The Python Mixture Package (PyMix , http://www.pymix.org/pymix/ ) is a freely available Python library implementing algorithms and data structures for a wide variety of data mining applications with basic and extended mixture models. Features include
* Finite mixture models of discrete and continuous features
* Wide range of available distributions (Normal, Exponential, Discrete, Dirichlet, Normal-Gamma, Uniform, HMM)
* Bayesian mixture models
* ML and MAP parameter estimation
* Context-specific independence structure learning
* Partially supervised parameter learning
* Parameter estimation for pairwise constrained samples
====== Installing ======
(tested on Ubuntu 12.10)
* download newest Version from http://www.pymix.org/pymix/index.php?n=PyMix.Download
* Adjust setup.py for ubuntu:
> vi setup.py
##
# 1.
# replace the line
# from distutils.core import setup, Extension,DistutilsExecError
# with
from distutils.core import setup, Extension
from distutils.errors import DistutilsExecError
##
# 2.
# replace the line
# numpypath = prefix + '/lib/python' +pyvs + '/site-packages/numpy/core/include/numpy' # path to arrayobject.h
# with
numpypath = '/usr/share/pyshared/numpy/core/include/numpy' # path to arrayobject.h
* build and install
python setup.py build
sudo python setup.py install --prefix /usr/local/
====== Clustering ======
Following the tutorial from http://www.pymix.org/pymix/index.php?n=PyMix.Tutorial
import numpy
import mixture
# create dummy data with speeds from lkw and pkw
raw_data = numpy.array([75 , 80 , 120, 83, 134, 150, 89, 160, 80, 160] )
data = mixture.DataSet()
data.fromArray(raw_data)
# create mixture model
n1 = mixture.NormalDistribution(80,3.0)
n2 = mixture.NormalDistribution(130,10.0)
m = mixture.MixtureModel(2,[0.5,0.5], [n1,n2])
# Perform Expectation Maximization Algorithm
m.EM(data, max_iter=40, delta=0.1) # finished after 40 iterations or when delta < 0.1
# show cluster assignment of data
clust = m.classify(data)
print clust