The Python Mixture Package (PyMix , http://www.pymix.org/pymix/ ) is a freely available Python library implementing algorithms and data structures for a wide variety of data mining applications with basic and extended mixture models. Features include * Finite mixture models of discrete and continuous features * Wide range of available distributions (Normal, Exponential, Discrete, Dirichlet, Normal-Gamma, Uniform, HMM) * Bayesian mixture models * ML and MAP parameter estimation * Context-specific independence structure learning * Partially supervised parameter learning * Parameter estimation for pairwise constrained samples ====== Installing ====== (tested on Ubuntu 12.10) * download newest Version from http://www.pymix.org/pymix/index.php?n=PyMix.Download * Adjust setup.py for ubuntu: > vi setup.py ## # 1. # replace the line # from distutils.core import setup, Extension,DistutilsExecError # with from distutils.core import setup, Extension from distutils.errors import DistutilsExecError ## # 2. # replace the line # numpypath = prefix + '/lib/python' +pyvs + '/site-packages/numpy/core/include/numpy' # path to arrayobject.h # with numpypath = '/usr/share/pyshared/numpy/core/include/numpy' # path to arrayobject.h * build and install python setup.py build sudo python setup.py install --prefix /usr/local/ ====== Clustering ====== Following the tutorial from http://www.pymix.org/pymix/index.php?n=PyMix.Tutorial import numpy import mixture # create dummy data with speeds from lkw and pkw raw_data = numpy.array([75 , 80 , 120, 83, 134, 150, 89, 160, 80, 160] ) data = mixture.DataSet() data.fromArray(raw_data) # create mixture model n1 = mixture.NormalDistribution(80,3.0) n2 = mixture.NormalDistribution(130,10.0) m = mixture.MixtureModel(2,[0.5,0.5], [n1,n2]) # Perform Expectation Maximization Algorithm m.EM(data, max_iter=40, delta=0.1) # finished after 40 iterations or when delta < 0.1 # show cluster assignment of data clust = m.classify(data) print clust