User Tools

Site Tools


pymix

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
pymix [2013/09/09 11:27]
hkoller [Installing]
pymix [2013/09/09 13:07] (current)
hkoller [Clustering]
Line 1: Line 1:
 +The Python Mixture Package (PyMix , http://​www.pymix.org/​pymix/​ ) is a freely available Python library implementing algorithms and data structures for a wide variety of data mining applications with basic and extended mixture models. Features include
 +
 +   * Finite mixture models of discrete and continuous features
 +   * Wide range of available distributions (Normal, Exponential,​ Discrete, Dirichlet, Normal-Gamma,​ Uniform, HMM)
 +   * Bayesian mixture models
 +   * ML and MAP parameter estimation
 +   * Context-specific independence structure learning
 +   * Partially supervised parameter learning
 +   * Parameter estimation for pairwise constrained samples ​
 +
 +====== Installing ======
 +(tested on Ubuntu 12.10)
 +
 +   * download newest Version from http://​www.pymix.org/​pymix/​index.php?​n=PyMix.Download
 +   * Adjust setup.py for ubuntu:
 +
 +<​code>​
 +> vi setup.py
 +
 +##
 +# 1.
 +# replace the line 
 +# from distutils.core import setup, Extension,​DistutilsExecError
 +# with
 +
 +from distutils.core import setup, Extension
 +from distutils.errors import DistutilsExecError
 +
 +##
 +# 2.
 +# replace the line
 +#   ​numpypath =  prefix + '/​lib/​python'​ +pyvs + '/​site-packages/​numpy/​core/​include/​numpy' ​ # path to arrayobject.h
 +# with
 +    numpypath = '/​usr/​share/​pyshared/​numpy/​core/​include/​numpy'​ # path to arrayobject.h
 +
 +</​code>​
 +    * build and install
 +<code bash>
 +python setup.py build
 +sudo python setup.py install --prefix /usr/local/
 +</​code>​
 +
 +====== Clustering ======
 +Following the tutorial from http://​www.pymix.org/​pymix/​index.php?​n=PyMix.Tutorial ​
 +
 +<code python>
 +import numpy
 +import mixture
 +
 +# create dummy data with speeds from lkw and pkw
 +raw_data = numpy.array([75 , 80 , 120, 83, 134, 150, 89, 160, 80, 160] )
 +data = mixture.DataSet()
 +data.fromArray(raw_data)
 +
 +# create mixture model
 +n1 = mixture.NormalDistribution(80,​3.0)
 +n2 = mixture.NormalDistribution(130,​10.0)
 +m = mixture.MixtureModel(2,​[0.5,​0.5],​ [n1,n2])
 +
 +# Perform Expectation Maximization Algorithm
 +m.EM(data, max_iter=40,​ delta=0.1) ​ # finished after 40 iterations or when delta < 0.1
 +
 +# show cluster assignment of data
 +clust = m.classify(data)
 +print clust
 +
 +</​code>​
 +
 +