170 likes | 191 Vues
This research explores efficient model selection for high-dimensional data using the Minimal Message Length criterion with a Generalized Dirichlet Mixture model. It covers theory, implementation, and experimental results.
E N D
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila and Djemel Ziou Dissusion led by Qi An Duke University Machine Learning Group
Outline • Introduction • The generalized Dirichlet mixture • The minimal message length (MML) criterion • Fisher information matrix and priors • Density estimation and model selection • Experimental results • Conclusions
Introduction • How to determine the number of components in a mixture model for high-dimensional data? • Stochastic and resampling (Slow) • Implementation of model selection criteria • Fully Bayesian way • Deterministic (Fast) • Approximate Bayesian criteria • Information/coding theory concepts • Minimal message length (MML) • Akaike’s information criterion (AIC)
The generalized Dirichlet distribution • A d dimensional generalized Dirichlet distribution is defined to be where and , , , It can be reduced to the Dirichlet distribuiton when
The generalized Dirichlet distribution For the generalized Dirichlet distribution: The GDD has a more general covariance structure than the DD and it is conjugate to multinomial distribution.
GDD vs. Gaussian • The GDD has smaller number of parameters to estimate. The estimation can be more accurate • The GDD is defined in a support [0,1] and can be extended to a compact support [A,B]. It is more appropriate for the nature of data. Beta distribution: Beta type-II distribution: They are equal if we set u=v/(1+v).
A GDD mixture model A generalized Dirichlet mixture model with M components, where p(X|α) takes a form of the GDD.
The MML criterion • The message length is defined as minus the logarithm of the posterior probability. • After placing an explicit prior over parameters, the message length for a mixture of distribution is given as prior likelihood Fisher Information optimal quantization constant
Fisher Information matrix • The Fisher information matrix is the expected value of the Hessian minus the logarithm of the likelihood where
Prior distribution • Assume the independence between difference components Mixture weighs GDD parameters Place a Dirichlet distribution and a generalized Dirichlet distribution on P and α, respectively, with parameters set to 1.
Message length • After obtaining the Fisher information and specifying the prior distribution, the message length can be expressed as
Estimation and selection algorithm • The authors use an EM algorithm to estimate the mixture parameters. • To overcome the computation issue and local maxima problem, they implement a fairly sophisticated initialization algorithm. • The whole algorithm is summarized in the next page
Experimental results The correct number of mixture are 5, 6, 7, respectively
Experimental results • Web mining: • Training with multiple classes of labels • Use to predict the label of testing sample • Use top 200 words frequency
Conclusions • A MML-based criterion is proposed to select the number of components in generalized Dirichlet mixtures. • Full dimensionality of the data is used. • Generalized Dirichlet mixtures allow more modeling flexibility than mixture of Gaussians. • The results indicate clearly that the MML and LEC model selection methods outperform the other methods.