1 / 98

Nevin L. Zhang Room 3504, phone: 2358-7015, Email: lzhang@cst.hk Home page

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY COMP 5213:  Introduction to Bayesian Networks L12: Latent Tree Models and Multidimensional Clustering. Nevin L. Zhang Room 3504, phone: 2358-7015, Email: lzhang@cs.ust.hk Home page. Outline. Latent Tree Models Definition

ohio
Télécharger la présentation

Nevin L. Zhang Room 3504, phone: 2358-7015, Email: lzhang@cst.hk Home page

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGYCOMP 5213:  Introduction to Bayesian NetworksL12: Latent Tree Models and Multidimensional Clustering Nevin L. ZhangRoom 3504, phone: 2358-7015, Email: lzhang@cs.ust.hkHome page

  2. Outline • Latent Tree Models • Definition • Generalizing finite mixture models • Generalizing phylogenetic trees • Attractive representation of joint distributions • Basic Properties • Learning Algorithms • Applications • Latent structure discovery • Multidmensional clustering • Probabilistic inference • Readings: http://www.cse.ust.hk/~lzhang/ltm/index.htm

  3. Latent Tree Models (LTM) • Bayesian networks with • Rooted tree structure • Leaves observed (manifest variables) • Discrete or continuous • Internal nodes latent (latent variables) • Discrete • Also known as hierarchical latent class (HLC)models, HLC models P(Y1), P(Y2|Y1), P(X1|Y2), P(X2|Y2), …

  4. Example with Continuous Leaves • A leaf node can contain • One discrete observed variable, • One continuous observed variable, or • Multiple continuous observed variables.

  5. Example • Manifest variables • Math Grade, Science Grade, Literature Grade, History Grade • Latent variables • Analytic Skill, Literal Skill, Intelligence

  6. More General Tree Models • Some internal nodes can be observed • Internal nodes can be continuous • .. • We do not consider such models

  7. Outline • Latent Tree Models • Definition • Generalizing finite mixture models • Generalizing phylogenetic trees • Attractive representation of joint distributions • Basic Properties • Learning Algorithms • Applications • Latent structure discovery • Multidmensional clustering • Probabilistic inference • Readings: http://www.cse.ust.hk/~lzhang/ltm/index.htm

  8. Finite Mixture Models • Gaussian Mixture Models and Latent class models • Contains one latent variable • Produces one partition of data

  9. Is One Partition Sufficient?

  10. How to Cluster Those? • Page 10

  11. How to Cluster Those? • Page 11 Style of picture

  12. How to Cluster Those? • Page 12 Type of object in picture

  13. How to Cluster Those? • Page 13 • Need multiple partitions • In general, complex data usually • Have multiple facets • Can be meaningfully clustered in multiple ways

  14. LTMs and Multidimensional Clustering • An LTM contains multiple latent variables • Each represents a partition of data. • Hence, LTMs can be used to produce multiple partitions of data • Called: Multidimensional clustering, each latent variable being a dimension.

  15. From FMMs to LTMs • Start with several FMMs, • Each based on a distinct subset of attributes • Each partition from a certain perspective. • Different partitions are independent of each other • Link them up to form a tree model • Get LTM • Consider different perspectives in a single model

  16. Outline • Latent Tree Models • Definition • Generalizing finite mixture models • Generalizing phylogenetic trees • Attractive representation of joint distributions • Basic Properties • Learning Algorithms • Applications • Latent structure discovery • Multidmensional clustering • Probabilistic inference • Readings: http://www.cse.ust.hk/~lzhang/ltm/index.htm

  17. Phylogeny • Assumption • All organisms on Earth have a common ancestor • This implies that any set of species is related. • Phylogeny • The relationship between any set of species. • Phylogenetic tree • Usually, the relationship can be represented by a tree which is called a phylogenetic (evolution) tree • this is not always true

  18. Phylogenetic trees

  19. Phylogenetic trees • TAXA (sequences) identify species • Edge lengths represent evolution time • Assumption: bifurcating tree toplogy

  20. Probabilistic Models of Evolution • Characterize relationship between taxa using substitution probability: • P(x | y, t): probability that ancestral sequence y evolves into sequence s along an edge of length t • P(X7), P(X5|X7, t5), P(X6|X7, t6), P(S1|X5, t1), P(S2|X5, t2), ….

  21. Probabilistic Models of Evolution • What should P(x|y, t) be? • Two assumptions of commonly used models • There are only substitutions, no insertions/deletions (aligned) • One-to-one correspondence between sites in different sequences • Each site evolves independently and identically • P(x|y, t) = Pi=1 to m P(x(i) | y(i), t) • m is sequence length

  22. Probabilistic Models of Evolution • What should P(x(i)|y(i), t) be? • Jukes-Cantor (Character Evolution) Model [1969] • Rate of substitution a (Constant or parameter?)

  23. Phylogenetic Trees are Special LTMs • The structure is a binary tree • The variables share the same state space. • The conditional probabilities are from the character evolution model, parameterized by edge lengths instead of usual parameterization.

  24. Outline • Latent Tree Models • Definition • Generalizing finite mixture models • Generalizing phylogenetic trees • Attractive representation of joint distributions • Basic Properties • Learning Algorithms • Applications • Latent structure discovery • Multidmensional clustering • Probabilistic inference • Readings: http://www.cse.ust.hk/~lzhang/ltm/index.htm

  25. Attractive Representation of Joint Distributions • Characteristics of LTMs • Are computationally very simple to work with. • Can represent complex relationships among manifest variables. • Useful tool for density estimation.

  26. What can LTMs be Used for? • Generalizing finite mixture models • Tool for multidimensional clustering • Generalizing phylogenetic trees • Tool for latent structure discovery • Attractive representation of joint distributions • Tool for density estimation (general probabilistic modeling)

  27. Outline • Latent Tree Models • Definition • Generalizing finite mixture models • Generalizing phylogenetic trees • Attractive representation of joint distributions • Basic Properties • Learning Algorithms • Applications • Latent structure discovery • Multidmensional clustering • Probabilistic inference • Readings: http://www.cse.ust.hk/~lzhang/ltm/index.htm

  28. Root-Walking: Proof

  29. Root-Walking: Proof

  30. Root Walking and Model Equivalence • M1: root walks to X2; M2: root walks to X3 • Root walking leads to equivalent models on manifest variables • Implications: • Cannot determine edge orientation from data • Can only learn unrooted models

  31. Regularity

  32. Regularity

  33. Regularity • Can focus on regular models only • Irregular models can be made regular • Regularized models better than irregular models • Theorem: The set of all such models is finite.

  34. Outline • Latent Tree Models • Definition • Generalizing finite mixture models • Generalizing phylogenetic trees • Basic Properties • Learning Algorithms • Applications • Latent structure discovery • Multidmensional clustering • Probabilistic inference • Readings: http://www.cse.ust.hk/~lzhang/ltm/index.htm

  35. Learning Latent Tree Models Determine • Number of latent variables • Cardinality of each latent variable • Model Structure • Conditional probability distributions

  36. Different Types of Algorithms • Search Algorithms • Clustering of manifest variables • Generalization of phylogenetic tree reconstruction algorithms, particularly Neighbor-Joining

  37. Model Selection • Bayesian score: posterior probability P(m|D) • P(m|D)= P(m)∫P(D|m, θ) d θ/ P(D) • BIC Score: large sample approximation BIC(m|D) = log P(D|m, θ*) – d/2 logN d: Standard dimension, number of free parameters • BICe Score: BICe(m|D) = log P(D|m, θ*) – de/2 logN effective dimensionde. • Effective dimensions are difficult to compute • BICe not realistic

  38. Page 40 Effective Dimension • Standard dimension: • Number of free parameters • Effective dimension • X1, X2, …, Xn: observed variables • P(X1, X2, …, Xn) is a point in a high-D space for each value of the parameter • Spans a manifold as parameter value varies. • Effective dimension: dimension of the manifold. • Parsimonious model: • Standard dimension = effective dimension • Open question: How to test parsimony?

  39. Page 41 Effective Dimension • Paper: • N. L. Zhang and Tomas Kocka (2004). Effective dimensions of hierarchical latent class models.Journal of Artificial Intelligence Research, 21: 1-17. Open question: Effective of LTM with one latent variable

  40. Model Selection • Other Choices • Cheeseman-Stutz (CS): impact of approximation error in BIC reduced • AIC • Holdout likelihood • (Cross validation: too expensive) • Simulation studies indicate that • BIC and CS result in good models • AIC and holdout likelihood do not • Therefore, we chose work with BIC.

  41. Model Optimization • Double hill climbing (DHC), 2002 • 7 manifest variables. • Single hill climbing (SHC), 2004 • 12 manifest variables • Heuristic SHC (HSHC), 2004 • 50 manifest variables • EAST, 2012 • As efficient as HSHC, and more principled • 100+ manifest variables • Reference: T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence,  176(1), 2246-2269. doi:10.1016/j.artint.2011.09.003.

  42. EAST Algorithm: 5 Search Operators • EAST: Expansion, Adjustment, Simplification until Termination • Expansion operators: • Node introduction (NI): M1 => M2; |X1| = |X| • Constraint: To mediate a latent node and only two of its neighbors • State introduction (SI): adds a new state to a latent variable • Adjustment operator: node relocation (NR), M2 => M3 • Simplification operators: node deletion (ND), state deletion (SD)

  43. Search Operators and Model Inclusion • M  M’: by NI or SI • M’ includes M • M  M’: by ND or SD • M’ included in M • M  M’: by NR • No inclusion property in general.

  44. Naïve Search • Start with an initial model • At each step: • Construct all possible candidate models • Evaluate them one by one • Pick the best one • Inefficient • Too many candidate models • Too expensive to run EM on all of them • Structural EM assumes fixed set of variables. • Does not work here Latent variables in models by NI, SI, SD differ from those in current model

  45. Reducing Number of Candidate Models • Not to use ALL the operators at once. • How? • BIC: BIC(m|D) = log P(D|m,θ*) – d/2 logN • Improve the two terms alternately • SD and ND reduce the penalty term. • Which operators to improve the likelihood term?

  46. Improve Likelihood Term • Let be m’ obtained from m using NI or SI log P(D|m’,θ’*) >= log P(D|m,θ*) NI and SI improves the likelihood term. [Homework: Prove this.] • Follow each NI operation with NR operations. • Overcome constraint by NI and allow transition from M1 to M3

  47. Choosing between Models by SI and NI • Operation Granularity • p = 100 • SI: 101 additional parameters • NI: 2 additional parameters • Compare shovels with bulldozer • SI always preferred initially • Cost-effectiveness principle • Select candidate model with highest improvement ratio

  48. Variable Complexity vs Structure Complexity • NI: Increases structure complexity. • SI: Increases variable complexity. • Cost-effectiveness principle: • Achieves good balance between the two kinds of complexity

More Related