Latent Tree Models

AAAI 2014 Tutorial Latent Tree Models Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. http://www.cse.ust.hk/~lzhang

HKUST 2014 HKUST 1988

Latent Tree Models • Part I: Non-Technical Overview (25 minutes) • Part II: Definition and Properties (25 minutes) • Part III: Learning Algorithms (110 minutes, 30 minutes break half way) • Part IV: Applications (50 minutes)

Part I: Non-Technical Overview • Latent tree models • What can LTMs be used for: • Discovery of co-occurrence/correlation patterns • Discovery of latent variable/structures • Multidimensional clustering • Examples • Danish beer survey data • Text data

Latent Tree Models (LTMs) • Tree-structured probabilistic graphical models • Leaves observed (manifest variables) • Discrete or continuous • Internal nodes latent (latent variables) • Discrete • Each edge is associated with a conditional distribution • One node with marginal distribution • Defines a joint distributions over all the variables (Zhang, JMLR 2004)

Latent Tree Analysis (LTA) From data on observed variables, obtain latent tree model Learning latent tree models: Determine • Number of latent variables • Numbers of possible states for latent variables • Connections among nodes • Probability distributions

LTA on Danish Beer Market Survey Data • 463 consumers, 11 beer brands • Questionnaire: For each brand: • Never seen the brand before (s0); • Seen before, but never tasted (s1); • Tasted, but do not drink regularly (s2) • Drink regularly (s3). (Mourad et al. JAIR 2013)

Why variables grouped as such? • Responses on brands in each group strongly correlated. • GronTuborg and Carlsberg: Main mass-market beers • TuborgClas and CarlSpec: Frequent beers, bit darker than the above • CeresTop, CeresRoyal, Pokal, …: minor local beers • In general, LTA partitions observed variables into groups such that • Variables in each group are strongly correlated, and • The correlations among each group can be properly be modeled using one single latent variable

Multidmensional Clustering • Each Latent variable gives a partition of consumers. • H1: • Class 1: Likely to have tasted TuborgClas, Carlspecand Heineken , but do not drink regularly • Class 2: Likely to have seen or tasted the beers, but did not drink regularly • Class 3: Likely to drink TuborgClas and Carlspec regularly • H0 and H2 give two other partitions. • In general, LTA is a technique for multiple clustering. • In contrast, K-Means, mixture models give only one partition.

Unidimensional vs Multidimensional Clustering • Grouping of objects intoclusters such that objects in the same cluster are similar while objects from different clusters are dissimilar. • Result of clustering is often a partition of all the objects.

How to Cluster Those?

How to Cluster Those? Style of picture

How to Cluster Those? Type of object in picture

Multidimensional Clustering • Complex data usually have multiple facets and can be meaningfully partitioned in multiple ways. Multidimensional clustering / Multi-Clustering • LTA is a model-based method for multidimensional clustering. • Other methods: http://www.siam.org/meetings/sdm11/clustering.pdf

Clustering of Variables and Objects • LTA produces a partition of observed variables. • For each cluster of variables, it produces a partition of objects.

Binary Text Data: WebKB • 1041web pages collected from 4 CS departments in 1997 • 336 words

Latent Tree Model for WebKB Data 89 latent variables (Liu et al. MLJ 2013)

Latent Tree Modes for WebKB Data

Why variables grouped as such? • Words in each group tend to co-occur. • On binary text data, LTA partitions word variables into groups such that • Words in each group tend to co-occur and • The correlations can be properly be explained using one single latent variable LTA is a method for identifying co-occurrence relationships.

Multidimensional Clustering LTA is an alternative approach to topic detection • Y66=4: Object Oriented Programming (oop) • Y66=2: Non-oop programming • Y66=1: programming language • Y66=3: Not on programming More on this in Part IV

Summary • Latent tree models: • Tree-structured probabilistic graphical models • Leaf nodes: observed variables • Internal nodes: latent variable • What can LTA be used for: • Discovery of co-occurrence patterns in binary data • Discovery of correlation patterns in general discrete data • Discovery of latent variable/structures • Multidimensional clustering • Topic detection in text data • Probabilistic modelling

Key References: • Anandkumar, A., Chaudhuri, K., Hsu, D., Kakade, S. M., Song, L., & Zhang, T. (2011). Spectral methods for learning multivariate latent tree structure. In Twenty-Fifth Conference in Neural Information Processing Systems (NIPS-11). • Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., and Telgarsky, M. Tensor decompositions for learning latent variable models. In Preprint, 2012a. • Anandkumar, A., Hsu, D., and Kakade, S. M. A method of moments for mixture models and hidden Markov models. In An abridged version appears in the Proc. Of COLT, 2012b. • Choi, M. J., Tan, V. Y., Anandkumar, A., & Willsky, A. S. (2011). Learning latent tree graphical models. Journal of Machine Learning Research, 12, 1771–1812. • Friedman, N., Ninio, M., Pe’er, I., & Pupko, T. (2002). A structural EM algorithm for phylogenetic inference.. Journal of Computational Biology, 9(2), 331–353. • Harmeling, S., & Williams, C. K. I. (2011). Greedy learning of binary latent trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(6), 1087–1097. • Hsu, D., Kakade, S., & Zhang, T. (2009). A spectral algorithm for learning hidden Markov models. In The 22nd Annual Conference on Learning Theory (COLT 2009).

Key References: • E. Mossel, S. Roch, and A. Sly. Robust estimation of latent tree graphical models: Inferring hidden states with inexact parameters. Submitted. http://arxiv.org/abs/1109.4668, 2011. • Mourad, R., Sinoquet, C., & Leray, P. (2011). A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genomewideassociation studies. BMC Bioinformatics, 12, 16. • Mourad R., Sinoquet C., Zhang N. L., Liu T. F. and Leray P. (2013). A survey on latent tree models and applications. Journal of Artificial Intelligence Research, 47, 157-203 , 13 May 2013. doi:10.1613/jair.3879. • Parikh, A. P., Song, L., & Xing, E. P. (2011). A spectral algorithm for latent tree graphical models. In Proceedings of the 28th International Conference on Machine Learning (ICML-2011). • Saitou, N., & Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees.. Molecular Biology and Evolution, 4(4), 406–425. • Song, L., Parikh, A., & Xing, E. (2011). Kernel embeddings of latent tree graphical models. In Twenty-Fifth Conference in Neural Information Processing Systems (NIPS-11). • Tan, V. Y. F., Anandkumar, A., & Willsky, A. (2011). Learning high-dimensional Markov forest distributions: Analysis of error rates. Journal of Machine Learning Research,12, 1617–1653.

Key References: • T. Chen and N. L. Zhang (2006). Quartet-based learning of shallow latent variables. In Proceedings of the Third European Workshop on Probabilistic Graphical Model,59-66 , September 12-15, 2006. • Chen, T., Zhang, N. L., Liu, T., Poon, K. M., & Wang, Y. (2012). Model-based multidimensional clustering of categorical data. Artificial Intelligence, 176(1), 2246–2269. • Liu, T. F., Zhang, N. L., Liu, A. H., & Poon, L. K. M. (2013). Greedy learning of latent tree models for multidimensional clustering. Machine Learning, doi:10.1007/s10994-013-5393-0. • Liu, T. F., Zhang, N. L., and Chen, P. X. (2014). Hierarchical latent tree analysis for topic detection. ECML, 2014 • Poon, L. K. M., Zhang, N. L., Chen, T., & Wang, Y. (2010). Variable selection in modelbased clustering: To do or to facilitate. In Proceedings of the 27th International Con-ference on Machine Learning (ICML-2010). • Wang, Y., Zhang, N. L., & Chen, T. (2008). Latent tree models and approximate inference in Bayesian networks. Journal of Articial Intelligence Research, 32, 879–900. • Wang, X. F., Guo, J. H., Hao, L. Z., Zhang, N.L., & P. X. Chen (2013). Recovering discrete latent tree models by spectral methods. • Wang, X. F., Zhang, N. L. (2014). A Study of Recently Discovered Equalities about Latent Tree Models using Inverse Edges. PGM 2014. • Zhang, N. L. (2004). Hierarchical latent class models for cluster analysis. The Journal of Machine Learning Research, 5, 697–723. • Zhang, N. L., & Kocka, T. (2004a). Effective dimensions of hierarchical latent class models. Journal of Articial Intelligence Research, 21, 1–17.

Key References: • Zhang, N. L., & Kocka, T. (2004b). Efficient learning of hierarchical latent class models. In Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 585–593. • Zhang, N. L., Nielsen, T. D., & Jensen, F. V. (2004). Latent variable discovery in classification models. Artificial Intelligence in Medicine, 30(3), 283–299. • Zhang, N. L., Wang, Y., & Chen, T. (2008). Discovery of latent structures: Experience with the CoIL Challenge 2000 data set*. Journal of Systems Science and Complexity, 21(2), 172–183. • Zhang, N. L., Yuan, S., Chen, T., & Wang, Y. (2008). Latent tree models and diagnosis in traditional Chinese medicine. Artificial Intelligence in Medicine, 42(3), 229–245. • Zhang, N. L., Yuan, S., Chen, T., & Wang, Y. (2008). Statistical Validation of TCM Theories. Journal of Alternative and Complementary Medicine, 14(5):583-7. • Zhang, N. L., Fu, C., Liu, T. F., Poon, K. M., Chen, P. X., Chen, B. X., Zhang, Y. L. (2014). The Latent Tree Analysis Approach to Patient Subclassificationin Traditional Chinese Medicine. Evidence-Based Complementary and Alternative Medicine. • Xu, Z. X., Zhang, N. L., Wang, Y. Q., Liu, G. P., Xu, J., Liu, T. F., and Liu A. H. (2013). Statistical Validation of Traditional Chinese Medicine Syndrome Postulates in the Context of Patients with Cardiovascular Disease. The Journal of Alternative and Complementary Medicine. 18, 1-6. • Zhao, Y. Zhang , N. L., Wang, T. F., Wang, Q. G. (2014). Discovering Symptom Co-Occurrence Patterns from 604 Cases of Depressive Patient Data using Latent Tree Models. The Journal of Alternative and Complementary Medicine. 20(4):265-71.

Latent Tree Models