Latent Tree Analysis of Unlabeled Data

Latent Tree Analysis of Unlabeled Data Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. http://www.cse.ust.hk/~lzhang

Latent tree models • Latent tree analysis algorithms • What can LTA be used for: • Discovery of co-occurrence/correlation patterns • Discovery of latent variable/structures • Multidimensional clustering • Examples • Danish beer survey data • Text data • TCM survey data Outline

Tree-structured probabilistic graphical models • Leaves observed (manifest variables) • Discrete or continuous • Internal nodes latent (latent variables) • Discrete • Each edge is associated with a conditional distribution • One node with marginal distribution • Defines a joint distributions over all the variables (Zhang, JMLR 2004) Latent Tree Models

Latent Tree Analysis From data on observed variables, obtain latent tree model Learning latent tree models: Determine • Number of latent variables • Numbers of possible states for latent variables • Connections among nodes • Probability distributions Model Selection Criterion Find the model that maximize the BIC score BIC(m|D) = log P(D|m, θ*) – d/2 logN D: Data, N: sample size m: model, θ*: MLE of parameters d: number of free parameters

Search-based Extension, Adjustment, Simplification until Termination Can deal with ~100 observed variables (Chen, Zhang et al. AIJ 2011) Algorithms: EAST

(Liu, Zhang et al. MLJ 2013) UniDimensioanlity Test

(Liu, Zhang et al. MLJ 2013)

(Liu, Zhang et al. MLJ 2013) Chow-Liu tree (1968)

Close to EAST in terms of model quality. Can deal with 1,000 observed variables (Liu, Zhang et al. MLJ 2013)

463 consumers, 11 beer brands • Questionnaire: For each brand: • Never seen the brand before (s0); • Seen before, but never tasted (s1); • Tasted, but do not drink regularly (s2) • Drink regularly (s3). Danish Beer Market Survey (Mourad et al. JAIR 2013)

GronTuborg and Carlsberg: Main mass-market beers • TuborgClas and CarlSpec: Frequent beers, bit darker than the above • CeresTop, CeresRoyal, Pokal, …: minor local beers • Grouped as such because responses on brands in each group strongly correlated. • Intuitively, latent tree analysis: • Partitions observed variables into groups such that • Variables in each group are strongly correlated, and • The correlations among each group can be properly be modeled using one single latent variable Why variables grouped as such?

Each Latent variable gives a partition of consumers. • H1: • Class 1: Likely to have tasted TuborgClas, Carlspec and Heineken , but do not drink regularly • Class 2: Likely to have seen or tasted the beers, but did not drink regularly • Class 3: Likely to drink TuborgClas and Carlspec regularly • Intuitively, latent tree analysis is a technique for multiple clustering. • K-Means, mixture models give only one partition. Multidmensional Clustering

Page 14 Binary Text Data: WebKB (Liu et al. PGM 2012, MLJ 2013) 1041 web pages collected from 4 CS departments in 1997 336 words

Page 15 Latent Tree Model for WebKB Data by BI Algorithm 89 latent variables

Latent Tree Modes for WebKB Data

Page 17

Page 18

Page 19 Why variables grouped as such? • Group as such because words in in each group tend to co-occur. • On binary data, latent tree analysis: • Partitions observed word variables into groups such that • Words in each group tend to co-occur and • The correlations can be properly be explained using one single latent variable LTA is a method for identifying co-occurrence relationships.

Multidimensional Clustering LTA is an approach to topic detection • Y66=4: Object Oriented Programming (oop) • Y66=2: Non-oop programming • Y66=1: programming language • Y66=3: Not on programming

Common practice in China, increasingly in Western world • Patients of a WM disease divided into several TCM classes • Different classes are treated differently using TCM treatments. • Example: • WM disease: Depression • TCM Classes: • Liver-Qi Stagnation (肝气郁结). Treatment principle: 疏肝解郁，Prescription: 柴胡疏肝散 • Deficiency of Liver Yin and Kidney Yin (肝肾阴虚)：Treatment principle: 滋肾养肝，Prescription: 逍遥散合六味地黄丸 • Vacuity of both heart and spleen (心脾两虚). Treatment principle: 益气健脾, Prescription: 归脾汤 • …. Background of Research

How should patients of a WM disease be divided into subclasses from the TCM perspective? • What TCM classes? • What are the characteristics of each TCM class? • How to differentiate different TCM classes? • Important for • Clinic practice • Research • Randomized controlled trials for efficacy • Modern biomedical understanding of TCM concepts • No consensus. Different doctors/researchers use different schemes. Key weakness of TCM. Key Question

Our objective: • Provide an evidence-based methodfor TCM patient classification • Key Idea • Cluster analysis of symptom data => empirical partition of patients • Check to see whether it corresponds to TCM class concept • Key technology: Multidimensional clustering • Motivation for developing latent tree analysis Key Idea

Subjects: • 604 depressive patients aged between 19 and 69 from 9 hospitals • Selected using the Chinese classification of mental disorder clinic guideline CCMD-3 • Exclusion: • Subjects we took anti-depression drugs within two weeks prior to the survey; women in the gestational and suckling periods, .. etc • Symptom variables • From the TCM literature on depression between 1994 and 2004. • Searched with the phrase “抑郁 and 证” on the CNKI (China National Knowledge Infrastructure) data • Kept only those on studies where patients were selected using the ICD-9, ICD-10, CCMD-2, or CCMD-3 guidelines. • 143 symptoms reported in those studies altogether. Symptoms Data of Depressive Patients (Zhao et al. JACM 2014)

Data as a table • 604 rows, each for a patient • 143 columns, each for a symptom • Table cells: 0 – symptom not present, 1 – symptom present • Removed: Symptoms occurring <10 times • 86 symptoms variables entered latent tree analysis. • Structure of the latent tree model obtained on the next two slides. The Depression Data

Model Obtained for a Depression Data (Top)

Model obtained for a Depression Data (Bottom)

The first cluster (Y29= s0) consists of 54% of the patients and while the cluster (Y29= s1)consists of 46% of the patients. The two symptoms ‘fear of cold’ and ‘cold limbs’ do not occur often in the first cluster While they both tend to occur with high probabilities (0.8 and 0.85) in the second cluster. The Empirical Partitions

Probabilistic symptom co-occurrence pattern: • The table indicates that the two symptoms ‘fear of cold’ and ‘cold limbs’ tend to co-occur in the cluster Y29= s1 • Pattern meaningful from the TCM perspective. • TCM asserts that YANG DEFICIENCY (阳虚) can lead to, among other symptoms, ‘fear of cold’ and ‘cold limbs’ • So, the co-occurrence pattern suggests the TCM symdrome type （证型） YANG DEFICIENCY (阳虚). Probabilistic Symptom co-occurrence pattern • The partition Y29 suggests that • Among depressive patients, there is a subclass of patient with YANG DEFICIENCY. • In this subclass, ‘fear of cold’ and ‘cold limbs’ co-occur with high probabilities (0.8 and 0.85)

Y28= s1 captures the probabilistic co-occurrence of ‘aching lumbus’, ‘lumbar pain like pressure’ and ‘lumbar pain like warmth’. • This pattern is present in 27% of the patients. • It suggests that • Among depressive patients, there is a subclass that correspond to the TCM concept of KIDNEY DEPRIVED OF NOURISHMENT (肾虚失养) • Characteristics of the subclass given by distributions for Y28= s1 Probabilistic Symptom co-occurrence pattern

Y27= s1 captures the probabilistic co-occurrence of ‘weak lumbus and knees’ and ‘cumbersome limbs’. • This pattern is present in 44% of the patients • It suggests that, • Among depressive patients, there is a subclass that correspond to the TCM concept of KIDNEY DEFICIENCY （肾虚） • Characteristics of the subclass given by distributions for Y27= s1 • Y27, Y28, Y29 together provide evidence for defining KIDNEY YANG DEFICIENCY Probabilistic Symptom co-occurrence pattern

Pattern Y21= s1: evidence for defining STAGNANT QI TURNING INTO FIRE （气郁化火） Y15= s1 : evidence for defining QI DEFICIENCY Y17 = s1 : evidence for defining HEART QI DEFICIENCY Y16= s1 : evidence for defining QI STAGNATION Y19= s1: evidence for defining QI STAGNATION IN HEAD Probabilistic Symptom co-occurrence pattern

Y9= s1 :evidence for defining DEFICIENCY OF BOTH QI AND YIN (气阴两虚) Y10= s1: evidence for defining YIN DEFICIENCY (阴虚) Y11= s1: evidence for defining DEFICIENCY OF STOMACH/SPLEEN YIN (脾胃阴虚) Probabilistic Symptom co-occurrence pattern

Some empirical partitions reveal symptom exclusion patterns Y1 reveals the mutual exclusion of ‘white tongue coating’, ‘yellow tongue coating’ and ‘yellow-white tongue coating’ Y2 reveals the mutual exclusion of ‘thin tongue coating’, ‘thick tongue coating’ and ‘little tongue coating’. Symptom Mutual-Exclusion Patterns

By analyzing 604 cases of depressive patient data using latent tree models we have discovered a host of probabilistic symptom co-occurrence patterns and symptom mutual-exclusion patterns. Most of the co-occurrence patterns have clear TCM syndrome connotations, while the mutual-exclusion patterns are also reasonable and meaningful. The patterns can be used as evidence for the task of defining TCM classes in the context of depressive patients and for differentiating between those classes. Summary of TCM Data Analysis

TCM terms such as Yang Deficiency were introduced to explain symptom co-occurrence patterns observed in clinic practice. Another Perspective: Statistical Validation of TCM Postulates (Zhang et al. JACM 2008) ….. ….. Y28 = s1 Kidney deprived of nourishment Y29 = s1 Yang Deficiency

D. Haughton and J. Haughton. Living Standards Analytics: Development through the Lens of Household Survey Data. Springer. 2012 • Zhang et al. provide a very interesting application of latent class (tree) models to diagnoses in traditional Chinese medicine (TCM). • The results tend to confirm known theories in Chinese traditional medicine. • This is a significant advance, since the scientific bases for these theories are not known. • The model proposed by the authors provides at least a statistical justification for them. Value of Work in View of Others

Latent tree models: • Tree-structure probabilistic graphical models • Leaf nodes: observed variables • Internal nodes: latent variable • What can LTA be used for: • Discovery of co-occurrence patterns in binary data • Discovery of correlation patterns in general discrete data • Discovery of latent variable/structures • Multidimensional clustering • Topic detection in text data • Key role in TCM patient classification Summary

References: N. L. Zhang (2004). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, 5(6):697-723, 2004. T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence, 176(1), 2246-2269. T.F.Liu, N. L. Zhang, A.H. Liu, L.K.M. Poon (2012). A Novel LTM-based Method for Multidimensional Clustering. European Workshop on Probabilistic Graphical Models (PGM-12), 203-210. T.F, Liu, N. L. Zhang, P. X. Chen, A. H.Liu, L. K. M. Poon, and Yi Wang (2013). Greedy learning of latent tree models for multidimensional clustering. Machine Learning, doi:10.1007/s10994-013-5393-0. R. Mourad, C. Sinoquet, N. L. Zhang, T.F. Liu and P. Leray (2013). A survey on latent tree models and applications. Journal of Artificial Intelligence Research, 47, 157-203 , 13 May 2013. doi:10.1613/jair.3879. N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Statistical Validation of TCM Theories. Journal of Alternative and Complementary Medicine, 14(5):583-7. N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Latent tree models and diagnosis in traditional Chinese medicine. Artificial Intelligence in Medicine. 42: 229-245. Z.X. Xu, N. L. Zhang, Y.Q. Wang, G.P. Liu, J. Xu, T. F. Liu, and A. H. Liu (2013). Statistical Validation of Traditional Chinese Medicine Syndrome Postulates in the Context of Patients with Cardiovascular Disease. The Journal of Alternative and Complementary Medicine. Y. Zhao, N. L. Zhang, T.F.Wang, Q. G. Wang (2014). Discovering Symptom Co-Occurrence Patterns from 604 Cases of Depressive Patient Data using Latent Tree Models. The Journal of Alternative and Complementary Medicine.

Thank You！

Latent Tree Analysis of Unlabeled Data

Latent Tree Analysis of Unlabeled Data

Presentation Transcript

Latent Semantic Analysis

Stochastic Unsupervised Learning on Unlabeled Data

Latent Tree Models

Latent Tree Models Part IV: Applications

Latent Transition Analysis

Latent class trajectory analysis

Latent Causal Modelling of Neuroimaging Data

Consistent and Efficient Reconstruction of Latent Tree Models

Latent Association Analysis of Document Pairs

Learning from labelled and unlabeled data

Techniques For Exploiting Unlabeled Data

Search-based Learning of Latent Tree Models

Latent Semantic Analysis

Data Analysis with a Latent Variable Model

Classification of unlabeled data:

Latent Semantic Analysis (LSA)

Information Extraction with Unlabeled Data

Latent Class Analysis

Techniques For Exploiting Unlabeled Data

Latent Semantic Analysis