Modeling Annotated Data (SIGIR 2003)

Modeling Annotated Data(SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003

A One-slide Summary • Problem: probabilistic modeling of annotated data (images + caption) • Method: Correspondence latent Dirichlet allocation (Corr-LDA) + two baseline models (GM-mixture, GM-LDA) • Evaluation • Held-out likelihood: Corr-LDA=GM-LDA > GM-mixture • Auto annotation: Corr-LDA>GM-mixture > GM-LDA • Image Retrieval: Corr-LDA>{GM-mixture , GM-LDA} • Conclusions: Corr-LDA is a good model

The Problem • Image/Caption Data: (r,w) • R={r1,…,rN}: Regions (primary data) • W={w1,…,wM}: Words (annotations) • R and W are different types • Need models for 3 tasks • Modeling join distribution: p(r,w) (for clustering) • Modeling conditional distr. P(w|r) (labeling an image, retrieval) • Modeling per-region word distr. P(w|ri) (labeling a region)

Three Generative Models • General • Region (ri) is modeled by a multivariate Gaussian with diagonal covariance • Word (wi) is modeled by a multinomial distribution • Assume k clusters • GM-Mixture: Gaussian-multinomial mixture • An image-caption pair belongs to exactly one cluster • Gaussian-multinomial LDA • An image-caption pair may belong to several clusters; each region/word belongs to exactly one cluster • Regions and words of the same image may belong to completely disjoint clusters • Correspondence LDA • An image-caption pair may belong to several clusters; each region/word belongs to exactly one cluster • A word must belong to exactly one of the clusters of the regions.

Detour, Probabilistic models for document/term Clustering…

P(w|C1) C1 P(C1) P(w|C2) P(C2) C2 … P(w|Ck) P(Ck) Ck Maximum Likelihood Estimator A Mixture Model of Documents Select a group Generate a document (word sequence)

Hidden variables z11,…z1k zij{0,1} zij=1 iff di is in cluster j zn1,…znk Incomplete likelihood: Complete likelihood: E-step: compute E z |old[Lc(|D)] M-step: = argmaxE z |old[Lc(|D)] Compute p(zij|di,old) Compute expected counts for estimating  Applying EM Algorithm Cluster/group Document d1 c1 c2 dn Data: D={d1,…,dn} ck

Practical issues: - “under-flow” - smoothing EM Updating Formulas • Parameters: =({p(Ci)}, {p(wj|Ci)}) • Initialization: randomly set 0 • Repeat until converge • E-step • M-step

End of Detour

GM-Mixture • Model: • Estimation: EM • Annotation:

Gaussian-Multinomial LDA • Model: • Estimation: Variational Bayesian • Annotation: Marginalization

Correspondence LDA • Model: • Estimation: Variational Bayesian • Annotation: Marginalization

Variational EM • General idea: • Using variational approximation to compute a lower bound of the likelihood in the E-step • Procedure: • Initialization • E-step: maximizing variational lower bound, usually involves iterations • M-step: Given variational distribution, estimate model parameters using ML

Experiment Results • Held-out likelihood: Corr-LDA=GM-LDA > GM-mixture • Auto annotation: Corr-LDA>GM-mixture > GM-LDA • Image Retrieval: Corr-LDA>{GM-mixture , GM-LDA}

Summary • A powerful generative model for annotated data (Corr-LDA) • Interesting empirical results

Modeling Annotated Data (SIGIR 2003)