MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification

MedLDA: Maximum Margin Supervised Topic Models forRegression and Classification J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009 Presented By Haojun Chen Sources: http://www.cs.cmu.edu/~junzhu/medlda.htm

Outline • Motivation • Supervised topic model (sLDA) and Support vector regression (SVR) • Maximum entropy discrimination LDA (MedLDA) • MedLDA for Regression • MedLDA for Classification • Experiments Results • Conclusion

Motivation • Learning latent topic models with side information, like sLDA, has attracted increasingly attention. • Maximum likelihood estimation are used for posterior inference and parameter estimation in sLDA. • Max-margin methods, such as SVM, for classification have demonstrated success in many applications. • General principle for learning max-margin discriminative supervised latent topic models for both regression and classification is proposed in this paper.

Supervised Topic Model (sLDA) • Joint distribution for sLDA • Variational MLE for sLDA

Support Vector Regression (SVR) • Given a training set , the linear SVR finds an optimal linear function by solving the following constrained convex optimization problem

Max-Entropy Discrimination LDA (MedLDA) • Maximum entropy discrimination LDA (MedLDA): an integration of max-margin prediction models (e.g. SVR and SVM) and hierarchical Bayesian topic models (e.g. LDA and sLDA) • Specifically, a distribution is learned in a max-margin manner in MedLDA. • MedLDA for regression and classification are considered in this paper.

MedLDA for Regression • For regression, MedLDA is defined as an integration of Bayesian sLDA and SVR is the variational approximation for the posterior

EM Algorithm for MedLDA Regression • Variational EM Algorithm: • The key difference between sLDA and MedLDA lies in updating

MedLDA for Classification • Similar to the regression model, the integrated LDA and multi-class classification model is defined as follow: where

EM Algorithm for MedLDA Classification • Similar to the EM algorithm for MedLDA regression • Update equation for

Embedding Results • 20 Newsgroup dataset MedLDA LDA

Example Topics Discovered

Classification Results • 20 Newsgroup Data Relative ratio =

Regression Results • Movei Review Data

Time Efficiency

Conclusion • MedLDA: an integration of max-margin prediction models and hierarchical Bayesian topic models by optimizing a single objective function with a set of expected margin constraints

MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification

MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification

Presentation Transcript

Regression in geoDA

Constant Gross-Margin Percentage NRV Method

Model Building Training

Logistic Regression – Simultaneous Entry of Variables

Multiple Regression

ACL 2008: Semi-supervised Learning Tutorial

Some Useful Machine Learning Tools

PM 515 Behavioral Epidemiology Generalized Linear Regression Analysis

Topic 1: Classification

Chapter 11 Supervised Learning: STATISTICAL METHODS

Topic Models for Social Network Analysis and Bibliometrics

Economics: Foundations and Models

CS 59000 Statistical Machine learning Lecture 3

What statistical analysis should I use?

Joint Models with Missing Data for Semi-Supervised Learning

Fast and Accurate Inference for Topic Models

Chapter 12 Multiple Regression

Network Models

Data Mining: Classification and Prediction

Applied Econometrics Second edition

Training Discriminative Computer Vision Models with Weak Supervision

CENG 464 Introduction to Data Mining