Logistic Regression and its Application in Classification Problems

Middle Term Exam 02/28 (Thursday), take home, turn in at noon time of 02/029 (Friday)

Project • 03/14 (Phase 1): 10% of training data is available for algorithm development • 04/04 (Phase 2): full training data and test examples are available • 04/17 (submission): submit your prediction before 11:59pm Apr. 20 (Wednesday) • 04/23 and 04/25: • Project presentation • Announce the competition results • 04/28: project report is due

Logistic Regression Rong Jin

Logistic Regression Generative models often lead to linear decision boundary Linear discriminatory model Directly model the linear decision boundary w is the parameter to be decided

Logistic Regression

Logistic Regression Learn parameter w by Maximum Likelihood Estimation (MLE) Given training data

Logistic Regression Convex objective function, global optimal Gradient descent Classification error

Illustration of Gradient Descent

Example: Heart Disease 1: 25-29 2: 30-34 3: 35-39 4: 40-44 5: 45-49 6: 50-54 7: 55-59 8: 60-64 • Input feature x: age group id • Output y: if having heart disease • y=1: having heart disease • y=-1: no heart disease

Example: Heart Disease

Example: Text Categorization Learn to classify text into two categories Input d: a document, represented by a word histogram Output y=1: +1 for political document, -1 for non-political document

Example: Text Categorization Training data

Example 2: Text Classification • Dataset: Reuter-21578 • Classification accuracy • Naïve Bayes: 77% • Logistic regression: 88%

Logistic Regression vs. Naïve Bayes • Both are linear decision boundaries • Naïve Bayes: • Logistic regression: learn weights by MLE • Both can be viewed as modeling p(d|y) • Naïve Bayes: independence assumption • Logistic regression: assume an exponential family distribution for p(d|y) (a broad assumption)

Discriminative vs. Generative Discriminative Models Generative Models Model P(x|y) Pros Usually fast converge Cheap computation Robust to noise data Cons Usually performs worse Model P(y|x) Pros Usually good performance Cons Slow convergence Expensive computation Sensitive to noise data

OverfittingProblem • Consider text categorization • What is the weight for a word j appears in only one training document dk?

Using regularization Without regularization Iteration Overfitting Problem Decrease in the classification accuracy of test data

Solution: Regularization • Regularized log-likelihood • The effects of regularizer • Favor small weights • Guarantee bounded norm of w • Guarantee the unique solution

Using regularization Without regularization Iteration Regularized Logistic Regression Classification performance by regularization

Regularization as Robust Optimization • Assume each data point is unknownbutbounded in a sphere of radius  and center xi

Sparse Solution by Lasso Regularization • RCV1 collection: • 800K documents • 47K unique words

Sparse Solution by Lasso Regularization How to solve the optimization problem? Subgradient descent Minimax

Bayesian Treatment Compute the posterior distribution of w Laplacian approximation

Bayesian Treatment Laplacian approximation

Multi-class Logistic Regression How to extend logistic regression model to multi-class classification ?

Conditional Exponential Model Let classes be Need to learn Normalization factor (partition function)

Conditional Exponential Model Learn weights ws by maximum likelihood estimation Any problem ?

Modified Conditional Exponential Model

Logistic Regression and its Application in Classification Problems

Logistic Regression and its Application in Classification Problems

Presentation Transcript

MID-TERM EXAM REVIEW

Mid-Term Exam

Mid term Exam Revision

Mid-term Exam

Review Mid-Term Exam

Middle East Exam

Middle Term Exam

End of Term Exam

Mid Term Exam

Middle Term Exam

Mid-Term Exam Outline

MID-TERM EXAM REVIEW

SP Mid-Term Exam

Mid-Term Exam

MGTO120 Mid-term Exam

Mid-term Exam

Mid-Term Exam

Final Exam Term-102

Mid-Term Exam

Mid term Exam

Mid-Term Exam

Mid term Exam