Forecasting Choices
This document provides a comprehensive overview of Generalized Linear Models (GLM) in the context of forecasting choices, focusing on variable types such as continuous, discrete, qualitative, and ordinal. It elaborates on the logistic regression framework for binary and multinomial outcomes and explains the maximum likelihood estimation methodology. Key concepts like response categories, link functions, and hypotheses testing are discussed to aid in decision-making processes. Additionally, the significance of deviance, standard errors, and covariates in model evaluation is explored.
Forecasting Choices
E N D
Presentation Transcript
Types of Variable Continuous Quantitative Discrete (counting) Variable Ordinal Qualitative Nominal
Nominal or Ordinal Dependent Variable • Indicating “choices” of a decision maker, say a consumer. • Response categories: • Mutually exclusive • Collectively exhaustive • Finite Number • Desired regression outputs • Probability that the d.m. chooses each category • Coefficient of each independent variable
Generalized Linear Models (GLM) • Regression model for a continuous Y: Y = b0 + b1X1 + b2X2 + e , e following N(0, s) • GLM Formulation: • Model for Y: Y is N(m, s) • Link Function (model for the predictors) m = b0 + b1X1 + b2X2
Estimation of Parameters of GLM • Maximum Likelihood Estimation • For normal Y, MLE is the LS estimation • Maximize: • Sum of log (likelihood function), Li of each observation
MLE for Regression Model • Y is N(m, s) • MLE: Maximize
GLM for Binary Dependent Variable, Y • Model for response: Y is B (n, p) • Model for predictors (Link Function) logit(p) = b0 + b1X1 + b2X2 +… bKXK = g • Probability p = exp(g) / (1+exp(g))
X : Covariates • Independent variables are often referred to as “covariates.” • Example: • SPSS binary logistic regression routine • SPSS multinomial logistic regression routine
A. Logistic Regression For Ungrouped Data (ni=1) • Model of Observation for the i-th observation Yi = 1: Choose category 1 with probability pi Yi = 0: Choose category 2 with probability 1- pi • Log Likelihood Function for the i-th observation
MLE • Maximize:
Link Function, gi Parameters of the Likelihood ln(Likelihood) Li Setting Up a Worksheet for MLE • Define an array for storing parameters of the link function. Enter an initial estimate for each parameter. Then for each observation: • Sum the likelihood and invoke the solver to maximize by changing the parameters. • Multiply –2 to the maximized value for test of significance of the regression
Test of Significance • Hypotheses: H0: b1 = b2 …. bK = 0 H1: At least one bj = 0 • Test statistic: • The Distribution Under H0: c2(DF = K)
Standard Errors of Logistic Regression Coefficients (optional) • Estimate of Information Matrix, I (K=2)
Deviance Residuals and Deviance for Logistic Regression (Optional) • Deviance (corresponds to SSE) • Deviance Residual
B. Logistic Regression for Grouped Data Using WLS • The observation for the i-th group: -> -> ->
WLS for Logistic Regression • Regress: on X1i, …, XKi with
WLS for Unequal Variance Data 2 * Y * * 1 * Observation 2 is subject to a larger variance than observation 1. So, it makes sense to give a lower weight. In WLS, the weight is proportional to 1/variance. * X
Modeling of Forecasting Choices - GLM • Model for Observation of the Dependent Variable. A probability distribution • Link Function (Model for Independent Variables) A mathematical function
Forecasting Choices Binomial Distr. 2 # of Choices Multinomial Distr. > 2 Unordered Ordered
Multinomial Logit Regression • Multinomial Choice (m=3) , Ungrouped Data: • Y1=1: Choose category 1 with probability p1 • Y1=0: Choose category 2 or 3 with probability 1- p1 • Y2=1: Choose category 2 with probability p2 • Y2=0: Choose category 1 or 3 with probability 1- p2 • Y3=1: Choose category 3 with probability p3 • Y3=0: Choose category 1 or 2 with probability 1- p3
Log Likelihood Function • Log Likelihood Function of the i-th ungrouped observation • MLE: Maximize
Y3 and p3 can be omitted • Multinomial Choice (m=3) , Ungrouped Data: • Y1=1: Choose category 1 with probability p1 • Y1=0: Choose category 2 or 3 with probability 1- p1 • Y2=1: Choose category 2 with probability p2 • Y2=0: Choose category 1 or 3 with probability 1- p2
Log Likelihood Function • Log Likelihood Function of the i-th (ungrouped) observation • MLE: Maximize
1: Formulating “Link” Functions: Unordered Choice Categories • Category 3 as the baseline category.
Test of Significance • Hypotheses: H0: b11 = b21 = … bK1 = b12 = b22 = … bK2 = 0 H1: At least one bij = 0 • Test statistic • The Distribution Under H0: c2(DF = 2 K)
Interpreting Coefficients • Not easy, as a change of probability for one category affects probabilities for other (two) categories.
2: Formulating Link Functions: Ordered Choice Categories Category 1 Category 2 Category 3 g1 g2 Underlying Variable Defining Categories
Choices for Probability Distribution of U a. Ordered Probit Model for the i-th DM Ui = follows N(mi, s=1) b. Ordered Logit Model for the i-th DM Ui follows Logistic Distribution(mi) • mi = b1X1i + b2X2i (no const)
Types of Variable Continuous Quantitative Discrete (counting) Variable Ordinal Qualitative Nominal
Poisson Regression for Counting • Model of observations for Y • Link Function • Log Likelihood Function