4nd meeting: Multilevel modeling: logistic regression Subjects for today:

4nd meeting: • Multilevel modeling: logistic regression • Subjects for today: • What is logistic regression? • Logistic regression in Mlwin • Event history analysis: an introduction

Up till now we discussed lineair regression: it was assumed that there is a lineair relationschip between X and Y. In logistic regression this assumption has changed to a curvelineair relationschip: This logistic curve is very interesting: it will never be below 0 and never larger than 1. This makes it suitable for the analyses of a dependent variable that has only scores 0 and 1 2

Take for instance a test in Math: you may pass the test (1) or you may fail to pass the test (0). (data: LOGSCHOOL.sav) Now we may want to know whether white students pass the test more frequently than non-whites: When you are a white student about 44% pass the test When you are non-white about 25% passes…. 3

There is more to it: for non whites there is a chance of .25 to pass the test (p1), and a chance of .75 to fail (p0). The odds then are p1/p0 .25/.75 or 1/3: out of every 4, 1 will pass, 3 will fail. For white the odds are .45 / .55 = about 8/10, out of every 18, eight will pass, 10 will fail. Now the ratio between the odds tells us something about the relationschip between passing the test and race: (8/10) / (1/3) = 2.3 This is called an odds ratio, it tells us how many times a particular odds (a) differs from another odds (b). When the odds ratio is 1 there is no relationship. Odds ratios range between 0 and infinity. 4

No relationship between passing the math test and race: When you are a white student about 40% pass the test When your you non-white about 40% passes…. Odds are 4/6 for both, oddsratio =1!! 5

Now, if we expand our analyses to x-variables with more than 2 categories and/or use interval variables cross-tables are poor instruments. Instead we use logistic regression analyses. Equation: log p1/p0 = a + b * X (where a and b are logit parameters) or p1/p0 = a * b * X , where a = odds and b is odds ratio. Table: output of logistic regression: dependent variable= to pass test (1 = passed, 0=failed, x-variable = race (white=1, other=0) 6

With log p1/p0 (with p1 the probability that Y=1 and p0 the probabilty that Y=0) we get the logistic curve for P1. P1 X This is statistically a good idea because it takes ito account floor and ceiling effects which occur quite easily with 0/1 data 7

Logistic regression in Mlwin: One level only: Odds = e -1.075 = 0.34 and odds ratio is e .847 = 2.33 (where e = 2.7182…) ICC= .658 / (.658 + 3.29) = 0.17 (3.29 is variance of e) For testing use t= .658 / .061= 10 with df= level 2 units Two levels: Alternatively use macro 'vpc.txt' in mlwin. See A User's Guide to MLwiN 2.0 p131-134 8

Adding another level 1 variable, this time an interval variable. The parameter .299 says that the log odds (p1/p0) increases with .299 for every hour extra spent on homework. This actually means that chances to pass (p1) increase when more time is spent on homework. In terms of odds ratio’s the odds to pass the test are multiplied with e .299 = 1.34 for every hour extra. 9

Now maybe we want to include the level 2 variable PUBLIC to test whether public schools do worse on passing the Math test: -0.6 tells us that chances to pass go down when on a public school (odds ratio = e -0.6 = 0.54!) 10

Now maybe we want to test whether homework varies across schools when it comes to passing the test. First thing is to set homework at random (do not worry about the significance of the (co)variance). The test of the interaction is on the next slide. 11

We continue with including a interaction between homework and public. It turns out that the effect for homework is lower when the student is on a private school (estimate .246) versus a public school (effect = .246 + .142 = .388) 12

Introduction to event history analyses Example: The Spanish Flu, that took many casulties world wide just after 1918. Maybe we record the data like (I’ve made the data up, based on http://en.wikipedia.org/wiki/1918_flu_pandemic, the fake data are in FLU.sav, spss syntax file to create person period file is in flu.sps). Every row has an individual and it is recorded whether person died from the flu during certain period. 13

Cross table on this data set: We have 200 people in the period 1-7 and 100 died within that period, but the effect of period is totally wrong here. 14

Now we turn this into a person period file: 15

A person period file has a observation for every period, this means that individuals are in the file more than once. In case we have an event that can occur only once then we have relatively low numbers coded 1 on the dependent variable. This means that p1 (the chance to encounter ‘Event’) which is called the hazard rate is rather low, while p0 (1-p1) is rather high. Recall that in logistic regression the parameter estimate for odds ratios is defined as b * p1/p0  because p0 is rather close to one we get: b * p1 so the b parameter is the number of times the hazard rate increases. A life table: 16

A person period file has a observation for every period, this means The life table again but now as a graph: We now get a clear view on the period effect: in period 4 the chances to die from the Spanish flu were about .18! Please note that this hazard rate is conditional upon the period, it says that IF you survived all periodes before t x then the chances to suvive tx is p1 (t) 17

A person period file can be analyzed with multilevel models in at least 3 cases: a) When you’ve got individuals nested in higher levels like districts and your dependent variable is a one-shot event (happens only once)  assignment 4 b) In case you’ve got individuals and your dependent variable is a recurrent event (happens more than once), for instance unemployment, level 1 is individuals, level 2 is period. c) In case you’ve got individuals nested in higher levels like districts and your dependent variable is a recurrent event (happens more than once), level 1 is individuals, level 2 is period and level 3 is district. 18

4nd meeting: Multilevel modeling: logistic regression Subjects for today:

4nd meeting: Multilevel modeling: logistic regression Subjects for today:

Presentation Transcript

Travel Demand Modeling Software Evaluation

Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)?

Multilevel Modeling Using HLM and MLwiN

Regression in geoDA

Lecture 9: Marginal Logistic Regression Model and GEE (Chapter 8)

Logistic Regression – Simultaneous Entry of Variables

Multiple Regression

Assessing Information from Multilevel and Continuous Tests

Hierarchical Binary Logistic Regression

Stepwise Binary Logistic Regression

Logistic Regression: For when your data really do fit in neat little boxes

PM 515 Behavioral Epidemiology Generalized Linear Regression Analysis

Chapter 3

Chapter 3

What statistical analysis should I use?

Homology Modeling

Chapter 12 Multiple Regression

Assessing Information from Multilevel and Continuous Tests

Applied Econometrics Second edition

Instrumental Variables Regression