PhD Research Methods Models December 1 & 15, 2008 Sessions

PhD Methods Duration Models December 1 & 15, 2008

Sessions & Inter-session Session 1: Objectives & Tools Questions → Models Models → Analyses Offline assignments Session 2: Analyses → Presentations Presentations → Questions

Objectivesand Tools

Objectives • Understand duration models • What questions they help answer? • What ‘flavors’ exist? • How they work? • What you need to check? • Get more practice with STATA • Watch me do tricks… • …and do your own with assignments!

Applied researcher’s toolkit • Reference texts • Woolridge’s “Econometric analysis of cross sectional and panel data” • Greene’s “Econometric analysis” • STATA Corp’s “Survival analysis and epidemiology tables” • Rabe-Hesketh & Everitt’s “A handbook of statistical analyses using STATA” • STATA v10

Questionsto Models

Typical questions? • Asked in life, engineering, economic and administrative sciences • Interested in the length of a spell of time: • What are predictors of this duration? • Does duration depend on elapsed time?

Key terms? • Spell (origin, failure, duration) • Hazard • At-risk • Risk set • Censoring (left, right)

A little theory • Spell length: T ~ f(t), f ‘nice’ • Cumulative probability: F(t) • Survival function: S(t) = 1-F(t) = Pr{T t} • Hazard rate: (t) = limt →0 Pr{T [t, t+t]|T t} / t = limt →0 {F( t+t) – F(t)} / t / S(t) = f(t) / S(t) = -d ln S(t)/dt

Hazard models • Non-parametric (e.g. Kaplan-Meier) • Semi-parametric (e.g. Cox) • Fully parametric (e.g. Weibull)

Kaplan-Meier non-parametric • Let’s look at a purely empirical estimate of the survivor function • Suppose nj is the number of units at risk before dj failures occur at tj • Then estimated Ŝ(t) = j|tjt ((nj – dj)/nj)

Kaplan-Meier implementation • KM curves are easily plotted in Stata: use http://www.statapress.com/data/r10/drugtr list stset sts graph sts graph, by(drug) ci level(95) sts graph, by(drug) sts test drug sts test drug, wilcoxon gen ageless57 =(age < 57) sts test ageless57

Cox’s semi-parametric model • Specification (ti) = exp (xi ) 0(ti) • Problem: estimate  in presence of the unknown individual heterogeneity 0(ti)? • Solution: condition on exactly 1 individual leaving risk set at time of interest

Cox’s model • Let Tk be the kth exit time, and let Rk be the at-risk set. • Then Pr{ti=Tk|Rk}=exp(xi )/j Rkexp(xj ) sweeping out the 0(ti) terms • Maximize this partial likelihood function ln L = k [xk  – ln (j Rkexp(xj ))]

Cox’s model if tied exit times • The partial likelihood function must now account for non-unique exit times. • Suppose there is a set Dj of failure times at time tj, and dj is the cardinality of that set, where Rj is at-risk set of units at tj ln L=j  D[k Dj xk  – djln(i Rkexp(xj ))]

Let’s try Cox’s model • STATA implements Cox’s model very clearly use http://www.statapress.com/data/r10/kva stset stcox load estimates store load stcox load bearings lrtest . load drop bearings stcox load %look at the estimated coefficient on load%

What’s Stata doing with Cox? • Look at the Excel spreadsheet in http://faculty.fuqua.duke.edu/~willm/Classes/PhD/PhD_2008_2009_LongStrat/Strategy591_2008_2009_ResearchMethods.htm • I’ve tried to show in easy sequences how the ado file in Stata parallels the partial likelihood function we just learned. • Note the log-like and estimate of  this spreadsheet yields. Compare with Stata

Key Cox assumptions • Recall the Cox specification for the hazard rate for individual i at time tk i(tk) = exp (xi ) 0(tk) • Consider the hazard ratio for two individuals i and m, again at time tk i(tk)/m(tk) = exp(xi)0(tk)/exp(xm)0(tk) = exp ((xi – xm)) ~ some proportionality constant

Testing Cox’s assumptions • Is global proportionality reasonable? use http://www.statapress.com/data/r10/drugtr gen ageless57 =(age < 57) sts graph, by(ageless57) %% curves roughly parallel?% stcox drug stcox drug, strata(ageless57) stphplot, by(ageless57) %% curves roughly parallel?% stcoxkm,by(ageless57) %% predicted vs observed?% • Mitigation with stratification

Testing Cox model residuals • Are there significant outliers? use http://www.statapress.com/data/r10/kva stcox load bearings, mgale (mart) predict devr, deviance predict xb, xb twoway scatter devr xb %% residuals look reasonable?% stcox load bearings, esr(score*) twoway scatter score1 failtime %% large deviations?% twoway scatter score2 failtime %% large deviations?%

Fully parametric • So far the underlying baseline hazard rate has been left unspecified. We can modify this assumption using parametric models. • Easiest choice is exponential survival function in which hazard rate is constant -d ln S(t)/dt = (t)  ⇒ S(t) = exp (-t)

Other fully parametric models • Weibull specification of a monotonic hazard rate with p > 0 (t)  p(t)p-1 use http://www.statapress.com/data/r10/kva streg load bearings, d(weibull) stcurve, haz streg load bearings, d(exponential) stcurve, haz sts, haz

Models to Analyses

Practice (1): Equine risks • stset the data • Basic non-parametric exploration • More parametric models • Model assumptions tested • Constructing additional variables as needed

Practice (2): Military risks • stset the data • Basic non-parametric exploration • More parametric models • Model assumptions tested • Constructing additional variables as needed

Practice (3): Hospital stay risks • stset the data Warning: this is a huge set • Basic non-parametric exploration • More parametric models • Model assumptions tested • Constructing additional variables as needed

Offline assignments

Assignments • Data assignment • Reading assignment

Assignments: data • Datasets from military, veterinary and medical science • Data may be fictional, is certainly de-identified, and should not be re-used • Think of a simple, plausible research question, model it, analyze one set of data, write up and present results (1-2 p)

Assignments: reading • Read and briefly critique each of: • Jensen, M. 2006. Should we stay or should we go? Accountability, status anxiety, and client defections. ASQ51: 97-128 • Rao H, Greve HR, Davis GF. 2001. Fool's gold: social proof in the initiation and abandonment of coverage by Wall Street analysts. ASQ46(3): 502-526 • My 2008 working paper on cardiologists

Assignments: reading… • Typical questions we’ll discuss • Are the research question, data and the model choice congruent? • How else could they have answered the question • Different data? • Different model? • Different analysis? • Is the presentation of the analyses clear and compelling? • Do you buy it? Why or why not? • What is left to do?

Assignments… • Reading and datasets posted at http://faculty.fuqua.duke.edu/~willm/Classes/PhD/PhD_2008_2009_LongStrat/Strategy591_2008_2009_ResearchMethods.htm • Email your write-up by next Friday, Dec 12, by close-of-business • Be prepared to discuss reading and answer questions on Monday, Dec 15

Analyses to Presentation

Equine data • What predicts fatal injury hazard here? • Does that make sense? • What model did you use and why? • How did you check it? • What summary results do you have? • What’s missing in the data? • What’s wrong with our model?

Discharge data • What predicts the discharge hazard? • Does that make sense? • What model did you use and why? • How did you check it? • What summary results do you have? • What’s missing in the data? • What’s wrong with our model?

Military data • What predicts the fatal wound hazard? • Does that make sense? • What model did you use and why? • How did you check it? • What summary results do you have? • What’s missing in the data? • What’s wrong with our model?

Presentation to Questions

Some recent ‘presentations’ • Jensen, M. 2006. Should we stay or should we go? Accountability, status anxiety, and client defections. Administrative Science Quarterly51: 97-128 • Rao H, Greve HR, Davis GF. 2001. Fool's gold: social proof in the initiation and abandonment of coverage by Wall Street analysts. Administrative Science Quarterly46(3): 502-526 • My working paper on cardiologists

Loose Ends & The End

We (probably) didn’t cover… • When covariates vary over time? • What to do about a lot of left censoring? • Frailty models for omitted variables • Shared frailty models to explain similarity in duration in groups of units  Stata manual and experimentation are almost always the best next steps

Summary • Neat modeling tools exist when you have data on timings and care about differences in timings and their reason • Really neat when you care about firm longevity, leadership durations, spells of some management activity

Thank you!

PhD Research Methods Models December 1 & 15, 2008 Sessions

PhD Research Methods Models December 1 & 15, 2008 Sessions

Presentation Transcript

METHODS OF PERSUASION

Disentangling Age-Period-Cohort Effects: New Models, Methods, and Empirical Applications

Spreadsheet Models for Enrollment Projections

Reflective Practice

2. Models for cognitive ergonomics

Camera calibration

Nonlinear Models with Spatial Data

Compassion Focused Therapy Derby December 2008

Qualitative Methods

15 th National Sahodaya Conference 10 th – 12 th December 2008

Linear Mixed Models: An Introduction

2008 ITRS Emerging Research Materials [ERM] December 6, 2008

3. Optimization Methods for Molecular Modeling

ASP.NET coding models

Commission ENERGY 2030 Preliminary Report

Database Systems Other Data Models

Beneficiary and partners

Dib Dib Dib: The role of questions when teaching EBP

Age-Period-Cohort Analysis: New Models, Methods, and Empirical Analyses

Forecasting using simple models

Electronic Business Models