180 likes | 305 Vues
This section delves into the analysis of duration data, focusing on the time until events occur, such as time until death, unemployment duration, or PhD completion. We discuss various statistical models, including the hazard function and survival analysis techniques, traditionally employed in lifespan studies of objects, and their applications in analyzing human events. Key concepts such as the probability density function (PDF), cumulative distribution function (CDF), and the importance of duration dependence are examined. Additionally, we explore data collection through the NHIS survey and identifying duration data in STATA.
E N D
Section Duration Data
Introduction • Sometimes we have data on length of time of a particular event or ‘spells’ • Time until death • Time on unemployment • Time to complete a PhD • Techniques we will discuss were originally used to examine lifespan of objects like light bulbs or machines. These models are often referred to as “time to failure”
Notation • T is a random variable that indicates duration (time til death, find a new job, etc) • t is the realization of that variable • f(t) is a PDF that describes the process that determines the time to failure • CDF is F(t) represents the probability an event will happen by time t
F(t) represents the probability that the event happens by ‘t’. • What is the probability a person will die on or before the 65th birthday?
Survivor function, what is the chance you live past (t) • S(t) = 1 – F(t) • If 10% of a cohort dies by their 65th birthday, 90% will die sometime after their 65th birthday
Hazard function, h(t) • What is the probability the spell will end at time t, given that it has already lasted t • What is the chance you find a new job in month 12 given that you’ve been unemployed for 12 months already
PDF, CDF (Failure function), survivor function and hazard function are all related • λ(t) = f(t)/S(t) = f(t)/(1-F(t)) • We focus on the ‘hazard’ rate because its relationship to time indicates ‘duration dependence’
Example: suppose the longer someone is out of work, the lower the chance they will exit unemployment – ‘damaged goods’ • This is an example of duration dependence, the probability of exiting a state of the world is a function of the length
Mathematically • d λ(t) /dt = 0 then there is no duration dep. • d λ(t) /dt > 0 there is + duration dependence the probability the spell will end increases with time • d λ(t) /dt < 0 there is – duration dependence the probability the spell will end decreases over time
Your choice, is to pick values for f(t) that have +, - or no duration dependence
Different Functional Forms • Exponential • λ(t)= λ • Hazard is the same over time, a ‘memory less’ process • Weibull • F(t) = 1 – exp(-γtα) where α,γ > 0 • λ(t) = αγtα-1 • if α>1, increasing hazard • if α<1, decreasing hazard • if α=1, exponential
NHIS Multiple Cause of Death • NHIS • annual survey of 60K households • Data on individuals • Self-reported healthm DR visits, lost workdays, etc. • MCOD • Linked NHIS respondents from 1986-1994 to National Death Index through Dec 31, 1995 • Identified whether respondent died and of what cause
Our sample • Males, 50-70, who were married at the time of the survey • 1987-1989 surveys • Give everyone 5 years (60 months) of followup
Key Variables • max_mths maximum months in the survey. • Diedin5 respondent died during the 5 years of followup • Note if diedn5=0, the max_mths=60. Diedin5 identifies whether the data is censored or not.
Identifying Duration Data in STATA • Need to identify which is the duration data stset length, failure(failvar) • Length=duration variable • Failvar=1 when durations end in failure, =0 for censored values • If all data is uncensored, omit failure(failvar)
In our case • Stset max_mths, failure(diedin5)
Getting Kaplan-Meier Curves • Tabular presentation of results sts list • Graphical presentation sts graph • Results by subgroup sts graph, by(income)