380 likes | 714 Vues
Survival Analysis 存活率分析. 張新儀 7/12/2004. Introduction. What is Survival Analysis? Outcome variable: time until an event occurs Time origin: precisely defined, comparable, needs not to be the same calendar date. time. Event. 0. Time. ×. ×. ×. ×. ×. ×. ×. ×. ×. ×. 1980. 1985.
 
                
                E N D
Survival Analysis存活率分析 張新儀 7/12/2004
Introduction • What is Survival Analysis? • Outcome variable: time until an event occurs • Time origin: precisely defined, comparable, needs not to be the same calendar date time Event 0
Time × × × × × × × × × × 1980 1985 1987 0 28 Year since beginning of the study Year since diagnosis/entry Time: years, months, age, etc, positive Event: death, disease incidence, relapse (we only consider 1 event at this moment)
Our Interests • Distribution of failure times • Comparison of the failure times • Effects of explaintary variables on survival • If the exact failure time of each individual is known, we can apply all the statistical techniques
Censoring • Incomplete observation of failure time • Study ends before event happens • Lost to follow up • Withdraw from the study • Example Entered the study No event happened A Study ended Study started
Censoring (con’t) Ex. AIDS patients survival years Steroid: 1, 1, 1, 1+, 4+, 5 Placebo: 1+, 2+, 3, 3+ + indicates no event occurs (1) Ignore +: Mean survival time of steroid treatment: 13/6 Mean survival time of placebo: 9/4 (2) Delete +: 3 left in the steroid group, only 1 remained in the placebo Traditional techniques do not work here!
Type of Censoring • Notations Ti: potential failure time for ith individual Ci: potential censoring time… Xi: min (ti, Ci), observed time δi: 1, if Ti≦Ci(un censored) 0, if Ti > Ci (censored)
Right Censoring (most of the cases) • The person’s exact survival time becomes incomplete at the right side of the follow-up time A × B End of study C ○ withdraw D End of study ○ lost E × F 0 2 4 6 8 10 12 2 events: A, F, 4 censored: B,C,D,E,
Right Censoring (con’t) • Type I censoring: Study ends when a fixed time point is reached • Ex. 1 year study, Ci=C= 1 year, fixed in advance • Type II censoring: Study ends when a fixed number of failures occur • Ex. Study the life time of light buld, study ends when 10 light bulbs failed9
Left Censoring A person’s survival time becomes incomplete at the left side of the follow-up period Interval Censoring Left censoring Survival time × HIV exposure HIV tested + 0 6 3 Event occurred between 3 and 6, but no exact information
More Notation and Terminology • S(t)=survival function fundamental to survival analysis S(t)=P( T > t ) • The probability that a person survives longer than a specific time t • Obtaining survival probabilities for different values of t provides crucial summary information for survival pattern In practice: step function is observed 1 non-increasing S(0)=1 S(∞)=0 S(0)=1 S(t) 0
Hazard Function Conditional probability Instantaneous potential per unit time for the event to occur given that the individual survived up to time t Hazard function does not give a probability. It gives a rate ranging from 0 to ∞. Properties: λ(t) ≧0, λ(t) has no upper bound
Survival DistributionProbability distribution for(a)continuous (b) non-negative random variables Probability Density Function (pdf) f(t) Cumulative distribution Function P[t≦T<t+Δt] F(t) t t t+Δt 0 t F(t)=P[T≦t]=∫f(t)ds
Some Special Distributions • Exponential Distribution (with parameter ρ) • f(t)= ρexp(﹣ρt) • S(t)=exp (﹣ρt), F(t)=1-exp (﹣ρt) • λ(t)=ρ, Λ(t)= ρt • E(T)=1/ρ • Lack of memory • Constant hazard • Coefficient of variation=s.d/mean=1
Gamma Distribution (parameters: ρ,κ) • f(t)=ρ(ρt) (κ-1)exp(-ρt)/Γ(ρ) • S(t)=incomplete gamma function • E(T)=κ/ρ • λ(t) is monotone increasing from 0 ifκ>1, is monotone decreasing from∞, ifκ<1, and in either case approaches ρas t→∞ • If κ=1, the gamma distribution reduces to the exponential distribution
Weibull Distribution (parameters: ρ,κ) • f(t)=κρ(ρt) κ-1exp[-(ρt) κ] • S(t)=exp [-(ρt) κ], F(t)=1-exp[-(ρt) κ] • λ(t)=κρ(ρt) κ-1, Λ(t)=(ρt) κ • Important generalization of the exponential distribution allows for a power dependence of the hazard on time • λ(t) is monotone increasing from 0 ifκ>1, is monotone decreasing from∞, ifκ<1 • If κ=1, the it reduces to the exponential hazard
Other Distributions • Log-normal Distribution • Log logistic Distribution • Generalized Gamma Distribution • Gompertz-Makeham Distribution • Ways to select different distributions • the density function is not effective • plot (t) or log (t) vs. t or log t • plot (t) or log -S(t) or other transformation vs. t or log t
RELATIONSHIPS • S(t), (t), (t), F(t)
Statistical Inference • Ho: = o • Likelihood Ratio Statistic W(o)=W=2[ln (, )-ln(o, o)], where(, ) is the joint MLE of (, ), and  o is the MLE under null hypothesis W(o)~2 with d.f.=dim(w) • Wald statistic • Score statistic
Example: Exponential Distribution • Exponential Distributed Failure Times • f(t)=exp(ρ), S(t)=exp(-ρt), λ(t)=ρ • l=㏑ (likelihood) =Σδi㏑ ρ- ρΣxi=d ㏑ρ-ρΣxi,where d is the total number of failures,Σxi is the censored + uncensored times (total time at risk) • MLE: d/Σxi=total # of failure/total time at risk
Numerical Example • Gregory et al. NEJM 1978 (Severe viral hepatitis): 20 patients, follow-up 16 weeks, 14 steroid treatment, 15 control • Steroid: 1, 1, 1,1+, 4+, 5, 7, 8,10,10+, 12+,16+, 16+, 16+ • Control: 1+, 2+, 3,3,3+,5+, 5+, 16+(8) ρ μ d Σxi ln (ρ) Ratio of hazard: 0.0648/0.0133≒5, which group is at higher risk of dying?
Advantages of Parametric Methods • Convenience for statistical inferences • Existence of explicit, reasonably simple forms for S(t), (t), and (t) • Capability of representing both over- and under-dispersion relative to exponential distribution • Qualitative shape of the hazard function • Behavior of S(t) for small times and large times are reasonably easy to study
NONPARAMETRIC METHODS • Product-limit (Kaplan-Meier) estimator • Order the failure time. Then, the probability of surviving k ( 2) or more years from the beginning of the study is a product of k observed survival probability S(t) = P1 P2  P3 … Pk • Comparing two survival distributions: do not require the actual survival times, only need the order of the survival time.
Product Limit Estimator (example) For t in S(t) 1 [0,1) [1,3) [3,4) [4,5) [5, ∞) 1 1×5/6=0.833 1×5/6×3/4=0.625 1×5/6×3/4×2/3=0.417 1×5/6×3/4×2/3×1/2=0.208 0 1 2 3 4 5 6 7 t How would you estimate the median survival time?
Comparing Survival Times Same difference at 5 year, which is better? • Log rank (Mantel-Haenzel) test: order failure times in the pooled sample, e.g. samples from two treatments, then create a series of 22 tables according to ordered death times. The statistics compares observed value of deaths with what would be expected if the hazard at ti were the same in both groups.
t1=2 t2=4 t3=10 t4=15 t5=19 t6=23 d s • Example A B χ2statistic will be calculated for this series of tables. As long as the difference between hazards has a consistent sign, logrank test usually does well. Use PROC LIFETEST in SAS to do the test. Gehan/Breslow Generalized Wilcoxon test, let Wi=Ni, better than longrank test in detecting early difference; worse at detecting later differences. Peto/Prentice generalized Wilcoxon test, let Wi=S(ti), similar to previous, but it doesn’t jump as wildly due to censoring
Modeling • To compare two or more sets of data • To compare the effects of explanatory variables • What type of explanatory variables are usually of interests? • Treatments, characteristics of individuals (e.g. sex, age, medical history etc.) • Environmental variables (e.g. living conditions, working environment etc.) • interactions, etc. • Other way to classify: constant or time dependent covariates which may be influenced by the treatments under investigation.
Types of Models • Accelerated Life Model • These models assume that the effect of independent variables on an event-time distribution is multiplicative on the event time. • T = ψ ( z ) T0= exp(x’) T0 • log (T) = log (T0 ) + x’ • choices of ψ ( z ) • ψ ( z ) ﹥0 • simple interpretation • mathematically simple if possible • ψ ( z )= exp(x’) is most commonly used
Example (output from SAS PROC LIFEREG) • PROC LIFEREG; • MODEL time*event(0)=group/distribution=weibull; Variable Estimate Intercept 2.248 Group 1.267 Scale .732 In SAS T=Toψ ( z ) = To exp(x’) , E(T; control)=E(To)exp (β1); E(T; treat)=E(To)exp(β1+ β2) E(T; treat)/E(T; control)=exp(1.267)> 1, E(To) can be ignored.
However, some may want to know the actual expected survival time, not just a relative measure • κ=1/0.732=1.35, ρ=exp (-2.248) • For the control group, all the covariates are zero, we have E[T; control]=1/ρΓ(1+1/κ) ≒exp(2.248) Γ(1+.732) ≒8.7 E[T; treat]=8.7exp(β2)=8.7exp (1.267) ≒3.9
Note • κis not really of interest in the accelerated life model. It affects So(t) but not ψ ( z ) . Remember S(t; z)=So(t) ×ψ ( z ) =exp(-ρt) κ exp(x’) • Generalization  ψ ( z ) such thatS(t, z)=So(t ψ ( z ) )f(t;z)=fo(t ψ ( z ) ) ψ ( z )(t;z)= (t ψ ( z ) ) ψ ( z ) T=To/ ψ ( z ) o=E(lnTo)ln T=o - ln ψ ( z )+, where z and  are independent, E[  }=0.
Proportional Hazard Model • Similar to accelerated life model, but focus on hazard (t,z)=ψ( z ) o(t) • A common form forψ( z ) =exp[z’] • Example, Weibull distribution o(t)=(t)κ1(t)= ψo(t)=(ψ1/κt)κ= (*t)κ • 1(t) is another Weibull with = * • Is proportional hazard model related to accelerated life time? Could a model be proportional hazard and ALT? • Only true in Weibull (try by yourself, if interested)
Cox PH model • Response variable: observed (x, ), of interest: T or just (t), T’s distribution is some survival distribution with S(t), (t)[notice that no specific function is assumed)Observed covariates: z1, z2, … zk • (t| z1, z2, … zk)= o(t) exp(o+ 1 z1+ 2 z2 …+ k zk) unspecified
Noteo(t) = (t| z1= z2= … zk=0) exp(o+ 1 z1+ 2 z2 …+ k zk)=RR= (t| z1, z2, … zk) / (t| z1= z2= … zk=0) • This is so called Proportional Hazards (very popular) • First proposed by Cox (JRSS, 1972) • Proportional Hazard model only good for survival functions do not cross; Relative risk (hazards) of death comparing covariates z1,z2,…zk to z1= z2= … zk=0
Interpreting Coefficients for Cox’s Model • k is the log RR (hazard ratio) for a unit change in zk, given that all other covariates remain constant. • The RR comparing 2 set of values for the covariates (z1,z2,… zk) vs.(z1’,z2’,… z’k)RR= (t| z1, z2, … zk) / (t| z’1, z’2, … z’k)=exp[1(z1- z’1)+1(z2- z’2)… +1(zk -z’k)] • Fitting Cox model by maximizing the partial likelihood function • Linear assumption of the covariates • Observations are independent • SAS: PROC PHREG;
References • Cox and Oaks (1984) Analysis of Survival Data • Lee E. (1992?) Statistical Methods for Survival Data Analysis • SAS manual for Survival Analysis • Some websites http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/default.htm
PROC lifetest data=uis plots=(s); time time*censor(0); strata treat; run; Other outputs omitted. Pr > Test Test of Equality over Strata Chi-Square DF Chi-Square Log-Rank 6.7979 1 0.0091 Wilcoxon 9.4608 1 0.00212Log(LR) 7.8267 1 0.0051