1 / 38

Survival Analysis 存活率分析

Survival Analysis 存活率分析. 張新儀 7/12/2004. Introduction. What is Survival Analysis? Outcome variable: time until an event occurs Time origin: precisely defined, comparable, needs not to be the same calendar date. time. Event. 0. Time. ×. ×. ×. ×. ×. ×. ×. ×. ×. ×. 1980. 1985.

erek
Télécharger la présentation

Survival Analysis 存活率分析

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Survival Analysis存活率分析 張新儀 7/12/2004

  2. Introduction • What is Survival Analysis? • Outcome variable: time until an event occurs • Time origin: precisely defined, comparable, needs not to be the same calendar date time Event 0

  3. Time × × × × × × × × × × 1980 1985 1987 0 28 Year since beginning of the study Year since diagnosis/entry Time: years, months, age, etc, positive Event: death, disease incidence, relapse (we only consider 1 event at this moment)

  4. Our Interests • Distribution of failure times • Comparison of the failure times • Effects of explaintary variables on survival • If the exact failure time of each individual is known, we can apply all the statistical techniques

  5. Censoring • Incomplete observation of failure time • Study ends before event happens • Lost to follow up • Withdraw from the study • Example Entered the study No event happened A Study ended Study started

  6. Censoring (con’t) Ex. AIDS patients survival years Steroid: 1, 1, 1, 1+, 4+, 5 Placebo: 1+, 2+, 3, 3+ + indicates no event occurs (1) Ignore +: Mean survival time of steroid treatment: 13/6 Mean survival time of placebo: 9/4 (2) Delete +: 3 left in the steroid group, only 1 remained in the placebo Traditional techniques do not work here!

  7. Type of Censoring • Notations Ti: potential failure time for ith individual Ci: potential censoring time… Xi: min (ti, Ci), observed time δi: 1, if Ti≦Ci(un censored) 0, if Ti > Ci (censored)

  8. Right Censoring (most of the cases) • The person’s exact survival time becomes incomplete at the right side of the follow-up time A × B End of study C ○ withdraw D End of study ○ lost E × F 0 2 4 6 8 10 12 2 events: A, F, 4 censored: B,C,D,E,

  9. Right Censoring (con’t) • Type I censoring: Study ends when a fixed time point is reached • Ex. 1 year study, Ci=C= 1 year, fixed in advance • Type II censoring: Study ends when a fixed number of failures occur • Ex. Study the life time of light buld, study ends when 10 light bulbs failed9

  10. Left Censoring A person’s survival time becomes incomplete at the left side of the follow-up period Interval Censoring Left censoring Survival time × HIV exposure HIV tested + 0 6 3 Event occurred between 3 and 6, but no exact information

  11. More Notation and Terminology • S(t)=survival function fundamental to survival analysis S(t)=P( T > t ) • The probability that a person survives longer than a specific time t • Obtaining survival probabilities for different values of t provides crucial summary information for survival pattern In practice: step function is observed 1 non-increasing S(0)=1 S(∞)=0 S(0)=1 S(t) 0

  12. Hazard Function Conditional probability Instantaneous potential per unit time for the event to occur given that the individual survived up to time t Hazard function does not give a probability. It gives a rate ranging from 0 to ∞. Properties: λ(t) ≧0, λ(t) has no upper bound

  13. Survival DistributionProbability distribution for(a)continuous (b) non-negative random variables Probability Density Function (pdf) f(t) Cumulative distribution Function P[t≦T<t+Δt] F(t) t t t+Δt 0 t F(t)=P[T≦t]=∫f(t)ds

  14. Some Special Distributions • Exponential Distribution (with parameter ρ) • f(t)= ρexp(﹣ρt) • S(t)=exp (﹣ρt), F(t)=1-exp (﹣ρt) • λ(t)=ρ, Λ(t)= ρt • E(T)=1/ρ • Lack of memory • Constant hazard • Coefficient of variation=s.d/mean=1

  15. Gamma Distribution (parameters: ρ,κ) • f(t)=ρ(ρt) (κ-1)exp(-ρt)/Γ(ρ) • S(t)=incomplete gamma function • E(T)=κ/ρ • λ(t) is monotone increasing from 0 ifκ>1, is monotone decreasing from∞, ifκ<1, and in either case approaches ρas t→∞ • If κ=1, the gamma distribution reduces to the exponential distribution

  16. Weibull Distribution (parameters: ρ,κ) • f(t)=κρ(ρt) κ-1exp[-(ρt) κ] • S(t)=exp [-(ρt) κ], F(t)=1-exp[-(ρt) κ] • λ(t)=κρ(ρt) κ-1, Λ(t)=(ρt) κ • Important generalization of the exponential distribution allows for a power dependence of the hazard on time • λ(t) is monotone increasing from 0 ifκ>1, is monotone decreasing from∞, ifκ<1 • If κ=1, the it reduces to the exponential hazard

  17. Other Distributions • Log-normal Distribution • Log logistic Distribution • Generalized Gamma Distribution • Gompertz-Makeham Distribution • Ways to select different distributions • the density function is not effective • plot (t) or log (t) vs. t or log t • plot (t) or log -S(t) or other transformation vs. t or log t

  18. RELATIONSHIPS • S(t), (t), (t), F(t)

  19. The Likelihood Function

  20. Statistical Inference • Ho: = o • Likelihood Ratio Statistic W(o)=W=2[ln (, )-ln(o, o)], where(, ) is the joint MLE of (, ), and  o is the MLE under null hypothesis W(o)~2 with d.f.=dim(w) • Wald statistic • Score statistic

  21. Example: Exponential Distribution • Exponential Distributed Failure Times • f(t)=exp(ρ), S(t)=exp(-ρt), λ(t)=ρ • l=㏑ (likelihood) =Σδi㏑ ρ- ρΣxi=d ㏑ρ-ρΣxi,where d is the total number of failures,Σxi is the censored + uncensored times (total time at risk) • MLE: d/Σxi=total # of failure/total time at risk

  22. Numerical Example • Gregory et al. NEJM 1978 (Severe viral hepatitis): 20 patients, follow-up 16 weeks, 14 steroid treatment, 15 control • Steroid: 1, 1, 1,1+, 4+, 5, 7, 8,10,10+, 12+,16+, 16+, 16+ • Control: 1+, 2+, 3,3,3+,5+, 5+, 16+(8) ρ μ d Σxi ln (ρ) Ratio of hazard: 0.0648/0.0133≒5, which group is at higher risk of dying?

  23. Advantages of Parametric Methods • Convenience for statistical inferences • Existence of explicit, reasonably simple forms for S(t), (t), and (t) • Capability of representing both over- and under-dispersion relative to exponential distribution • Qualitative shape of the hazard function • Behavior of S(t) for small times and large times are reasonably easy to study

  24. NONPARAMETRIC METHODS • Product-limit (Kaplan-Meier) estimator • Order the failure time. Then, the probability of surviving k ( 2) or more years from the beginning of the study is a product of k observed survival probability S(t) = P1 P2  P3 … Pk • Comparing two survival distributions: do not require the actual survival times, only need the order of the survival time.

  25. Product Limit Estimator (example) For t in S(t) 1 [0,1) [1,3) [3,4) [4,5) [5, ∞) 1 1×5/6=0.833 1×5/6×3/4=0.625 1×5/6×3/4×2/3=0.417 1×5/6×3/4×2/3×1/2=0.208 0 1 2 3 4 5 6 7 t How would you estimate the median survival time?

  26. Comparing Survival Times Same difference at 5 year, which is better? • Log rank (Mantel-Haenzel) test: order failure times in the pooled sample, e.g. samples from two treatments, then create a series of 22 tables according to ordered death times. The statistics compares observed value of deaths with what would be expected if the hazard at ti were the same in both groups.

  27. t1=2 t2=4 t3=10 t4=15 t5=19 t6=23 d s • Example A B χ2statistic will be calculated for this series of tables. As long as the difference between hazards has a consistent sign, logrank test usually does well. Use PROC LIFETEST in SAS to do the test. Gehan/Breslow Generalized Wilcoxon test, let Wi=Ni, better than longrank test in detecting early difference; worse at detecting later differences. Peto/Prentice generalized Wilcoxon test, let Wi=S(ti), similar to previous, but it doesn’t jump as wildly due to censoring

  28. Modeling • To compare two or more sets of data • To compare the effects of explanatory variables • What type of explanatory variables are usually of interests? • Treatments, characteristics of individuals (e.g. sex, age, medical history etc.) • Environmental variables (e.g. living conditions, working environment etc.) • interactions, etc. • Other way to classify: constant or time dependent covariates which may be influenced by the treatments under investigation.

  29. Types of Models • Accelerated Life Model • These models assume that the effect of independent variables on an event-time distribution is multiplicative on the event time. • T = ψ ( z ) T0= exp(x’) T0 • log (T) = log (T0 ) + x’ • choices of ψ ( z ) • ψ ( z ) ﹥0 • simple interpretation • mathematically simple if possible • ψ ( z )= exp(x’) is most commonly used

  30. Example (output from SAS PROC LIFEREG) • PROC LIFEREG; • MODEL time*event(0)=group/distribution=weibull; Variable Estimate Intercept 2.248 Group 1.267 Scale .732 In SAS T=Toψ ( z ) = To exp(x’) , E(T; control)=E(To)exp (β1); E(T; treat)=E(To)exp(β1+ β2) E(T; treat)/E(T; control)=exp(1.267)> 1, E(To) can be ignored.

  31. However, some may want to know the actual expected survival time, not just a relative measure • κ=1/0.732=1.35, ρ=exp (-2.248) • For the control group, all the covariates are zero, we have E[T; control]=1/ρΓ(1+1/κ) ≒exp(2.248) Γ(1+.732) ≒8.7 E[T; treat]=8.7exp(β2)=8.7exp (1.267) ≒3.9

  32. Note • κis not really of interest in the accelerated life model. It affects So(t) but not ψ ( z ) . Remember S(t; z)=So(t) ×ψ ( z ) =exp(-ρt) κ exp(x’) • Generalization  ψ ( z ) such thatS(t, z)=So(t ψ ( z ) )f(t;z)=fo(t ψ ( z ) ) ψ ( z )(t;z)= (t ψ ( z ) ) ψ ( z ) T=To/ ψ ( z ) o=E(lnTo)ln T=o - ln ψ ( z )+, where z and  are independent, E[  }=0.

  33. Proportional Hazard Model • Similar to accelerated life model, but focus on hazard (t,z)=ψ( z ) o(t) • A common form forψ( z ) =exp[z’] • Example, Weibull distribution o(t)=(t)κ1(t)= ψo(t)=(ψ1/κt)κ= (*t)κ • 1(t) is another Weibull with = * • Is proportional hazard model related to accelerated life time? Could a model be proportional hazard and ALT? • Only true in Weibull (try by yourself, if interested)

  34. Cox PH model • Response variable: observed (x, ), of interest: T or just (t), T’s distribution is some survival distribution with S(t), (t)[notice that no specific function is assumed)Observed covariates: z1, z2, … zk • (t| z1, z2, … zk)= o(t) exp(o+ 1 z1+ 2 z2 …+ k zk) unspecified

  35. Noteo(t) = (t| z1= z2= … zk=0) exp(o+ 1 z1+ 2 z2 …+ k zk)=RR= (t| z1, z2, … zk) / (t| z1= z2= … zk=0) • This is so called Proportional Hazards (very popular) • First proposed by Cox (JRSS, 1972) • Proportional Hazard model only good for survival functions do not cross; Relative risk (hazards) of death comparing covariates z1,z2,…zk to z1= z2= … zk=0

  36. Interpreting Coefficients for Cox’s Model • k is the log RR (hazard ratio) for a unit change in zk, given that all other covariates remain constant. • The RR comparing 2 set of values for the covariates (z1,z2,… zk) vs.(z1’,z2’,… z’k)RR= (t| z1, z2, … zk) / (t| z’1, z’2, … z’k)=exp[1(z1- z’1)+1(z2- z’2)… +1(zk -z’k)] • Fitting Cox model by maximizing the partial likelihood function • Linear assumption of the covariates • Observations are independent • SAS: PROC PHREG;

  37. References • Cox and Oaks (1984) Analysis of Survival Data • Lee E. (1992?) Statistical Methods for Survival Data Analysis • SAS manual for Survival Analysis • Some websites http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/default.htm

  38. PROC lifetest data=uis plots=(s); time time*censor(0); strata treat; run; Other outputs omitted. Pr > Test Test of Equality over Strata Chi-Square DF Chi-Square Log-Rank 6.7979 1 0.0091 Wilcoxon 9.4608 1 0.00212Log(LR) 7.8267 1 0.0051

More Related