500 likes | 524 Vues
Lecture 9: Hypothesis Testing. One sample tests > 2 sample. Hypothesis Testing for One-Sample. Standard set-up What is q ? Common approach Assume distribution is exponential Test that distribution is exponential with q = q 0. Pretty Stringent. Actually
E N D
Lecture 9: Hypothesis Testing One sample tests >2 sample
Hypothesis Testing for One-Sample • Standard set-up • What is q ? • Common approach • Assume distribution is exponential • Test that distribution is exponential with q = q0
Pretty Stringent • Actually • As long as the hazard is specified for the range of t, tests can be performed
General Form of Test When H0 is true When H0 is true; assuming large N Note: this is a one-sided test to test h(t) > h0(t)
Log-Rank • W(ti) = Y(ti) (the most popular choice of weight function) • Y(ti) is the number of individuals in the risk set at time ti
Accounting for Left-Truncation • Choice of weights is still W(t) = Y(t)
Other Options • Harrington and Fleming • WHF(t)=Y(t)*S0(t)p *[1-S0(t)]q, where p,q≥0 and S0(t)=exp(-H0(t)) • Allows user to have flexibility in weighting • Can choose early (p>>q) or late (p<<q) departures or departures in the mid-range (p=q>0) from the null hypothesis to be more influential • Special case: log-rank test, p = q= 0
Notes • An estimator of the variance, V, can be the empirical estimate rather than the hypothesized value • When the alternative, h(t) > h0(t) is true, this variance estimator is expected to be larger and the test less powerful • If h(t) < h0(t) then this variance will be smaller and the test more powerful
Example: Rheumatoid Arthritis • 10 white males with RA followed for up to 18 years • Objective: • Determine if men with RA are at greater risk of mortality
Bone Marrow Transplant for Leukemia (example 1.3 in the book) • Patient undergoing bone marrow transplant (BMT) for acute leukemia • Three types of leukemia • ALL • AML low risk • AML high risk • What if we are interested in overall incidence rate (i.e. either relapse or death) across all three leukemia types
Estimated KM survival probability for all incidence (i.e. both death and TRM)
BMT Example • Want to test whether or not survival in BMT patients follows an exponential distribution • What does this mean we are asking? • Can estimate l from the data (recall the MLE for an exponential distribution)
R Code ### BMT example data<-read.csv("H:\\public_html\\BMTRY722_Summer2019\\Data\\BMT_1_3.csv“) failtime<-ifelse(data$Relapse==0 & data$Death==0| data$Relapse==1, data$TTR, NA) failtime<-ifelse(data$Death==1 & data$TTR>=data$TTD, data$TTD, failtime) event<-ifelse(data$Relapse==1| data$Death==1, 1, 0) st<-Surv(failtime, event) fit<-survfit(st~1) # empirical survival function plot(fit, xlab="Time", ylab="S(t)", lwd=2) #Calculating lambda hat for estimated hazard rate lambda.hat<-sum(event)/sum(failtime)
“survdiff” Function Description Tests if there is a difference between two or more curves using the G-rho family of tests, or for a single curve against a known alternative Usage survdiff(formula, data, subset, na.action, rho=0) Arguments formula: a formula expression as for other survival models, of the form Surv(time, status)~predictors. For a one-sample test, the predictors must consist of a single offset(sp) term, where sp is a vector giving the survival probability for each subject
“survdiff” Function Method This function implements the G-rho family of Harrington and Fleming (1982), with weights on each death of S(t)^rho, where S is the Kaplan-Meier estimate of survival. With rho=0 this is the log-rank or Mantel-Haenszel test, and with rho=1 it is the equivalent to the Peto & Peto modification of the Gehan-Wilcoxon test. If the right hand side of the formula consists only of an offset term, then a one sample test is done. To cause the missing values in the predictors to be treated as a separate group, rather than being omitted, use a factor function with its exclude argument.
R code #Estimating lambda >lambda.hat<-sum(event)/sum(failtime) # Expected S(t) = exp(-lambda.hat*t) > S.exp<-exp(-lambda.hat*failtime) > one.sample.test<-survdiff(st~offset(S.exp)) # default rho is 0 i.e. log-rank test > one.sample.test1 Observed Expected Z p 83 83 0 1 > one.sample.test2<-survdiff(st~offset(S.exp), rho=1) > one.sample.test2 Observed Expected Z p 83 83 0 0.00521 #Comparing hypothesized dist’n to empirical dist’n > plot(fit, conf.int=F, lwd=2) > lines(sort(failtime), rev(sort(S.exp)), col=2, lwd=2, type="s")
R code #Estimating lambda for failure times <800 > fail2<-failtime[which(failtime<800)] > event2<-event[which(failtime<800)] > lambda.hat2<-sum(event2)/sum(fail2) # Expected S(t) = exp(-.004*t) > S.exp2<-exp(- lambda.hat2 *fail2) > st2<-Surv(fail2, event2); fit2<-survfit(st2~1) > one.sample.testa<-survdiff(st2~offset(S.exp2)) > one.sample.testa Observed Expected Z p 80 80 0 1 > one.sample.testb<-survdiff(st2~offset(S.exp2), rho=1) > one.sample.testb Observed Expected Z p 80 80 0.000 0.477
R code #Estimating lambda for failure times >800 > fail3<-failtime[which(failtime>=800)] > event3<-event[which(failtime>=800)] > lambda.hat3<-sum(event3)/sum(fail3) # Expected S(t) = exp(-.004*t) > S.exp3<-exp(- lambda.hat3*fail3) > st3<-Surv(fail3, event3); fit3<-survfit(st3~1) > one.sample.testc<-survdiff(st3~offset(S.exp3)) > one.sample.testc Observed Expected Z p 3 3 -2.56e-16 1 > one.sample.testd<-survdiff(sts~offset(S.exp3), rho=1) > one.sample.testd Observed Expected Z p 3 3 -0.035 0.9730
Conclusions • So what can we conclude about our original hypothesis?
Relevance • Becoming more common • Phase II cancer studies with TTE outcomes instead of response • But • Often more interested in median or 1 year survival • Yet • Very important for sample size considerations • Most often assume study data will have exponential distribution for sample size
Comparing two or more samples • Anova type approach • Where t is the largest time for which all groups have at least one subject at risk • Data can be right-censored (and left truncated) for the tests we will discuss
Notation • Let t1 < t2 < … < tDbe distinct death times in all samples being compared • At time ti, let dijbe the number of events in group j out of Yijindividuals at risk (j = 1,2,…,K) • Define
Rationale • Weighted comparisons of the estimated hazard of thejth population under the null hypothesis and alternative hypothesis • Based on Nelson-Aalen estimator • If the null is true, the pooled estimate of h(t) should be an estimator for hj(t)
Applying the Test • Let Wj(t) be a positive weight function s.t. Wj(t) = 0 if Yij = 0 • If all Zj(t)’s are close to zero, then little evidence to reject the null
Common Form for Weight Functions • All commonly used tests choose weight functions s.t. • Note that weight is common across allj • Can redefine Z:
Test Statistic • Variance and covariance of Zj(t) (K&M p. 207) • Z1(t) , Z2(t) , ..., ZK(t) are linearly dependent because their sum is 0 • For test statistic, choose K – 1 components • Chi-square test with K – 1 d.f. where S-1 is the variance-covariance matrix
Log-Rank Test for 2 Groups • For log-rank W(ti)=1 • Have 2 groups and want to test if survival is the same in the groups • We want to develop a nonparametric test of
Log-Rank Test for 2 Groups • If and follow some parametric distribution and are in the same family, this is easy • For example assume • But need a test whose validity doesn’t depend on parametric assumptions
Constructing the Log-Rank Test • Recall our notation • t1 < t2 < … < tDare D distinct ordered event times • Yij= # people in the group j at risk at ti • Yi = # people at risk across groups at ti • dij = # of people in group jthat fail at ti • di= # of people in across groupsthat fail at ti
Constructing the Log-Rank Test • We can summarize the information at time ti in a 2x2 table
Toy Example • Say we have the following data on two groups: • We want to test the hypothesis
Same Test in R > time<-c(3,6,9,9,11,16,8,9,10,12,19,23) > cens<-c(1,0,1,1,0,1,1,1,0,0,1,0) > grp<-c(1,1,1,1,1,1,2,2,2,2,2,2) > grp<-as.factor(grp) > > sdat<-Surv(time, cens) > survdiff(sdat~grp) Call: survdiff(formula = sdat ~ grp) N Observed Expected (O-E)^2/E (O-E)^2/V grp=1 6 4 2.57 0.800 1.62 grp=2 6 3 4.43 0.463 1.62 Chisq= 1.6 on 1 degrees of freedom, p= 0.203
Same Test in R > names(toy) [1] "n" "obs" "exp" "var" "chisq" "call" > toy$obs [1] 4 3 > toy$exp [1] 2.566667 4.433333 > toy$var [,1] [,2] [1,] 1.267778 -1.267778 [2,] -1.267778 1.267778 > toy$chisq [1] 1.620508
More general: 2 samples • We can change the weight function • For K = 2, can use Z-score or c2 Corrects for ties
Choice for Weight Functions • W(t) = 1 • Log-rank test • Optimal power for detecting differences when hazards are proportional • Wi(t) = Yi • Gehan test • Generalization of 2-sample Mann-Whitney-Wilcoxon test
Choices for Weight Functions • Fleming-Harrington • General case • Special cases • Log-rank: q = 0 • Mann-Whitney-Wilcoxon: p = 1, q = 0 • q = 0, p > 0: gives greater weight to early departures • p = 0, q > 0: gives greater weight to late departures • Allows specific choice of influence (for better or worse!)
Others? • Many • Not all available in all software (e.g. Gehan not in R) • Worth trying a few in each situation to compare inferences
Caveat • Note we are interested in the average difference (consider log-rank specifically) • What if hazards cross? • Could have significant difference prior to some t, and another significant difference after t: but what if direction differs?
Next time • More on different weight functions • Tests for trends