Regression using serial data
480 likes | 502 Vues
This study examines the effectiveness of a treatment on the growth of the upper jaw in Japanese children. The regression analysis is conducted using serial data to determine the impact of age on the dependent variable.
Regression using serial data
E N D
Presentation Transcript
Regression using serial data Jyoti Sarkar, IUPUI jsarkar@math.iupui.edu
The Problem Given: On n units, (x,y) “before” and (x,y) “after” a treatment Goal: Regress y on x • X=a predictor (easy/inexpensive) • Y=a response (difficult/expensive) • Assume n units are independent jsarkar@math.iupui.edu
Example Concern: Many Japanese exhibit a bigger lower jaw than upper jaw. Treatment (for growth of upper jaw): Children (4 -12 years old) wore a mouth gear 8 -10 hours daily for 1- 2 years. Questions • Was the treatment effective? … no control gp • How did measurements change with age? jsarkar@math.iupui.edu
Face Mask Experiment Sample: 25 boys, 18 girls “before” and “after” treatment • age (year day) From X-ray plates measure: • ccorr = corrected C-axis SM (mm) • theta = (C-axis, anterior cranial base SN) • alpha = (C-axis, palatal plane thro’ M) ( ½ degree) Objective: Regress y=ccorr on x=age jsarkar@math.iupui.edu
Face Mask data ☺ patient gender age1 theta1 alpha1 ccorr1 age2 theta2 alpha2 ccorr2 ☺ 1 2 4.99 39.0 35.0 76.076 5.99 40.5 32.0 77.064 ☺ 2 2 9.90 43.0 32.0 68.666 11.43 47.0 34.0 72.124 etc. jsarkar@math.iupui.edu
Regress y=ccorr on x=age:(1) “before” data: (n=18) >Regress ccorr1 on 1 age1 ccorr1=66.97 + 0.2530 age1 • r2=.023, r2(adj)=.000 • S=3.632, SE(b1)=0.4096, p-value=.545 • t.975,16=2.120 • 95% CI(b1) = (-0.6153,1.1214) jsarkar@math.iupui.edu
(2) “after” data: girls (n=18) >Regress ccorr2 on 1 age2 ccorr2=71.30 + 0.1142 age2 • r2=.006, r2(adj)=.000 • S=3.321, SE(b1)=0.3738, p-value=.764 • t.975,16=2.120 • 95% CI(b1) = (-0.6782, 0.9066) jsarkar@math.iupui.edu
(3) “superimposed” data: (n=36) Data size doubled, range expanded >Stack age1 age2 age >Stack ccorr1 ccorr2 ccorr >Regress ccorr on 1 age Ccorr =67.5636 + 0.3793 age • r2=.049, r2adj=.021 • S=3.745, SE(b1)=0.2880, p-value=.197 • t.975,34=2.032 • 95% CI(b1) = (-0.2060, 0.9646) jsarkar@math.iupui.edu
Regress y on x: naïve attempts All 3 naïve attempts yield • Low r2 • Large p-value => slope=0 • CI э 0 Conclusion: • Either “ccorr does not depend on age” • Or “we need a better regression model” jsarkar@math.iupui.edu
Serial Bivariate Plot • ccorr increases with age (for most girls) • Regression of ccorr on age should have positive slope,especially under treatment Why then is r2 low? Between-subject variation is high. Study within-subject change, to see if ccorr depends on age. jsarkar@math.iupui.edu
Within-subject change • Dage = age2 - age1 = Treatment duration • Dccorr = ccorr2 – ccorr1 = Change in ccorr • Dccorr / Dage = within-subject slope Means (n=18 girls) age2 = 8.39 ccorr2 = 72.26 age1 = 7.26 ccorr1 = 68.80 Dage = 1.13 Dccorr = 3.46 Dccorr/ Dage = 3.0251 Recall b1= (1) 0.2530 (2) 0.1142 (3) 0.3793 jsarkar@math.iupui.edu
Regress Dccorr on Dage >Regress dccorr on 1 dage; >noconstant. dccorr = 3.0763 dage S=2.374, SE(b1)=0.4847, p-value = .000 t.975,17=2.110 95% CI(b1) = (2.0536,4.0990) Conclusion: ccorr increases with age jsarkar@math.iupui.edu
A Paradox: • Naïve regression slopes are zero • Within-subject slope is non-zero What to do? • Find the proper regression model. • Repeated Measures/Growth Curves • Repeated Measures with Covariate • Serial Correlation jsarkar@math.iupui.edu
Serial Correlation Model 1 • Regression model ccorr = b0 + b1 age + error • error variables ID N(0,s2), dependent • Between-subject errors uncorrelated • Within-subject errors have correlation r jsarkar@math.iupui.edu
Regression Model 1 jsarkar@math.iupui.edu
If r unknown Pre-multiply by jsarkar@math.iupui.edu
Orthogonalized Model 1 jsarkar@math.iupui.edu
Stacking … jsarkar@math.iupui.edu
If r unknown jsarkar@math.iupui.edu
Algorithm: Estimate r 0. Begin = correlation(ccorr1, ccorr2) 1. Orthogonalize age and ccorr using to obtain tage & tccorr 2. Regress tccorr on 1 tage Save residuals 3. If = corr(tresi1, tresi2) < .001, STOP Else = + Go to Step 1. jsarkar@math.iupui.edu
MINITAB codes1 >corr c7 c12 # initial rho >let k3=.730 # enter above/updated rho >let k1=(1/sqrt(1+k3)+1/sqrt(1-k3))/2 >let k2=(1/sqrt(1+k3)-1/sqrt(1-k3))/2 # orthogonalize age >let c21=k1*c3+k2*c8 >let c22=k2*c3+k1*c8 >stack c21 c22 c31 >name c31 'tage' jsarkar@math.iupui.edu
MINITAB codes2 >let c23=k1*c7+k2*c12 # orthog… ccorr >let c24=k2*c7+k1*c12 >stack c23 c24 c32 >name c32 'tccorr' >regress 'tccorr' 1 'tage'; >resi c33; >coef c34. >unstack c33 c35 c36; subs c18. >corr c35 c36 # STOP if <.001, else >let k3=k3+corr(c35,c36)/2 jsarkar@math.iupui.edu
“Orthogonalized” data: (n=36) First iteration: (Model 1) Initial =.730 • tccorr =46.9184 + 1.1271 tage • r2=.216, r2(adj)=.193, p-value=.004 • Corr(tresi1, tresi2)=.191 Revised =.82545 jsarkar@math.iupui.edu
Iteration History (Model 1) Iter 0 .730 .191 1 .825 .066 2 .858 .012 3 .8641 .001 4 .8646 .000 5 .864621 jsarkar@math.iupui.edu
“Orthogonalized” data: (n=36) After Five iterations: =.8646 • Corr(tresi1, tresi2)=.000 • tccorr =42.132 + 1.6613 tage • r2=.347, r2(adj)=.328, • S=5.1319, SE(c1)=0.3908, p-value=.000 jsarkar@math.iupui.edu
Regress y on x : (Model 1) ccorr = 57.532 + 1.6613 age • =0.8646 • =5.2091, SE(b1)=.3967, p-value=.000 • t.975,33=2.0345 • 95% CI(b1) = (0.8560, 2.4683) jsarkar@math.iupui.edu
Serial Correlation Model 2 • Regression model 2 ccorr = b0 + b1 (age) + error • error variables ID N(0,s2), dependent • Between-subject errors uncorrelated • Within-subject errors have correlation r(age2-age1) jsarkar@math.iupui.edu
Regression Model 2 jsarkar@math.iupui.edu
MINITAB Codes 3 >let c19=‘age2’ – ‘age1’ >name c19 ‘dage’ >corr c7 c12 >let k3=.730 # enter above/updated correlation # use rDage to orthogonalize >let c51=(1/sqrt(1+k3**c19)+1/sqrt(1-k3**c19))/2 >let c52=(1/sqrt(1+k3**c19) -1/sqrt(1-k3**c19))/2 >let c21=c51*c3+c52*c8 >let c22=c52*c3+c51*c8 etc. jsarkar@math.iupui.edu
Iteration History: (Model 2) Iter 0 .730 .231 1 .845 .063 2 .877 .002 3 .8782 .001 4 .8781 .000 5 .878120 jsarkar@math.iupui.edu
“Orthogonalized” data: (n=36) After Five iterations: =.8781 Corr(tresi1, tresi2)=.000 • tccorr =57.935 intdage + 1.6097 tage • r2=.336, r2(adj)=.316, • S=5.092, SE(c1)=0.3912, p-value=.000 jsarkar@math.iupui.edu
Regress y on x : (Model 2) ccorr = 57.935 + 1.6098 age • =0.8781 • =5.169, SE(b1)=0.3971, p-value=.000 • t.975, 33=2.0345 • 95% CI(b1) = (0.8018, 2.4176) jsarkar@math.iupui.edu
Summary • Model serial data properly • Estimate serial correlation Use iterated algorithm • Regress orthogonalized data • Obtain regression of y on x • Adjust , SE(b1) and CI(b1) • Can extend to more repeats per subject jsarkar@math.iupui.edu
Thank you. jsarkar@math.iupui.edu