1 / 30

Sequential Multiple Decision Procedures (SMDP) for Genome Scans

Sequential Multiple Decision Procedures (SMDP) for Genome Scans. Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School of Medicine Statistical Genetics Forum, April, 2006. References.

bina
Télécharger la présentation

Sequential Multiple Decision Procedures (SMDP) for Genome Scans

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequential Multiple Decision Procedures (SMDP)for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical GenomicsWashington University School of Medicine Statistical Genetics Forum, April, 2006

  2. References R.E. Bechhofer, J. Kiefer., M. Sobel. 1968. Sequential identification and ranking procedures. The University of Chicago Press, Chicago. M.A. Province. 2000. A single, sequential, genome-wide test to identify simultaneously all promising areas in a linkage scan. Genetic Epidemiology,19:301-332 . Q.Y. Zhang, M.A. Province.2005. Simplified sequential multiple decision procedures for genome scans.2005 Proceedings of American Statistical Association. Biometrics section:463~468

  3. SMDP SequentialMultiple Decision ProceduresSequential testMultiple hypothesis test

  4. Idea 1: Sequential Start from a small sample size Increase sample size, sequential testat each stage (SPRT) Stop when stopping rule is satisfied n0+1 n0+2 … n0 n0+i … Experiment in next stage Extra data for validation

  5. SNP1 SNP2 SNP2 SNP3 SNP3 SNP4 SNP4 SNP5 SNP5 SNP6 SNP6 … … SNPn SNPn Independent testBinary hypothesis test Simultaneous testMultiple hypothesis test Idea 2: Multiple Decision test 1 test 2 test 3 test 4 test 5 test 6 test n Signal group SNP1 Noise group test-wise error and experiment-wise error p value correction

  6. SNP2 SNP3 SNP4 SNP5 SNP6 … SNPn Binary Hypothesis Test test 1 H0: Eff.(SNP1)=0 vs. H1: Eff.(SNP1)≠0 test 2 H0: Eff.(SNP2)=0 vs. H1: Eff.(SNP2)≠0 test 3 …… test 4 …… test 5 …… test 6 …… test n H0: Eff.(SNPn)=0 vs. H1: Eff.(SNPn)≠0 SNP1

  7. SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 … SNPn Multiple Hypothesis Test H1: SNP1,2,3 are truly different from the others H2: SNP1,2,4 are truly different from the others H3 …… H4 …… H5: SNP4,5,6 are truly different from the others H6 …… … Hu: SNPn,n-1,n-2 are truly different from the others H: any t SNPs are truly different from the others (n-t) u= number of all possible combination of t out of n

  8. SMDP Sequential test Multiple hypothesis test Sequential Multiple Decision Procedure

  9. The freq/density function of a K-D population can be written in the form: f(x)=exp{P(x)Q(θ)+R(x)+S(θ)} The normal density function with unknown mean and known variance; The normal density function with unknown variance and known mean; The exponential density function with unknown scale parameter and known location parameter; The Bernoulli distribution with unknown probability of “success” on a single trial; The Poisson distribution with unknown mean; …… The distance of two K-D populations is defined as : Koopman-Darmois(K-D) Populations(Bechhofer et al., 1968)

  10. SMDP (Bechhofer et al., 1968)Selecting the t best of M K-D populations U possible combinations of t out of M Sequential Sampling 1 2 … h h+1 … Pop. 1 Pop. 2 : Pop. t-1 Pop. t Pop. t+1 Pop. t+2 : Pop. M For each combinationu Y1,h Y2,h : : Yi,h : : : YM,h D Stopping rule Prob. of correct selection (PCS) > P*, whenever D>D*

  11. P* arbitrary, 0.95 t fixed or varied D* indifference zone Pop. 1 Pop. 2 : Pop. t-1 Pop. t Pop. t+1 Pop. t+2 : : : Pop. M SMDP stopping rule SMDP: P*, t, D* Prob. of correct selection (PCS) > P* whenever D>D* Correct selection Populations with Q(θ)>Q(θt)+D* are selected Q(θt) D Q(θt)+D D* Q(θt)+D*

  12. SMDP: Computational Problem Sequential stage 1 2 3 : h h+1 : N Y1,h Y2,h : Yt,h Yt+1,h Yt+2,h : YM,h U sums of U possible combinations of t out of M Each sum contains t members of Yi,h Computertime ?

  13. Simplified Stopping Rule(Bechhofer et al., 1968) U-S+1= Top Combination Number (TCN) TCN=2 (i.e. S=U-1,U-S=1)=> the simplest stopping rule When TCN=U (i.e. S=1, U-S=U-1)=> the original stopping rule How to choose TCN? Balance between computational accuracy and computational time

  14. SMDP Combined With Regression Model(M.A. Province, 2000, page 320-321) Data pairs for a marker Z1 , X1 Z2 , X2 Z3 , X3 : : Zh , Xh Zh+1 , Xh+1 : : ZN , XN Sequential sum of squares of regression residuals Yi,h denotes Y for marker i at stage h

  15. Combine SMDP With Regression Model(M.A. Province, 2000, page 319) Case B : the normal density function with unknown variance and known mean;

  16. Simplified Stopping Rule M.A. Province, 2000 page 321-322

  17. A Real Data Example (M.A. Province, 2000, page 310)

  18. A Real Data Example (M.A. Province, 2000, page 308)

  19. Simulation Results (1) M.A. Province, 2000, page 312

  20. Simulation Results (2) M.A. Province, 2000, page 313

  21. Simplified SMDP(Bechhofer et al., 1968) U-S+1= Top Combination Number (TCN) How to choose TCN? Balance between computational accuracy and computational time

  22. Data

  23. Zhang & Province,2005,page 465 Relation of W and t (h=50, D*=10) Effective Top Combination Number ETCN

  24. Zhang & Province,2005,page 466 ETCN Curve

  25. Zhang & Province,2005,page 466 t =?

  26. Zhang & Province,2005,page 467 P*=0.95 D*=10 TCN=10000 72 SNPs P<0.01

  27. SMDP Summary Advantages: • Test, identify all signals simultaneously,no multiple comparisons • Use “Minimal” N to find significant signals,efficient • Tight control statistical errors (Type I, II), powerful • Save rest of N for validation,reliable Further studies: • Computer time • Extension to more methods/models • Extension to non-K-D distributions

  28. Thanks !

More Related