1 / 23

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression. John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009. 1. http://www.stat.usu.edu/~jrstevens/pcmi. Why this workshop?. Me … Outreach mission of USU

gparrish
Télécharger la présentation

Advanced Statistical Methods: Beyond Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Statistical Methods:Beyond Linear Regression John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

  2. Why this workshop? • Me … • Outreach mission of USU • Recruitment – undergraduate & graduate • Too much fun • You … 2

  3. Outline • Notes 1: Case Study Data sets • 1. Challenger Explosion • 2. Beetle Fumigation • 3. T-cell Cancer • Notes 2: Statistical Methods I • Logistic Regression – incl. Separation of Points • EM Algorithm • Notes 3: Statistical Methods II • Tests for Differential Expression • Multiple hypothesis testing • Visualization • Machine Learning • Notes 4: Computer Implementation • (Notes 5): Bonus Material 3

  4. Case Study 1: Challenger • January 18, 1986 explosion prompted the Presidential Commission on the Space Shuttle Challenger Accident • Commission's 1986 report attributed the explosion to a burn through of an O-ring seal at a field joint in one of the solid-fuel rocket boosters • After each of the previous 24 launches, the solid rocket boosters were inspected, and the presence or absence of damage to the field joint was noted

  5. Obs Flight Temp Damage 1 STS1 66 NO 2 STS9 70 NO 3 STS51B 75 NO 4 STS2 70 YES 5 STS41B 57 YES 6 STS51G 70 NO 7 STS3 69 NO 8 STS41C 63 YES 9 STS51F 81 NO 10 STS4 80 11 STS41D 70 YES 12 STS51I 76 NO 13 STS5 68 NO 14 STS41G 78 NO 15 STS51J 79 NO 16 STS6 67 NO 17 STS51A 67 NO 18 STS61A 75 YES 19 STS7 72 NO 20 STS51C 53 YES 21 STS61B 76 NO 22 STS8 73 NO 23 STS51D 67 NO 24 STS61C 58 YES Challenger Data Motivating question: What was sodifferent on the 25th launch?

  6. Case Study 2: Beetle Fumigation – Rhyzopertha Dominica (Image courtesy Clemson University – USDA Cooperative Extension Slide Series, www.insectimages.org)

  7. Motivation • Beetle: lesser grain borer • A primary pest of stored grain • A year-round problem in moderate climates • Australian grain industry: • $6–8 billion • Zero tolerance for insect-infested grain • Phosphine fumigant for control • Some beetles have developed resistance levels more than 235 times greater than normal (UQ News Online, 18 Oct. 1999)

  8. Experimental Background • Two DNA markers linked to resistance • rp6.79: two genotypes: –,+ • rp5.11: three genotypes: B,H,A • Motivating question: What contributes to the degree of resistance? • Mixture of six beetle genotypes  exposure to various concentrations of fumigant (48 hours)

  9. Experimental Data

  10. Practical Considerations in Choosing Dosage • Clearly a high dosage would kill all beetles, regardless of genotype • Time more important than concentration • Expense more time with lower dose • Technical limitations maintain concentration in silos • Safety spontaneous combustion at high conc.

  11. Case Study 3: T-cell Cancer • Acute lymphoblastic leukemia (ALL) • leukemia – cancer of white blood cells • ALL – excess of lymphoblasts (immature cells that become white blood cells) • Two types of interest here: • T-cell – manage cell-mediated immune response(activation of cells, release of cytokines) • B-cell – manage humoral immune response(secretion of antibodies) • Researchers used gene expression technology

  12. Central Dogma of Molecular Biology

  13. General assumption of microarray technology • Use mRNA transcript abundance level as a measure of the level of “expression” for the corresponding gene • Proportional to degree of gene expression

  14. How to measure mRNA abundance? • Several different approaches with similar themes: • Affymetrix GeneChip • Nimblegen array • Two-color cDNA array • more • Representation of genes on slide • Small portion of gene • Larger sequence of gene oligonucleotide arrays

  15. Affymetrix Probes 25 bp (Images courtesy Affymetrix, www.affymetrix.com)

  16. Affymetrix Technology – GeneChip • Each spot on array represents a single probe sequence (with millions of copies) • Perfect match • Mismatch • Each gene is represented by a unique set of probe pairs (usually 12-20 probe pairs per probe set) • These probes are fixed to the array (Image courtesy Affymetrix, www.affymetrix.com)

  17. Affymetrix Technology – Expression A tissue sample is prepared so that its mRNA has fluorescent tags; wait for hybridization (Images courtesy Affymetrix, www.affymetrix.com)

  18. Affymetrix GeneChip Image courtesy Affymetrix, www.affymetrix.com

  19. Cartoon Representations • Animation 1: GeneChip structure (1 min.) • Animation 2: Measuring gene expression (2.5 min)

  20. Data: Spot Intensities Full Array Image Close-up of Array Image Images courtesy Affymetrix, www.affymetrix.com

  21. Basic goal of microarray technology • “Observe” gene expression in different conditions – healthy vs. diseased, e.g. • Decide which genes’ expression levels are changing significantly between conditions • Target those genes – to halt disease, e.g. • Study those genes – to better understand differences at the genetic level

  22. ALL Data • “Preprocessed” gene expression data • 12625 genes (hgu95av2 Affymetrix GeneChip) • 128 samples (arrays) • a matrix of “expression values” – 128 cols, 12625 rows • phenotypic data on all 128 patients, including: • 95 B-cell cancer • 33 T-cell cancer • Motivating question: Which genes are changing expression values systematically between B-cell and T-cell groups?

  23. Next … • Analysis for these case studies • Build on known statistical methods • Notice huge potential for additional methods

More Related