1 / 47

Microarrays

Microarrays. Acknowledgement. This lecture was, in part, designed to be consistent with lecture material from Johns Hopkins University. Please see the link below for more information. http://www.ams.jhu.edu/~dan/550.435/notes/COURSENOTES435.pdf. Central Dogma of Biology.

lpatricia
Télécharger la présentation

Microarrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microarrays

  2. Acknowledgement This lecture was, in part, designed to be consistent with lecture material from Johns Hopkins University. Please see the link below for more information. http://www.ams.jhu.edu/~dan/550.435/notes/COURSENOTES435.pdf

  3. Central Dogma of Biology Microarrays are a tool that help figure out why, when, and to what extent genes are transcribed

  4. Definitions • Gene expression == mRNA abundance • Increased expression == induced, or up-regulated • Decreased expression == down-regulated • Difference in expression == differential expression

  5. Research Questions • Which genes are expressed in a given context? • Which genes are expressed differently between different contexts? • Which genes’ expression correlate with a variable of interest? • What do the (differentially) expressed expressed genes tell us about the biological processes of the system?

  6. What Is A Microarray? “A collection of microscopic DNA spots attached to a solid surface”

  7. What Is A Microarray?

  8. Microarray Experiments Data driven questions: • Which genes are differentially expressed? • Which genes are co-expressed? • Given gene expression, can we predict a condition? Experimental questions:

  9. Impact of Microarrays 82,731 publications in PubMed Breast cancer prognosis tool Breast cancer predicts chemotherapy response Colon cancer assay predicts risk of recurrence Prostate cancer assay predicts risk of progression

  10. Microarray Study Design Biological Question 1. Setup 2. Data pre-processing 3. Data analysis 4. Interpretation

  11. Two Types of Microarrays • Spotted arrays • High density oligonucleotide arrays

  12. Oligo Arrays (in situ) • Also called Affy arrays (orig. Affymetrix) • Probe - a unique, short cDNA sequence • Probes synthesized on the chip • Terminology: • probe - an individual 25nt sequence • probe cell - millions of copies of a probe placed together • probe set - a set of unique probes for a given gene

  13. Oligo Arrays (in situ) • Each gene is represented by a set of probes • Probe sets • Unique to each gene • Multiple probe cells tiled across exons

  14. Oligo Arrays (in situ)

  15. Oligo Arrays - Labeling & Scanning • Sample cDNA tagged w/ fluorescent marker • Sample washed over flowcell • Unbound cDNA washed away • Luminescence quantified in a scanner

  16. Oligo Arrays - Older Technologies • Affymetrix U133A & B • Probe cells grouped together, led to biases and artifacts

  17. Oligo Arrays - Older Technologies • Used perfect match and mismatch probes • Mismatch probes contain...mismatch in probe sequence to account for non-specific binding

  18. Oligo Arrays - Technology Update • U133 Affy arrays designed against 2001 draft human genome • Mismatch probes didn’t work well • Genome reference improved, more genes needed to be profiled • Gene ST array created and current

  19. Oligo Arrays - Current Technology Several chips: • 3’ IVT - gene expression, measures 3’ UTR • Gene - gene expression, tiled sequence • Exon - DNA sequence, exon sequences only • Tiling - DNA sequence, tiled across genome

  20. Microarray Analysis Methods

  21. Typical Processing Steps

  22. Normalization • Statistically adjust a set of expression matrices so they are comparable • Includes: • Background correction - remove artifacts/noise • Data normalization - adjust probe distributions across arrays to ensure comparability • Probe summarization - combine probe intensities within probeset to gene-level

  23. Normalization

  24. Old Normalization Method: MAS5 • Measured value = Noise + Probe Effects + Signal • Background correction • Intensity adjustment • Examine perfect/mismatch probe ratio • Calculate % present • AUC • The good - usable with single chip, p-value for gene expression • The bad - uses mismatch probes, very complicated

  25. State of the Art Normalization: RMA • Robust Multi-array Average (RMA) • Background correction - kernel density estimation based • Normalization - quantile normalization across batch of arrays • Summarization - linear additive model controlling for probe effects, gene expression, and error

  26. RMA Normalization

  27. Quality Control - RNA Quality • RIN - RNA Integrity Number • Ranges 0-9 • RNA Degradation Plot • RNA degrades 5’ -> 3’ • Probe intensity tends to increase accordingly • Slope >2 may indicate poor RNA quality

  28. Quality Control - RLE • Subtract median intensity across all arrays from each probe • Median RLE != 0 -> number of up/down genes not the same • Large IQR means many genes are differentially expressed

  29. Quality Control - NUSE • Normalized Unscaled Standard Error • Arrays w/ large IQR or median NUSE > 1 may be poor quality

  30. Quality Control - PCA • Principal Component Analysis • Data dimensionality reduction technique • Identifies “directions of variance”

  31. Quality Control Rules of Thumb • RLE, NUSE, RNA Degradation, and PCA • No single metric is indicative of poor quality • Flagged samples should be examined before omission • Make decisions based on multiple criteria • Rule of thumb - filter if all three:Median RLE > 0.10, Median NUSE > 1.05, RIN < 4.0

  32. Correcting for Batch Effects • Batch effects - variation in data due to technical effects: • Arrays processed on different days • The technician who ran the arrays • Sample collection site • Variance in reagent type/quality • Confounding - when batch effects are correlated with condition of interest • ComBat - R package to correct batch effects

  33. Correcting for Batch Effects

  34. Group Assignments

  35. R Primer

  36. R • R is a statistical computing language • Free and open source • Designed for statistics, data analysis, and visualization • Can be run on command line • More commonly run with rstudio

  37. R for Data Science • Excellent guide for learning R • tidyverse: set of packages that makes R less awful https://r4ds.had.co.nz/

  38. Bioconductor

  39. Project Principles

  40. Kindness Patience Acceptance Trust Honesty KPATH

  41. Trust Honesty Patience Acceptance Kindness THPAK

  42. Most ideas are “bad” Many “bad” ideas lead to “good” ideas

  43. You are not your ideas Your ideas are not you

  44. This Room Is A Safe Place To Dissent Remember: KPATH (or THPAK)

  45. Project Setup Suggested project directory structure: project_1/ samples/ ← save sample files here reference/ ← annotations, etc analysis/ ← all code README.txt ← text description of project

  46. Use Relative Paths Portable code in analysis/my_script.R: sample.info <- read.csv(‘../reference/annot.csv’) Nonportable code: sample.info <- read.csv( ‘/projectnb/bf528/group_1/project_1/reference/annot.csv’)

  47. For Next Time • Read project description and paper • Study chapters 1-4 of R for Data Science https://r4ds.had.co.nz/

More Related