1 / 97

DNA Microarras: Basics

DNA Chips and Their Analysis Comp. Genomics: Lecture 13 based on many sources, primarily Zohar Yakhini. DNA Microarras: Basics. What are they. Types of arrays (cDNA arrays, oligo arrays). What is measured using DNA microarrays. How are the measurements done?.

hilda
Télécharger la présentation

DNA Microarras: Basics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DNA Chips and Their AnalysisComp. Genomics: Lecture 13based on many sources, primarily Zohar Yakhini

  2. DNA Microarras: Basics • What are they. • Types of arrays (cDNA arrays, oligo arrays). • What is measured using DNA microarrays. • How are the measurements done?

  3. DNA Microarras: Computational Questions • Design of arrays. • Techniques for analyzing experiments. • Detecting differential expression. • Similar expression: Clustering. • Other analysis techniques (mmmmmany). • Machine learning techniques, and applications for advanced diagnosis.

  4. What is a DNA Microarray (I) • A surface (nylon, glass, or plastic). • Containing hundreds to thousand pixels. • Each pixel has copies of a sequence of single stranded DNA (ssDNA). • Each such sequence is called a probe.

  5. What is a DNA Microarray (II) • An experiment with 500-10k elements. • Way to concurrently explore the function of multiple genes. • A snapshot of the expression level of 500-10k genes under given test conditions

  6. Some Microarray Terminology • Probe: ssDNA printed on the solid substrate (nylon or glass). These are short substrings of the genes we are going to be testing • Target: cDNA which has been labeled and is to be washed over the probe

  7. Back to Basics: Watson and Crick James Watson and Francis Crick discovered, in 1953, the double helix structure of DNA. From Zohar Yakhini

  8. AATGCTTAGTC TTACGAATCAG AATGCGTAGTC TTACGAATCAG Perfect match One-base mismatch Watson-Crick Complimentarity A binds to T C binds to G From Zohar Yakhini

  9. Array Based Hybridization Assays (DNA Chips) • Array of probes • Thousands to millions of differentprobe sequences per array. Unknown sequence or mixture (target).Many copies. From Zohar Yakhini

  10. Array Based Hyb Assays • Target hybs to WC complimentary probes only • Therefore – the fluorescence pattern is indicative of the target sequence. From Zohar Yakhini

  11. DNA Sequencing Sanger Method • Generate all A,C,G,T – terminated prefixes of the sequence, by a polymerase reaction with terminating corresponding bases. • Run in four different gel lanes. • Reconstruct sequence from the information on the lengths of all A,C,G,T – terminated prefixes. • The need for 4 different reactions is avoided by using differentially dye labeled terminating bases. From Zohar Yakhini

  12. Transcription Translation mRNA Protein Central Dogma of Molecular Biology(reminder) Cells express different subset of the genes in different tissues and under different conditions Gene (DNA) From Zohar Yakhini

  13. Expression Profiling on MicroArrays • Differentially label the query sample and the control (1-3). • Mix and hybridize to an array. • Analyze the image to obtain expression levels information. From Zohar Yakhini

  14. Microarray: 2 Types of Fabrication • cDNA Arrays: Deposition of DNA fragments • Deposition of PCR-amplified cDNA clones • Printing of already synthesized oligonucleotieds • Oligo Arrays: In Situ synthesis • Photolithography • Ink Jet Printing • Electrochemical Synthesis By Steve Hookway lecture and Sorin Draghici’s book “Data Analysis Tools for DNA Microarrays”

  15. cDNA Microarrays vs. Oligonucleotide Probes and Cost By Steve Hookway lecture and Sorin Draghici’s book “Data Analysis Tools for DNA Microarrays”

  16. Photolithography (Affymetrix) • Similar to process used to generate VLSI circuits • Photolithographic masks are used to add each base • If base is present, there will be a “hole” in the corresponding mask • Can create high density arrays, but sequence length is limited Photodeprotection mask C From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

  17. Photolithography (Affymetrix) From Zohar Yakhini

  18. Ink Jet Printing • Four cartridges are loaded with the four nucleotides: A, G, C,T • As the printer head moves across the array, the nucleotides are deposited in pixels where they are needed. • This way (many copies of) a 20-60 base long oligo is deposited in each pixel. By Steve Hookway lecture and Sorin Draghici’s book “Data Analysis Tools for DNA Microarrays”

  19. C T A G Ink Jet Printing (Agilent) The array is a stack of images in the colors A, C, G, T. … From Zohar Yakhini

  20. Inkjet Printed Microarrays Inkjet head, squirting phosphor-ammodites From Zohar Yakhini

  21. Electrochemical Synthesis • Electrodes are embedded in the substrate to manage individual reaction sites • Electrodes are activated in necessary positions in a predetermined sequence that allows the sequences to be constructed base by base • Solutions containing specific bases are washed over the substrate while the electrodes are activated From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

  22. Preparation of Samples • Use oligo(dT) on a separation column to extract mRNA from total cell populations. • Use olig(dT) initiated polymerase to reverse transcribe RNA into fluorescence labeled cDNA. RNA is unstable because of environment RNA-digesting enzymes. • Alternatively – use random priming for this purpose, generating a population of transcript subsequences From Zohar Yakhini

  23. Expression Profiling on MicroArrays • Differentially label the query sample and the control (1-3). • Mix and hybridize to an array. • Analyze the image to obtain expression levels information. From Zohar Yakhini

  24. Expression Profiling: a FLASH Demo URL: http://www.bio.davidson.edu/courses/genomics/chip/chip.html

  25. Expression Profiling – Probe Design Issues • Probe specificity and sensitivity. • Special designs for splice variations or other custom purposes. • Flat thermodynamics. • Generic and universal systems From Zohar Yakhini

  26. Hybridization Probes • Sensitivity:Strong interaction between the probe and its intended target, under the assay's conditions.How much target is needed for the reaction to be detectable or quantifiable? • Specificity:No potential cross hybridization. From Zohar Yakhini

  27. Specificity • Symbolic specificity • Statistical protection in the unknown part of the genome. Methods, software and application in collaboration with Peter Webb, Doron Lipson. From Zohar Yakhini

  28. Reading Results: Color Coding • Numeric tables are difficult to read • Data is presented with a color scale • Coding scheme: • Green = repressed (less mRNA) gene in experiment • Red = induced (more mRNA) gene in experiment • Black = no change (1:1 ratio) • Or • Green = control condition (e.g. aerobic) • Red = experimental condition (e.g. anaerobic) • We usually use ratio Campbell & Heyer, 2003

  29. Thermal Ink Jet Arrays, by Agilent Technologies In-Situ synthesized oligonucleotide array. 25-60 mers. cDNA array, Inkjet deposition

  30. Application of Microarrays • We only know the function of about 30% of the 30,000 genes in the Human Genome • Gene exploration • Functional Genomics • First among many high throughput genomic devices http://www.gene-chips.com/sample1.html By Steve Hookway lecture and Sorin Draghici’s book “Data Analysis Tools for DNA Microarrays”

  31. A Data Mining Problem • On a given microarray, we test on the order of 10k elements in one time • Number of microarrays used in typical experiment is no more than 100. • Insufficient sampling. • Data is obtained faster than it can be processed. • High noise. • Algorithmic approaches to work through this large data set and make sense of the data are desired.

  32. Informative Genes in aTwo Classes Experiment • Differentially expressed in the two classes. • Identifying (statistically significant) informative genes • - Provides biological insight • - Indicate promising research directions • - Reduce data dimensionality • - Diagnostic assay From Zohar Yakhini

  33. Informative genes+ + + + + + + + - - - - - - -- - - - - - -+ + + + + + + + - - - - + - -+ + - + + + + + etc Non-informative genes + - + - + + + + - - + + - - -- + + - + - -+ + - + + - - + + - - - + + -+ + - + + - + - etc Scoring Genes Expression pattern and pathological diagnosis information (annotation), for a single gene + + - - + + + - - + - - + + - a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 Permute the annotation by sorting the expression pattern (ascending, say). From Zohar Yakhini

  34. 6 7 # of errors = min(7,8) = 7. Ex 2: A perfect single gene classifier gets a score of 0. + + + + + + + + - - - - - - - 0 Threshold Error Rate (TNoM) Score Find the threshold that best separates tumors from normals, count the number of errors committed there. Ex 1: - + + - + - -+ + - + + - - + From Zohar Yakhini

  35. p-Values • Relevance scores are more useful when we can compute their significance: • p-value: The probability of finding a gene with a given score if the labeling is random • p-Values allow for higher level statistical assessment of data quality. • p-Values provide a uniform platform for comparing relevance, across data sets. • p-Values enable class discovery From Zohar Yakhini

  36. BRCA1 Differential Expression Genes over-expressed in BRCA1 wildtype Genes over-expressed in BRCA1 mutants Collab with NIH NEJM 2001 Sporadic sample s14321 With BRCA1-mutant expression profile BRCA1 mutants BRCA1 Wildtype From Zohar Yakhini

  37. Small, efficient diagnostic assays Perform this using different choices of genes subsets sizes Data Analysis: Leave One Out Cross Validation (LOOCV) • Repeat, for each tissue (tumor/normal) • “Hide” the label of the test tissue • Diagnose the test tissue based on the remaining data • Compare the diagnosis to the hidden label From Zohar Yakhini

  38. 95% success rate (21/22) • Sporadic tissue (14321) consistently classified as BRCA1 • BRCA1 gene is normal, but silenced in the patient’s DNA BRCA1 LOOCV Results From Zohar Yakhini

  39. Lung Cancer Informative Genes Data from Naftali Kaminski’s lab, at Sheba. • 24 tumors (various types and origins) • 10 normals (normal edges and normal lung pools) From Zohar Yakhini

  40. And Now: Global Analysisof Gene Expression Data First (but not least): Clustering either of genes, or of experiments

  41. Example data: fold change (ratios) What is the pattern? Campbell & Heyer, 2003

  42. Example data 2 Campbell & Heyer, 2003

  43. Pearson Correlation Coefficient, r.values in [-1,1] interval • Gene expression over d experiments is a vector in Rd, e.g. for gene C: (0, 3, 3.58, 4, 3.58, 3) • Given two vectors X and Y that contain N elements, we calculate r as follows: Cho & Won, 2003

  44. Example: Pearson Correlation Coefficient, r • X = Gene C = (0, 3.00, 3.58, 4, 3.58, 3)Y = Gene D = (0, 1.58, 2.00, 2, 1.58, 1) • ∑XY = (0)(0)+(3)(1.58)+(3.58)(2)+(4)(2)+(3.58)(1.58)+(3)(1) = 28.5564 • ∑X = 3+3.58+4+3.58+3 = 17.16 • ∑X2 = 32+3.582+42+3.582+32 = 59.6328 • ∑Y = 1.58+2+2+1.58+1 = 8.16 • ∑Y2 = 1.582+22+22+1.582+12 = 13.9928 • N = 6 • ∑XY – ∑X∑Y/N = 28.5564 – (17.16)(8.16)/6 = 5.2188 • ∑X2 – (∑X)2/N = 59.6328 – (17.16)2/6 = 10.5552 • ∑Y2 – (∑Y)2/N = 13.9928 – (8.16)2/6 = 2.8952 • r = 5.2188 / sqrt((10.5552)(2.8952)) = 0.944

  45. Example data: Pearson correlation coefficients Campbell & Heyer, 2003

  46. Example: Reorganization of data Campbell & Heyer, 2003

  47. Spearman Rank Order Coefficient • Replace each entry xi by its rank in vector x. • Then compute Pearson correlation coefficients of rank vectors. • Example: X = Gene C = (0, 3.00, 3.41, 4, 3.58, 3.01) Y = Gene D = (0, 1.51, 2.00, 2.32, 1.58, 1) • Ranks(X)= (1,2,4,6,5,3) • Ranks(Y)= (1,3,5,6,4,2) • Ties should be taken care of: (1) rare (2) randomize (small effect)

  48. Grouping and Reduction • Grouping: Partition items into groups. Items in same group should be similar. Items in different groups should be dissimilar. • Grouping may help discover patterns in the data. • Reduction: reduce the complexity of data by removing redundant probes (genes).

  49. Unsupervised Grouping: Clustering • Pattern discovery via clustering similarly expressed genes together • Techniques most often used: • k-Means Clustering • Hierarchical Clustering • Biclustering • Alternative Methods: Self Organizing Maps (SOMS), plaid models, singular value decomposition (SVD), order preserving submatrices (OPSM),……

  50. Clustering Overview • Different similarity measures in use: • Pearson Correlation Coefficient • Cosine Coefficient • Euclidean Distance • Information Gain • Mutual Information • Signal to noise ratio • Simple Matching for Nominal

More Related