1 / 68

Beyond the Human Genome: Transcriptomics

Beyond the Human Genome: Transcriptomics. Dr Jen Taylor Henry Wellcome Centre for Gene Function Bioinformatics Department of Statistics taylor@stats.ox.ac.uk. Beyond the Human Genome: 1995 Human Genome sequencing begins in earnest “Mapping the Book of Life” 1999 Human Genome

thina
Télécharger la présentation

Beyond the Human Genome: Transcriptomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beyond the Human Genome:Transcriptomics Dr Jen Taylor Henry Wellcome Centre for Gene Function Bioinformatics Department of Statistics taylor@stats.ox.ac.uk

  2. Beyond the Human Genome: 1995 Human Genome sequencing begins in earnest “Mapping the Book of Life” 1999 Human Genome 2000 - First Draft Human Genome 2003 - Essential Completion Human Genome = approx 140, 000 genes = 30, 000 – 40,000 genes ?? = 24, 195 genes !!!??? Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster) Gonville & Caius College, Cambridge, UK.

  3. Complexity Regulation Transcriptome Beyond the Human Genome: Gene Number ≠ Complexity Gene Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster) Gonville & Caius College, Cambridge, UK.

  4. Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data analysis Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome

  5. Transcriptome: “transcriptome, the mRNAs expressed by a genome at any given time..” (Abbott, 1999)

  6. Central Dogma of Molecular Biology • mRNA – single stranded RNA molecule • Complementary to DNA • Processed (spliced and polyadenylated) RNA transcript • Carries the sequence of a gene out of the nucleus into the cytoplasm where it can be translated into a protein structure Image: Access Excellence, National Institutes of Heath

  7. Transcriptome: An evolving definition • (the population of) mRNAs expressed by a genome at any given time (Abbott, 1999) • The complete collection of transcribed elements of the genome. (Affymetrix, 2004) • mRNAs: 35, 913 transcripts (including alternative spliced variants) • Non-coding RNAs • tRNAs (497 genes) • rRNAs (243 genes) • snmRNAs (small non-messenger RNAs) • microRNAs and siRNAs (small interferring RNAs) • snoRNAs (small nucleolar RNAs) • snRNAs (small nuclear RNAs) • Pseudogenes (~ 2,000)

  8. The human transcriptome Nucleotides High density oligonucleotide arrays across 11 different cell lines ~ 70% of transcripts non-coding ~79-88% have multiple transcripts Kapranov et al., 2002 ~ 90% of transcribed nucleotides outside annotated exons The dimensions of the unique transcriptome?? >>> current 40,000 estimate Kampa et al., Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Research. 2004

  9. Transcriptomics Scope • the population of functional RNAtranscripts. • the mechanisms that regulate the production of RNA transcripts • dynamics of the trancriptome (time, cell type, genotype, external stimuli) Definition The study of characteristics and regulation of the functional RNA transcript population of a cell/s or organism at a specific time.

  10. Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data analysis Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome

  11. Observing the transcriptome High-throughput friendly Genome Predicts Biology ** Regulatory network Transcriptome Context dependent and dynamic Proteome **Li et al., 2004

  12. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Schena M, Shalon D, Davis RW, Brown PO. Stanford University Medical Center, CA. “ The challenge is no longer in the expression arrays themselves, but in developing experimental designs to exploit the full power of a global perspective.” Eric Lander Publications: Expression Profiling vs Proteomics Data from PubMed

  13. Observing the transcriptome? Classic Human Transcriptome Profiling Studies: Trancriptome reflects Biology Golub et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999. ALL – acute lymphoblastic leukemia AML – acute myeloid leukemia Scherf et al., A gene expression database for the molecular pharmacology of cancer. Nature Genetics 2000 60 human cancer cell lines

  14. Observing the transcriptome • Focussed Experimental Approaches: • Northern Blotting Analysis • Real time PCR (quantitative or semi-quantitative) • Highthroughput Approaches: • Closed System Profiling: • Microarray expression profiling •  Open System Profiling: • Serial analysis of gene expression (SAGE) • Massively Parallel Signature Sequencing (MPSS)

  15. Red – increase of Cy5 sample transcripts Green – increase of Cy3 sample transcripts Yellow – equal abundance Limit of Detection: 1 in 30,000 transcripts ~ 20 transcripts/cell

  16. Cell population A Cell population B RNA extraction Quantify pixel intensities. A A B B Reverse transcription “Overlay images” A A B B Klenow label incorporation Scan cy5 channel Sample B labelled with cy3 dye Sample A labelled with cy5 dye Scan cy3 channel Hybridisation Washing Experimental overview:

  17. Red – increase of Cy5 sample transcripts Green – increase of Cy3 sample transcripts Yellow – equal abundance Limit of Detection: 1 in 30,000 transcripts ~ 20 transcripts/cell

  18. Platforms and Formats • Isotope • Nylon – cDNA (300-900 nt) • Two-colour • Glass • cDNA or Oligo (80 nt) • 500 – 11,000 elements • Affymetrix • Silicone – oligo (20 nt) • 22 ,000 elements • Tissue Arrays • Glass • Tissue Discs (20-150)

  19. Affymetrix GeneChip® Limits: 1: 100,000 transcripts ~ 5 transcripts/cell Affymetrix GeneChip®

  20. http://www.affymetrix.com

  21. Affymetrix: Gene Expression Arrays Transcripts/Genes Arabidopsis Genome 24,000 C. elegans Genome 22,500 Drosophila Genome 18, 500 E. coli Genome 20, 366 Human Genome U133 Plus 47,000 Mouse Genome 39, 000 Yeast Genome 5, 841 (S. cerevisiae) & 5, 031 (S. pombe) Rat Genome 30, 000 Zebrafish 14, 900 Plasmodium/Anopheles 4,300 (P. falciparum) & 14,900 (A. gambiae) Barley (25,500), Soybean (37,500 + 23,300 pathogen), Grape (15,700) Canine (21,700), Bovine (23,000) B.subtilis (5,000), S. aureus (3,300 ORFS), Xenopus (14, 400)

  22. Microarray and GeneChip Approaches Advantages: • Rapid • Method and data analysis well described and supported • Robust • Convenient for directed and focussed studies Disadvantages: • Closed system approach • Difficult to correlate with absolute transcript number • Sensitive to alternative splicing ambiguities

  23. Serial Analysis of Gene Expression (SAGE) • The principles: • Velculescu et al., Science 1995 • A transcript (new or novel) can be recognised by a small subset (e.g. 14) of its nucleotides – a tag • Linking tags allows for rapid sequencing. • Open system for transcript profiling Modified SAGE methods LongSAGE (21 nt) SAGE-lite, micro-SAGE, mini-SAGE RASL/DASL methods (5’ and 3’ Tags) 14 nt TAG AAAAAAAAA – 3’ TAG AAAAAAAAA – 3’ TAG AAAAAAAAA – 3’ TAG AAAAAAAAA – 3’ AGCTTGAACCGTGACATCATGGCCATTGGCCCCAATTGAGACAGTGAGTTCAATGC TAG TAG TAG TAG Sequence

  24. SAGE Advantages: • Potential ‘open’ system method – new transcripts can be identified • Accuracy of unambiguous transcript observation • Digital output of data • Quantitative and qualitative information Disadvantages: • Characterising novel transcripts is often computationally difficult from short tag sequences • Tag specificity (recently increased length to 21 bp) • Length of tags can vary (RE enzyme activity variable with temperature) • A subset of transcripts do not contain enzyme recognition sequence • Sensitive to a subset of alternative splice variants

  25. Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data analysis Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome

  26. Biological question Sample Attributes Experimental design Platform Choice 16-bit TIFF Files Microarray experiment (Rspot, Rbkg), (Gspot, Gbkg) Image analysis Normalization StatisticalAnalysis Clustering Data Mining Pattern Discovery Classification Biological verification and interpretation

  27. Analysis 188, 000 47,000 x 2 x 2 datapoints Liver 47,000 x 2 x 2 datapoints 188, 000 Brain 47,000 x 2 x 2 datapoints Lymphocyte 188, 000

  28. Analysis Essential problem: Given a large dataset with technical and biological noise: Find: A) Transcripts: patterns (common themes or differences) measures of robustness or some idea of uncertainty B) Sample: similarities or differences between samples on global/multi-gene level

  29. Analysis Brain Liver Lymphocytes Which transcripts are different? What are the patterns?

  30. Biologists Nightmare: Statisticians Playground Characteristics of the expression profiling data: • High dimensionality • Sample number (n) low and observation number high (p) • Non-independence of observations • Complex patterns: visualisation and extraction • Incorporation of contextual information • Standardisation and data sharing • Integration of & with other data types

  31. Analysis Methods • Classical parametric & non-parametric statistical tests for hypothesis testing • Unsupervised clustering algorithms Hierarchical clustering Kmeans and Self-Organising Maps • Classification e.g. Machine learning and Linear discriminant analysis • Dimensionality Reduction or Principal Component Analysis e.g. Gene Shaving and Multi-dimensional Scaling • Probabilistic Modelling Dynamic Bayesian Networks Markov Models

  32. Analysis Methods Classical Parametric Statistical Analysis: Tools: T-test ANOVA Mann Whitney U Test Fold Change Liver Brain Lymphocyte

  33. Analysis Methods Classical Parametric Statistical Analysis: (P=0.01) 20,000 transcripts = 200 transcripts • Difficulties • Assumes that observations are normally distributed and independent • ‘Statistical significance’ does not equal biological significance • Appropriate multiple testing corrections are difficult ???

  34. Analysis Methods Clustering Approaches: Divides or groups genes/samples into groups “clusters”, based on similarities and differences Number of groups is user defined Algorithms: Hierarchical clustering Kmeans clustering Self organising maps

  35. to to Distance Metrics Time Distance between 2 expression vectors EuclideanPearson(r*-1) 1.4 -0.90 4.2 -1.00

  36. Pearson Distance Euclidean Distance Distance Metric Transcription Factor Transcript Target Transcript 1 Target Transcript 2

  37. g1 g1 g1 g8 g2 g8 g3 g4 g2 g2 g3 g4 g5 g4 g3 g5 g6 g5 g7 g6 g6 g7 g8 g7 Hierarchical Clustering g1 is most like g8 g4 is most like {g1, g8}

  38. g1 g8 g4 g5 g7 g2 g3 g6 Hierarchical Tree

  39. Clustering: Case Study Sorlie et al., 2001 Breast tissue subtypes Hierarchical clustering

  40. K-means clustering Partition or centroid algorithms Step 1: User specifies K clusters x K = 3 x Expression Level Brain x Liver Expression Level

  41. K-means clustering Step 2 – Using Euclidean distance nearest points assigned to clusters (k) Step 3 – New centroids calculated x K = 3 x x

  42. Iterates until centroids don’t move K-means clustering Step 4 – Points re-assigned to nearest centroid Step 5 – New centroids calculated K = 3

  43. Classification Transcript B Transcript A K-nearest neighbour methods (KNN) Linear Discriminant Analysis (LDA) Machine Learning: Support Vector Machines Neural Network Analysis Adapted from Florian Markowetz

  44. Classification Training Set 2/3 sample set Test Set 1/3 sample set Define Classification Rule Linear Discriminant Analysis KNN Gene B Gene A

  45. Classification More complex classifiers Gene B Gene A KNN – Voting scheme – (k=3) Use three closest points to classify Adapted from Florian Markowetz

  46. Probabilistic Modelling • Incorporate dependencies and prior knowledge into the identification of patterns/clusters: • - relationships in time between samples • - relationships between genes • Handle measures of uncertainty well • Conceptually simple, consideration needed on implementation • Markov modelling • Dynamic bayesian networks

  47. Analysis Methods • Classical parametric & non-parametric statistical tests for hypothesis testing • Unsupervised clustering algorithms Hierarchical clustering Kmeans and Self-Organising Maps • Classification Machine learning and Linear discriminant Analysis • Dimensionality Reduction or Principal Component Analysis Gene Shaving and Multi-dimensional Scaling • Probabilistic Modelling Dynamic Bayesian Networks and Pattern recognition Markov Models

  48. Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data curation and analysis pipelines Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome

  49. …. to be continued.

  50. Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data curation and analysis pipelines Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome

More Related