1 / 21

Mapping Transcription Mechanisms from Multimodal Genomic Data

Mapping Transcription Mechanisms from Multimodal Genomic Data. Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni. Children ’ s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology Harvard Medical School March 10, 2010.

deon
Télécharger la présentation

Mapping Transcription Mechanisms from Multimodal Genomic Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology Harvard Medical School March 10, 2010

  2. Information Flow in Multimodal Genomic Data • Genetic Variants • 100k – 1000k SNPs • 250k copy number variations (CNVs) • 250k methylation measurements Information Information • Transcripts • 50k mRNA expression levels • 50k microRNA expression levels • 1.5M exon expression / splicing

  3. Expression Quantitative Trait Loci (eQTLs) Connection from variant to expression is an information channel A DNA locus is modulating the expression level of a gene = eQTL Cis(Trans) eQTLs are the genetic variants located close to (far away) genes. Identifying cis-eQTLs is easier Focusing on cis-eQTL reduces search space trans eQTLs?

  4. Clinical Study on Pediatric Leukemia • Cancer: based on genetic modification (variants) and cellular malfunction (gene expression) • Identification of eQTLs helps understand molecular mechanisms in cancer and provides biological insight. • Clinical study of Acute lymphoblastic leukemia (ALL) • The most common malignancy in children, nearly one third of all pediatric cancers. • A few cases are associated with inherited genetic syndromes (i.e., Down syndrome, Bloom syndrome, Fanconi anemia), but the cause remains unknown. • Data • 29 patients. • Genotyped 100,000 SNPs (Affymetrix Human Mapping 100K). • Profiled 50,000 gene expressions (Affymetrix HG-U133 Plus 2.0).

  5. Challenges in Finding eQTLs • Compare the distribution of each Variant to the levels of each expression measurement • Computational • All pairs of variants vs. expressions is costly • Usually discretize expression levels (Pensa et al., BioKDD, 2004) • Multiple testing considerations • Understanding • Too many associations to test via laboratory science • Computational methods of biological discovery • Want to summarize main informational (biological) pathways • Answer: Use transcriptional information

  6. Transcription Channel X Y Transcriptional Information Channel Expressions are modeled as log-normal variables. SNPs are modeled as binomial variables. • Mutual Information quantifies information flow: • Info Theory:measures Entropy,H(X) • Higher MI is achieved by larger σ2 and smaller σk2 , i.e., when expression level Y is more likely modulated by SNP X.

  7. Transcript Y is modulated by SNP X: • Transcript Y is independent of SNP X:

  8. Transcriptional Information Map Y1 Y2 Y5 Y4 Y6 Y7 Y8 Y9 Y3 X8 X3 X5 X6 X7 X2 X9 X1 X4

  9. ALL Transcriptional Information Map of Chr21

  10. Cluster Genes and SNPs into Networks Y1 Y2 Y5 Y4 Y6 Y7 Y8 Y9 Y3 X8 X3 X5 X6 X7 X2 X9 X1 X4

  11. Y1 Y2 Y9 X1 X3 X4 X8 Cluster Genes and SNPs into Networks • We can further infer the optimal modulation patterns using Bayesian networks.

  12. A B Bayesian Networks • Bayesian networks are directed acyclic graphs: • Nodes correspond to random variables. • Directed arcs encode conditional probabilities of the target nodes on the source nodes. • p(X) depends on (A,B) • p(Z|X,Y) independent of (A,B) X Y Z

  13. Infer Bayesian Networks in Individual Clusters Y1 Y2 Y9 X1 X3 X4 X8 • Step 1: Use TIM as the initial network. • Step 2: Bayesian network infers SNP-SNP connections.

  14. A Bayesian Network Inferred from Chr21 TIM

  15. Information Theoretic Network Analysis Find hubs, motifs, guilds, etc. Abstract edges Global patterns -> local patterns Reveal emergent properties Information theoretic approach using Data Compression Alterovitz G, and Ramoni MF, “Discovering biological guilds through topological abstraction,” AMIA Annu Symp Proc, pp. 1-5, 2006.

  16. Identified Fundamental Components Reference: Alterovitz and Ramoni, AMIA Annu Symp Proc, pp. 1-5, 2006.

  17. RIPK4, 21q22.3 Related to Downs Syndrome RIPK4 has 5 (trans) SNPs in q11.2 (shown as blue in the figure) affecting its expression. Identification of Cis- and Trans eQTL RIPK4

  18. Identification of Cis and Trans eQTL CYYR1, 21q21.1 Recently discovered. Encodes a cysteine and tyrosine-rich protein. Recent study found a correlation with neuroendocrine tumors. TIM shows CYYR1 modulated by SNPs across the q arm of chromosome 21. DSCAM related to Down’s syndrome DSCAM-CYYR1 interaction leads to ALL? DSCAM

  19. Complete TIM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . Genetic Variant Transcript Compute Transcriptional Information Group Linked SNPs and Transcripts Cluster 1 Infer Network in Individual Clusters . . . Cluster N Network Topology Analysis and Summary Cluster 1 Cluster N . . .

  20. Transcriptional Information Maps • Make large multimodal genetic dataset amenable to transcriptional analysis • Identifies • Modulation patterns between genetic variants and transcripts. • CIS and TRANS eQTL. • Analysis of pediatric ALL helps identify biological hypotheses regarding connection to Down’s syndrome

  21. Questions? Thanks to Prof. Marco F. Ramoni, Dr. Hsun-Hsien Chang, Dr. Gil Alterowitz, Children’s Hospital Informatics Program, Brigham and Women’s Hospital

More Related