1 / 31

3D Chromosome O rganization Statistical challenges and opportunities for analyzing Hi-C data

3D Chromosome O rganization Statistical challenges and opportunities for analyzing Hi-C data. Zhaohui Steve Qin Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University. Transcription regulation. However …. Long-range chromosomal interactions

whitley
Télécharger la présentation

3D Chromosome O rganization Statistical challenges and opportunities for analyzing Hi-C data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3D Chromosome OrganizationStatistical challenges and opportunities for analyzing Hi-C data Zhaohui Steve Qin Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University

  2. Transcription regulation

  3. However … • Long-range chromosomal interactions • Transcriptional factory • Chimeric events

  4. Chromosome folding 0.00001 m How can a two meter long polymer fit into a nucleus of ten micrometer (10-5 m) diameter?

  5. Chromosome folding http://en.wikipedia.org/wiki/Chromosome

  6. “… deep things in science are not found because they are useful; they are found because it was possible to find them” -- Robert Oppenheimer

  7. Chromosome Conformation Capture (3C)Dekker et al. Science 2002 Naumova and Dekker J of Cell Science 2010 Fine scale: (0-kb)

  8. 3C-on-chip/Circular 3C (4C)5C Naumova and Dekker J of Cell Science 2010 Intermediate: (0-Mb) Fine scale: (0-kb)

  9. Naumova and Dekker J of Cell Science 2010 Whole genome Intermediate: (0-Mb) Fine scale: (0-kb)

  10. What are the main findings?

  11. In Liberman-Aiden et al. • Genomes can be decomposed of compartments A and B, • Fractal globule, not equilibrium globule.

  12. In Sexton et al. • Genome partitioned into physical domains. • Domain structure highly connected with epigenetic activities.

  13. In Dixon et al. • Topological domains. • Stable across cell types. • Highly conserved across species. • Domain boundaries enriched with insulators.

  14. In Hou et al. • Differences between domain boundary and interior, in terms of gene density, TF and epigenetic factor concentration.

  15. Challenges • Quality control and pre-processing of the reads, • Any bias in the data? and if so, how to normalize? • Whether it is possible, and if so, how, to infer the 3-dimesnional chromosomal structure based on the Hi-C data?

  16. Hi-C Data Preprocess Restriction enzyme cutting site Random break Random break PCR amplification reads Random breaking reads Self-ligation reads Dangling reads Restriction enzyme cut fragment Valid reads Downstream analysis Imakaev et al. 2012

  17. Systematic biases in the data Restriction enzyme GC content Mappability Yaffe and Tanay, 2011

  18. Methods for Hi-C Bias Reduction • Normalization (equal ‘visibility’, no assumption on biases) • Iterative correction and eigenvector decomposition (ICE) (Imakaev, et al, 2012) • Sequential component normalization (SCN) (Cournac, et al, 2012) • Correction (posit a statistical model on biases) • Yaffe & Tanay’s method (Yaffe & Tanay, 2011) • Fragment level (4KB, 1012), 420 parameters • HiCNorm (Hu et al, 2012) • Any resolution level • 1MB, 106, 3 parameters

  19. Motivation and the key assumption Number of paired-end reads spanning the two loci is inversely proportional to the 3D spatial distance between them (obtained from fluorescence in situ hybridization(FISH)). Lieberman-Aiden et al, 2009

  20. Bayesian statistical model : number of reads between loci and . : 3D Euclidian distance between loci and . : number of enzyme cut site in locus . : mean GC content in locus . : mean mappability score in locus .

  21. Real Hi-C data from Lieberman-Aiden et al. 2009 d(L2, L4) = 1.4042, d(L2, L3) = 1.9755, significant

  22. mESC: Hind3 vs. Nco1

  23. Two compartment model

  24. Whole Chromosome Model Lieberman-Aiden, et al, 2009 Naumova and Dekker, 2010

  25. Other Features (Chromosome 2) Compartment Gene density Gene expression Chromatin accessibility RNA polymerase II DNA replication time H3K36me3 H3K27me3 H3K9me3 H3K20me3 H3K4me3 Lamina interaction

  26. References • Hu M, Deng K, Selvaraj S, Qin ZS, Ren B, Liu JS. (2012) HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics. 28. 3131-3133. http://www.people.fas.harvard.edu/~junliu/HiCNorm/ • HuM, Deng K, Qin ZS, Dixon J, SelvarajS, Fang J, RenB, LiuJS. (2012) Bayesian inference of three-dimensional chromosomal organization. PLoS Computational Biology. 9(1):e1002893. http://www.people.fas.harvard.edu/~junliu/BACH/ • Hou C, Li L,Qin ZS,Corces, VG. (2012) Gene Density, Transcription and Insulators Contribute to the Partition of the Drosophila Genome into Physical Domains. Mol Cell. 48 471-484 (with preview article of Xu and Felsenfeld (2012) Order from Chaos in the Nucleus. Mol Cell 48. 327-328). . • Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS and Ren B. (2012) Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature485. 376-380.

  27. Acknowledgements Ming Hu Ke Deng Jun S. Liu Jesse Dixon • SiddarthSelvaraj • Bing Ren LiLi Chunhui Hou • Victor Corces

More Related