1 / 30

ChIP-seq

ChIP-seq. Xiaole Shirley Liu STAT115, STAT215. ChIP-chip/seq Technology. Chromatin ImmunoPrecipitation + microarray or high throughput sequencing Detect genome-wide in vivo location of TF and other DNA-binding proteins Find all the DNA sequences bound by TF-X?

quinto
Télécharger la présentation

ChIP-seq

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ChIP-seq Xiaole Shirley Liu STAT115, STAT215

  2. ChIP-chip/seq Technology • Chromatin ImmunoPrecipitation + microarray or high throughput sequencing • Detect genome-wide in vivo location of TF and other DNA-binding proteins • Find all the DNA sequences bound by TF-X? • Cook all the dishes with cinnamon • Can learn the regulatory mechanism of a transcription factor or DNA-binding protein much better and faster

  3. Sonication (~500bp)

  4. Immunoprecipitation

  5. Reverse Crosslink and DNA Purification

  6. ChIP-Seq ChIP-DNA Noise Map 30-mers back to the genome Sequence millions of 30-mer ends of fragments

  7. Binding MACS: Model-based Analysis for ChIP-Seq • Use confident peaks to model shift size

  8. Peak Calls • Tag distribution along the genome ~ Poisson distribution (λBG = total tag / genome size) • ChIP-Seq show local biases in the genome • Chromatin and sequencing bias

  9. Peak Calls • Tag distribution along the genome ~ Poisson distribution (λBG = total tag / genome size) • ChIP-Seq show local biases in the genome • Chromatin and sequencing bias • 200-300bp control windows have to few tags • But can look further Dynamic λlocal = max(λBG, [λctrl, λ1k,] λ5k, λ10k) ChIP Control 300bp 1kb 5kb 10kb http://liulab.dfci.harvard.edu/MACS/ Zhang et al, Genome Bio, 2008

  10. Peak Call Statistics • P-value and FDR? • Simulation: random sampling of reads? • FDR = A / B (Ctrl/ChIP peaks are all FPs) • Qvalue? A B

  11. ChIP-seq Downstream Analysis

  12. Target Gene Assignment Yeast TF Regulatory Network Protein Transcribe Regulate Gene

  13. Human TF Binding Distribution • Most TF binding sites are outside promoters • How to assign targets? • Nearest distance? • Binding within 10KB? • Number of binding? • Other knowledge?

  14. Binding <> Functional • Binding have effect on up genes at all hours, but only have effect on down genes at 12 hours

  15. Stronger sites more function? • Stronger sites are not closer to differentially regulated genes (not necessarily more functional) Tang et al, Cancer Res 2011

  16. Peak Conservation • Evolutionary conservation • Can be used for ChIP QC • Conserved sites more functional? • Majority of functional sites not conserved Odom et al, Nat Genet 2007

  17. Higher Order Chromatin Interactions Chromatin confirmation capture

  18. Hi-C Interactions follows exponential decay with distance Lieberman-Aiden et al, Science 2009

  19. Direct Target Identification • Binary decision? • Rank product of • Regulatory potential • Default λ 100kb • Differential expression

  20. ChIP-chip/seq Motif Finding • ChIP-chip gives 10-5000 binding regions ~200-1000bp long. Precise binding motif? • Raw data is like perfect clustering, plus enrichment values • MDscan • High ChIP ranking => true targets, contain more sites • Search TF motif from highest ranking targets first (high signal / background ratio) • Refine candidate motifs with all targets

  21. m-matches for TGTAACGT Similarity Defined by m-match For a given w-mer and any other random w-mer TGTAACGT 8-mer TGTAACGT matched 8 AGTAACGT matched 7 TGCAACAT matched 6 TGACACGG matched 5 AATAACAG matched 4 Pick a reasonable m to call two w-mers similar

  22. A 9-mer ATTGCAAAT Higher enrichment TTTGCGAAT TTGCAAATC Seed motif pattern ChIP-chip selected upstream sequences ATTGCAAAT TTTGCGAAT TTTGCAAAT GCCACCGT ACCACCGT ACCACGGT GCCACGGC … GCAAATCCA GCAAATTCG GCAAATCCA GGAAATCCA GGAAATCCT TTGCAAATC TTGCGAATA TTGCAAATT TTGCCCATC TTTGCAAAT CAAATCCAA CAAATCCAA GAAATCCAC TGCAAATCC TGCAAATTC MDscan Seeds

  23. Seed1 m-matches Update Motifs With Remaining Seqs Extreme High Rank All ChIP-selected targets

  24. Seed1 m-matches Refine the Motifs Extreme High Rank All ChIP-selected targets

  25. Further Refine Motifs • Could also be used to examine known motif enrichment • Is motif enrichment correlated with ChIP-seq enrichment? • Is motif more enriched in peak summits than peak flanks? • Motif analysis could identify transcription factor partners of ChIP-seq factors

  26. ER TF?? Estrogen Receptor • Carroll et al, Cell 2005 • Overactive in > 70% of breast cancers • Where does it go in the genome? • ChIP-chip on chr21/22, motif and expression analysis found its “pioneering factor” FoxA1

  27. ER AP1 Estrogen Receptor (ER) Cistrome in Breast Cancer • Carroll et al, Nat Genet 2006 • ER may function far away (100-200KB) from genes • Only 20% of ER sites have PhastCons > 0.2 • ER has different effect based on different collaborators NRIP

  28. ER NRIP AP1 Estrogen Receptor (ER) Cistrome in Breast Cancer • Carroll et al, Nat Genet 2006 • ER may function far away (100-200KB) from genes • Only 20% of ER sites have PhastCons > 0.2 • ER has different effect based on different collaborators

  29. Cell Type-Specific Binding • Same TF bind to very different locations in different tissues and conditions, why? • TF concentration? • Collaborating factors, esp pioneering factors • Interesting observations about pioneering factors

  30. Summary • ChIP-seq identifies genome-wide in vivo protein-DNA interaction sites • ChIP-seq peak calling to shift reads, and calculate correct enrichment and FDR • Functional analysis of ChIP-seq data: • Strong vs weak binding, conserved vs non-conserved • Target identification • Motif analysis • Cell type-specific binding  Epigenetics

More Related