330 likes | 596 Vues
ChIP-seq. Xiaole Shirley Liu STAT115, STAT215. ChIP-chip/seq Technology. Chromatin ImmunoPrecipitation + microarray or high throughput sequencing Detect genome-wide in vivo location of TF and other DNA-binding proteins Find all the DNA sequences bound by TF-X?
E N D
ChIP-seq Xiaole Shirley Liu STAT115, STAT215
ChIP-chip/seq Technology • Chromatin ImmunoPrecipitation + microarray or high throughput sequencing • Detect genome-wide in vivo location of TF and other DNA-binding proteins • Find all the DNA sequences bound by TF-X? • Cook all the dishes with cinnamon • Can learn the regulatory mechanism of a transcription factor or DNA-binding protein much better and faster
ChIP-Seq ChIP-DNA Noise Map 30-mers back to the genome Sequence millions of 30-mer ends of fragments
Binding MACS: Model-based Analysis for ChIP-Seq • Use confident peaks to model shift size
Peak Calls • Tag distribution along the genome ~ Poisson distribution (λBG = total tag / genome size) • ChIP-Seq show local biases in the genome • Chromatin and sequencing bias
Peak Calls • Tag distribution along the genome ~ Poisson distribution (λBG = total tag / genome size) • ChIP-Seq show local biases in the genome • Chromatin and sequencing bias • 200-300bp control windows have to few tags • But can look further Dynamic λlocal = max(λBG, [λctrl, λ1k,] λ5k, λ10k) ChIP Control 300bp 1kb 5kb 10kb http://liulab.dfci.harvard.edu/MACS/ Zhang et al, Genome Bio, 2008
Peak Call Statistics • P-value and FDR? • Simulation: random sampling of reads? • FDR = A / B (Ctrl/ChIP peaks are all FPs) • Qvalue? A B
Target Gene Assignment Yeast TF Regulatory Network Protein Transcribe Regulate Gene
Human TF Binding Distribution • Most TF binding sites are outside promoters • How to assign targets? • Nearest distance? • Binding within 10KB? • Number of binding? • Other knowledge?
Binding <> Functional • Binding have effect on up genes at all hours, but only have effect on down genes at 12 hours
Stronger sites more function? • Stronger sites are not closer to differentially regulated genes (not necessarily more functional) Tang et al, Cancer Res 2011
Peak Conservation • Evolutionary conservation • Can be used for ChIP QC • Conserved sites more functional? • Majority of functional sites not conserved Odom et al, Nat Genet 2007
Higher Order Chromatin Interactions Chromatin confirmation capture
Hi-C Interactions follows exponential decay with distance Lieberman-Aiden et al, Science 2009
Direct Target Identification • Binary decision? • Rank product of • Regulatory potential • Default λ 100kb • Differential expression
ChIP-chip/seq Motif Finding • ChIP-chip gives 10-5000 binding regions ~200-1000bp long. Precise binding motif? • Raw data is like perfect clustering, plus enrichment values • MDscan • High ChIP ranking => true targets, contain more sites • Search TF motif from highest ranking targets first (high signal / background ratio) • Refine candidate motifs with all targets
m-matches for TGTAACGT Similarity Defined by m-match For a given w-mer and any other random w-mer TGTAACGT 8-mer TGTAACGT matched 8 AGTAACGT matched 7 TGCAACAT matched 6 TGACACGG matched 5 AATAACAG matched 4 Pick a reasonable m to call two w-mers similar
A 9-mer ATTGCAAAT Higher enrichment TTTGCGAAT TTGCAAATC Seed motif pattern ChIP-chip selected upstream sequences ATTGCAAAT TTTGCGAAT TTTGCAAAT GCCACCGT ACCACCGT ACCACGGT GCCACGGC … GCAAATCCA GCAAATTCG GCAAATCCA GGAAATCCA GGAAATCCT TTGCAAATC TTGCGAATA TTGCAAATT TTGCCCATC TTTGCAAAT CAAATCCAA CAAATCCAA GAAATCCAC TGCAAATCC TGCAAATTC MDscan Seeds
Seed1 m-matches Update Motifs With Remaining Seqs Extreme High Rank All ChIP-selected targets
Seed1 m-matches Refine the Motifs Extreme High Rank All ChIP-selected targets
Further Refine Motifs • Could also be used to examine known motif enrichment • Is motif enrichment correlated with ChIP-seq enrichment? • Is motif more enriched in peak summits than peak flanks? • Motif analysis could identify transcription factor partners of ChIP-seq factors
ER TF?? Estrogen Receptor • Carroll et al, Cell 2005 • Overactive in > 70% of breast cancers • Where does it go in the genome? • ChIP-chip on chr21/22, motif and expression analysis found its “pioneering factor” FoxA1
ER AP1 Estrogen Receptor (ER) Cistrome in Breast Cancer • Carroll et al, Nat Genet 2006 • ER may function far away (100-200KB) from genes • Only 20% of ER sites have PhastCons > 0.2 • ER has different effect based on different collaborators NRIP
ER NRIP AP1 Estrogen Receptor (ER) Cistrome in Breast Cancer • Carroll et al, Nat Genet 2006 • ER may function far away (100-200KB) from genes • Only 20% of ER sites have PhastCons > 0.2 • ER has different effect based on different collaborators
Cell Type-Specific Binding • Same TF bind to very different locations in different tissues and conditions, why? • TF concentration? • Collaborating factors, esp pioneering factors • Interesting observations about pioneering factors
Summary • ChIP-seq identifies genome-wide in vivo protein-DNA interaction sites • ChIP-seq peak calling to shift reads, and calculate correct enrichment and FDR • Functional analysis of ChIP-seq data: • Strong vs weak binding, conserved vs non-conserved • Target identification • Motif analysis • Cell type-specific binding Epigenetics