120 likes | 278 Vues
BOWTIE + MACS. Enrique Blanco 2013. 0. SOURCES. 1. RAW FORMAT: FASTQ. Example:. @1-ILLUMINA-GA:90515:1:1:28:2023 GGAGAAGTAGCCCGGAATGGTCGAGTAGGGAAGAGAAAGT + a^Z]HY]]SZaV]a`VLOOWNYT_`BBBBBBBBBBBBB @1-ILLUMINA-GA:90515:1:1:28:1470 AAGGGCGCGAAGCCCGTTACCAAGCCCGCCAAGGGAACCG +
E N D
BOWTIE + MACS Enrique Blanco 2013
1. RAW FORMAT: FASTQ Example: @1-ILLUMINA-GA:90515:1:1:28:2023 GGAGAAGTAGCCCGGAATGGTCGAGTAGGGAAGAGAAAGT + a^Z]HY]]SZaV]a`VL\OOWNYT_\`BBBBBBBBBBBBB @1-ILLUMINA-GA:90515:1:1:28:1470 AAGGGCGCGAAGCCCGTTACCAAGCCCGCCAAGGGAACCG + QYFQR__TXNUU]U]VGFFOOZTFZT_`]HNVJV\G__BB @1-ILLUMINA-GA:90515:1:1:28:2017 CGATACGTTCGCGTTCGTGTAACTTTGTGTTTGCAAATCA + aa]a^aa_aa``^_aaaaaa_Z`baaa`a```aa^`^a^T @1-ILLUMINA-GA:90515:1:1:28:931 ACTGCCCCACATCAGCACACCAAACACGTCCACCGCAGCT + INMG[TXXZYMFPTXN]NN]NZ\X_Z_ZN[XYBBBBBBBB …
2. INDEXES FOR READ MAPPING WITH BOWTIE Drosophila_melanogaster/UCSC/dm3/Sequence/BowtieIndex/ genome.1.ebwt genome.2.ebwt genome.3.ebwt genome.4.ebwt genome.rev.1.ebwt genome.rev.2.ebwt Copy only those files into a single folder to export it: export FLY="/usr/local/molbio/indexes/dm3"
3. FROM FASTQ TO READ MAPS IN dm3: ChIPseq (treatment and control) 2 ChIPseq experiments: (treatment) H3K4me3 and Control (input) Number of processes (threads) SAM output Info about running time mapping Indexes (dm3) Only reads with 1 hit Input file Output file FASTQ input > bowtie -t -p 4 -m 1 -S -q $FLY/genome rawfiles/H3K4me3_EID.fq maps/H3K4me3_EID.sam Time loading forward index: 00:00:00 Time loading mirror index: 00:00:00 Seeded quality full-index search: 00:33:46 # reads processed: 50624300 # reads with at least one reported alignment: 26219230 (51.79%) # reads that failed to align: 21122867 (41.72%) # reads with alignments suppressed due to -m: 3282203 (6.48%) Reported 26219230 alignments to 1 output stream(s) Time searching: 00:33:46 Overall time: 00:33:46 > bowtie -t -p 4 -m 1 -S -q $FLY/genome rawfiles/Control_EID.fq maps/Control_EID.sam Time loading forward index: 00:00:00 Time loading mirror index: 00:00:00 # reads processed: 25651696 # reads with at least one reported alignment: 15469848 (60.31%) # reads that failed to align: 6722832 (26.21%) # reads with alignments suppressed due to -m: 3459016 (13.48%) Reported 15469848 alignments to 1 output stream(s) Time searching: 00:17:44 Overall time: 00:17:44
4. FROM SAM TO BAM: READ MAPS SAM example 1-ILLUMINA-GA:90515:1:88:1105:333 0 chr2L 7732 50 40M * 0 0 CGCCAGTATACACGCAGCACCAGGTGCGCAATGAAGCCCC a```````a_\a`][^`Z_P]```Z^QDT_``\\\VS\]\ XA:i:0 MD:Z:40 NM:i:0 XS:A:+ NH:i:1 (01) 1-ILLUMINA-GA:90515:1:88:1105:333 <------ NAME (02) 0 <------ FLAG: (TOPHAT) 0 means mapped in FWD; 16 means mapped in RVS (03) chr2L <------ CHROMOSOME (04) 7732 <------ 1-based POSITION IN CHROMOSOME 50 <------ MAPPING quality 40M <------ CIGAR alignment * 0 0 (10) CGCCAGTATACACGCAGCACCAGGTGCGCAATGAAGCCCC <------ SEQUENCE a```````a_\a`][^`Z_P]```Z^QDT_``\\\VS\]\ <------ PHRED quality score XA:i:0 MD:Z:40 NM:i:0 XS:A:+ NH:i:1 SAM to BAM (compression) > samtools view -S -b -o maps/H3K4me3_EID.bam maps/H3K4me3_EID.sam > samtools view -S -b -o maps/Control_EID.bam maps/Control_EID.sam
5. CREATING THE PROFILE AND CALLING PEAKS (A) The program has to find the appropriate fragment length out > macs14 -t maps/H3K4me3_EID.bam -c maps/Control_EID.bam -g dm -n peaks_test1 -B -S Control sample Read map Treatment sample Read map Genome size Output folder Generate profile in BEDGRAPH, in one single file (B) The user sets the fragment length to 162x2=324 bp > macs14 -t maps/H3K4me3_EID.bam -c maps/Control_EID.bam -g dm -n peaks_test2 -B -S --nomodel --shiftsize 162 Do not run the fragment length estimation Use this value for fragment length (C) The program has to find the fragment size, the peaks and the subpeaks > macs14 -t maps/H3K4me3_EID.bam -c maps/Control_EID.bam -g dm -n peaks_test3 -B --call-subpeaks Run the PeakSplitter program to find subpeaks
6. MACS OUTPUT INFO @ Mon, 01 Jul 2013 15:21:33: # ARGUMENTS LIST: # name = peaks_test1 # format = AUTO # ChIP-seq file = maps/H3K4me3_EID.bam # control file = maps/Control_EID.bam # effective genome size = 1.20e+08 # band width = 300 # model fold = 10,30 # pvalue cutoff = 1.00e-05 # Large dataset will be scaled towards smaller dataset. # Range for calculating regional lambda is: 1000 bps and 10000 bps INFO @ Mon, 01 Jul 2013 15:21:33: #1 read tag files... INFO @ Mon, 01 Jul 2013 15:21:33: #1 read treatment tags... INFO @ Mon, 01 Jul 2013 15:21:33: Detected format is: BAM INFO @ Mon, 01 Jul 2013 15:21:33: tag size: 36 INFO @ Mon, 01 Jul 2013 15:21:44: 1000000 … INFO @ Mon, 01 Jul 2013 15:30:43: 50000000 INFO @ Mon, 01 Jul 2013 15:31:15: #1.2 read input tags... INFO @ Mon, 01 Jul 2013 15:31:15: Detected format is: BAM INFO @ Mon, 01 Jul 2013 15:31:26: 1000000 … INFO @ Mon, 01 Jul 2013 15:35:55: 25000000 INFO @ Mon, 01 Jul 2013 15:36:16: #1 tag size is determined as 36 bps INFO @ Mon, 01 Jul 2013 15:36:16: #1 tag size = 36 INFO @ Mon, 01 Jul 2013 15:36:16: #1 total tags in treatment: 26219230 INFO @ Mon, 01 Jul 2013 15:36:16: #1 user defined the maximum tags... INFO @ Mon, 01 Jul 2013 15:36:16: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s) INFO @ Mon, 01 Jul 2013 15:36:25: #1 tags after filtering in treatment: 9070120 INFO @ Mon, 01 Jul 2013 15:36:25: #1 Redundant rate of treatment: 0.65 INFO @ Mon, 01 Jul 2013 15:36:25: #1 total tags in control: 15469848 INFO @ Mon, 01 Jul 2013 15:36:25: #1 user defined the maximum tags... INFO @ Mon, 01 Jul 2013 15:36:25: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s) INFO @ Mon, 01 Jul 2013 15:36:30: #1 tags after filtering in control: 7966937 INFO @ Mon, 01 Jul 2013 15:36:30: #1 Redundant rate of control: 0.49 INFO @ Mon, 01 Jul 2013 15:36:30: #1 finished! INFO @ Mon, 01 Jul 2013 15:36:30: #2 Build Peak Model... INFO @ Mon, 01 Jul 2013 15:36:45: #2 number of paired peaks: 2518 INFO @ Mon, 01 Jul 2013 15:36:55: #2 finished! INFO @ Mon, 01 Jul 2013 15:36:55: #2 predicted fragment length is 350 bps INFO @ Mon, 01 Jul 2013 15:36:55: #2.2 Generate R script for model : peaks_test1_model.r INFO @ Mon, 01 Jul 2013 15:36:55: #3 Call peaks... INFO @ Mon, 01 Jul 2013 15:36:55: #3 shift treatment data INFO @ Mon, 01 Jul 2013 15:36:59: #3 merge +/- strand of treatment data INFO @ Mon, 01 Jul 2013 15:37:02: #3 save the shifted and merged tag counts into bedGraph file... INFO @ Mon, 01 Jul 2013 15:37:02: write to a bedGraph file INFO @ Mon, 01 Jul 2013 15:46:55: compress the bedGraph file using gzip... INFO @ Mon, 01 Jul 2013 15:47:21: #3 call peak candidates INFO @ Mon, 01 Jul 2013 15:50:40: #3 shift control data INFO @ Mon, 01 Jul 2013 15:50:40: #3 merge +/- strand of control data INFO @ Mon, 01 Jul 2013 15:50:46: #3 save the shifted and merged tag counts into bedGraph file... INFO @ Mon, 01 Jul 2013 15:50:46: write to a bedGraph file INFO @ Mon, 01 Jul 2013 16:00:02: compress the bedGraph file using gzip... INFO @ Mon, 01 Jul 2013 16:00:29: #3 call negative peak candidates INFO @ Mon, 01 Jul 2013 16:04:38: #3 use control data to filter peak candidates... INFO @ Mon, 01 Jul 2013 16:04:53: #3 Finally, 3678 peaks are called! INFO @ Mon, 01 Jul 2013 16:04:53: #3 find negative peaks by swapping treat and control INFO @ Mon, 01 Jul 2013 16:05:28: #3 Finally, 6303 peaks are called! INFO @ Mon, 01 Jul 2013 16:05:28: #4 Write output xls file... peaks_test1_peaks.xls INFO @ Mon, 01 Jul 2013 16:05:28: #4 Write peak bed file... peaks_test1_peaks.bed INFO @ Mon, 01 Jul 2013 16:05:28: #4 Write summits bed file... peaks_test1_summits.bed INFO @ Mon, 01 Jul 2013 16:05:28: #4 Write output xls file for negative peaks... peaks_test1_negative_peaks.xls INFO @ Mon, 01 Jul 2013 16:05:28: #5 Done! Check the output files!
7. MACS RESULTS track type=bedGraph name="peaks_test1_treat_all" description="Extended tag pileup from MACS version 1.4.2 20120305" chr2L 5058 5077 1 chr2L 5077 5078 2 chr2L 5078 5126 3 … track type=bedGraph name="peaks_test1_control_all" description="Extended tag pileup from MACS version 1.4.2 20120305" chr2L 5552 5555 1 chr2L 5555 5556 2 chr2L 5556 5866 3 FOLDER peaks_test1/ peaks_test1_model.r peaks_test1_MACS_bedGraph/ peaks_test1_peaks.bed peaks_test1_summits.bed peaks_test1_peaks.xls peaks_test1_negative_peaks.xls chr2L 66232 68723 MACS_peak_1 1062.92 chr2L 71865 74845 MACS_peak_2 1671.86 chr2L 84954 88053 MACS_peak_3 814.30 chr start end length summit tags -10*log10(pvalue) fold_enrichment FDR(%) chr2L 66233 68723 2491 952 591 1062.92 8.65 24.59 chr2L 71866 74845 2980 1002 795 1671.86 12.91 14.81 chr2L 84955 88053 3099 2107 611 814.30 9.04 33.29 …
8. BEDGRAPH PROFILES AND BED PEAKS IN THE UCSC GENOME BROWSER (1) (test1) Fsize (test3) Fsize + subpeaks
9. BEDGRAPH PROFILES AND BED PEAKS IN THE UCSC GENOME BROWSER (2) (test2) Fsize=324 (test1) Fsize