1 / 12

BOWTIE + MACS

BOWTIE + MACS. Enrique Blanco 2013. 0. SOURCES. 1. RAW FORMAT: FASTQ. Example:. @1-ILLUMINA-GA:90515:1:1:28:2023 GGAGAAGTAGCCCGGAATGGTCGAGTAGGGAAGAGAAAGT + a^Z]HY]]SZaV]a`VLOOWNYT_`BBBBBBBBBBBBB @1-ILLUMINA-GA:90515:1:1:28:1470 AAGGGCGCGAAGCCCGTTACCAAGCCCGCCAAGGGAACCG +

faunus
Télécharger la présentation

BOWTIE + MACS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BOWTIE + MACS Enrique Blanco 2013

  2. 0. SOURCES

  3. 1. RAW FORMAT: FASTQ Example: @1-ILLUMINA-GA:90515:1:1:28:2023 GGAGAAGTAGCCCGGAATGGTCGAGTAGGGAAGAGAAAGT + a^Z]HY]]SZaV]a`VL\OOWNYT_\`BBBBBBBBBBBBB @1-ILLUMINA-GA:90515:1:1:28:1470 AAGGGCGCGAAGCCCGTTACCAAGCCCGCCAAGGGAACCG + QYFQR__TXNUU]U]VGFFOOZTFZT_`]HNVJV\G__BB @1-ILLUMINA-GA:90515:1:1:28:2017 CGATACGTTCGCGTTCGTGTAACTTTGTGTTTGCAAATCA + aa]a^aa_aa``^_aaaaaa_Z`baaa`a```aa^`^a^T @1-ILLUMINA-GA:90515:1:1:28:931 ACTGCCCCACATCAGCACACCAAACACGTCCACCGCAGCT + INMG[TXXZYMFPTXN]NN]NZ\X_Z_ZN[XYBBBBBBBB …

  4. 2. INDEXES FOR READ MAPPING WITH BOWTIE Drosophila_melanogaster/UCSC/dm3/Sequence/BowtieIndex/ genome.1.ebwt genome.2.ebwt genome.3.ebwt genome.4.ebwt genome.rev.1.ebwt genome.rev.2.ebwt Copy only those files into a single folder to export it: export FLY="/usr/local/molbio/indexes/dm3"

  5. 3. FROM FASTQ TO READ MAPS IN dm3: ChIPseq (treatment and control) 2 ChIPseq experiments: (treatment) H3K4me3 and Control (input) Number of processes (threads) SAM output Info about running time mapping Indexes (dm3) Only reads with 1 hit Input file Output file FASTQ input > bowtie -t -p 4 -m 1 -S -q $FLY/genome rawfiles/H3K4me3_EID.fq maps/H3K4me3_EID.sam Time loading forward index: 00:00:00 Time loading mirror index: 00:00:00 Seeded quality full-index search: 00:33:46 # reads processed: 50624300 # reads with at least one reported alignment: 26219230 (51.79%) # reads that failed to align: 21122867 (41.72%) # reads with alignments suppressed due to -m: 3282203 (6.48%) Reported 26219230 alignments to 1 output stream(s) Time searching: 00:33:46 Overall time: 00:33:46 > bowtie -t -p 4 -m 1 -S -q $FLY/genome rawfiles/Control_EID.fq maps/Control_EID.sam Time loading forward index: 00:00:00 Time loading mirror index: 00:00:00 # reads processed: 25651696 # reads with at least one reported alignment: 15469848 (60.31%) # reads that failed to align: 6722832 (26.21%) # reads with alignments suppressed due to -m: 3459016 (13.48%) Reported 15469848 alignments to 1 output stream(s) Time searching: 00:17:44 Overall time: 00:17:44

  6. 4. FROM SAM TO BAM: READ MAPS SAM example 1-ILLUMINA-GA:90515:1:88:1105:333 0 chr2L 7732 50 40M * 0 0 CGCCAGTATACACGCAGCACCAGGTGCGCAATGAAGCCCC a```````a_\a`][^`Z_P]```Z^QDT_``\\\VS\]\ XA:i:0 MD:Z:40 NM:i:0 XS:A:+ NH:i:1 (01) 1-ILLUMINA-GA:90515:1:88:1105:333 <------ NAME (02) 0 <------ FLAG: (TOPHAT) 0 means mapped in FWD; 16 means mapped in RVS (03) chr2L <------ CHROMOSOME (04) 7732 <------ 1-based POSITION IN CHROMOSOME 50 <------ MAPPING quality 40M <------ CIGAR alignment * 0 0 (10) CGCCAGTATACACGCAGCACCAGGTGCGCAATGAAGCCCC <------ SEQUENCE a```````a_\a`][^`Z_P]```Z^QDT_``\\\VS\]\ <------ PHRED quality score XA:i:0 MD:Z:40 NM:i:0 XS:A:+ NH:i:1 SAM to BAM (compression) > samtools view -S -b -o maps/H3K4me3_EID.bam maps/H3K4me3_EID.sam > samtools view -S -b -o maps/Control_EID.bam maps/Control_EID.sam

  7. 5. CREATING THE PROFILE AND CALLING PEAKS (A) The program has to find the appropriate fragment length out > macs14 -t maps/H3K4me3_EID.bam -c maps/Control_EID.bam -g dm -n peaks_test1 -B -S Control sample Read map Treatment sample Read map Genome size Output folder Generate profile in BEDGRAPH, in one single file (B) The user sets the fragment length to 162x2=324 bp > macs14 -t maps/H3K4me3_EID.bam -c maps/Control_EID.bam -g dm -n peaks_test2 -B -S --nomodel --shiftsize 162 Do not run the fragment length estimation Use this value for fragment length (C) The program has to find the fragment size, the peaks and the subpeaks > macs14 -t maps/H3K4me3_EID.bam -c maps/Control_EID.bam -g dm -n peaks_test3 -B --call-subpeaks Run the PeakSplitter program to find subpeaks

  8. 6. MACS OUTPUT INFO @ Mon, 01 Jul 2013 15:21:33: # ARGUMENTS LIST: # name = peaks_test1 # format = AUTO # ChIP-seq file = maps/H3K4me3_EID.bam # control file = maps/Control_EID.bam # effective genome size = 1.20e+08 # band width = 300 # model fold = 10,30 # pvalue cutoff = 1.00e-05 # Large dataset will be scaled towards smaller dataset. # Range for calculating regional lambda is: 1000 bps and 10000 bps INFO @ Mon, 01 Jul 2013 15:21:33: #1 read tag files... INFO @ Mon, 01 Jul 2013 15:21:33: #1 read treatment tags... INFO @ Mon, 01 Jul 2013 15:21:33: Detected format is: BAM INFO @ Mon, 01 Jul 2013 15:21:33: tag size: 36 INFO @ Mon, 01 Jul 2013 15:21:44: 1000000 … INFO @ Mon, 01 Jul 2013 15:30:43: 50000000 INFO @ Mon, 01 Jul 2013 15:31:15: #1.2 read input tags... INFO @ Mon, 01 Jul 2013 15:31:15: Detected format is: BAM INFO @ Mon, 01 Jul 2013 15:31:26: 1000000 … INFO @ Mon, 01 Jul 2013 15:35:55: 25000000 INFO @ Mon, 01 Jul 2013 15:36:16: #1 tag size is determined as 36 bps INFO @ Mon, 01 Jul 2013 15:36:16: #1 tag size = 36 INFO @ Mon, 01 Jul 2013 15:36:16: #1 total tags in treatment: 26219230 INFO @ Mon, 01 Jul 2013 15:36:16: #1 user defined the maximum tags... INFO @ Mon, 01 Jul 2013 15:36:16: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s) INFO @ Mon, 01 Jul 2013 15:36:25: #1 tags after filtering in treatment: 9070120 INFO @ Mon, 01 Jul 2013 15:36:25: #1 Redundant rate of treatment: 0.65 INFO @ Mon, 01 Jul 2013 15:36:25: #1 total tags in control: 15469848 INFO @ Mon, 01 Jul 2013 15:36:25: #1 user defined the maximum tags... INFO @ Mon, 01 Jul 2013 15:36:25: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s) INFO @ Mon, 01 Jul 2013 15:36:30: #1 tags after filtering in control: 7966937 INFO @ Mon, 01 Jul 2013 15:36:30: #1 Redundant rate of control: 0.49 INFO @ Mon, 01 Jul 2013 15:36:30: #1 finished! INFO @ Mon, 01 Jul 2013 15:36:30: #2 Build Peak Model... INFO @ Mon, 01 Jul 2013 15:36:45: #2 number of paired peaks: 2518 INFO @ Mon, 01 Jul 2013 15:36:55: #2 finished! INFO @ Mon, 01 Jul 2013 15:36:55: #2 predicted fragment length is 350 bps INFO @ Mon, 01 Jul 2013 15:36:55: #2.2 Generate R script for model : peaks_test1_model.r INFO @ Mon, 01 Jul 2013 15:36:55: #3 Call peaks... INFO @ Mon, 01 Jul 2013 15:36:55: #3 shift treatment data INFO @ Mon, 01 Jul 2013 15:36:59: #3 merge +/- strand of treatment data INFO @ Mon, 01 Jul 2013 15:37:02: #3 save the shifted and merged tag counts into bedGraph file... INFO @ Mon, 01 Jul 2013 15:37:02: write to a bedGraph file INFO @ Mon, 01 Jul 2013 15:46:55: compress the bedGraph file using gzip... INFO @ Mon, 01 Jul 2013 15:47:21: #3 call peak candidates INFO @ Mon, 01 Jul 2013 15:50:40: #3 shift control data INFO @ Mon, 01 Jul 2013 15:50:40: #3 merge +/- strand of control data INFO @ Mon, 01 Jul 2013 15:50:46: #3 save the shifted and merged tag counts into bedGraph file... INFO @ Mon, 01 Jul 2013 15:50:46: write to a bedGraph file INFO @ Mon, 01 Jul 2013 16:00:02: compress the bedGraph file using gzip... INFO @ Mon, 01 Jul 2013 16:00:29: #3 call negative peak candidates INFO @ Mon, 01 Jul 2013 16:04:38: #3 use control data to filter peak candidates... INFO @ Mon, 01 Jul 2013 16:04:53: #3 Finally, 3678 peaks are called! INFO @ Mon, 01 Jul 2013 16:04:53: #3 find negative peaks by swapping treat and control INFO @ Mon, 01 Jul 2013 16:05:28: #3 Finally, 6303 peaks are called! INFO @ Mon, 01 Jul 2013 16:05:28: #4 Write output xls file... peaks_test1_peaks.xls INFO @ Mon, 01 Jul 2013 16:05:28: #4 Write peak bed file... peaks_test1_peaks.bed INFO @ Mon, 01 Jul 2013 16:05:28: #4 Write summits bed file... peaks_test1_summits.bed INFO @ Mon, 01 Jul 2013 16:05:28: #4 Write output xls file for negative peaks... peaks_test1_negative_peaks.xls INFO @ Mon, 01 Jul 2013 16:05:28: #5 Done! Check the output files!

  9. 7. MACS RESULTS track type=bedGraph name="peaks_test1_treat_all" description="Extended tag pileup from MACS version 1.4.2 20120305" chr2L 5058 5077 1 chr2L 5077 5078 2 chr2L 5078 5126 3 … track type=bedGraph name="peaks_test1_control_all" description="Extended tag pileup from MACS version 1.4.2 20120305" chr2L 5552 5555 1 chr2L 5555 5556 2 chr2L 5556 5866 3 FOLDER peaks_test1/ peaks_test1_model.r peaks_test1_MACS_bedGraph/ peaks_test1_peaks.bed peaks_test1_summits.bed peaks_test1_peaks.xls peaks_test1_negative_peaks.xls chr2L 66232 68723 MACS_peak_1 1062.92 chr2L 71865 74845 MACS_peak_2 1671.86 chr2L 84954 88053 MACS_peak_3 814.30 chr start end length summit tags -10*log10(pvalue) fold_enrichment FDR(%) chr2L 66233 68723 2491 952 591 1062.92 8.65 24.59 chr2L 71866 74845 2980 1002 795 1671.86 12.91 14.81 chr2L 84955 88053 3099 2107 611 814.30 9.04 33.29 …

  10. 8. BEDGRAPH PROFILES AND BED PEAKS IN THE UCSC GENOME BROWSER (1) (test1) Fsize (test3) Fsize + subpeaks

  11. 9. BEDGRAPH PROFILES AND BED PEAKS IN THE UCSC GENOME BROWSER (2) (test2) Fsize=324 (test1) Fsize

  12. 10. COMPARISON WITH ANOTHER PEAK FINDING PROGRAM: PEAKSEQ

More Related