1 / 13

Tuning Tophat2

Tuning Tophat2. Belinda Giardine. Tophat2. Aligns reads from RNA to the genome Ribonucleic acid (RNA) is a ubiquitous family of large biological molecules that perform multiple vital roles in the coding, decoding, regulation, and expression of genes.

bob
Télécharger la présentation

Tuning Tophat2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tuning Tophat2 Belinda Giardine

  2. Tophat2 • Aligns reads from RNA to the genome • Ribonucleic acid (RNA) is a ubiquitous family of large biological molecules that perform multiple vital roles in the coding, decoding, regulation, and expression of genes. • Adds on dealing with gaps in the alignments by breaking the reads into small pieces ~20 bases and reassembling the reads after mapping. • Though the new version is more parallel still slow (more than 4 days for recent runs) • It uses Bowtie to do the actual mapping

  3. RNA-seq image from wikipedia fastq file, a single read: @DGM97JN1:330:C3EW0ACXX:1:1101:2723:1993 1:N:0:NAAGGC GAATGCCCCCGGCCGTCCCTCTTAATCATGGCCTCAGTTCCGAAAACCANCAAAATAGAACCGCGGTCCTATTNN + CCCFFFFFHHHHGIIFGIIIIJJIIJIFGIJEHIIJIGHIJHAGHHFEE#,;;BACEEDDDDDD@B>BBDCDC##

  4. Tophat2 • Pipeline written in C++ (34,351 lines of code in 63 files) • Wrapper written in Python • 3 of the programs use Boost pthreads • long_spanning_reads.cpp • segment_juncs.cpp • tophat_reports.cpp • Programs are compiled as one unit under autoconfig and automake, communication between programs with temporary files. • Many prerequisites: zlib, Boost, samtools, Bowtie, this and the amount of file IO makes running on MIC only not feasible.

  5. Data files • Reads in fastq format, 20–200 million reads (2 x 20gb for my test) • Reference sequence and indexes used for mapping 6gb for mouse • Final output 14gb for my test

  6. Work from last time • Compiling • start with gcc • then icc • then add –mmic (this failed in trying to get all the prerequisites) • Test run on host, using Tophat’s log of run for time. • Run on biostar(Xeon) using 8 threads took 26 hours • Run on stampede (host) using 16 threads took 19 hours, 40 mins • Run on stampede (host) using 32 threads took 24 hours

  7. New work • Python wrapper and long run times makes gprof and vtune difficult to profile code with. • Going from my experience in Biostar, I am starting with segment_juncs executable. • Keeping the temporary files that are used for passing data between programs, I ran just segment_juncs. • Time for segment_junctions run alone: • 8 threads 2 hours 13 minutes • 16 threads 1 hour 15 minutes (2 ½ out of 19 ½ hours total) • of this 76% is spent in the parallel section • 32 threads 2 hours 12 minutes

  8. Failed attempts • Run vtune onsegment_juncs • times out of full data • license errors • Check loops inpar_reportthat are assumed dependencies. • lines of code indicated not loops or in loops? • contradictory lines • Offloading threaded section of code in segment_juncs.cpp. Will it actually improve speed or too much file IO? • Lots of variables to copy • File IO

  9. Hardison Lab

  10. vec_report3 segment_juncs.cpp(135): (col. 32) remark: loop was not vectorized: existence of vector dependence. segment_juncs.cpp(135): (col. 32) remark: vector dependence: assumed ANTI dependence between r.92068 line 135 and r.92068 line 135. segment_juncs.cpp(135): (col. 32) remark: vector dependence: assumed FLOW dependence between r.92068 line 135 and r.92068 line 135. Line 135: left_seg.left = max(0, T.right() - 2);

  11. opt_report • REMOVED VAR left_mismatches.201433.0_V$78b • REMOVED PACK left_mismatches.201433.0 • REMOVED VAR right_mismatches.201433.0_V$78d • REMOVED PACK right_mismatches.201433.0

  12. gprof output for segment_juncs • Each sample counts as 0.01 seconds. • % cumulative self self total • time seconds seconds calls Ts/call Ts/call name • 100.01 0.01 0.01 extend_from_seeds(std::vector<SeedExtension, std::allocator<SeedExtension> >&, PackedSplice const&, std::vector<std::vector<ReadHit, std::allocator<ReadHit> >, std::allocator<std::vector<ReadHit, std::allocator<ReadHit> > > > const&, std::string const&, std::string const&, unsigned long, unsigned long, int) • 0.00 0.01 0.00 89528 0.00 0.00 pack_splice(std::string const&, int, int, unsigned int) • 0.00 0.01 0.00 3 0.00 0.00 __do_global_dtors_aux • 0.00 0.01 0.00 2 0.00 0.00 pack_right_splice_half(std::string const&, unsigned int, unsigned int)

  13. Parallel section of code: vector<boost::thread*> threads; for (inti = 0; i < num_threads; ++i) { SegmentSearchWorker worker; worker.rt = &rt; worker.reads_fname = left_reads_fname; worker.segmap_fnames = &left_segmap_fnames; worker.partner_reads_map_fname = right_reads_map_fname; worker.seg_partner_reads_map_fname = right_seg_fname_for_segment_search; worker.juncs = &vseg_juncs[i]; worker.deletions = &vdeletions[i]; worker.insertions = &vinsertions[i]; worker.fusions = &vfusions[i]; worker.read = READ_LEFT; worker.partner_hit_offset = 0; worker.seg_partner_hit_offset = 0; if (i == 0) { worker.begin_id = 0; worker.seg_offsets = vector<int64_t>(left_segmap_fnames.size(), 0); worker.read_offset = 0; } else { worker.begin_id = read_ids[i-1]; worker.seg_offsets.insert(worker.seg_offsets.end(), offsets[i-1].begin()+1, offsets[i-1].end()); worker.read_offset = offsets[i-1][0]; if (partner_offsets.size() > 0) worker.partner_hit_offset = partner_offsets[i-1]; if (seg_partner_offsets.size() > 0) worker.seg_partner_hit_offset = seg_partner_offsets[i-1]; } worker.end_id = (i+1 < num_threads) ? read_ids[i] : std::numeric_limits<uint64_t>::max(); //Geo debug: //fprintf(stderr, "Worker %d: begin_id=%lu, end_id=%lu\n", i, worker.begin_id, worker.end_id); if (num_threads > 1 && i + 1 < num_threads) threads.push_back(newboost::thread(worker)); else worker(); }

More Related