1 / 17

Fast Exact String Matching On the GPU

Fast Exact String Matching On the GPU. Michael C. Schatz and Cole Trapnell May 8, 2007 CMSC 740 Computer Graphics. String Matching Applications. A very common problem in computational biology is to find all occurrences (or approximate occurrences) of one string in another string

johana
Télécharger la présentation

Fast Exact String Matching On the GPU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Exact String Matching On the GPU Michael C. Schatz and Cole Trapnell May 8, 2007 CMSC 740 Computer Graphics

  2. String Matching Applications • A very common problem in computational biology is to find all occurrences (or approximate occurrences) of one string in another string • Genome Assembly, Gene Finding, Comparative Genomics, Functional analysis of proteins, Motif discovery, SNP analysis, Phylogenetic analysis, Primer Design • Short Read Resequencing: 200 Million 50bp reads • … • Sequence databases are huge, and growing exponentially • We need ever faster methods for string matching

  3. Suffix Trees to the Rescue • Tree of all suffixes of string S • Suffix i encoded on path to leaf i • Nodes: positions where suffixes diverge • Edges: substrings of S • Leaves: starting position of suffix • Suffix Links: traverse to next suffix • O(n) Construction • Ukkonen’s Algorithm • Exploits inter-suffix relationships and suffix links • O(k) Query Match • Every substring S[i,j] is a prefix of suffix i. • Walk from root following the characters in the query Q. • One leaf for each occurrence of Q in T. Suffix tree of “ACATAC$” *858E Algorithms for Biosequence Analysis

  4. Suffix Tree Search TAC$ $ A C 7 4 TAC$ $ ATAC$ C 3 6 2 ATAC$ $ 5 1 Searching for “ATA”… Suffix tree of “ACATAC$”

  5. Suffix Tree Search TAC$ $ A C 7 4 TAC$ $ ATAC$ C 3 6 2 ATAC$ $ 5 1 Searching for “ATA”… Suffix tree of “ACATAC$”

  6. Suffix Tree Search TAC$ $ A C 7 4 TAC$ $ ATAC$ C 3 6 2 ATAC$ $ 5 1 Searching for “ATA”… Suffix tree of “ACATAC$”

  7. Suffix Tree Search TAC$ $ A C 7 4 TAC$ $ ATAC$ C 3 6 2 ATAC$ $ 5 1 Searching for “ATA” found at position 3! Suffix tree of “ACATAC$”

  8. Suffix Tree Search TAC$ $ A C 7 4 TAC$ $ ATAC$ C 3 6 2 ATAC$ $ 5 1 Searching for “AC” found at positions 1 & 5 Searching for “ACT” “falls off” tree => Not in S Suffix tree of “ACATAC$”

  9. GPGPU Programming • Utilize the highly parallel SIMD architecture of the GPU • Nominally used for in parallel triangle rendering, texture application • Each processor executes same kernel • Dramatic runtime improvement for scientific applications • CUDA Architecture • API and runtime library to implement C style programming of stream processors • nVidia GeForce 8800 GTX (G80) • 16 multiprocessors w/ 8 processors • 128 stream processors @ 1.35 GHz • 768 MB total on board RAM • 2D Texture Cache for large readonly data *Image from CUDA Programming Guide

  10. Cmatch GPU Algorithm • Load Reference String • Create Suffix Tree • Load Query Strings • Transfer data to GPU • Execute Query Kernel • Up to 128 simultaneous matches on GPU • Fetch Results from GPU • Output results

  11. Data Structures on the GPU • Suffix tree nodes => 2D Texture • Encode node information & children pointers as RGBA color of texel • Arrange nodes in 32x32 blocks along space filling curve • Optimize near root for inter-thread caching, further down for an individual thread. • Reference String => 2D Texture • Access many successive characters along edge • Query Strings => On Board RAM • |Q| array with offsets in a large array of strings • Results buffer => On Board RAM • |Q| array with id of last visited node for query i

  12. Experimental Protocol • Comparing running time of (serial) CPU versus (parallel) GPU programs • CPU: 3.0 GHz Intel Xeon • GPU: nVidia GeForce 8800 GTX (128 processors @ 1.35 GHz) • Simulate short read resequencing projects by extracting substrings of reference sequences • References • Genome of Bacillus anthracis (5.20 Mbp) • Genome of Yersinia pestis (4.6 Mbp) • BAC-sized portion of Human Chromosome 2 (200 kbp) • Query sets (250 Mbp total) • 10 million x 25 bp • 5 million x 50 bp • 1.25 million x 200 bp • 312,500 x 800 bp

  13. Query Time Results Speedup of the GPU match kernel versus CPU match program.

  14. Long Read Query Time Results Future work to improve cache hit rate for longer reads.

  15. Processing Time GPU Cmatch is bounded by time to construct suffix tree and IO processing time

  16. Conclusions • We have reduced the computation processing time for short read resequencing from hours to minutes. • Make sure you have sufficient cooling available • Low arithmetic intensity GPGPU programs can have dramatic performance improvements (35x) over CPU execution • Utilizing the texture cache with careful node placement and minimizing register use were essential to high performance • A single GPU can supply same processing power as a small computer cluster at a fraction of the cost • Installing GPUs into an existing cluster can provide an order of magnitude increase in computing capacity. • More information: • http://www.cbcb.umd.edu/software/cmatch

  17. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Texture Space filling curve • Texture cache organized in 2x2 blocks. • Try to place all children of a node are in the same cache block

More Related