1 / 31

Elevator Speech for Oracle Interview

Elevator Speech for Oracle Interview. Zhaoming Yin Jan 16, 2014. Summary. Work on GPU Algorithms for Sequence Alignment -Using GPU to parallelize the HMM and CRF algorithm for sequence alignment. Work on Algorithm to cope with Genome Rearrangement Problem

wardah
Télécharger la présentation

Elevator Speech for Oracle Interview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Elevator Speech for Oracle Interview Zhaoming Yin Jan 16, 2014

  2. Summary • Work on GPU Algorithms for Sequence Alignment -Using GPU to parallelize the HMM and CRF algorithm for sequence alignment. • Work on Algorithm to cope with Genome Rearrangement Problem -Algorithm engineering method to accelerate the median algorithm for more than 2 orders of magnitude. -new algorithm to deal with unequal content data. -A newsoftware package to construct tree with unequal content data • Work on parallelizing optimization problems -problems such as knapsack problem, exemplar distance problem (which are all NP-Hard problems)

  3. GPU Algorithms for Sequence Alignment (HMM & CRF) wave-front algorithm: The computing procedure is similar to a frontier of a wave to fill a matrix, where each block’s value in the matrix is calculated based on the values of the previously-calculated blocks.

  4. GPU Algorithms for Sequence Alignment (HMM & CRF) Streaming Algorithm: Transfering data between Host and Device.

  5. Genome Rearrangement (algorithm engineering) Genome rearrangements observed in Drosophila polytene chromosomes. DOBZHANSKY, T., and A. H. STURTEVANT, 1938 Inversions in the chromosomes of Drosophila pseudoobscura. Genetics 23: 28-64.

  6. Experimental Results (Time)

  7. Experimental Results (Space)

  8. Genome Rearrangement (New Algorithm) deletion Traditional Algorithm: 1 2 3 4 5 6 7 8 9 10 1 3 -2 4 6 7 -9 -10 -8 1 2 -3 -7 -6 -9 -8 -10 4 ………… New Algorithm: 1 2 4 5 6 7 8 9 10 1 3 -2 -2 4 6 6 7 -9 -10 -8 1 2 -3 -7 -6 -9 -8 -10 4 ………… duplication

  9. Experimental Result Close distance estimation Close median estimation http://ai.stanford.edu/~serafim/CS374_2006/presentations/lecture17.ppt

  10. Genome Rearrangement In 1980s Jeffrey Palmer studied evolution of plant organelles by comparing mitochondrial genomes of the cabbage and turnip, 99% similarity between genes, These surprisingly identical gene sequences differed in gene order, This study helped pave the way to analyzing genome rearrangements in molecular evolution. 1 2 3 4 5 6 7 8 9 10 Inversion: 1 2 –6 –5 -4 -3 7 8 9 10 Transposition: 1 2 7 8 3 4 5 6 9 10 Inverted Transposition: 1 2 7 8 –6 -5 -4 -3 9 10

  11. Genome Median Computation 5 6 5 6 4 2 3 3 1 4 2 1 4 4 3 3 1 1 5 5 6 6 2 2

  12. Genome Median Computation 1,2,3 4 1,-3,-2 -2,-1,3 3 1 5 6 1,2,3 = 2 moves 2,-1,3 = 5 moves ….. 2

  13. Step 1: Spectral Partition

  14. Step 2: Compute MP Tree for Each Sub-Disk

  15. 4 3 5 2 6 1 7 8 Step 2-1: How to Compute Median (BNB) 4 4 3 3 5 5 4 3 2 2 5 6 6 2 6 1 1 7 7 1 8 8 7 8 4 3 5 2 6 1 7 8 4 3 5 2 6 4 4 3 3 5 5 1 2 2 6 6 7 8 1 1 7 7 8 8

  16. Step 2-2: How to Compute Median (LK) …………………. stop

  17. Step 2-2: How to Evaluate Median 1 1, 2, 3, 4, 3, 6, 5 med 1, 2, 3, 3, 4, 6, 5 2 1, 2, 3, 4, 6, 3, 5 3 1, 2, 5, 4, 6, 3, 3 Dis(m,1)+Dis(m,2)+Dis(m,3)

  18. Step 2-2: How to Evaluate Median 1, 2, 3, 3, 4, 6, 5 1, 2, 3, 4, 3, 5 Find a mapping first (NP hard) dis=1 1, 2, 3, 3, 4, 6, 5 -2, -1, 3, 3, 4, 5 Complete the loss (polynomial) dis =2 1, 2, 3, 4, 6, 5 -2, -1, 3, 4, 6, 5 Compute DCJ (polynomial) dis =3 1, 2, 3, 4, 6, 5 1, 2, 3, 4, 6, 5

  19. Step 3: Merge Disks Decomposition of The disks Construct a tree for each disk Merge the tree using A specific consensus method: Strict, majority etc… Disambiguation

  20. Step 4: Initialization Init by insertion Which is local 4 3 1 5 6 c X 2 b 1 2 e Init by prospection Which is global. d

  21. Step5: Iterative Refinement 1 2 a b 3 4

  22. Review • Step 1: Spectral partition • Step 2: Subtree construction • Step 3: Supertree merge • Step 4: Initialization of complete tree using General Adequate Subgraph (GAS) method. • Step 5: Iterative Refinement until the complete tree converged.

  23. Result—Simulated Data seed #Theta+ #gamma+ #phi operations We grow our own tree We know the total number of evolution event in the model tree

  24. Result--Accuracy %of duplication 0.1 % of loss 0.1 Theta is % of inversion There are 8 species 2*8-3 =13edges. So the average accuracy is ~90%

  25. Result – Real Data SCRaMbLE Matrix • We can represent a SCRaMbLEd strain by its vector. • The sign gives the orientation. • The color encodes the position in the synthetic chromosome.

  26. Result – Real Data #inversion:#insertion/deletion:#duplication

  27. Parallel Method [Bader 05] Load Balancing Parallel search

  28. Experimental Results (Parallel)

  29. Why Many-core BnB? • So many distributed memory MIP BnB frameworks (PICO, PEBBL, ALPS, COIN-OR). • Load balance of distributed BnB is highly relied on Ramp up, run time load balancing is not efficient. • But nowadays Peta-flops machines are mostly hybrid systems(distributed + many-core (or accelerators)).

  30. Experimental Results (Intel Phi knapsack)

More Related