1 / 39

Weekly Report Start learning GPU

Weekly Report Start learning GPU. Ph.D. Student: Leo Lee Supervisor: Dr. Xiaowen Chu Date: Sep. 11, 2009. Outline. Protein identification and pFind GPU and data mining Research Plan. Protein identification and pFind. Background Identify flow Challenges

baxter
Télécharger la présentation

Weekly Report Start learning GPU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Weekly ReportStart learning GPU Ph.D. Student: Leo LeeSupervisor: Dr. Xiaowen ChuDate: Sep. 11, 2009

  2. Outline • Protein identification and pFind • GPU and data mining • Research Plan

  3. Protein identification and pFind • Background • Identify flow • Challenges • Could GPU be used?

  4. Protein identification and pFind • Background • Identify flow • Challenges • Could GPU be used?

  5. The Human Genome Project: China 1%

  6. Same gene,different protein

  7. Human Plasma Proteome Project, USA Human Disease Glycomics/Proteome Initiative (HGPI), Japan Human Proteome Program: China in charge of liver

  8. Characters of Proteome

  9. Protein identification and pFind • Background • Identify flow • Challenges • Could GPU be used?

  10. Mass Spectrometry Based Protein Identification Tandem MS LC-MS/MS Digest Mixed peptides Mixed Proteins Data analyze >ipi|IPI00243451|IPI00243451.6 MDQHQHLNKTAESASSEKKKTRRCNGFKMFLAALSFSYIAKALGGIIMKISITQIERRFD… TAESASSEK MFLAALSFSYIAK … Merge Protein sequence Peptide sequence

  11. Web search engine

  12. Protein identification SE 1200 1200 1000 1000 200 200 400 400 600 600 800 800 1000 1200 200 400 600 800 pFind TAESA MFLAALS … FSYIAK Go score query Sequence database …… …KFDTGIPDGFAGFFGHYAQGGITFRHEWTRJQIDF…

  13. 1200 1000 200 400 600 800 Protein identification SE 400.15 EVDG 400.15 AAEE 400.15 PSTD … 698.48 SVKKKK 699.78 TLKHLK 699.78 WDRDL …… 查询结果 Upper bound of mass:699.70 lower bound of mass699.90 digestion >IQPSKANME TEPDQ… >DEAVPPPAL QLQFN… ….. Protein sequence database

  14. Protein identification SE 1200 1200 1000 1000 200 200 400 400 600 600 800 800 1000 1200 200 400 600 800 Protein database MS >IQPSKANME TEPDQ… >DEAVPPPAL QLQFN… >RQRAILKVM NTIGGE… … ……

  15. Protein identification SE MS Peptide Protein database 400 EVDG 400 AAEE 400 PSTD 698 SVKKKK 699 TLKHLK 699 WDRDL …… >IQPSKANME TEPDQ… >DEAVPPPAL QLQFN… >RQRAILKVM NTIGGE… … Matching Digest

  16. Protein identification and pFind • Background • Identify flow • Challenges • Could GPU be used?

  17. Challenges of PISE Protein database MS Peptide EVDG AAEE PSTD SVKKKK TLKHLK WDRDL …… >IQPSKANME TEPDQ… >DEAVPPPAL QLQFN… >RQRAILKVM NTIGGE… Matching Digest Protein increase exponentially Generation Speed keep increasing PTM leads to huge peptides

  18. E.g. Phosphorylation Amino S, T and Y(HPO3,80Da) May be happen 25 kinds of possibilities PO3 PO3 PO3 PO3 PO3 EMSVPSCQYILSATNR

  19. Identification of PTM Peptide Protein 400 EVDG 400 AAEE 400 PSTD 631 EMSVPS 699 TLKHLK 699 WDRDL …… >IQPSKANME TEPDQ… >DEAVPPPAL QLQFN… >RQRAILKVM NTIGGE… …

  20. Protein identification and pFind • Background • Identify flow • Challenges • Could GPU be used? • http://bioinformatics.oxfordjournals.org/cgi/content/full/25/15/1937

  21. Protein identification on GPU • Each thread-each MS • Each thread-each score • Each thread-each “query” • V1 Match V2 Seems valuable to think further!

  22. Outline • Protein identification and pFind • GPU and data mining • Research Plan

  23. Google 2009.09.11

  24. GPU and data mining • Characters of GPU • GPU VS CPU • CUDA • Data mining on GPU

  25. GPU VS CPU 1 Based on slide 7 of S. Green, “GPU Physics,” SIGGRAPH 2007 GPGPU Course. http://www.gpgpu.org/s2007/slides/15-GPGPU-physics.pdf

  26. Control ALU ALU ALU ALU DRAM Cache DRAM Design philosophies are different. • The GPU is specialized for compute-intensive, massively data parallel computation (exactly what graphics rendering is about) • So, more transistors can be devoted to data processing rather than data caching and flow control • The fast-growing video game industry exerts strong economic pressure for constant innovation CPU GPU

  27. What is the GPU Good at? • The GPU is good at data-parallel processing • The same computation executed on many data elements in parallel – low control flow overhead withhigh SP floating point arithmetic intensity • Many calculations per memory access • Currently also need high floating point to integer ratio • High floating-point arithmetic intensity and many data elements mean that memory access latency can be hidden with calculations instead of big data caches – Still need to avoid bandwidth saturation!

  28. . . . . . . CUDA - No more shader functions. • CUDA integrated CPU+GPU application C program • Serial or modestly parallel C code executes on CPU • Highly parallel SPMD kernel C code executes on GPU CPU Serial Code Grid 0 GPU Parallel Kernel KernelA<<< nBlk, nTid >>>(args); CPU Serial Code Grid 1 GPU Parallel Kernel KernelB<<< nBlk, nTid >>>(args);

  29. CUDA • Basic • Memory • Threads • Application performance

  30. Data mining on GPU • K-means • K-nn • Apriori • SVM

  31. K-means on GPU • A team at University of Virginia, led by Professor Skadron • HKUST && MSRA • GPUMiner • LABS-hp

  32. Experiments -GPUMiner

  33. Experiments-HPL

  34. Data mining on GPU • The time of speed-up highly depends on the implementation • Data transfer • Memory • CPU-GPU cooperation

  35. Outline • Protein identification and pFind • GPU and data mining • Research Plan

  36. Research Plan • Keep reading related papers • GPU, data mining • Development • Read our k-means program • Try to speed it up • Try protein identification on GPU

  37. Time schedule • Courses • Thu. 6.30-9.30pm, data mining • TA • Tue. 11.30-12.20am, Network security; • Fri. 9.30-11.30am, Network security;

  38. Thank you for your listening

More Related