Download
team members joshua wu 11174269 shuyu christine xu 11161640 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Advanced ComputationAL Biology Project Presentation PowerPoint Presentation
Download Presentation
Advanced ComputationAL Biology Project Presentation

Advanced ComputationAL Biology Project Presentation

153 Vues Download Presentation
Télécharger la présentation

Advanced ComputationAL Biology Project Presentation

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640 Advanced ComputationAL Biology Project Presentation

  2. OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work Now we are here

  3. Project Description Explicit Suffix Trees Suppose that we want to store explicitly all strings that are edge labels of a suffix tree. The main question of this project is how much space explicit suffix trees require comparing to implicit suffix trees. Implement suffix tree algorithm and run it on substrings of real data.

  4. OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work Now we are here

  5. Introduction • Any string of length m can be degenerated into m suffixes, and these suffixes can be stored in a suffix tree. • Setup time O(m) (m is length of string) • searching time O(n) (n is length of pattern)

  6. OVERVIEW • Project Description • Introduction • Motivation • Bioinformatics Application • Explicit vs Implicit • Problem Analysis • Implement Files • Experimental Results • Conclusion • Possible Future Work Now we are here

  7. Motivation • "Suffix trees are widely used in the computer field... Recent improvements in the method have cut the memory requirement to 17 bytes per letter, which brings the method to the verge of practicality [for bioinformatics applications]" -- Nat Goodman (Genome Technology).

  8. OVERVIEW • Project Description • Introduction • Motivation • Bioinformatics Application • Explicit vs Implicit • Problem Analysis • Implement Files • Experimental Results • Conclusion • Possible Future Work Now we are here

  9. Bioinformatics Application • multiple genome alignment (Michael Hohl et al., 2002) • selection of signature oligonucleotides for DNA arrays (Kaderali and Schliep, 2002) • identification of sequence repeats (Kurtz and Schleiermacher, 1999)

  10. OVERVIEW • Project Description • Introduction • Motivation • Bioinformatics Application • Explicit vs Implicit • Problem Analysis • Implement Files • Experimental Results • Conclusion • Possible Future Work Now we are here

  11. Explicit vs Implicit • ABC $ Explicit • 1 2 3 4 ABC$ $ BC$ C$ Implicit 1,4 4,4 2,4 3,4

  12. OVERVIEW • Project Description • Introduction • Motivation • Bioinformatics Application • Explicit vs Implicit • Problem Analysis • Implement Files • Experimental Results • Conclusion • Possible Future Work Now we are here

  13. Problem Analysis • Best Case for explicit and implicit suffix trees: All different characters • Best case not likely with DNA inputs: total of 4 characters • Worst case: same characters throughout

  14. Assumptions • In implicit trees, each number will only take up one bit. (the number 10 takes up 1 bit) • Only alphabets will be in the sequence

  15. Example: all different char • ABCD $ 1,5 5,5 • 1 2 3 4 5 2,5 3,5 4,5 • N: string length • N = 5 • Memory = 10 • best case

  16. Example • ABCABC $ 7,7 • 1 2 3 4 5 6 7 1,3 2,3 6,6 • N: string length • N = 7 4,7 7,7 7,7 7,7 • Memory = 20 4,7 4,7

  17. Example: all same character • AAAA $ • 1 2 3 4 5 1,1 5,5 • N=string length • N = 5, 6, 7 2,2 5,5 • Memory = 16, 20, 24 • Memory = 4n-4 3,3 5,5 • Worse case 4,5 5,5

  18. Program Input Data DNA for all kinds of creatures: Homo Sapiens, Monkeys, Chickens, …

  19. OVERVIEW • Project Description • Introduction • Motivation • Bioinformatics Application • Explicit vs Implicit • Problem Analysis • Implement Files • Experimental Results • Conclusion • Possible Future Work Now we are here

  20. Sample input: Homo Sapien • cagctcctgagactgctggcatgaaggggagccgtgccctcctgctggtggccctcaccctgttctgcatctgccggatggccacaggggaggacaacgatgagtttttcatggacttcctgcaaacactactggtggggaccccagaggagctctatgaggggaccttgggcaagtacaatgtcaacgaagatgccaaggcagcaatgactgaactcaagtcctgcagagatggcctgcagccaatgcacaaggcggagctggtcaagctgctggtgcaagtgctgggcagtcaggacggtgcctaagtggacctcagacatggctcagccataggacctgccacacaagcagccgtggacacaacgcccactaccacctcccacatggaaatgtatcctcaaaccgtttaatcaataa

  21. Sample result

  22. Sample input 2: plants • EARPIVVGPPPPLSGGLPGTENSDQARDGTLPYTKDRFYLQPLPPTEAAQRAKVSASEILNVKQFIDRKAWPSLQNDLRLRASYLRYDLKTVISAKPKDEKKSLQELTSKLFSSIDNLDHAAKIKSPTEAEKYYGQTVSNINEVLAKLG

  23. Sample output:

  24. OVERVIEW • Project Description • Introduction • Motivation • Bioinformatics Application • Explicit vs Implicit • Problem Analysis • Implement Files • Experimental Results • Conclusion • Possible Future Work Now we are here

  25. Homo Sapien

  26. Sample Input: Homo Sapiens • atgaaggggagccgtgccctcctgctggtggccctcaccctgttctgcatctgccggatggccacaggggaggacaacgatgagtttttcatggacttcctgcaaacactactggtggggaccccagaggagctctatgaggggaccttgggcaagtacaatgtcaacgaagatgccaaggcagcaatgactgaactcaagtcctgcagagatggcctgcagccaatgcacaaggcggagctggtcaagctgctggtgcaagtgctgggcagtcaggacggtgcctaa

  27. Comparisons: Homo Sapiens

  28. Comparisons: Homo Sapiens

  29. Monkey Virus

  30. Sample Input: Monkey Virus • GGSCFKCGKKGHFAKNCHEHAHNNAEPKVPGLCPRCKRGKHWANECKSKTDNQGNPIPPH

  31. Monkey Virus

  32. Plants

  33. Sample Input: Plants • EARPIVVGPPPPLSGGLPGTENSDQARDGTLPYTKDRFYLQPLPPTEAAQRAKVSASEILNVKQFIDRKAWPSLQNDLRLRASYLRYDLKTVISAKPKDEKKSLQELTSKLFSSIDNLDHAAKIKSPTEAEKYYGQTVSNINEVLAKLG

  34. Plants

  35. Tobacco

  36. Sample input: tobacco • SYSITTPSQFVFLSSAWADPIELINLCTNALGNQFQTQQARTVVQRQFSEVWKPSPQVTVRFPDSDFKVYRYNAVLDPLVTALLGAFDTRNRIIEVENQANPTTAETLDATRRVDDATVAIRSAINNLIVELIRGTGSYNRSSFESSSGLVWTSGPAT

  37. Tobacco

  38. Insects

  39. Sample Input: Insects • DCLSGRYKGPCAVWDNETCRRVCKEEGRSSGHCSPSLKCWCEGC

  40. Insects

  41. Birds

  42. Sample Input: Birds • IDTCRLPSDRGRCKASFERWYFNGRTCAKFIYGGCGGNGNKFPTQEACMKRCAKA

  43. Birds

  44. SARS

  45. Sample Input: SARS • ALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEV

  46. SARS

  47. Fish

  48. Sample Input: Fish • GHHHHHHLEDPSGGTPYIGSKISLISKAEIRYEGILYTIDTENSTVALAKVRSFGTEDRPTDRPIAPRDETFEYIIFRGSDIKDLTVCEPPKPIM

  49. Fish

  50. Chicken