250 likes | 394 Vues
Implementation and Analysis of Parallel Motif Finding Algorithm s for Bioinformatics. WMU CS 6260 Parallel Computations II Spring 2013 Presentation #1 about Semester Project Feb/18/2013 Professor: Dr. de Doncker Name: Sandino Vargas Xuanyu Hu. Outline.
E N D
Implementation and Analysis of Parallel Motif Finding Algorithms for Bioinformatics WMU CS 6260 Parallel Computations II Spring 2013 Presentation #1 about Semester Project Feb/18/2013 Professor: Dr. de Doncker Name: Sandino Vargas Xuanyu Hu
Outline • Our Team of This Semester Project • Project Topic Background: Bioinformatics • Parallel Algorithms in Bioinformatics • Problem We Want to Solve • Solution in a Sequential Way • Demo of Sequential Program • How to Parallelize It? (paper) • Conclusion • Reference • Questions?
Our team of Semester Project • Members of our team: • Sandino Vargas • Xuanyu Hu • We have taken the same courses • WMU_CS6030_Bioinformatics (Summer II 2012) • WMU_CS5260_Parallel Computation I • WMU_CS6260_Parallel Computation II (Spring 2013) • Our professor, Dr. de Doncker, will be teaching the interesting course CS6030 "Biomedical Informatics“ again, in the next semester: Summer I (2013).
Project Topic Background: Bioinformatics • Bioinformatics is an interdisciplinary field that develops and improves upon methods for storing, retrieving, organizing and analyzing biological data. • A major activity in bioinformatics is to develop software tools to generate useful biological knowledge.
Subjects in Bioinformatics • Mapping DNA • Sequencing DNA • Comparing Sequences • Predicting Genes • Finding Signals • Identifying Proteins • Repeat Analysis • DNA Arrays • Genome Rearrangements • Molecular Evolution
Sequencing DNA • After we read the materials and searched the internet, we both agreed that Sequencing DNA is the best object for implementation and analysis of parallel computations. • There will be 3 major questions: • What is Sequencing DNA? • Why we need Sequencing DNA? • How to Sequencing DNA?
What is Sequencing DNA? • DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. It includes any method or technology that is used to determine the order of the four bases—adenine, guanine, cytosine, and thymine—in a strand of DNA. • The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.
One Thing We Need to Know • This is a picture of a DNA model • 2 strands and made of 4 kinds of pairs • A-T != T-A • A – T • T – A • G – C • C – G
Why we need Sequencing DNA? • DNA sequencing may be used to determine the sequence of individual genes, larger genetic regions, full chromosomes or entire genomes. • Depending on the methods used, sequencing may provide the order of nucleotides in DNA or RNA isolated from cells of animals, plants, bacteria, or any other source of genetic information.
Why we need Sequencing DNA? • The resulting sequences may be used by researchers in molecular biology or genetics to further scientific progress or may be used by medical personnel to make treatment decisions or aid in genetic counseling. • Function = DNA Pattern • DNA Function: • Estimate the function of a new kind of virus • Kill the virus’s function by starvation
Another Thing We Need to Know • In genetics, a mutation is a change of the nucleotide sequence of the genome of an organism, virus, or extra-chromosomal genetic element. • Mutations may or may not produce changes in the observable characteristics of an organism. • Mutations play a part in both normal and abnormal biological processes, including evolution, cancer, and the development of the immune system.
How to Sequencing DNA? • Lots of methods: • Maxam-Gilbert sequencing • Chain-termination methods • Shotgun sequencing • Bridge PCR • Polony sequencing • 454 Pyrosequencing • Ion semiconductor sequencing • DNA nanoball sequencing
Problem We Want to Solve • We have several DNA strands, and some of them might have mutation. • They have the same function. Or they are from the same species. • We want to find the DNA pattern that make them to have the same function. • DNA is made of 2 strands, each stain is made of A, T, G, C. If we know one of the 2 stains, we can easily know another one.
Examples • ATGCAACT is the DNA pattern we want to find. • Small letter means mutation.
Solution in a Sequential Way • Greedy Algorithm
Solution in a Sequential Way • Brute Force
Solution in a Sequential Way • Branch And Bound:
Conclusion • Some materials about my semester project • Bioinformatics • Sequencing DNA • Parallel Algorithms in Bioinformatics • Problem We Want to Solve • Solution in a Sequential Way • How to Parallelize It (paper)
Questions? • I would be very happy to answer any questions you have.
Reference • http://en.wikipedia.org/wiki/Bioinformatics • http://en.wikipedia.org/wiki/DNA_sequencing • http://www.gpugrid.net/ • http://meseec.ce.rit.edu/756-projects/spring2006/d2/1/Bioinformatics.pdf • http://www.ac.uma.es/~ots/papers/survey.pdf • http://eprints.ru.ac.za/162/1/Akhurst_MSc.pdf • http://en.wikipedia.org/wiki/Mutation • http://www.cs.washington.edu/education/courses/cse527/04au/proj/lyons-talk.pdf • http://www.cs.washington.edu/education/courses/cse527/04au/proj/lyons-paper.pdf