1 / 19

Computational Methods to study Sequencing data

Computational Methods to study Sequencing data. -Meenakshi Sharma. Outline. Bioinformatics Genomics Motivation Challenges Next-Generation-Sequencing Pipeline Sequencing Mapping Assembly Blast. Introduction. Biology Computer Science Data Mining Statistics Applied Mathematics

saber
Télécharger la présentation

Computational Methods to study Sequencing data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Methods to study Sequencing data -Meenakshi Sharma

  2. Outline • Bioinformatics • Genomics • Motivation • Challenges • Next-Generation-Sequencing Pipeline • Sequencing • Mapping • Assembly • Blast

  3. Introduction • Biology • Computer Science • Data Mining • Statistics • Applied Mathematics • Applied Chemistry • Applied Physics Bioinformatics

  4. Definition • Bioinformatics definition by bioinformatics definition Committee, National Institute of Mental Health released on July 17, 2000 “Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.”

  5. Genomics • Determine the complete DNA sequence for all genetic material contained in an organism • Analysis and comparison of entire genome of a single or multiple species • Genome: set of all genes possessed by an organism

  6. Genome

  7. Motivation • Gene and genome organization • Study protein structure and functions • Study metabolic pathways • Study ecology and environment • Find potential pathogen

  8. Challenges

  9. Challenges

  10. Challenges • Knowledge acquisition and knowledge management • Methods for Information and Knowledge Processing • Information retrieval • Statistical data analysis • High-performance and large-scale computing • Applications of new devices and emerging hardware technologies • Visualization of data and knowledge • Legal issues, policy issues, history, ethics

  11. Next-Generation-Sequencing Pipeline

  12. Sequencing Infected Tissue ATGCGACTC ACCATGGCG ACTAGGGCA ATTATGTAG ATGGGTGAA TTCATGCGG ACTTCGCGT ATGATCCGA Reads from Infected Sample Illumina Sequencer Library Preparation Reads from Healthy Sample Healthy Tissue

  13. Mapping ATGGGTGAA TTCATGCGG ACTTCGCGT ATGATCCGA ATGCGACTC ACCATGGCG ACTAGGGCA ATTATGTAG Reads from Infected Sample Reads from Healthy Sample NC_000018 NC_000018 ATGATGATGATGATGCGACTCTACCGGCGTA ATGATGATGATGATACTTCGCGTTCTCGCGTA ATGATGATGATGATGCGACTCTACCGGCGTA 0000000 2 2 1 5 0 0000000000 3 … ATGCGACTC 0000000 10 20 12 45 10 0000000000 10 … ATGCGACTC ATGCGACTC 0 00 000 0000000000001

  14. Comparing coverages in 2 samples Healthy Tissue Coverage Value Infected Tissue

  15. Assembly ATGCGAACCATG ACTAGATTATGTTTCGCGA ACTCCCTATCGA GATTATGTTTCGCGA ATGTTTCGCGAGGTGT … ATGGGTTTA TTCATGTCG ACTTGTCAG ATGATCTAA … ATGCGACTC ACCATGGCG ACTAGGGCA ATTATGTAG … ATGGGTATTCATG TCTTTGTATGATCTA ATGGGTAATG GTGTGTATGATCTA … ATGAAA TGAAAA TGAAAA GAAATA ATGCGA TGCGAG TGCGAT TGCGAG

  16. Blast ATGCGAACCATG| papilloma virus ACTAGATTATGTTTCGCGA| Ecoli ACTCCCTATCGA| human mitochondria GATTATGTTTCGCGA| human chr 12 ATGTTTCGCGAGGTGT| polio virus … ATGCGAACCATG ACTAGATTATGTTTCGCGA ACTCCCTATCGA GATTATGTTTCGCGA ATGTTTCGCGAGGTGT … ATGGGTATTCATG| small pox virus TCTTTGTATGATCTA| human chr 21 ATGGGTAATG| growth factor gene GTGTGTATGATCTA| human mitochondria … ATGGGTATTCATG TCTTTGTATGATCTA ATGGGTAATG GTGTGTATGATCTA …

  17. Sequencing ATGCGA ACCATG ACTAG ATTATGTA ATGGGTA TTCATG ACTTGT ATGATCTA Sequencing reads ATGCGA ACCATG ACTAG ATTATGTA Assembly ATGGGTA TTCATG ACTTGT ATGATCTA Mapping ATGCGA ACCATG ACTAG ATTATGTA ATGGGTA TTCATG ACTTGT ATGATCTA ATGGGTATTCATG TCTTTGTATGATCTA ATGGGTAATG GTGTGTATGATCTA ATGCGAACCATG ACTAGATTATGTTTCGCGA GATTATGTTTCGCGA ATGTTTCGCGAGGTGT NC_989231 ATGTAATCTAGTAGATGAGATGATAG ACTAG ACTTGT Assembled Contigs Coverage Values Blast Coverage Analysis ATGCGAACCATG ACTAGATTATGTTTCGCGA GATTATGTTTCGCGA ATGTTTCGCGAGGTGT TAGATC TGAGAT TAGATC ATGTAA TGAGAT TAGATC ATGTAA TGAGAT TAGATC NC_989231 ATGTAATCTAGTAGATGAGATGATAGATCGCAT ACTAG TGAGAT TCGCAT ACTAG TGAGAT TCGCAT ACTAG TCGCAT ATGGGTATTCATG TCTTTGTATGATCTA ATGGGTAATG GTGTGTATGATCTA Matched genes and Organisms Differential Coverage

  18. References • Gibas, C. and Jambec, P., Developing Bioinformatics Computer Skills,  April 2001, O'Reilly & Associates, Inc. Web. 13 February 2012. • Kahn, Scott D., On the Future of Genomic Data Science 331, 728 (2011); DOI: 10.1126/science.1197891 • Wetterstrand KA., DNA Sequencing Costs: Data from the NHGRI Large-Scale Genome Sequencing Program, Available at: www.genome.gov/sequencingcosts. 13 February 2012.

  19. Thank you!

More Related