1 / 44

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops. www.bioinformatics.ca. Informatics on High Throughput Sequencing Data. Introduction to next-gen sequencing. Francis Ouellette francis@oicr.on.ca July 25 th 2008. Outline. Sequencing DNA Next Generation Technologies Solexa SOLiD 454 Helicos

december
Télécharger la présentation

Canadian Bioinformatics Workshops

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Canadian Bioinformatics Workshops www.bioinformatics.ca

  2. Informatics on High Throughput Sequencing Data Introduction to next-gen sequencing Francis Ouellette francis@oicr.on.ca July 25th 2008

  3. Outline • Sequencing DNA • Next Generation Technologies • Solexa • SOLiD • 454 • Helicos • AB’s color space • What next, & things to keep in mind!

  4. Adapted from John McPherson, OICR Biological Research

  5. 1870 Miescher: Discovers DNA Avery: Proposes DNA as ‘Genetic Material’ 1940 Watson & Crick: Double Helix Structure of DNA 1953 Holley: Sequences Yeast tRNAAla 1965 Wu: Sequences  Cohesive End DNA 1970 Sanger: Dideoxy Chain Termination Gilbert: Chemical Degradation 1977 1980 Messing: M13 Cloning Hood et al.: Partial Automation 1986 1990 • Cycle Sequencing • Improved Sequencing Enzymes • Improved Fluorescent Detection Schemes 2002 • Next Generation Sequencing • Improved enzymes and chemistry • Improved image processing Adapted from Eric Green, NIH; Adapted from Messing & Llaca, PNAS (1998) History of DNA Sequencing Efficiency (bp/person/year) 1 15 150 1,500 15,000 25,000 50,000 200,000 50,000,000 100,000,000,000 2008

  6. Basics of the “old” technology • Clone the DNA. • Generate a ladder of labeled (colored) molecules that are different by 1 nucleotide. • Separate mixture on some matrix. • Detect fluorochrome by laser. • Interpret peaks as string of DNA. • Strings are 500 to 1,000 letters long • 1 machine generates 57,000 nucleotides/run • Assemble all strings into a genome.

  7. Basics of the “new” technology • Get DNA. • Attach it to something. • Extend and amplify signal with some color scheme. • Detect fluorochrome by microscopy. • Interpret series of spots as short strings of DNA. • Strings are 30-300 letters long • Multiple images are interpreted as 0.4 to 1.2 GB/run (1,200,000,000 letters/day). • Map or align strings to one or many genome.

  8. From Debbie Nickerson, Department of Genome Sciences, University of Washington, http://tinyurl.com/6zbzh4

  9. Differences between the various platforms: • Nanotechnology used. • Resolution of the image analysis. • Chemistry and enzymology. • Signal to noise detection in the software • Software/images/file size/pipeline • Cost $$$

  10. Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk Next Generation DNA Sequencing Technologies 3 Gb ==

  11. Solexa

  12. Solexa-based Whole Genome Sequencing Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk

  13. Solexa flow cell ~50M clusters are sequenced per flow cell. Solexa-based Whole Genome Sequencing Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk

  14. Debbie Nickerson, Department of Genome Sciences, University of Washington, http://tinyurl.com/6zbzh4

  15. 454

  16. Roche / 454 : GS FLX • Real Time Sequencing by Synthesis • Chemiluminescence detection in pico titer plates • Amplification: emulsion PCR • Pyrosequencing • up to 400,000 reads / run • on average 250 bases / read (and longer) • up to 100 Mb / run

  17. Roche / 454 : GS FLX • Made for de novo sequencing. • Too expensive for resequencing. • For example, this platform will be used a lot by laboratories doing new bacterial genomes. • Baylor Genome Center involved in Sea Urchin, Bee, Platypus genomes: They have a number of 454.

  18. Helicos

  19. Adapted from: Barak Cohen, Washington University, Bio5488 http://tinyurl.com/6zttuq http://tinyurl.com/6k26nh Single Molecule Sequencing Microscope slide * * * Single DNA molecule Super-cooled TIRF microscope primer dNTP-Cy3 * Helicos Biosciences Corp.

  20. Helicos Approximate Data Production per Run at Current Peak Throughput (1 strand/µ2) Single Pass Dual Pass 7 day run 14 day run • Image Data: 35 TB 60 TB • Diagnostic Images: 350 GB 600 GB • Object Table: 3.5 TB 6 TB • Sequence Data: 350 GB 600 GB • Log Files: 350 GB 600 GB • Total ~4.5 TB ~7.8 TB(w/o full image stack)

  21. ABI SOLiD

  22. File management

  23. SOLiD color space

  24. It’s more complicated! • Get files with quality scores • Get files with miss-matches • Need to align them to a reference genome • Multiple tools do this today … and there will be more later. • What do you do? Do it all!

  25. Things to keep in mind • All people are learning, if you don’t know, ask, and they probably won’t know either, and you can figure it out together! • The technology is changing – This workshop next year will be totally different! • We can only do so much in two days – you will need to find things, find people who can help you, and you will need to teach your friends!

  26. Other factors • Changing technology • New and disappearing companies? • Changing price structure • Cost of machine • Cost of operation (reagents/people) • Service from the company • 1 machine vs (2 or 3 machines) vs 40 machines. • Changing software and processing

  27. Pacific Biosystems (PacBio)

  28. Questions? • Coffee break!

  29. Day 1

More Related