550 likes | 652 Vues
This module provides an essential overview of next-generation sequencing (NGS) technologies. It discusses the evolution of DNA sequencing, comparing traditional methods with NGS advantages such as increased efficiency and resolution. Participants will learn about different sequencing platforms, data analysis, and the significance of NGS in modern genomics, including its role in personalized medicine and cancer research. Get acquainted with the technical aspects of sequencing, from DNA extraction to image processing and data interpretation.
E N D
Canadian Bioinformatics Workshops www.bioinformatics.ca
Module 1Introduction to next-gen sequencing FRANCIS OUELLETTE Informatics on High Throughput Sequencing Data July 2009
Overview • “next-gen” or “next-next-gen”: why are we here? • What kinds of sequencing are we doing? • How does DNA sequencing works? • Trying to stay away from vender-specific challenges, but can we really? • Where next?
History of DNA Sequencing Adapted from Eric Green, NIH; Adapted from Messing & Llaca, PNAS (1998) 1870 Miescher: Discovers DNA Avery: Proposes DNA as ‘Genetic Material’ 1940 Efficiency (bp/person/year) Watson & Crick: Double Helix Structure of DNA 1953 Holley: Sequences Yeast tRNAAla 1 15 1965 Wu: Sequences Cohesive End DNA 150 1970 Sanger: Dideoxy Chain Termination Gilbert: Chemical Degradation 1,500 1977 Messing: M13 Cloning 15,000 1980 25,000 Hood et al.: Partial Automation 50,000 1986 • Cycle Sequencing • Improved Sequencing Enzymes • Improved Fluorescent Detection Schemes 200,000 1990 50,000,000 2002 • Next Generation Sequencing • Improved enzymes and chemistry • New image processing 100,000,000,000 2009
Why are we sequencing? • Before Next-generation: • Reductionist perspective on life • DNA, RNA, (proteins), (populations), sampling, averages, consensus • Problems: sampling, averages, consensus. • After Next-generation: • We are still reductionist, but better • Genome sequence and structure • Less cloning/PCR • Single molecules (for some)
Basics of the “old” technology • Clone the DNA. • Generate a ladder of labeled (colored) molecules that are different by 1 nucleotide. • Separate mixture on some matrix. • Detect fluorochrome by laser. • Interpret peaks as string of DNA. • Strings are 500 to 1,000 letters long • 1 machine generates 57,000 nucleotides/run • Assemble all strings into a “whole”.
Differences between the various platforms: • Nanotechnology used. • Resolution of the image analysis. • Chemistry and enzymology. • Signal to noise detection in the software • Software/images/file size/pipeline • Cost $$$
Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk Next Generation DNA Sequencing Technologies
From John McPherson, OICR Next-gen sequencers 100 Gb AB/SOLiDv3, Illumina/GAII short-read sequencers (10+Gb in 50-100 bp reads, >100M reads, 4-8 days) 10 Gb 454 GS FLX pyrosequencer 1 Gb (100-500 Mb in 100-400 bp reads, 0.5-1M reads, 5-10 hours) bases per machine run 100 Mb ABI capillary sequencer 10 Mb (0.04-0.08 Mb in 450-800 bp reads, 96 reads, 1-3 hours) 1 Mb 10 bp 100 bp 1,000 bp read length
From John McPherson, OICR 2009/10 Promises? AB SOLiDv3 120Gb, 100 bp reads 100 Gb Illumina GAII 90Gb, 175bp reads 10 Gb 1 Gb 454 GS FLX Titanium bases per machine run 0.4-0.6 Gb, 100-400 bp reads 100 Mb 10 Mb ABI capillary sequencer (0.04-0.08 Mb, 450-800 bp reads 1 Mb 10 bp 100 bp 1,000 bp read length
Solexa-based Whole Genome Sequencing Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk
From Debbie Nickerson, Department of Genome Sciences, University of Washington, http://tinyurl.com/6zbzh4
Sample AB data Lab >443_1087_001_F3 T12111121313231331100020021211112211 >443_1087_002_F3 T01121100201303232033213132212320123 >443_1087_003_F3 T21333200110101330330011101121132111 >443_1087_004_F3 T21322103331203331001002121021323111 >443_1088_005_F3 T32311301011311231133321301012223110 >443_1088_006_F3 T13211113031122103020002220012122101 >443_1088_007_F3 T21112301301221022023212000311310313 >443_1088_008_F3 T12133033210200001231010301011012031 >443_1088_009_F3 T23330012121212103111123012012320300 >443_1088_010_F3 T10213330331021322130123311011312110 • Get sequence assignment from instructor • Work with people at your table. • Use info from lecture notes (Panel E) • BLAST sequence at NCBI • What is it?
Roche / 454 : GS FLX • Also known as “pyrosequencing” • http://www.454.com/products-solutions/system-features.asp • 500 million bp/run • 10 hr run • 400-500 bp/read & > 1 M reads
Roche / 454 : GS FLX • Made for de novo sequencing. • Too expensive for resequencing. • For example, this platform will be used a lot by laboratories doing new bacterial genomes. • Baylor Genome Center involved in Sea Urchin, Bee, Platypus genomes: They have a number of 454.
It’s more complicated! • Get files with quality scores • Get files with miss-matches • Need to align them to a reference genome • Multiple tools do this today … and there will be more later. • What do you do? Do it all!
Things to keep in mind • All people are learning, if you don’t know, ask, and they probably won’t know either, and you can figure it out together! • The technology is changing – This workshop next year will be totally different! • We can only do so much in two days – you will need to find things, find people who can help you, and you will need to teach your friends!
Other factors • Changing technology • New and disappearing companies? • Changing price structure • Cost of machine • Cost of operation (reagents/people) • Service from the company • 1 machine vs (2 or 3 machines) vs 40 machines. • Changing software and processing