Genomics

Genomics

DNA arrays andchips • (semi) qRT-PCR • Northern blot + hybrid. • Transkriptional fusions • 2D electrophoresis • Mass spectrometry • Protein sequencing • Translational fusional • Immunodetection • Enzyme activities Regulation • Chromatography • Mass spectrometry • NMR Gene expression • Genome maping • Genome sequencing • Genome annotations Structural genomics Nucleus DNA (Genome) pre-mRNA Cytoplasm mRNA Functional genomics mRNA (Transcriptome) Proteins(Proteome) Metabolites (Metabolome)

History of genomes sequencing • 1977 bacteriophage øX174 (5386bp, 11 genes) • 1981 mitochondrial genome (16,568bp; 13 prots; 2 rRNAs; 22 tRNAs • 1986 chloroplast genome (120,000-200,000bp) • 1992 Saccharomyces chromosome III (315kb; 182 ORFs) • 1995 Haemophilus influenzae (1.8Mb • 1996 Saccharomyces whole genome (12.1Mb;over 600 people 100 laboratories) • 1997 E. coli (4.6Mb; 4200 proteins) • 1998 Caenorhabditiselegans (97 Mb; 19,000 genů) • 2000 Arabidopsis thaliana (115Mb, 25-30,000 genů) • 2001 mouse (1 year!) • 2001 Homo sapiens (2 projekty) • 2005 Pan, rice • 2006 Populus Technological improvements

DNA sequencing – principle(Sanger’s method) Polymeration from primer in the presence of low concentration of terminator (dideoxy) ddNTP primer Random termination on all positions with occurance of the nucleotide

A T C G • Original arrangement • sequence • - RI labelled primer • 4 separated reactions • - with individual ddNTP • - ddNTP:dNTP (cca 1:20 – (100)) • - PAGE separation C T G G A T C T A G C Separation by size

Automated sequencing with fluorescence-labelled ddNTP • Every ddNTP labelled with different fluorescent dye – all together in one reaction • Separation by size in capillary – fluorescence detection

Genom sequencing is more than sequencing of DNA • 1 sequencing reaction 300 – 800 bp • Typical genom hunderts of millions to billions bp How to manage?

Strategies of genome sequencing • Classical strategy(Map-Based Assembly): - minimal quantity of DNA sequencing – sorting of big DNA fragments, successive reading (human genome sequencing – original strategy) - scaffold for genome sequence assemble - time consuming • Whole genome shotgun (WGS) – random (7-9x redundant) sequencing – sorting of sequence data (Haemophilus) - problems with repetitive DNA • Combination – „hierarchical shotgun“, „chromosome shotgun“

Hierarchical shotgun sequencing Whole-genome shotgun sequencing Green (2001) Nature Reviews Genetics 2: 573-583 Production of over-lapping clones (e.g. BACs, YACs) and construction of physical map Shearing of DNA and sequencing of subclones Assembly

Hierarchical shotgun sequencing First step: library of big DNA inserts (= genome fragments) • phage (l) vectors: 30 kb • cosmids: 50 kb • BACs (bacterial artificial chromosomes): 100-300 kb • YACs (yeast artificial chromosomes): cca 0.5-1Mb

Physical„BAC“ map of genome • Arrangement (position, orientation) of individual BAC in the genome • Fundamental for classical sequencing • Very usefull for assembly of „shotgun“ sequences How to make the map from BACs with unknown sequence?

Map construction - BAC fingerprinting Sequencing of DNA ends Restriction sites • 10-20x more bp in BACs than in the genome for map construction (Arabidopsis – 20 000, rice - 70 000)

BAC fingerprinting ANIMATION of HIERARCHICAL SHOTGUN: http://www.weedtowonder.org/sequencing.html

Minimum tiling path= the lowest possible set of BACs covering the whole sequence physical map arrangement and mapping and clone selection - by restriction fragment analysis - using terminal sequences and hybridization - by hybridization with markers with known position in genetic map

random cleavage + direct sequencing (NGS) Shotgun sequencing BAC/chromosome/whole genome Cosmids (40 Kbp): sequencing of clone ends (known distance between) ~500 bp ~500 bp

Genome (chromosome, BAC...) assembly • Looking for overlaps in primary sequences • Assembly to contigs to get short consensus sequences • Assebly to supercontigs using the information of sequence pairs(ends + distance) 4. Complete consensus sequence ..ACGATTACAATAGGTT..

repetition Repetitive sequences and contig assembly ? ? Repetitions are serious problem in assembly, if they are conserved and longer than sequencing run

Use of markers for whole genome assembly(STS – sequence tagged sites = short sequences with known position on chromosoms) Supecontigs with scaffold (BAC-end sequences with known distance)

X Filling of gaps: shorter clones are better • optimal – libraries with different insert sizes (2, 10, a 50 kbp) • sequencing the linker clone = filling the gap

What to do with the genome sequence? To annotate! • Searching for genes: • Automatic prediction of coding seq. • Prediction of introns/exons • Prediction according to related seq. • Confirmation by cDNA and EST • Prediction of function – from experimentally characterized homologues

Fragment of GenBank BAC clone annotation

Graphical interface of BAC annotation

Large genomes alternative strategies of sequencing:- isolation of individual chromosomes e.g. wheat – allows assembly of homeologous chromosomes (allohexaploid)- shotgun sequencing of non-methylated DNA (maize)- sequencing of ESTs (potato)

Expressed Sequence Tags (ESTs) • short sequenced regions of cDNA(300-600 nt) • usually gene fragments (primarilly originate from mRNA) • highly redundant, but also incomplete! • problems: - no regulatory sequences (promotors, introns,...) • only transcripts of certain genes

Expressed Sequence Tags (ESTs) Preparation of EST library • - mRNA • - RT with oligoT primer  cDNA • cleavage of RNA from heteroduplex • RNAseH • - 2nd strand cDNA synthesis • - cleavage with restriction endonuclease • - adaptor ligation cloning sequencing

Assembly of EST contigs - Unigenes

Next generation sequencing • - faster and cheaper!!! • - parallel sequencing of high numbers of sequences! • no handling with individual sequences! • Examples of recently developed or developing technologies: • 454 sequencing – pyrosequencing (Roche) • - complementary strand synthesis • Illumina – sequencing by synthesis • - complementary strand synthesis • SOLiD - Sequencing by Oligonucleotide Ligation and Detection • - ligation of labelled oligonucleotides • Oxford nanopore technology • - exonuclease degradation, el. current changes detection

NGS – comparison of basic parameters http://en.wikipedia.org/wiki/DNA_sequencing

454 technology - pyrosequencing up to 1 mil reads (lenght 700 - 1000 bp) one day (23 hour procedure) = 500-800 Mbp

454 technology - pyrosequencing

454 technology

Illumina – sequencing by synthesis (Solexa)

Illumina – seqencing by synthesis (Solexa)

SOLiD™ System (Applied Biosystems) 2 Base Encoding Sequencing by Oligonucleotide Ligation and Detection • reads up to 75 b • 20-30 Gb for a day! • high accuracy up to 99,99 % • initial step – clonal multiplication (similar to 454) http://appliedbiosystems.cnpg.com/Video/flatFiles/699/index.aspx

SOLiD™ System Mix of 1024 octamers (number of variations NNN = 64) x 16 known dinucleotides Z = nucleotides universally pairing with any nucleotide (prolongation) – cleaved out after ligation labelling: 4 fluorescent dyes – each for 256 octamers (with just 4 known middle dinucleotides) -

5 independent reactions = each 10 – 15 times repeated ligations of labelled octamers starting from a primer with shifted end

Knowledge of the first nucleotide allows translation of color sequence to nucleotide sequence } alternative translation with different 1st nucleotide A A T G C A G G C A T G C C G T A C

Oxford nanopore technologies – direct sequencing of one DNA strand http://www.nanoporetech.com/sequences • protein nanopore in membrane • (alpha-hemolysin) • covalently bound exonuclease • monitoring specific decrease • in current (metC!)

Genomics

Genomics

Presentation Transcript

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

Genomics

GENOMICS