A new Approach to Fragment Assembly in DNA Sequenceing

A new Approach to Fragment Assembly in DNA Sequenceing Fei wu April ,24,2006

Preface • Introduce the author • The background of the paper • The history of DNA Sequencing

Traditional DNA Sequencing DNA • Read 500 – 700 nucleotides at a time from the small fragments (Sanger method) • Shear DNA into millions of small fragments Shake

Fragment Assembly • Computational Challenge: assemble individual short fragments (reads) into a single genomic sequence (“super string”) • Until late 1990s the shotgun fragment assembly of human genome was viewed as intractable problem

Shortest Superstring Problem • Problem: Given a set of strings, find a shortest string that contains all of them • Input: Strings s1, s2,…., sn • Output: A string s that contains all strings s1, s2,…., sn as substrings, such that the length of s is minimized • Complexity: NP – complete • Note: this formulation does not take into account sequencing errors

Reducing SSP to eulerian path problem • Define overlap ( si, sj ) as the length of the longest prefix of sj that matches a suffix of si. aaaggcatcaaatctaaaggcatcaaa aaaggcatcaaatctaaaggcatcaaa aaaggcatcaaatctaaaggcatcaaa • Construct a graph with n vertices representing the n strings s1, s2,…., sn. • Insert edges of length overlap ( si, sj ) between vertices siand sj. • Find the shortest path which visits every vertex exactly once. This is the Traveling Salesman Problem (TSP), which is also NP – complete.

Bruijun graph • Properties If n = 1 then the condition for any two vertices forming an edge holds vacuously, and hence all the vertices are connected forming a total of m2 edges. Each vertex has exactly m incoming and m outgoing edges

Sequencing by Hybridization

l-mer (tulip) composition • Spectrum ( s, l ) - unordered multiset of all possible (n – l + 1) l-mers in a string s of length n • The order of individual elements in Spectrum ( s, l ) does not matter • For s = TATGGTGC all of the following are equivalent representations of Spectrum ( s, 3 ): {TAT, ATG, TGG, GGT, GTG, TGC} {ATG, GGT, GTG, TAT, TGC, TGG} {TGG, TGC, TAT, GTG, GGT, ATG}

CG GT TG CA AT GC Path visited every EDGE once GG SBH: Eulerian Path Approach S = { ATG, TGC, GTG, GGC, GCA, GCG, CGT } Vertices correspond to ( l – 1 ) – mers : { AT, TG, GC, GG, GT, CA, CG } Edges correspond to l – mers from S

S = { AT, TG, GC, GG, GT, CA, CG } corresponds to two different paths: CG CG GT GT TG TG GC AT GC CA GG GG ATGGCGTGCA ATGCGTGGCA

Error Correction Or Data Corruption • Euler algorithm sometimes introduces errors. • Introduces errors for reducing the complexity of the Bruijn graph. • Reeducation of Bruijn graph eliminate false edge. • For example: N.meningitieds sequencing project,orphan elimination corrects 234410 errors, and introces 1452 errors.

Observations of the EULER

Conclusions • Finishing is a bottleneck in large-scale DNA • EULER has excellent scaling potential . • The complexity of EULER is mainly defined by the number of tangles rather than the number of repeats/length of the gonomes.

RESULTS AND DISCUSSION • The general performance of SEA on the benchmark • Prediction ambiguity improves alignment quality • Alignment quality versus local structure prediction ambiguity

CONCLUSION

Any Questions?

A new Approach to Fragment Assembly in DNA Sequenceing

A new Approach to Fragment Assembly in DNA Sequenceing

Presentation Transcript

Graph Theory Aiding DNA Fragment Assembly

A New Approach to Banking

A New Algorithm for DNA Sequence and Assembly

DNA aSsEmBlY tEcHnIqUes

Fragment Assembly

Fragment Assembly

A Cyclopropane Fragmentation Approach to Heterocycle Assembly

A new approach to SMTI (in CP)

A new approach to resident involvement? Innovation Group presentation to City Assembly

Fragment Assembly

DNA Self-Assembly

DNA Sequence Analysis and Fragment Assembly System (FAS)

DNA Sequence Analysis and Fragment Assembly System (FAS)

A new Approach to  -Decay

Biological Motivation for Fragment Assembly

gfp -gene as cDNA in a host-DNA-fragment E. coli (new host)

DNA Fragment Assembly

Biological Motivation for Fragment Assembly

A Fast Hybrid Short Read Fragment Assembly Algorithm

CSCI2950-C Lecture 3 DNA Sequencing and Fragment Assembly

A Fragment …

CSCI2950-C Lecture 2 DNA Sequencing and Fragment Assembly