Advanced RNA Assembly Method with Extended Tag Overlaps
A comprehensive overview of RNA assembly using an innovative extending method to address old and new challenges, optimizing accuracy and efficiency. Explore techniques, advantages, and comparisons for enhanced assembly results.
Advanced RNA Assembly Method with Extended Tag Overlaps
E N D
Presentation Transcript
RNA Assembly Using extending method. Wei Xueliang 2010-04-07
Overview • Why abandon deBruijn. • Why abandon Extended deBruijn. • Introduction to current method. • Handle the old problem. • The new problem. • Todo
Why abandon deBruijn. • De Bruijn Graph’s (dis)advantage: • Very Fast. • Coverage distribution and K-Value affect a lot • Key : the coverage is not uniform distributed in the RNA assembly. • No best K value.
Why abandon deBruijn. • The length of the red part is 27.
Why abandon deBruijn. • Key : The coverage is not uniform distributed in the RNA assembly. • No best K value. • Can we using different K to run the program many times? • This is not De Novo Assembly’s job. • Time. • Provide high accurate contigs with-in limited time. • Scaffolding programs.
Why abandon Extended deBruijn. • My Extended de Bruijnmethod: • Using two or more K value at the same time.
Why abandon Extended deBruijn. • The change rate of coverage is above my expectation. Need many K. • The convert between different K are difficult. • Memory problem for big K. When K > 32, each K-index need > 50G (with Data-Sets: 10G) • Throw the K away.
Introduction to the new method • From Pramila’s genome assembly method. • Start from any Tag and do a correction. • If successfully corrected, continue.
Introduction to the new method • Find all the tag which have at least 24 bps overlaps. (Magic number) • Using these overlapping tags to extend Base and continue add more tags.
Introduction to the new method • How to find the overlapping tags fast and with mis-match? • Index and Union: {Tag3}, {Tag2, Tag3}, {Tag3, Tag4} Union =>{Tag1, Tag2, Tag3, Tag4}
Introduction to the new method • How to find the next overlapping tags fast and with mis-match? • V1 <= U3 • V2 <= (U1 << 1) + 0 • V3 <= (U2 << 1) + 0
Handle the old problem. • When the length of overlapping part < 24?
Handle the old problem. • Check the tags one by one by descending order of the length of overlap.
Handle the old problem. • Degree of approximation.
Handle the old problem. • Less tips. • Do not have bubbles. • Because we doing overlap with mis-match. • Use whole tags
The new problem. • Speed. • The tail of the tag often have more errors. • Reverse ExtendingProblem.
Todo • Handle Reverse ExtendingProblem. • Speed • Finish the comparision between deBruijn method(velvet) and my method. • Paired End Tag.