1 / 21

AutoEditor

AutoEditor. Automated base caller error correction tool Slides courtesy of Pawel Gajer, Ph.D. Base-calling in the context of single chromatogram is hard…. but finding base-calling “mistakes” in a multiple alignment is easy. AutoEditor. Principal and secondary aims of AutoEditor

mcenteno
Télécharger la présentation

AutoEditor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AutoEditor Automated base caller error correction tool Slides courtesy of Pawel Gajer, Ph.D.

  2. Base-calling in the context of single chromatogram is hard… but finding base-calling “mistakes” in a multiple alignment is easy. AutoEditor

  3. Principal and secondary aims of AutoEditor • AutoEditor as a higher level base caller • Tiling discrepancy types • Base caller error types • Resolving discrepancies of the form B…B* • Resolving discrepancies of the form *…*B • AutoEditor statistics

  4. A principal goal of AutoEditor is to automatically correct a majority of tiling discrepancies, reducing human editing effort to the most problematic discrepancy types. A tiling discrepancy is any deviation from the homogeneous coverage of a consensus base.

  5. autoEditor as a higher level base caller single read trace data nucleotide sequence base caller tiling of reads tiling discrepancies multiple read trace data autoEditor list of corrected discrepancies

  6. Other applications: • Clear range editing (read expansion) • SNP detection

  7. Clear range editing trimming algorithm single read quality values data trimmed read less stringently trimmed reads assembler autoEditor tiling of reads

  8. SNP detection Alignment data of genome 1 List of putative SNPs Combined genomes alignment data Alignment data of genome 2 autoEditor List of putative SNPs that pass autoEditor error screening

  9. Tiling discrepancy types Single deletion: Single insertion:

  10. Single insertion and single deletion are extreme cases of insertion/deletion discrepancies A A A A A A A * A A * * A * * * * * * * The above sequence of discrepancies can be represented schematically as an edge in a two vertex graph: A *

  11. The configuration space of all tiling discrepancy types can be schematically represented as a 4-dimensional simplex G * A C T

  12. minimum difference between amplitude and local minimum (c) Open dots on the signal curve indicate local maxima and open circles indicate local minima. amplitude (a) support support (b) support Re-calling individual bases

  13. Base caller error types • Missed signal • Signal shift • Unresolved peaks

  14. Resolving a single deletion discrepancy compute discrepancy’s read multiplicity: mult if mult = 0 then check for a missed signal error if |mult| > 0 then check for a signal shift error if it is not a signal shift error then it is a unresolved peaks error To resolve it, find two other reads with well resolved peaks over the unresolved peaks bases A discrepancy read multiplicity is the number of bases to the right or left (negative sign) of the discrepancy positions equal to the consensus base covering the discrepancy.

  15. Resolving a single insertion discrepancy compute discrepancy’s read multiplicity - mult ifmult = 0 then check if the signal parameters are within allowable ranges if | mult | > 0 then check if the insertion base is a part of |mult |+1 well-resolved signal peaks if not find two other reads whose traces have exactly |mult | well-resolved signal peaks between the bases flanking the discrepancy position

  16. mult = 0, weak signal error mult = -2, unresolved peaks error with two other reads with exactly 2 signal peaks between Gs flanking AA*

  17. Missed-signal (MS) and signal shift (SS) correction errors AutoEditor version 1.1 from Nov 12, 2002 Test set: the first 10 contigs of Mycoplasma arthritidis asmbl_id size(kb) # corrections # autoEdit # errors in errors newer autoEdit 1 132 124 3 0 2 64 78 4 1 3 40 55 3 0 4 53 45 2 1 5 16 15 0 0 6 22 29 1 0 7 23 19 0 0 8 51 48 1 0 9 26 33 1 0 10 15 15 0 0 ---------------------------------------------------------------------- Total: 442 461 15 2 ~3.25% ~0.43%

  18. AutoEditor version 1.2 correcting all single deletion errors Test set: the first 10 contigs of Mycoplasma arthritidis asmbl_id size(in kb) #disc #corr %corr 1 132 3390 3266 96% 2 64 2195 2142 98% 3 40 1344 1325 99% 4 53 1304 1242 95% 5 16 508 487 96% 6 22 777 757 97% 7 23 624 613 98% 8 51 1303 1232 95% 9 26 783 760 97% 10 15 437 423 97% -------------------------------------------------------------------- Total: 442 12665 12065 95% where #disc is the total number of discrepancies in the given contig #corr is the number of corrected discrepancies %corr is the percentage of corrected discrepancies

  19. AutoEditor accuracy

  20. AutoEditor accuracy

More Related