Aaron R. Quinlan (quinlaaa@bc) and Gabor T. Marth (marth@bc),

SVMs Learn a Function to Distinguish between Positive and Negative based on the statistics of the features in the training examples. A Novel Approach To Diploid Base Calling Aaron R. Quinlan (quinlaaa@bc.edu) and Gabor T. Marth (marth@bc.edu), Department of Biology, Boston College, Chestnut Hill, MA 02467 http://bioinformatics.bc.edu/marthlab/ We are integrating diploid base calling (heterozygote detection) into Our SNP detection method, detects SNPs across clonal reads based on base composition and quality. C/T Probability of each possible diploid base call (AA,CC,GG,TT,AC,AG,AT,CG,CT,GT) PCR-based sequences of diploid individuals Calls = (CC, CT, TT) 1 P(CC) = .045 1 P(TT|R) =.9991 P(TT) = .045 Observed Diploid Variations/Probabilities P(Others) = .01 Prior Probability of Each Diploid Genotype 2 P(CT|R) =.96 Base Call/Quality Polymorphism Rate SNP (A/G) Found Across Multiple Clonal Reads Probability of polymorphism 3 P(CC|R) =.9995 P(CT) = .9 Probability of Each Genotype From a diploid base call 4 Depth of Alignment Each Possible Diploid Base Call/Probability Base Composition Depth of Alignment P(CT|R) =.003 Objective: To enhance with an accurate diploid base calling algorithm Method forDiploid Base Calling (Support Vector Machine - based) P(CT|R) = .34 “Unseen” Alignments Convert SVM Score to P(Het) 1 P(CT|R) =.01 P(Het) SVM P(AC|R) = .999 0 - + 0 SVM Score + is Het - is Hom P(AT|R) = .001 SNP 2 … SNP N SNP 1 P(S1S2|R) = Probability of allelic combination given the read Collect Heterozygous and Homozygous Training Examples Calculate indicative features that separate heterozygotes from homozygotes. Trained SVM Can Separate Unseen Homozygotes and Heterozygotes Make Diploid Base Calls on Unseen Alignments. Utilizing multiple reads per individual, we can make an individual genotype call. Rationale: The accuracy of the consensus diploid base call for an individual increases with the number of reads available for that individual. P(GT | Read) = .98 Assessing the Accuracy of the InitialPrototype: ? Forward Read Reverse Read Individual Genotype Call: P(GT) =.993 Prior(GT Frequency) = .34 P(GT | Read) = .87 Assessing the Genotyping Accuracy of the Initial Prototype Summary: Number of Alignments Analyzed: 993 Total Number of Read Positions: 231874 Total Number of Heterozygotes: 31411 Total Number of Homozygotes: 143370 Note: Polyphred was tested on alignments created by PolyBayes. This allowed Polyphred to analyze a larger fraction of reads, as Compared to Phrap Alignments. • We built a diploid base calling prototype from the ground up. The initial prototype’s performance is similar to Polyphred 5. • We are currently compiling a larger example set to improve accuracy. • Our method incorporates information from multiple reads for a given individual in a statistically-rigorous fashion. • This prototype represents the first major expansion of . • 5.We are currently working to expand the prototype to a production-ready application Accuracy Polyphred 5 was tested with the following settings: quality = 21, score = 99, source, ref_comp Sensitivity Data Accuracy by P(Het) Score 21851

Aaron R. Quinlan (quinlaaa@bc) and Gabor T. Marth (marth@bc),

Aaron R. Quinlan (quinlaaa@bc) and Gabor T. Marth (marth@bc),

Presentation Transcript

The Late Period The Saite Renaissance

351 BC: Barovia Appears

WORLD KINGDOMS OF BIBLE TIMES

HIFIS

Pre- Modern 30,000 BC to 1850 AD; 31,850 years

An Overview of Old Testament History

1:1 – 2:22

Persepolis

Unrecorded discoveries

Greece

Pangea, et al

Construction of geometric figures & numbers

Molecular Marker Application to Incorporate Salinity Tolerance to West Africa Rice Varieties

Han Dynasty Rule: 206 BC-220 AD

Mesopotamia

Aaron R. Quinlan (quinlaaa@bc) and Gabor T. Marth (marth@bc),