180 likes | 287 Vues
This paper proposes a Bayesian framework for determining the time of divergence between two aligned sequences. By modeling DNA mutations as a Markov process, it accounts for the varying probabilities of divergence points. Utilizing PAM matrices (similar to Dayhoff's) with uniform mutation rates, the study develops a scoring scheme for sequence alignment considering the effect of transitions and transversions. The methodology includes calculating probabilities of mismatches and employing Bayes' theorem, enabling better assessments of evolutionary distances and alignment quality.
E N D
Bayesian Evolutionary Distance P. Agarwal and D.J. States. Bayesian evolutionary distance. Journal of Computational Biology 3(1):1—17, 1996
Determining time of divergence • Goal: Determine when two aligned sequences X and Y diverged from a common ancestor AGTTGAC ACTTGCC • Model: • Mutation only • Independence • Markov process
Divergence points have different probabilities X Ancestor Y Probability time
DNA PAM matrices • Similar to Dayhoff PAM matrices • PAM 1 corresponds to 1% mutation • 1% change ≈ 10 million years • Simplification: uniform mutation rates among nucleotides: • mij = if i = j • mij = if i j • Can modify to handle different transition/transversion rates • Transitions (AG or CT) have higher probability than transversions • PAM x = (PAM 1)x
DNA PAM 1 A G T C A G T A
DNA PAM x A G T C A G T A
DNA PAM x • As x , (x) and (x) 1/4 • Assume pi = ¼ for i ={A,C,T,G} • Leads to simple match/mismatch scoring scheme
( ) ( x ) log 4 ( x ) a = s ( ) ( x ) log 4 ( x ) b = r DNA PAM n: Scoring Log-odds score of alignment of length n with k mismatches: Odds score of same alignment:
Probability of k mismatches at distance x Note: Need odds score here, not log-odds!
Conditional expectation From odds scores ?? Expected evolutionary distance given k mismatches Over all distances By Bayes’ Thm:
Assumptions • Consider only a finite number of values of x; e.g., 1, 10, 25,50, etc. • In theory, could consider any number of values • “Flat prior:” All values of x are equally likely • If M values are considered, Pr(x) = 1/M
Fraction of the probability of k mismatches that comes from assuming distance is x Calculating the distance
X Y Ungapped local alignments An ungapped local alignmentof sequences X and Y is a pair of equal-length substrings of X and Y Only matches and mismatches — no gaps
23 matches 2 mismatches 34 matches 11 mismatches Ungapped local alignments A: B: P. Agarwal and D.J. States. Bayesian evolutionary distance. Journal of Computational Biology 3(1):1—17, 1996
Which alignment is better? Answer depends on evolutionary distance