Pairwise Sequence Analysis-V

Pairwise Sequence Analysis-V • Relatedness • “Not just important, but everything” • Modeling Alignment Scores • Coin Tosses • Unit Distributions • Extreme Value Distribution • Lambda and K revealed • Loose Ends Lecture 6 CS566

Modeling Expectation • Reduced model: Coin tosses • Given: • N coin tosses • Probability of heads p • Problem: • What is the average number of longest run of heads? • Solution: • Experimental: Perform several repetitions and count • Theoretical: E(Runmax) = log1/pN • For example, for fair coin and 64 tosses, E(Runmax) = 6 Lecture 6 CS566

Random alignment as Coin tosses • Head = Match • Assume • Score = Run of matches • Maximum score = Longest run of matches • Therefore • Same model of expectation • For example: For DNA sequences of length N, • E(matchlengthmax) = Expected longest run of matches = log1/pN Lecture 6 CS566

Local alignment as Coin tosses • Assume • Score in local alignment = Run of matches • Maximum score = Longest run of matches • Therefore • Similar model of expectation • For DNA sequences of length n & m • E(Matchlengthmax) ~ log1/p(nm) (Why not just n or m?) ~ log1/p(K’nm) • Var(Matchlengthmax) = C (i.e., Independent of sample space) Lecture 6 CS566

Refining Model • S = AS matrix based scoring between unrelated sequences • E(S) ~ log1/p(K’nm) ~ [ln(Knm)]/ (where  = loge 1/p) • Holy Grail: Need P(S > x), probability of a score between unrelated sequences exceeding x Lecture 6 CS566

Poisson distribution estimate of P(S > x) • Consider Coin Toss Example Given [x >> E(Runmax)] Define Success = (Runmax  x) Define Pn = Probability of n successes Define y = E[Success],i.e., Average no. of successes Then, probability of n successes follows Poisson dist. Pn = (e-yyn)/n! Probability of 0 successes (No score exceeding x) is given by P0 = e-y. Then, probability of at least one score exceeding x, P(S > x) = i 0 Pi = (1 - P0) = 1 - e-y For Poisson distribution, y = Kmne-x. Therefore, P(S > x) = 1 – exp (-Kmne-x) Lecture 6 CS566

Unit Distributions • Normalize Gaussian and EVD • Area under curve = 1 • Curve maximum at 0 • Then • For Gaussian • Mean = 0; SD = 1 • P(S > x) = 1 – exp (-e-x) • For EVD • Mean = 0.577 (Euler cons); Variance = 2/6 = 1.645 • P(S > x) = 1 – exp (-e-(x-u)) • Z-score representation in terms of SDs • P (Z > z) = 1 – exp(-e-1.2825z – 0.5772) Lecture 6 CS566

Lambda and K •  = Scale factor for scoring system • Effectively converts AS matrix values to actual natural log likelihoods • K = Scale factor that reduces search space to compensate for non-independence of local alignments • Esimated by fitting to Poisson approximation or equation for E(S) Lecture 6 CS566

Treasure Trove of Probabilities • Probability distribution of scores between unrelated sequences P(Sunrel) • Probability distribution of number of scores from P(Sunrel) exceeding some cut-off, mean represents number of scores exceeding cut-off observed on average • Probability of observing score x occurring between unrelated sequences P(S  x) Lecture 6 CS566

Loose Ends • What about gap parameters? • Short answer: No formal theory • Long answer: • Found empirically • Choice of parameters can be used to convert local alignment algorithm into a global alignment • What about gapped alignment? • Not formally proven, but simulations show statistical behavior similar to ungapped alignment • Effective sequence length n’ = n – E(matchLengthmax) Lecture 6 CS566

Pairwise Sequence Analysis-V

Pairwise Sequence Analysis-V

Presentation Transcript

Pairwise sequence alignments

Pairwise Sequence Alignment

Pairwise Sequence Alignment

Pairwise sequence Alignment

Pairwise Sequence Alignment

Pairwise Sequence Alignment

Pairwise sequence Alignment

Pairwise sequence alignment

Pairwise Sequence Alignment

Pairwise Sequence Alignments

Pairwise Sequence Comparison

Pairwise Sequence Analysis-I

Pairwise Sequence Alignment

Pairwise sequence alignment

Pairwise Sequence Alignment

Pairwise sequence comparison

Pairwise sequence alignments

Pairwise sequence Alignment

Pairwise sequence alignments

Pairwise Sequence Alignment

Pairwise Sequence Alignment

Pairwise sequence alignment