1 / 9

Life Science 20001396 IIS Lab 이청재

Peptide Sequence Pairwise Alignment using Hidden Markov Model. - Requirement Analysis. Life Science 20001396 IIS Lab 이청재. Pairwise Alignment. Seq X : HEAGAWGHEEE. How much similar are two sequences?. Seq Y : PAWHEAE. Identity Scoring. Chemical Similarity Scoring. Observed Substitution.

eyal
Télécharger la présentation

Life Science 20001396 IIS Lab 이청재

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peptide Sequence Pairwise Alignment using Hidden Markov Model - Requirement Analysis Life Science 20001396 IIS Lab 이청재

  2. Pairwise Alignment Seq X : HEAGAWGHEEE How much similar are two sequences? Seq Y : PAWHEAE Identity Scoring Chemical Similarity Scoring Observed Substitution Peptide Sequence Alignment How to Know? Pairwise Alignment • The peptide sequences are comprised of 20 amino acids, which have diverged from a common ancestor by a process of mutation and selection. • Substitution : changing residues (symbols) in a sequence  Scoring Schemes • Insertion, Deletion : adding or removing residues  Gap penalty Key Issues Scoring Schemes or Weight Matrices Techniques of Alignment • Substitution matrix • Global alignment VS Local alignment • Gap Penalty : Opening VS Extended • What kinds of algorithms : Dynamic Programming, HMM, …

  3. Scoring Scheme BLOSUM100 Substitution Matrix • We need score terms for each aligned residue pair. : Score s(a,b) elements are can be arranged in a 20 X 20 matrix. • How to make this matrix : Deriving scores for pairwise alignment algorithms from probabilities. • How to estimate the probabilities BLOSUM : They were derived from a set of aligned, ungapped regions from protein families called the BLOCKS database.  from known dataset symmetric S(xi,yj) : Score of substitution between xi and yj. xi is the i-th amino acid in a sequence x. yj is the j-th amino acid in a sequence y. eg) HEAGAWGHE--E ----P—AW--HEAE S(A,P) = -2

  4. Techniques of Alignment : Linear Score g : a gap of length d : a gap-open penalty : Affine Score Generally, d > e e : a gap-extension penalty F(i-1, j-1) + s(xi,yj) F(i-1, j-1) F(i, j-1) F(i-1, j) - d F(i,j) = max F(i, j-1) - d -d s(xi,yj) F(i-1, j) F(i, j) -d Techniques of Alignment • Global alignment VS Local alignment Global Alignment Local Alignment Seq X : HEAGAWGHEEE HEAGAWGHE--E ----P—AW--HEAE AWGHE AW--HE Seq Y : PAWHEAE • Gap Penalty : Opening VS Extended • Alignment Algorithms : Needleman-Wunsch, Smith-Waterman, Hidden Markov Model, … Global Alignment : Needleman-Wunsch Algorithm  Dynamic Programming • Initialize • F(0,0) = 0 • At i=0, F (0,j) = -dj • At j=0, F (i,0) = -di Traceback : store the pointer of a previous state

  5. Dynamic Programming Seq A : HEAGAWGHEEE H E A G A W G H E E E 0 -8 -16 -32 P -8 -2 Seq B : PAWHEAE A -16 F(1,1) = F(0,0) + S(P,H) W -32 H F(i - 1,j - 1) + S(P,H) = 0 - 2 F(i, j) = MAX F(i - 1,j) - d = -8 - 8 E F(i,j - 1) - d = -8 - 8 A E Pointer for traceback

  6. Dynamic Programming with More Complex Model Mi-1,j - d Iyi,j = max Iyi,j-1 - e -e Mi-1,j-1 – S(xi,yj) Gap-extension penalty Mi,j = max Ixi-1,j-1 - S(xi,yj) Ix (+1,+0) Iyi-1,j-1 - S(xi,yj) S(xi,yj) Gap-open penalty -d M (+1,+1) S(xi,yj) -d S(xi,yj) Iy (+0,+1) Mi,j-1 - d -e Ixi,j = max Ixi-1,j - e FSA Alignment with Affine Gap Penalty Symbol was inserted into a sequence x: there is a gap in the position of a sequence y

  7. Hidden Markov Model • Decoding • Observed sequence  Decode the sequence of the underlying states Viterbi Algorithm (similar to dynamic programming) The Most Probable State Path π* : the sequence of states from symbol sequence with unknown states. Vk(i) : the probability of the most probable path ending in state k with observation i. Markov Chain “A probabilistic model of sequences of events where the probability of an event occurring depends upon the fact that a preceding event occurred” Hidden Markov Model • “The real sequence of states is invisible” •  We can see only visible symbols, that is sequences, or measurements • More complicated than simple Markov chain • (Hidden) states • Transition probability • (Visible) symbols • Emission probability : The probability that certain symbol is emitted from a state.

  8. Pairwise Alignment using HMM ε Pair HMMs δ Ix Qxi δ γ 1 - 2δ - γ 1 – ε - γ M Pxiyj γ Begin 1 - 2δ - γ End γ 1 – ε - γ δ Iy Qyj δ ε γ Probabilistic Model Further Study

  9. References 1. Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison. Biological Sequence Analysis : Probabilistic models of proteins and nucleic acids (2000). Cambridge University Press. • Available Lecture Note : http://bi.snu.ac.kr/Courses/bio02/bio02_2.html 2. Website For Bioinformatics : http://bioinfo.sarang.net/wiki/BioinfoWiki 3. Programs for biosequence analysis : http://www.dina.dk/~sestoft/bsa.html 4. Dynamic Programming : http://no-smok.net/nsmk/DynamicProgramming

More Related