1 / 47

Profile HMMs for sequence families and Viterbi equations

Profile HMMs for sequence families and Viterbi equations. Linda Muselaars and Miranda Stobbe. Example alignment. HBA_HUMAN –HGSAQVKGHGKKVADALTNAVAHV- HBB_HUMAN VMGNPKVKAHGKKVLGAFSDGLAHL- MYG_PHYCA MKASEDLKKHGVTVLTALGAILKK-- GLB3_CHITP IKGTAPFETHANRIVGFFSKIIGEL-

jewell
Télécharger la présentation

Profile HMMs for sequence families and Viterbi equations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Profile HMMs for sequence families and Viterbi equations Linda Muselaars and Miranda Stobbe

  2. Example alignment HBA_HUMAN –HGSAQVKGHGKKVADALTNAVAHV- HBB_HUMAN VMGNPKVKAHGKKVLGAFSDGLAHL- MYG_PHYCA MKASEDLKKHGVTVLTALGAILKK-- GLB3_CHITP IKGTAPFETHANRIVGFFSKIIGEL- GLB5_PETMA LKKSADVRWHAERIINAVNDAVASM- LGB2_LUPLU PQNNPELQAHAGKVFKLVYEAAIQLQ GLB1_GLYDI ---DPGVAALGAKVLAQIGVAVSHL- Linda Muselaars and Miranda Stobbe

  3. Overview chapter 5 • Ungapped score matrices. • Adding insert and delete states to obtain profile HMMs. • Deriving profile HMMs from multiple alignments • Searching with profile HMMs. • Profile HMM variants for non-global alignments. • More on estimation of probabilities. • Optimal model construction. • Weighting training sequences. Linda Muselaars and Miranda Stobbe

  4. Overview chapter 5 • Ungapped score matrices. • Adding insert and delete states to obtain profile HMMs. • Deriving profile HMMs from multiple alignments • Searching with profile HMMs. • Profile HMM variants for non-global alignments. • More on estimation of probabilities. • Optimal model construction. • Weighting training sequences. Linda Muselaars and Miranda Stobbe

  5. Key-issues • Identifying the relationship of an individual sequence to a sequence family. • How to build a profile HMM. • Use profile HMMs to detect potential membership in a family. • Use profile HMMs to give an alignment of a sequence to the family. Linda Muselaars and Miranda Stobbe

  6. Key-issues (2) Lollypops for a valuable (up to the speakers to decide) contribution to this lecture. Linda Muselaars and Miranda Stobbe

  7. Needed theory • Emission probabilities. • Silent states. • Pair HMMs. • The Viterbi algorithm. • The Forward algorithm. Linda Muselaars and Miranda Stobbe

  8. Contents • Ungapped score matrices. • Adding insert and delete states to obtain profile HMMs. • Deriving profile HMMs from multiple alignments. • Non-probabilistic profiles • Basic profile HMM parameterisation • Searching with profile HMMs. • Profile HMM variants for non-global alignments. Linda Muselaars and Miranda Stobbe

  9. Example alignment HBA_HUMAN –HGSAQVKGHGKKVADALTNAVAHV- HBB_HUMAN VMGNPKVKAHGKKVLGAFSDGLAHL- MYG_PHYCA MKASEDLKKHGVTVLTALGAILKK-- GLB3_CHITP IKGTAPFETHANRIVGFFSKIIGEL- GLB5_PETMA LKKSADVRWHAERIINAVNDAVASM- LGB2_LUPLU PQNNPELQAHAGKVFKLVYEAAIQLQ GLB1_GLYDI ---DPGVAALGAKVLAQIGVAVSHL- ********************* Linda Muselaars and Miranda Stobbe

  10. Ungapped regions • Gaps tend to line up. • We can consider models for ungapped regions. • Specify indepependent probabilities ei(a). • But of course: log-odds ratio! • Position specific score matrix. Linda Muselaars and Miranda Stobbe

  11. Drawbacks • Multiple alignments do have gaps. • Need to be accounted for. • For example: BLOCKS database, with combined scores of ungapped regions. • We will develop a single probabilistic model for the whole extent of the alignment. Linda Muselaars and Miranda Stobbe

  12. Contents • Ungapped score matrices. • Adding insert and delete states to obtain profile HMMs. • Deriving profile HMMs from multiple alignments. • Non-probabilistic profiles • Basic profile HMM parameterisation • Searching with profile HMMs. • Profile HMM variants for non-global alignments. Linda Muselaars and Miranda Stobbe

  13. Short review • Emission probabilities: the probability that a certain symbol is seen when in certain state k. • Silent states: states that do not emit symbols in an HMM. Linda Muselaars and Miranda Stobbe

  14. Mj Building the model (1) • We need position sensitive gap scores. • HMM with repetitive structure of (match) states. • Transitions of probability 1. • Emmision probabilities: eMi(a). .... .... Begin End Linda Muselaars and Miranda Stobbe

  15. Ij Building the model (2) • Deal with insertions: set of new states Ii. • Ii have emission distribution eIi(a). • Set to the background distribution qa. Begin Mj End Linda Muselaars and Miranda Stobbe

  16. Dj Building the model (3) • Deal with deletions. • Possibly forward jumps. • For arbitrarily long gaps: silent states Dj . Begin Mj End Linda Muselaars and Miranda Stobbe

  17. Costs for additional states • States for insertions: the sum of the costs of the transitions and emissions (M→ I, number of I→ I, I→ M). • States for deletions: the sum of the costs of an M→ D transition and a number of D→ D transitions and an D→ M transition. Linda Muselaars and Miranda Stobbe

  18. Dj Ij Full model Begin Mj End Linda Muselaars and Miranda Stobbe

  19. Comparison with pair HMM X qxi X qxi M pxiyj Begin End Y qyj Y qyj Linda Muselaars and Miranda Stobbe

  20. Contents • Ungapped score matrices. • Adding insert and delete states to obtain profile HMMs. • Deriving profile HMMs from multiple alignments. • Non-probabilistic profiles • Basic profile HMM parameterisation • Searching with profile HMMs. • Profile HMM variants for non-global alignments. Linda Muselaars and Miranda Stobbe

  21. Non-probabilistic profiles • Profile HMM without underlying probabilistic model. • Set scores to averages of standard substitution scores. • Anomalies: • Conservation of columns is not taken into account. • Scores for gaps do not behave properly. Linda Muselaars and Miranda Stobbe

  22. Example HBA_HUMAN ...VGA--HAGEY... HBB_HUMAN ...V----NVDEV... MYG_PHYCA ...VEA--DVAGH... GLB3_CHITP ...VKG------D... GLB5_PETMA ...VYS--TYETS... LGB2_LUPLU ...FNA--NIPKH... GLB1_GLYDI ...IAGADNGAGV... *** ***** The score for residue a in column 1 would be set to: Linda Muselaars and Miranda Stobbe

  23. Basic profile HMM parameterisation • Objective: make the probability distribution peak around members of the family. • Available parameters: • Length of the model. • Transition and emission probabilities. Linda Muselaars and Miranda Stobbe

  24. Length of the model • Which multiple alignment columns do we assign to match states? • And which to insert states? • Heuristic rule: Columns that consist for more than 50% of gap characters should be modeled by insert states. Linda Muselaars and Miranda Stobbe

  25. # of transitions from state k to state l # of transitions from state k to any other state Probability parameters • Transition probability: • Emission probability: • In the limit this is an accurate and consistent estimation. • Pseudocount method: LaPlace’s rule. Linda Muselaars and Miranda Stobbe

  26. Example Linda Muselaars and Miranda Stobbe

  27. A 5/8 C 1/8 G 1/8 T 1/8 A 3/7 C 1/7 G 2/7 T 1/7 A 1/8 C 5/8 G 1/8 T 1/8 A 1/7 C 1/7 G 4/7 T 1/7 Example continued D1 D2 D3 D4 I0 I1 I3 I4 I2 End Begin A C G T A C G T A C G T A C G T aM1M2 = 4/7 aM1D2 = 2/7 aM1I1 = 1/7 M1 M2 M3 M4 Linda Muselaars and Miranda Stobbe

  28. Contents • Ungapped score matrices. • Adding insert and delete states to obtain profile HMMs. • Deriving profile HMMs from multiple alignments. • Non-probabilistic profiles • Basic profile HMM parameterisation • Searching with profile HMMs. • Profile HMM variants for non-global alignments. Linda Muselaars and Miranda Stobbe

  29. Searching with profile HMMs • Obtaining significant matches of a sequence to the profile HMM: • Viterbi algorithm: P(x, π*| M). • Forward algorithm: P(x | M). • Give an alignment of a sequence to the family. • Highest scoring, or Viterbi, alignment. Linda Muselaars and Miranda Stobbe

  30. Viterbi equations • Log-odds score of best path matching subsequence x1…i to the submodel up to state j, ending with xi being emitted by state Mj: • Log-odds score of the best path ending in xi being emitted by Ij: • The best path ending in state Dj: • Pair HMM: Linda Muselaars and Miranda Stobbe

  31. Viterbi equations Linda Muselaars and Miranda Stobbe

  32. Forward algorithm Linda Muselaars and Miranda Stobbe

  33. Initialisation and termination • Viterbi algorithm: • Initialisation: • Termination: • Forward algorithm: • Initialisation: • Termination: Linda Muselaars and Miranda Stobbe

  34. Alternative to log-odds scoring • Log Likelihood score (LL score) • Strongly length dependent. • Solutions: • Divide by sequence length • Z-score • Which method is preferred? Linda Muselaars and Miranda Stobbe

  35. Linda Muselaars and Miranda Stobbe

  36. Demo Linda Muselaars and Miranda Stobbe

  37. Part of the profile HMM Linda Muselaars and Miranda Stobbe

  38. Scoring Linda Muselaars and Miranda Stobbe

  39. Part of the multiple alignment Linda Muselaars and Miranda Stobbe

  40. Relative frequencies Linda Muselaars and Miranda Stobbe

  41. Contents • Ungapped score matrices. • Adding insert and delete states to obtain profile HMMs. • Deriving profile HMMs from multiple alignments. • Non-probabilistic profiles • Basic profile HMM parameterisation • Searching with profile HMMs. • Profile HMM variants for non-global alignments. Linda Muselaars and Miranda Stobbe

  42. Flanking model states • Used to model the flanking sequences to the actual profile match itself. • Extra probabilities needed: • Emission probability: qa. • ‘Looping’ transition probability: (1 - η). • Transition probability from left flanking state: depends on application. Linda Muselaars and Miranda Stobbe

  43. Begin End Q Q Model for local alignment Smith-Waterman style Dj Ij Begin Mj End Linda Muselaars and Miranda Stobbe

  44. Model for overlap matches Dj Q Q Ij Begin Mj End Linda Muselaars and Miranda Stobbe

  45. Q Begin End Model for repeat matches Dj Ij Begin Mj End Linda Muselaars and Miranda Stobbe

  46. Summary • Construction of a profile HMM for different kinds of alignments. • Use profile HMMs to detect potential membership in a family. • Use profile HMMs to give an alignment of a sequence to the family. Linda Muselaars and Miranda Stobbe

  47. Discussion subject BLAST versus profile HMM Linda Muselaars and Miranda Stobbe

More Related