1 / 26

Weighting training sequences

Weighting training sequences. Why do we want to weight training sequences? Many different proposals Based on trees Based on the 3D position of the sequences Interested only in classifying family membership Maximizing entropy. Why do we want to weight training sequences?.

wardah
Télécharger la présentation

Weighting training sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Weighting training sequences • Why do we want to weight training sequences? • Many different proposals • Based on trees • Based on the 3D position of the sequences • Interested only in classifying family membership • Maximizing entropy Probability estimation and weights

  2. Why do we want to weight training sequences? • Parts of sequences can be closely related to each other and don’t deserve the same influence in the estimation process as a sequence which is highly diverted. • Phylogenetic trees • Sequences AGAA, CCTC, AGTC AGTC AGAA CCTC Probability estimation and weights

  3. Weighting schemes based on trees • Thompson, Higgins & Gibson (1994) (Represents electric currents as calculated by Kirchhoff’s laws) • Gerstein, Sonnhammer & Chothia (1994) • Root weights from Gaussian parameters (Altschul-Caroll-Lipman weights for a three-leaf tree 1989) Probability estimation and weights

  4. Thompson, Higgins & Gibson • Electric network of voltages, currents and resistances 1 2 3 Probability estimation and weights

  5. Thompson, Higgins & Gibson 1 2 3 Probability estimation and weights

  6. Gerstein, Sonnhammer & Chothia • Works up the tree, incrementing the weights • Initially: weights are set to the edge lengths (resistances in previous example) Probability estimation and weights

  7. Gerstein, Sonnhammer & Chothia 2 1 0 1 2 3 Probability estimation and weights

  8. Gerstein, Sonnhammer & Chothia • Small difference with Thompson, Higgins & Gibson? 1 2 Probability estimation and weights

  9. Root weights from Gaussian parameters • Continuous in stead of discrete members of an alphabet • Probability density in stead of a substitution matrix • Example: Gaussian Probability estimation and weights

  10. Root weights from Gaussian parameters Probability estimation and weights

  11. Root weights from Gaussian parameters • Altschul-Caroll-Lipman weights for a tree with three leaves Probability estimation and weights

  12. Root weights from Gaussian parameters 1 2 3 Probability estimation and weights

  13. Weighting schemes based on trees • Thompson, Higgins & Gibson (Electric current): 1:1:2 • Gerstein, Sonnhammer & Chothia: 7:7:8 • Altschul-Caroll-Lipman weights for a tree with three leaves: 1:1:2 Probability estimation and weights

  14. Weighting scheme using ‘sequence space’ • Voronoi weights = = Probability estimation and weights

  15. More weighting schemes • Maximum discrimination weights • Maximum entropy weights • Based on averaging • Based on maximum ‘uniformity’ (entropy) Probability estimation and weights

  16. Maximum discrimination weights • Does not try to maximize likelihood or posterior probability • It decides whether a sequence is a member of a family Probability estimation and weights

  17. Maximum discrimination weights • Discrimination D • Maximize D, emphasis is on distant or difficult members Probability estimation and weights

  18. Maximum discrimination weights • Differences with previous systems • Iterative method • Initial weights give rise to a model • New calculated posterior probabilities P(M|x) gives rise to new weights and hence a new model until convergence is reached • It optimizes performance for that what the model is designed for : classifying whether a sequence is a member of a family Probability estimation and weights

  19. More weighting schemes • Maximum discrimination weights • Maximum entropy weights • Based on averaging • Based on maximum ‘uniformity’ (entropy) Probability estimation and weights

  20. Maximum entropy weights • Entropy = A measure of the average uncertainty of an outcome (maximum when we are maximally uncertain about the outcome) • Averaging: Probability estimation and weights

  21. Maximum entropy weights • Sequences AGAA CCTC AGTC Probability estimation and weights

  22. Maximum entropy weights • ‘Uniformity’: Probability estimation and weights

  23. Maximum entropy weights • Sequences AGAA CCTC AGTC Probability estimation and weights

  24. Maximum entropy weights • Solving the equations leads to: Probability estimation and weights

  25. Summary of the entropy methods • Maximum entropy weights (avaraging) • Maximum entropy weights (‘uniformity’) Probability estimation and weights

  26. Conclusion • Many different methods • Which one to use depends on problem • Questions?? Probability estimation and weights

More Related