1 / 28

Patterns, Profiles, and Multiple Alignment

Patterns, Profiles, and Multiple Alignment. OUTLINE. Profiles and Sequence Logos Profile Hidden Markov Models Aligning Profiles Multiple Sequence Alignments by Gradual Sequence Adition Other Ways of Obtaining Multiple Alignments Sequence Pattern Discovery. OUTLINE.

mitch
Télécharger la présentation

Patterns, Profiles, and Multiple Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Patterns, Profiles, and Multiple Alignment

  2. OUTLINE • Profiles and Sequence Logos • Profile Hidden Markov Models • Aligning Profiles • Multiple Sequence Alignments by Gradual Sequence Adition • Other Ways of Obtaining Multiple Alignments • Sequence Pattern Discovery

  3. OUTLINE • Profiles and Sequence Logos • Profile Hidden Markov Models • Aligning Profiles • Multiple Sequence Alignments by Gradual Sequence Adition • Other Ways of Obtaining Multiple Alignments • Sequence Pattern Discovery

  4. Profiles and Sequence Logos • Previously we worked on some techniques on aligning two sequences. • Very often, similar sequences (sharing common ancestral sequence) have similar properties at equivalent regions. • Align a sequence to a set of similar sequences. • We need a way to represent the general properties of the set of sequences.

  5. Profiles and Sequence Logos A new sequence can be aligned to this representation. Such representation is called a PROFILE.

  6. Profiles and Sequence Logos Position-specific scoring matrices are an extension of substitution scoring matrices. In previous techniques we applied a substitution scoring matrix, According to substitution matrices: alignment of residue a and b always receives the score sa,b,

  7. Profiles and Sequence Logos Position-specific scoring matrices are an extension of substitution scoring matrices. One common use of database search is to discover all known sequences that belong to the same sequence family as the query sequence, To find all family members: Discover initial set of family members (first database search), Include these popsition specific preferences in scoring scheme PSSM

  8. Profiles and Sequence Logos Position-specific scoring matrices are an extension of substitution scoring matrices. PSSM: Are specific for each family of sequences, is a matrix, Weights sequences according to observed diversity specific to the family of interest,

  9. Profiles and Sequence Logos Position-specific scoring matrices are an extension of substitution scoring matrices. PSSM:

  10. Profiles and Sequence Logos Position-specific scoring matrices are an extension of substitution scoring matrices. Generate a PSSM: A set of sequences is required, Suppose we have an alignment of Nseq sequences with Laln positions (alignment columns)  the PSSM for this alignment will also have Laln columns

  11. Profiles and Sequence Logos PSSM:

  12. Profiles and Sequence Logos Generating a PSSM: If we have a multiple alignment of Nsequences, nu,b is the number of residue of type b at column u, mu,a is the score associated with row a and column u

  13. Profiles and Sequence Logos Generating a PSSM (cont.): Give extra support to conserved residue, use logarithmic form of weighting, the value of ratio of the logs varies between 0 and 1, but in this case residues present in a smaller fraction are relatively under-weighted.

  14. Profiles and Sequence Logos Generating a PSSM (cont.): Using log-ods ratio: similar to construction of common substitution matrices, if sufficient data is available:

  15. Profiles and Sequence Logos Generating a PSSM (cont.): Using log-ods ratio: qu,a will cause problems,

  16. Profiles and Sequence Logos Generating a PSSM (cont.): Using log-ods ratio: qu,a will cause problems, SOLUTION: pseudocounts,

  17. Profiles and Sequence Logos Generating a PSSM (cont.): Using log-ods ratio: A better formula:

  18. Profiles and Sequence Logos Representing a profile as a logo: Entropy: is a measure of the uncertainty, usually refers to the Shannon entropy(information theory), quantifies theexpected value of the information contained in a message, usually in units such asbits.

  19. Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: is a measure of the uncertainty, usually refers to the Shannon entropy(information theory), quantifies theexpected value of the information contained in a message, usually in units such asbits.

  20. Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: suppose that there are n alternative events, each of the possible event is labeled xi, each event has probability P(xi), then,

  21. Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: Shannon Entropy is defined as:

  22. Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: Example: consider the amino acid that occurs at a particular position in a protein sequence, possible events are 20 amino acids, the uncertainity about which event will occur depends on P(xi).

  23. Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: Example (cont.): if one of the possible event has a probability of 1 (is certai to occur), all others have probability of zero, entropy will be zero in this case

  24. Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: Example (cont.): if all the events has equal probability, entropy will be maximum in this case

  25. Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: Information present in the pattern at position Iu:

  26. Profiles and Sequence Logos Representing a profile as a logo (cont.):

  27. Profiles and Sequence Logos

  28. References • M. Zvelebil, J. O. Baum, “Understanding Bioinformatics”, 2008, Garland Science • Andreas D. Baxevanis, B.F. Francis Ouellette, “Bioinformatics: A practical guide to the analysis of genes and proteins”, 2001, Wiley.

More Related