1 / 10

Chapter 6 - Profiles

Chapter 6 - Profiles. Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search with more sequences from the family together Consensus sequences (regular expressions) Regular expression Ex. A-[FR]-X(2,3)-M

otis
Télécharger la présentation

Chapter 6 - Profiles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 6 - Profiles Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search with more sequences from the family together Consensus sequences (regular expressions) Regular expression Ex. A-[FR]-X(2,3)-M GARCCMH LCAFARLMLMA Weight matrices or position-specific scoring matrices Not considering gaps Profiles Profiles as Hidden Markov Models Chapter 6 - Profiles

  2. Search with a family of sequences • Align the sequences (multiple) • Make a profile from part of the alignment • Search in the database with the profile • As an option, revise the profile, and search again (iteratively) Chapter 6 - Profiles

  3. Multiple alignments and profiles What weight does amino acid a have in position r in the profile Chapter 6 - Profiles

  4. Example Clustal X (1.64b) multiple sequence alignment XENLA1 ALVSGPQD------NELDG--MQL XENLA2 AQVNGPQD------NELDG--MQF MOUSE1 PQVEQLEL------GGSP---GDL RAT1 PQVPQLEL------GGGPEA-GDL MOUSE2 PQVAQLEL------GGGPGA-GDL RAT2 PQVAQLEL------GGGPGA-GDL Removed CRILO PQVAQLEL------GGGPGA-DDL RABIT LQVGQAEL------GGGPGA-GGL BOVIN PQVGALEL------AGGPG----- SHEEP PQVGALEL------AGGPG----- Removed PIG PQAGAVEL------GGGLGG---L CANFA LQVRDVEL------AGAPGE-GGL HUMAN LQVGQVEL------GGGPGA-GSL CHICK P-LVSSPL------RGEAGV-LPF ORENI LLGFLPPKAGGAVVQGGEN---EV VERMO LLGFLPAKSGGAAAGG-ENEVAEF 12345678******567890*234 * means removed Cons A B C D E F G H I K L M N P Q R S T V W X Y Z Gap Le 1 P 1 0 -18 -17 -12 -14 -21 -13 -3 -10 1 -2 -15 26 -6 -12 -3 -2 -1 -32 0 -18 0 100 100 2 q -4 0 -18 -5 2 -10 -17 2 -3 3 0 1 -3 -7 11 3 -4 -3 -4 -17 0 -10 0 50 100 3 V 1 0 -5 -23 -17 -6 -15 -17 15 -15 9 7 -17 -16 -13 -17 -7 -3 18 -26 0 -14 0 100 100 4 G 0 0 -12 -8 -7 -14 0 -5 -13 -6 -14 -10 -2 -9 -5 -6 -1 -3 -8 -22 0 -11 0 100 100 5 Q 2 0 -15 1 1 -25 4 -3 -17 -1 -15 -11 1 -7 3 -2 3 -1 -12 -30 0 -20 0 100 100 6 P 1 0 -13 -17 -11 -14 -21 -13 0 -10 0 -1 -13 18 -7 -13 -1 0 3 -32 0 -17 0 100 100 7 E 0 0 -29 12 19 -36 -10 0 -25 7 -24 -19 3 20 13 2 2 0 -17 -41 0 -26 0 100 100 8 L -8 0 -20 -15 -10 -1 -29 -10 7 -7 14 9 -13 -17 -6 -10 -12 -8 3 -20 0 -8 0 100 100 5 g 3 0 -16 5 2 -36 21 0 -28 3 -28 -21 10 -8 4 5 4 -2 -20 -32 0 -25 0 34 34 6 G 4 0 -21 6 0 -49 51 -10 -41 -6 -40 -32 4 -13 -4 -7 3 -9 -30 -40 0 -37 0 100 100 7 G 3 0 -16 -3 -4 -31 23 -11 -22 -8 -20 -16 -2 -12 -5 -9 0 -6 -16 -33 0 -27 0 100 100 8 P 3 0 -24 7 6 -32 -10 -5 -21 -1 -20 -17 0 27 2 -6 2 0 -14 -43 0 -25 0 100 100 9 g 3 0 -19 5 -2 -45 49 -8 -39 -6 -38 -30 9 -13 -5 -6 4 -7 -28 -37 0 -33 0 50 78 0 a 5 0 -3 -2 0 -12 0 -5 -3 -3 -6 -3 -2 -3 -1 -4 1 0 0 -19 0 -12 0 50 78 2 g -1 0 -11 -9 -9 -12 7 -9 -6 -9 -4 0 -6 -13 -7 -10 -4 -6 -6 -18 0 -14 0 50 78 3 q 0 0 -22 13 11 -33 4 0 -26 3 -25 -19 6 6 7 0 3 0 -19 -36 0 -23 0 50 78 4 L -12 0 -10 -37 -28 28 -42 -13 22 -22 29 21 -27 -24 -17 -23 -20 -12 15 1 0 10 0 100 100 * 17 0 0 10 17 3 52 0 0 1 36 2 4 22 21 2 5 0 16 0 0 0 0 Chapter 6 - Profiles

  5. What to take into account when creating a profile? 1. The observed amino acids in position r in the alignment. 2. The number of independent ‘observations’ that has been used for constructing the alignment of position r (for example number of different a.a. in the column) 3. The similarity of a to the amino acids observed in column r, to allow for not yet observed amino acids. Amino acid a is more likely to occur in unknown family members if there are many amino acids similar to a in the known sequences. Thus a ‘background’ scoring matrix should be used. 4. The background (a priori) distribution of the amino acids. 5. The diversity and similarity of the sequences, resulting in the importance (or weight) of each sequence. The known sequences are normally not uniformly distributed in the ‘family space’, and should have different weights in the calculation. 6. The number of gaps over column r and the neighbouring columns. These points are not independent. How these aspects are treated varies with the different methods for profile construction. Chapter 6 - Profiles

  6. Database search with a profile Chapter 6 - Profiles

  7. Notations Chapter 6 - Profiles

  8. Position weight No sequence weight considered now • All a.a. In the column count equally • A.a occurring many times are favored • A.a. Occurring many times are ’punished’ Chapter 6 - Profiles

  9. PSI-BLAST Chapter 6 - Profiles

  10. Hidden Markov Model Chapter 6 - Profiles

More Related