1 / 13

1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

C. E. N. T. E. R. F. O. R. I. N. T. E. G. R. A. T. I. V. E. B. I. O. I. N. F. O. R. M. A. T. I. C. S. V. U. 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment Centre for Integrative Bioinformatics VU (IBIVU)

natala
Télécharger la présentation

1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. C E N T E R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U 1-month Practical Course Genome Analysis 2008Lecture 3: Profiles: representing sequence alignment Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands ibivu.nl heringa@cs.vu.nl

  2. Alignment input parametersScoring alignments Anumber of different schemes have been developed to compile residue exchange matrices 2020 However, there are no formal concepts to calculate corresponding gap penalties Emperically determined values are recommended for PAM250, BLOSUM62, etc. Amino Acid Exchange Matrix 10 1 Gap penalties (open, extension)

  3. A C B D C D A B E But how can we align blocksof sequences ? • The dynamic programming algorithm performs well for pairwise alignment (two axes). • So we should try to treat the blocks as a “single” sequence … ?

  4. How to represent a block of sequences • Historically: consensus sequencesingle sequence that best represents the amino acids observed at each alignment position. • Modern methods: alignment profile representation that retains the information about frequencies of amino acids observed at each alignment position.

  5. Consensus sequence • Problem: loss of information • For larger blocks of sequences it “punishes” more distant members

  6. Alignment profiles • Advantage: full representation of the sequence alignment (more information retained) • Not only used in alignment methods, but also in sequence-database searching (to detect distant homologues) • Also called PSSM in BLAST (Position-specific scoring matrix)

  7. Multiple alignment profiles Core region Gapped region Core region frequencies i A C D    W Y fA.. fC.. fD..    fW.. fY.. fA.. fC.. fD..    fW.. fY.. fA.. fC.. fD..    fW.. fY.. - Gapo, gapx Gapo, gapx Gapo, gapx Position-dependent gap penalties

  8. Profile building • Example: each aa is represented as a frequency and gap penalties as weights. i A C D    W Y 0.5 0 0    0 0.5 0.3 0.1 0    0.3 0.3 0 0.5 0.2    0.1 0.2 Gap penalties 1.0 0.5 1.0 Position dependent gap penalties

  9. Profile-sequence alignment sequence ACD……VWY

  10. Sequence to profile alignment A A V V L 0.4 A 0.2 L 0.4 V Score of amino acid L in a sequence that is aligned against this profile position: Score = 0.4 * s(L, A) + 0.2 * s(L, L) + 0.4 * s(L, V)

  11. Profile-profile alignment profile A C D . . Y profile ACD……VWY

  12. General function for profile-profile scoring Profile 1 Profile 2 A C D . . Y A C D . . Y • At each position (column) we have different residue frequencies for each amino acid (rows) • Instead of saying S=s(aa1, aa2) for pairwise alignment • For comparing two profile positions we take:

  13. Profile to profile alignment 0.75 G 0.25 S 0.4 A 0.2 L 0.4 V Match score of these two alignment columns using the a.a frequencies at the corresponding profile positions: Score = 0.4*0.75*s(A,G) + 0.2*0.75*s(L,G) + 0.4*0.75*s(V,G) + + 0.4*0.25*s(A,S) + 0.2*0.25*s(L,S) + 0.4*0.25*s(V,S) s(x,y) is value in amino acid exchange matrix (e.g. PAM250, Blosum62) for amino acid pair (x,y)

More Related