180 likes | 411 Vues
Tutorial 4. Comparing Protein Sequences. Today’s menu: PAM and BLOSUM score matrices Psi-BLAST Phi-BLAST. PAM matrices are based on global alignments of closely related proteins. The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence.
E N D
Tutorial 4 Comparing Protein Sequences Today’s menu: • PAM and BLOSUM score matrices • Psi-BLAST • Phi-BLAST
PAM matrices are based on global alignments of closely related proteins. The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence. Other PAM matrices are extrapolated from PAM1. BLOSUM matrices are based on local alignments. BLOSUM 62 is a matrix calculated from comparisons of sequences with at most 62% identity in the blocks. All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins. PAM & BLOSUM
Use Recommendations PAM100 ~ BLOSUM90 Closely Related PAM120 ~ BLOSUM80 PAM160 ~ BLOSUM60 PAM200 ~ BLOSUM52 PAM250 ~ BLOSUM45 Highly Divergent
Example • Query: >ADRM1_HUMAN (A glycosylated plasma membrane protein which promotes cell adhesion • Data Base: nr on Human genome. • Blast Program: BLASTP • Matrices: PAM30,BLOSUM45
What differences we observe?: • With BLOSUM45 we found related and divergent sequences. • With PAM30 we found only related sequences. BLOSUM45 PAM 30
With BLOSUM45 we can discover interesting relations between proteins PAM 30 Mucin-13:a glycosylated membrane protein that protects the cell by binding to pathogens BLOSUM45 . . .
Using different scoring matrices can produce slightly Different alignments: With PAM 30 With BLOSUM45
A same alignment can be solved in many ways, specially when using a matrix for highly divergent sequences (BLOSUM45):
PSI-BLAST Position Specific Iterative BLAST We will analyze the following Archeal uncharacterized protein: >gi|2501594|sp|Q57997|Y577_METJA PROTEIN MJ0577 MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKKRDIFSLLLGVAGLNKSVEEFENELKNKLTEEAKNKMENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIMGSHGKTNLKEILLGSVTENVIKKSNKPVLVVKRKNS
Threshold for initial BLAST Search (default:10) Threshold for inclusion in PSI-BLAST iterations (default:0.005)
The query itself Orthologous sequences in two other archaeal species Other homologous sequences
Is MJ0577 a filament protein? . . . Is MJ0577 a cationic amino transporter? . . . Is MJ0577 a universal stress protein? . . .
PHI-BLAST Pattern Hit Initiated BLAST A-T-X-[AVG]R-S
Pattern symbols []= For grouping up aminoacids that can happen at a given position ()= For numbers, when a residue (or group of residues) is repited - = For separating between positions
Making a pattern …LIDEADKTT… …IMDEADEFL… …LLDEADKCL… …ILDEADRIL… …VVDEADNFI… …LVDEADKGI… …LMDEADEFL… …MLDEADRSI… …LIDEADKML… …MLDEADNWI… …LVDEADRFL… [LIVM](2)-D-E-A-D-[RKEN]-x-[LI]
Example >gi|71154193|sp|P0A9P6|DEAD_ECOLI Cold-shock DEAD box protein A (ATP-dependent RNA helicase deaD) MAEFETTFADLGLKAPILEALNDLGYEKPSPIQAECIPHLLNGRDVLGMAQTGSGKTAAFSLPLLQNLDP ELKAPQILVLAPTRELAVQVAEAMTDFSKHMRGVNVVALYGGQRYDVQLRALRQGPQIVVGTPGRLLDHL KRGTLDLSKLSGLVLDEADEMLRMGFIEDVETIMAQIPEGHQTALFSATMPEAIRRITRRFMKEPQEVRI QSSVTTRPDISQSYWTVWGMRKNEALVRFLEAEDFDAAIIFVRTKNATLEVAEALERNGYNSAALNGDMN QALREQTLERLKDGRLDILIATDVAARGLDVERISLVVNYDIPMDSESYVHRIGRTGRAGRAGRALLFVE NRERRLLRNIERTMKLTIPEVELPNAELLGKRRLEKFAAKVQQQLESSDLDQYRALLSKIQPTAEGEELD LETLAAALLKMAQGERTLIVPPDAPMRPKREFRDRDDRGPRDRNDRGPRGDREDRPRRERRDVGDMQLYR IEVGRDDGVEVRHIVGAIANEGDISSRYIGNIKLFASHSTIELPKGMPGEVLQHFTRTRILNKPMNMQLL GDAQPHTGGERRGGGRGFGGERREGGRNFSGERREGGRGDGRRFSGERREGRAPRRDDSTGRRRFGGDA The DEAD box pattern: [LIVM](2)-D-E-A-D-[RKEN]-x-[LI]