150 likes | 289 Vues
What can (many) sequences tell us?. Vriend’s first rule of sequence analysis. If it is conserved, it is important.
E N D
Vriend’s first rule of sequence analysis If it is conserved, it is important. Regulation is most important, and thus most conserved. Second most conserved is the location of function. Third is function, Fourth is structure. And sequence is least conserved in evolution. However, sequence conservation is easiest to determine, so that is what people do research into...
Vriend’s second rule of sequence analysis If it is very conserved, it is very important
Consequences: If something is conserved in each sub-family, it is involved in a sub-family specific function.
What is CMA? Functions never is just one residue QWERTYASDFGRGH QSLMTYLNDFHRPM QAGTTNMKDTRRKC QPRSTNRGDTRRVW Red = conserved Green = variable Blue = correlated
Part of the big alignment We see correlations between columns and between ‘things’.
Correlations Residues can correlate with residues, and when that happens we found a function, no matter the conservation or variability. Residues that have a function, correlate with that function.
Wilma Example correlation: Which cysteines form a pair in this protein family? Shown are aligned peptides from five different bacteria. ASDFGCHIKLMCNPQRSCTVW YSDYGCNIKLFCQPQRSCT-- ATDYPVQIKLMCNPQKSCSMW YTDFGCHVKLLVQPNRSVTVW -TDFGVHVKLMCNPQKSCSFW Wilma Kuipers Thesis
Wilma Summary correlation If its conserved its important; if its important it remains conserved. If residue positions show correlation with ‘something’ it is involved in that ‘something’. ‘Something’ can be any of a very large number of functions (optimal wavelength of an opsin; cellular localisation; binding an ion; binding over an interface; involved in the same internal motion; collaborating to bind the substrate; etcetera). Wilma Kuipers Thesis
Wilma Conserved or very conserved? Recalcitrant. VT1V1TVC11TRC1RT1C?VV ASDFGCHIKLMCNPQRSCTVW YSDYGCNIKLFCNPQRSCT-- ATDLPVQIKLMANPQKSCSVW LSDFGCHIKLMCNPQRSCTVW YTDFGCHVKLLVQPNRSVAFW -SDAGVHVKLMVQPNKSVSF- YTDFGCHVKLLVQPNRSVVFW -TDSGVHVKLMIQPDKSVSFW V = Variable / not important T = Conserved type 1 = Conserved ? = No idea R= Recalcitrant Left R is certainly recalcitrant. Left one is, or is not. What is the concept?
Entropy and variability So far we saw that conservation and correlation can help us find functionally important residues. Can variability patterns also tell us something?
Entropy Sequence entropy Ei at position i is calculated from the frequency pi of the twenty amino acid types (p) at position i: 20 Ei =Spi ln(pi) i=1
Variability Sequence variability Vi is the number of amino acid types observed at position i in more than 0.5% of all sequences.
Summary variability analysis Variability patterns hold information. Entropy and Variability are two (of the) ways to measure variability patterns. Entropy and Variability patterns can say something about the type of function, and thus add detail to correlation studies.
Conclusions: Data is difficult, but we need it (sic); life would be so nice if we could do without it. PDB files are the worst. Nomenclature is not homogeneous. Ontologies…. Much data has been carefully hidden in the literature, where it can only be found back with great difficulty. Residue numbering is difficult but very necessary. Variability-entropy analysis is powerful, but requires very 'good' alignments.