1 / 57

Protein function and classification

Protein function and classification . www.ebi.ac.uk/interpro. Hsin -Yu Chang www.ebi.ac.uk. Greider and Balckburn discovered telomerase in 1984 and were awarded Nobel prize in 2009. Which model organism they used for this study ? . 3. Mouse. 2. S accharomyces cerevisiae.

beck
Télécharger la présentation

Protein function and classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein function and classification www.ebi.ac.uk/interpro Hsin-Yu Chang www.ebi.ac.uk

  2. Greider and Balckburn discovered telomerase in 1984 and were awarded Nobel prize in 2009. Which model organism they used for this study ? 3. Mouse 2. Saccharomyces cerevisiae 1. Tetrahymena 4. Human

  3. 1995 Clone hTR 1995/1997 Clone hTERT 1997 Telomerase knockout mouse 1989 Telomere hypothesis of cell senescence Szostak 1999/2000… Telomerase/telomere dysfunctions and cancer 1985 Discovery of telomerase Greider and Blackburn 1998 Ectopic expression of telomerase in normal fibroblasts and epithelial cells bypasses the Hayflick’s limit A single Tetrahymena cell has 40,000 telomeres, whereas a human cell only has 92. Gilson and Ségal-Bendirdjian, Biochimie, 2010.

  4. Therefore, protein classification could help scientists to gain information about protein functions.

  5. In the lab, what do we usually do to analyse protein sequences and find out their functions?

  6. Protein BLAST Publications - text books or papers UniProt PDB Specialized protein databases such as SGD, the human protein atlas, etc. What I used to do:

  7. BLAST it? • Drawbacks: • sometimes struggle with multi-domain proteins • less useful for weakly-similar sequences (e.g., divergent homologues) • Advantages: • Relatively fast • User friendly • Very good at recognising similarity between closely related sequences

  8. Using BLAST to find clues of protein functions-when it goes well

  9. Pairwise alignment of two proteins: CD4 from two closely-related species

  10. Using BLAST to find clues of protein functions-when it does not give you much information

  11. Using BLAST to find clues of protein functions-when it does not give you much information

  12. Because BLAST performs localpairwise alignment, it: • Cannot encode the information found in an multiple sequence alignment that show you conserved sites.

  13. 60S acidic ribosomal protein P0: multiple sequence alignment Using pairwise alignment could miss out on conserved residues

  14. An alternative approach: protein signature search • Model the pattern of conserved amino acids at specific positions within a multiple sequence alignment • Use these models to infer relationships with the characterised sequences (from which the alignment was constructed) • This is the approach taken by protein signature databases

  15. Three different protein signature approaches Patterns Single motif methods Profiles & HMMs hidden Markov models Full alignment methods Fingerprints Multiple motif methods

  16. PS00000 Patterns Patterns are usually directed against functional sequence features such as: active sites, binding sites, etc. Sequence alignment Motif ALVKLISG AIVHESAT CHVRDLSC CPVESTIS Pattern sequences [AC] – x -V- x(4) - {ED} Regular expression Pattern signature

  17. Patterns • Advantages: • Cananchor the match to the extremity of a sequence • <M-R-[DE]-x(2,4)-[ALT]-{AM} • Strict - a pattern with very little variability and forbidden residues can produce highly accurate matches • Drawbacks: • Simple but less flexible

  18. Motif 1 Motif 2 Motif 3 xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx Motif sequences Fingerprint signature PR00000 Fingerprints: a multiple motif approach Sequence alignment Define motifs Weight matrices

  19. The significance of motif context • Identify small conserved regions in proteins • Several motifs  characterise family • Offer improved diagnostic reliability over single motifs by virtue of the biological context provided by motif neighbours order 1 2 3 interval

  20. Fingerprints • Good at modeling the often small differences between closely related proteins • Distinguish individual subfamilies within protein families, allowing functional characterisation of sequences at a high level of specificity

  21. Profiles & HMMs Whole protein Sequence alignment Entire domain Define coverage xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx Use entire alignment of domain or protein family xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Build model Profile or HMM signature

  22. Profiles Start with a multiple sequence alignment Amino acids at each position in the alignment are scored according to the frequency with which they occur Scores are weighted according to evolutionary distance using a BLOSUM matrix • Good at identifying homologues

  23. HMMs Start with a multiple sequence alignment Amino acid frequency at each position in the alignment and their transition probabilities are encoded Insertions and deletions are also modelled • Can model very divergent regions of alignment • Very good at identifying evolutionarily distant homologues

  24. Three different protein signature approaches Patterns Single motif methods Profiles & HMMs hidden Markov models Full alignment methods Fingerprints Multiple motif methods

  25. www.ebi.ac.uk/interpro

  26. The aim of InterPro InterPro

  27. What is InterPro? • InterProis an integrated sequence analysis resource • It combines predictive models (known as signatures) from different databases to provide functional analysis of protein sequences by classifying them into families and predicting domains and important sites

  28. Facts about InterPro • First release in 1999 • 11 partner databases • Forms part of the automated system that adds annotation to UniProtKB/TrEMBL • Provides matches to over 80% of UniProtKB • Source of >60 million Gene Ontology (GO) mappings to >17 million distinct UniProtKBsequences • 50,000 unique visitors to the web site per month> 2 million sequences searched online per month. Plus offline searches with downloadable version of software

  29. HAMAP Profiles Protein features  (sites) Functional annotation of families/domains Structural domains Patterns Finger prints Hidden Markov Models

  30. InterPro signature integration process • Signatures are provided by member databases • They are scanned against the UniProt database to see which sequences they match • Curators manually inspect the matches before integrating the signatures into InterPro • Signatures representing the same entity are integrated together • Relationships between entries are traced, where possible • Curators add literature referenced abstracts, cross-refs to other databases, and GO terms

  31. http://www.ebi.ac.uk/interpro/

  32. Search using protein sequences

  33. Family

  34. Type

  35. InterPro entry types Proteins share a common evolutionary origin, as reflected in their related functions, sequences or structure Family Domain Distinct functional, structural or sequence units that may exist in a variety of biological contexts Repeats Short sequences typically repeated within a protein Active Site Binding Site Conserved Site PTM Sites

  36. Type Name Identifier Contributing signatures Description References GO terms

  37. Type Name Identifier Contributing signatures Relationships Description References

  38. InterPro family and domain relationships

  39. Family relationships in InterPro: Interleukin-15/Interleukin-21 family Interleukin-15 Interleukin-15 mammal Interleukin-15 fish Interleukin-15 avian

  40. Relationships

  41. InterPro relationships: domains Protein kinase-like domain Protein kinase catalytic domain Tyrosine kinase catalytic domain Serine/threonine kinase catalytic domain

  42. A brief diversion into the Gene Ontology...

  43. Gene Ontology • Unify the representation of gene and gene product attributes across species • Allow cross-species and/or cross-database comparisons

  44. The Gene Ontology Less specific concepts • A way to capture biological knowledge in a written and computable form • A set of concepts • and their relationships • to each other arranged • as a hierarchy More specific concepts www.ebi.ac.uk/QuickGO

  45. The Concepts in GO 1. Molecular Function • protein kinase activity • insulin receptor activity 2. Biological Process • Cell cycle • Microtubule cytoskeleton organisation 3. Cellular Component

More Related