1 / 27

Terrapon N., Ghouila A., Gascuel O., Maréchal E., Laouini D., Bréhélin L.

Identification of Novel Protein Domains in Plasmodium and Leishmania Species. Terrapon N., Ghouila A., Gascuel O., Maréchal E., Laouini D., Bréhélin L. ISCB’09 Bamako, Mali. Outline. Background Protein domains Plasmodium & Leishmania species Detection by Co-Occurrence Website

anana
Télécharger la présentation

Terrapon N., Ghouila A., Gascuel O., Maréchal E., Laouini D., Bréhélin L.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identification of Novel Protein Domains in Plasmodium and Leishmania Species Terrapon N., Ghouila A., Gascuel O., Maréchal E., Laouini D., Bréhélin L. ISCB’09 Bamako, Mali

  2. ISCB’09, Bamako, Mali Outline • Background • Protein domains • Plasmodium & Leishmania species • Detection by Co-Occurrence • Website • Experiments

  3. ISCB’09, Bamako, Mali Protein domains • Domains are structural and functionalsubunits of proteins • Predicting domain composition of proteins helps to predict their function • Domain families databases • Prosite, Pfam, Superfamily, SMART, etc. • Interpro domain metadatabase: gathers information from 10 different domain databases

  4. ISCB’09, Bamako, Mali [Finn 08] • Hidden Markov Models (HMMs): powerful tool for protein domain identification • One domain one HMM: 10 340 models (v23.0) • Score reflecting sequence similarity to the model • Thresholds provided by Pfam allowing to assert domain presence

  5. ISCB’09, Bamako, Mali [Finn 08] • Hidden Markov Models (HMMs): powerful tool for protein domain identification • One domain one HMM: 10 340 models (v23.0) • Score reflecting sequence similarity to the model • Thresholds provided by Pfam allowing to assert domain presence • Problem: in divergent sequences, some domains may be missed

  6. ISCB’09, Bamako, Mali Divergent organisms • Plasmodium falciparum • Agent of Malaria; sequenced [Gardner02] • ~ 500 million clinical cases and ~ 2 million deaths each year • Leishmania major • Agent of Leishmaniasis; sequenced [Ivens05] • ~ 2 million clinical cases (visceral and cutaneous) and ~ 50 thousands deaths each year • Pfam domains in these organisms • Very low variety of domains types • 50% of proteins do not have any domain (Yeast: 24%)

  7. ISCB’09, Bamako, Mali Outline • Background • Detection by Co-Occurrence • Principle • Illustration • Website • Experiments

  8. ISCB’09, Bamako, Mali Detection by Co-Occurrence • Principle • Relax Pfam thresholds: more detections but numerous false positives • Filter procedure using domain co-occurrence • Domain co-occurrence • Domain tendency to appear with few other favorite domains • In Uniprot proteins: 20 000 domain pairs over ~12,5 millions possible pairs (1,6‰)

  9. ISCB’09, Bamako, Mali Detection by Co-Occurrence • Conditionally dependent pairs (CDP) statistically relevant co-occurrence (Uniprot) (A, C) (A, D) (C, D)

  10. ISCB’09, Bamako, Mali Detection by Co-Occurrence • Conditionally dependent pairs (CDP) statistically relevant co-occurrence (Uniprot) (A, C) (A, D) (C, D) • Given a protein sequence • Identify the known domains A C

  11. Detection by Co-Occurrence • Conditionally dependent pairs (CDP) statistically relevant co-occurrence (Uniprot) (A, C) (A, D) (C, D) • Given a protein sequence • Identify the known domains • Relax Pfam thresholds: potential domains A C B D

  12. ISCB’09, Bamako, Mali Detection by Co-Occurrence • Conditionally dependent pairs (CDP) statistically relevant co-occurrence (Uniprot) (A, C) (A, D) (C, D) • Given a protein sequence • Identify the known domains • Relax Pfam thresholds: potential domains • Check all pairs (known, potential)in the CDP list A C B D

  13. ISCB’09, Bamako, Mali Detection by Co-Occurrence • Conditionally dependent pairs (CDP) statistically relevant co-occurrence (Uniprot) (A, C) (A, D) (C, D) • Given a protein sequence D is certified! B is not. A C D B

  14. ISCB’09, Bamako, Mali Detection by Co-Occurrence • Different types of certification • Known Interpro domains: more reliable • Potential Pfam domains:allow to find domains in proteins where no domain is already known • Control of the error rate • Shuffling procedure • False Discovery Rate (FDR) estimation

  15. Outline Background Detection by Co-Occurrence Website Plasmodium species: falciparum, vivax, yoelii http://www.lirmm.fr/~terrapon/codd/ Leishmania species: major, infantum, braziliensis http://www.lirmm.fr/~terrapon/leishmania/ Experiments 30/11/2009 ISCB’09, Bamako, Mali 15

  16. Outline • Background • Detection by Co-Occurrence • Website • Experiments • Statistics • Biological analysis ISCB’09, Bamako, Mali

  17. Certified Domains- FDR < 10% High congruency with orthologous proteins in closest species: P. vivax: 78%, P.yoelii: 64% | L. infantum: 92%, L. braziliensis: 85% ISCB’09, Bamako, Mali

  18. L. major P. falciparum Known domains DNA binding Hydrolase activity Transferase activity DNA binding Translation initiation factor activity Predicted domains ATP-dependent 3'-5' DNA helicase activity Chromatin binding DNA replication Intracellular transport RNA binding Transcription factor activity DNA binding DNA repair Intracellular transport RNA processing Response to DNA damage stimulus Over-represented GO terms ISCB’09, Bamako, Mali

  19. Domains of major interest • Plasmodium falciparum • Vitamin synthesis (cobalamin and folate) • Drought resistance related domain – plant kingdom specific • Leishmania major • Bacterial specific domains (bacterial transcription regulation and receptor domains) • Domains related to cell cycle regulation and invasion mechanisms ISCB’09, Bamako, Mali

  20. Categories of predicted domains • In Leishmania major : ISCB’09, Bamako, Mali

  21. Conclusion • Method to improve the sensitivity of Pfam domain detection • New functional annotations • Interesting results on divergent proteomes • Predictions for Plasmodiumspecies: http://www.lirmm.fr/~terrapon/codd/ • Predictions for Leishmania species: http://www.lirmm.fr/~terrapon/leishmania/ ISCB’09, Bamako, Mali

  22. Future Works New assumptions for the understanding of: • Transcription and regulation mechanisms • Parasite invasion strategies Application to other organisms: • Arabidopsis thaliana, Saccharomyces cerevisiae: done • All sequenced organisms: in progress… Integrate whole results in a real database Improvement: combine results of closest species 30/11/2009 ISCB’09, Bamako, Mali 30

  23. THANKS FOR YOUR ATTENTION!

  24. Selecting CDPs • Computing the probability to obtain as many proteins with A and B under the nullhypothesis of independency.

  25. Most Frequent Domain Certified in Leishmania and Plasmodium species Known in 30 proteins Discovered in 37 others TPR2 Mediates PPI Cell cycle regulation  transcriptional control  mitochondrial and peroxisomal protein transport neurogenesis and protein folding

More Related