1 / 26

Automatic and Reliable Functional Annotation of Proteins

Automatic and Reliable Functional Annotation of Proteins. Your data Uncharacterized Any kind of data Protein sequences Gene sequences etc. Our target: TrEMBL. The Target Database. Target. Collection of conditions Sequence patterns Profiles HMMs E.C. numbers Protein clusters

kaveri
Télécharger la présentation

Automatic and Reliable Functional Annotation of Proteins

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic and Reliable Functional Annotation of Proteins

  2. Your data Uncharacterized Any kind of data Protein sequences Gene sequences etc. Our target: TrEMBL The Target Database Target

  3. Collection of conditions Sequence patterns Profiles HMMs E.C. numbers Protein clusters Example: PROSITE Pfam The External Database XDB Target

  4. Search target Transfer annotation to target database Example:Look up E.C. number and add recommended enzyme name Direct Transfer XDB Target

  5. Usually more than one external database is used Combine the different results Multiple Sources XDB Target

  6. Conflicts • Contradiction • Inconsistencies • Synonyms • Redundancy

  7. Use a translator to map XDB language to target language Translation XDB Target

  8. Translation Examples • ENZYME TrEMBLCA L-ALANINE=D-ALANINECC -!- CATALYTIC ACTIVITY: L-ALANINE=CC D-ALANINE. • PROSITE TrEMBL/SITE=3,heme_ironFT METAL IRON • Pfam TrEMBL FT DOMAIN zf_C3HC4FT ZN_FING C3HC4-TYPE

  9. Introduction a standard/reference database Must be: highly reliable well-curated Example:SWISS-PROT Automatic Translation XDB Standard Target

  10. Use XDB to extract entries from standard database Example:Pfam:PF00509 HemagglutininHEMA_IAVI7/P03435HEMA_IANT6/P03436HEMA_IAAIC/P03437HEMA_IAX31/P03438HEMA_IAME2/P03439HEMA_IAEN7/P03440HEMA_IABAN/P03441HEMA_IADU3/P03442HEMA_IADA1/P03443HEMA_IADMA/P03444HEMA_IADM1/P03445HEMA_IADA2/P03446HEMA_IASH5/P03447 Extract Reference Entries Pfam SWISS-PROT TrEMBL

  11. Extract Common Annotation 132 entries read131 ID HEMA_XXXXX125 DE HEMAGGLUTININ PRECURSOR. 6 DE HEMAGGLUTININ.131 GN HA130 CC -!- FUNCTION: HEMAGGLUTININ IS RESPONSIBLE FOR ATTACHING THE130 CC VIRUS TO CELL RECEPTORS AND FOR INITIATING INFECTION.125 CC -!- SUBUNIT: HOMOTRIMER. EACH OF THE MONOMER IS FORMED BY TWO125 CC CHAINS (HA1 AND HA2) LINKED BY A DISULFIDE BOND. 75 DR HSSP; P03437; 1HGD. 31 DR HSSP; P03437; 1DLH.131 KW HEMAGGLUTININ; GLYCOPROTEIN; ENVELOPE PROTEIN102 KW SIGNAL 1 KW COAT PROTEIN; POLYPROTEIN; 3D-STRUCTURE130 FT CHAIN HA1 CHAIN.107 FT CHAIN HA2 CHAIN.102 FT SIGNAL

  12. Store the used pattern and the extracted common annotation in a separate database Store Common Annotation XDB Standard Target Common

  13. Extract entries from target Add common annotation to the entries Add Annotation to Target XDB Standard Target Common

  14. Modelling of the Rules • Definition of condition types • Definition of action types • Encoding the logic • Storage and retrieval of the rules • Version control • Monitoring the results

  15. Formal Language for the Rules • #Comment#RULE RU000001#DATE 1997-04-23 • ?Condition?PSAC PS00057?SPOC PLANTA • !Action!SPDE L-LACTATE DEHYDROGENASE!ECNO 1.1.1.27

  16. Implementation of Condition Types • Every condition type must be implemented • Example: Perl routine for ‘?PSAC’: has the protein a link to a given prosite entry?sub condition_PSAC { my $ac = shift; return /^DR PROSITE; $ac/m;}

  17. Implementation of Action Types • Every action type must be implemented • Example: Add enzyme code to the entry.sub action_ECNO { my $ecno = shift; s/^DE.*$/$& (EC $ecno)/m;}orinsert into Trembl2Enzyme values (acc,ecno);

  18. Encoding the Logic • Any logical expression likeaAND (bORc) BUTNOTdcan be written without brackets as aANDbANDNOTdORaANDcANDNOTd • Rules can be identifed by their conditions”a&b&-d|a&c&-d”

  19. Extract conditions from XDB Group SWISS-PROT by conditions Extract common annotation Group TrEMBL by conditions Add common annotation to TrEMBL Automatic Annotation of TrEMBL ENZYME Pfam PROSITE SWISS-PROT TrEMBL RuleBase

  20. Results: RuleBase • Source: PROSITE patterns • 262 rules • 597 conditions • 1099 actions • Result: • 2951 of 29330 new TrEMBL 5 entries • 1443 of 15078 new TrEMBL 6 entries • 9658 of 106330 existing TrEMBL 5 entries • 3254 of 140635 existing TrEMBL 6 entries

  21. Results: Keywords in TrEMBL

  22. Results: TrEMBL Annotation

  23. Discussion • Stable and reliable, successfully added 68000 lines to TrEMBL • Carefully set thresholds, therefore low coverage • Restricted language better than free text • Feed-back loop SWISS-PROT  TrEMBL • Rules may be implemented in set-oriented language • Position specific annotation may be improved by alignments • Independent of hierarchy • Based on multiple entries

  24. Dynamic Updates

  25. Where to get TrEMBL ftp.ebi.ac.uk/pub/databases/sp_tr_nrdb/

  26. SWISS-PROT at EBI Rolf Apweiler Sergio Contrino Wolfgang Fleischmann Henning Hermjakob Viv Junker Fiona Lang Claire O'Donovan Michele Magrane Maria Jesus Martin Nicoletta Mitaritonna Steffen Moeller Stephanie Kappus Collaborators Amos Bairoch Alain Gateau Jean-Jacques Codani Keith Tipton MGD Flybase Pfam Network of > 200 external experts Credits

More Related