1 / 9

Tomer Altman Bioinformatics Research Group SRI International taltman@ai.sri

And now for our ‘Feature’ presentation: Automatic Loading of Protein Sequence Annotation Data from UniProt to Pathway / Genome PGDBs. Tomer Altman Bioinformatics Research Group SRI International taltman@ai.sri.com. Protein Features in Pathway Tools.

lgraham
Télécharger la présentation

Tomer Altman Bioinformatics Research Group SRI International taltman@ai.sri

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. And now for our ‘Feature’ presentation: Automatic Loading of Protein Sequence Annotation Data from UniProt to Pathway / Genome PGDBs Tomer Altman Bioinformatics Research Group SRI International taltman@ai.sri.com

  2. Protein Features in Pathway Tools • Represents annotations along a polypeptide sequence • Can represent anything from active sites to secondary structure • Defined by a set of classes rooted at ‘|Protein-Features| • Are found in the ‘FEATURES slot of ‘|Proteins| instances

  3. Protein Features Displayed

  4. BioWarehouse UniProt Loader • Parses the XML versions of the SwissProt and TrEMBL databases • Loads the Feature table with the corresponding sequence annotation entries • BioWarehouse is open-source software • Currently being extended to support alternate sequences and sequence annotation citations

  5. Extensions to the Pathway Tools Schema • Rooted as a sub-class under ‘|Protein-Segments| • Mirrors protein features available from the UniProt controlled vocabulary • Makes distinctions between variants due to human activity, variants within an organism, and variants across a strain population

  6. UniProt Feature Importer • PGDB proteins are mapped to entries in UniProt via UniProt Accession Numbers • If it does not already exist, the protein feature is imported from UniProt • Identity is based on the associated protein object, ‘|Protein-Feature| sub-class, and location along the protein. • If the previously-imported protein feature was deleted from UniProt, it is removed from the PGDB

  7. Current Statistics for EcoCyc • 19032 total ‘|Protein-Features| instances (out of 75537 total frames in EcoCyc) • 2130 manually created instances • 16902 imported from UniProt • 5586 ‘|Transmembrane-Regions| • 1939 ‘|Metal-Binding-Sites| • 1647 ‘|Mutagenesis-Variants| • 1146 ‘|Conserved-Regions|

  8. Current Work • Extending the UniProt Loader to import variant sequence information, and citations • Adding interface to UniProt Feature Importer from Pathway Tools • Creating databases on PublicHouse (publicly accessible BioWarehouse instance) to allow our users to import protein features into their own PGDBs

  9. Alex Shearer Suzanne Paley Ingrid Keseler Valerie Wagner Acknowledgements EcoCyc.org

More Related