80 likes | 461 Vues
UniProt: Universal Protein Resource. Central Resource of Protein Sequence and Function. International Consortium PIR at GUMC European Bioinformatics Institute Swiss Institute of Bioinformatics Unifies PIR-PSD, Swiss-Prot, TrEMBL Protein Sequence Databases. http://www.uniprot.org.
E N D
UniProt: Universal Protein Resource Central Resource of Protein Sequence and Function • International Consortium • PIR at GUMC • European Bioinformatics Institute • Swiss Institute of Bioinformatics • Unifies PIR-PSD, Swiss-Prot, TrEMBL Protein Sequence Databases http://www.uniprot.org
UniParc: Comprehensive Sequence Archive with Sequence History UniRef: Non-redundant Reference Databases for Sequence Search UniProtKB: Knowledgebase with Full Classification and Functional Annotation UniProt Databases
UniProt Archive (UniParc) • An archive for tracking protein sequences • Comprehensive: All published protein sequences • Non-Redundant: Merge identical sequence strings • Traceable: Versioned, with ‘Active’ or ‘Obsolete’ status tag • Concise: no annotation of function, species, tissue, etc. • 5 million unique entries from 13 million source-database entries
Sub-fragments Splice variants UniProt Reference Clusters (UniRef) • Non-Redundant Reference Clusters for Sequence Searching • UniRef100 for Comprehensive Sequence Similarity Search • 100% sequence identity from all species, merging sub-fragments • Derived from UniProtKB – Splice variants as separate entries • Additional UniParc sources (e.g. Ensembl, IPI, EMBL_WGS)
Release 4.4 (03/29/05) Database Size UniProt Reference Clusters (UniRef) • UniRef90/50 for Faster Searches using Reduced Data Sets • UniRef90: 90% sequence identity (35% reduction from UniRef100) • UniRef50: 50% sequence identity (65% reduction) • Representative Sequence for cluster
UniProt Knowledgebase (UniProtKB) • Objective: Stable, Comprehensive, Fully Classified, Richly and Accurately Annotated • Describe in a single record all protein products derived from a certain gene in a given species • Information Content • Isoform Presentation: Alternatively Spliced Forms, Proteolytic Cleavage, and Post-Translational Modification (each with FTid) • Nomenclature: Gene/Protein Names (Nomenclature Committees) • Family Classification and Domain Identification: InterPro and PIRSF • Functional Annotation: Function, Functional Site, Developmental Stage, Catalytic Activity, Modification, Regulation, Induction, Pathway, Tissue Specificity, Subcellular Location, Disease, Process
UniProtKB Report (II) http://www.pir.uniprot.org/cgi-bin/upEntry?id=PH4H_HUMAN