1 / 75

Bioinformatic Databases

Take home. The internet is a powerful resource containing a large volume of data and tools to manipulate them? unfortunately, connecting data between them can sometimes be tricky.. Overview. Whirlwind tour of Web databasesThe Rat Genome Database ? data, tools, and operations. Bioinformatic databa

baby
Télécharger la présentation

Bioinformatic Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Bioinformatic Databases Norie de la Cruz, PhD

    2. Take home The internet is a powerful resource containing a large volume of data and tools to manipulate them unfortunately, connecting data between them can sometimes be tricky.

    3. Overview Whirlwind tour of Web databases The Rat Genome Database data, tools, and operations

    4. Bioinformatic databases on the WWW Loose definition of database here Vary widely in terms of offerings, data, tools and specialization Vary widely in terms of data collection methodologies

    5. Some classifications per NAR Major sequence repositories Gene Expression Comparative genomics Gene Identification and Structure Genetic and physical maps Genomic Databases Intermolecular interactions Metabolic Pathways and Cellular Regulation Mutation Databases Pathology

    6. Some classifications per NAR Protein Databases Protein sequence Motifs Proteome Resources Retrieval systems RNA Sequences Structure Transgenics Varied Biomedical Content

    7. Major Sequence Repositories GenBank RefSeq DDBJ Ensemble Unigene Collection of sequence data Genomic Markers Genes Proteins Some provide tools to expedite access Blast Search Alignment tools Translation tools etc. Varying degrees of quality control Machine data upload Human curation and QC

    8. Major Sequence Repositories: Genbank All know nucleotide and protein sequences Provides submission system for various authors Little QC

    9. Major Sequence Repositories: RefSeq Non redundant collection of naturally occurring biological molecules Human QC Comprehensive, integrated set of sequences for major research organisms Provides a stable reference for further characterization of sequences including comparative analyses, mutations, expression, etc.

    10. Major Sequence Repositories: Unigene Attempts to cluster GenBank sequences into gene-oriented clusters Each cluster contains sequences that represent one gene Provides a stable reference for further characterization of sequences including comparative analyses, mutations, expression, etc.

    11. Major Sequence Repositories: DDBJ (DNA Data Bank of Japan) Japanese equivalent to NCBI efforts Attempting to gather all known nucleotide and protein sequences Part of the International Nucleotide Sequence Collaboration

    12. Major Sequence Repositories: EMBL Nucleotide Sequence Database European equivalent to NCBI efforts Attempting to gather all known nucleotide and protein sequences Part of the International Nucleotide Sequence Collaboration

    13. Major Sequence Repositories: UCSC Genome Browser Visual representation of genome and sequence data Run by University of California at Santa Cruz

    14. Comparative Genomics

    15. Comparative Genomics: Microbial Genome Database for Comparative Analysis

    16. Comparative Genomics: Some specialized sites

    17. Comparative Genomics: Clusters of Orthologous Groups Phylogenetic classification of the proteins encoded in complete genomes Proteins grouped according to sequence by a program called COGNITOR Must be represented in at least three species in a group of 43 species representing phylogenetic lineages Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain.

    18. Gene Expression

    19. Gene Expression: Array Express

    20. Gene Expression: Edinburgh Mouse Atlas Project

    21. Gene Expression: HugeIndex (Human Gene Expression Index)

    22. Gene Expression: Other specialized sites

    23. Gene Identification and Structure

    24. Gene Identification and Structure: SNP Consortium database

    25. Gene Identification and Structure: Alternative Splicing Annotation Project (ASAP)

    26. Gene Identification and Structure: PromEC

    27. Gene Identification and Structure: Some other specialized sites

    28. Genetic and physical maps Repository for marker information Data on gene locations within the genome Map of cloned sequences Tools to integrate information across genomes

    29. Genetic and Physical Maps: HuGeMap

    30. Genetic and Physical Maps: GeneMap99

    31. Genomic Databases Data repositories for research results on various model organisms Rat Human Fruit fly Worm Arabidopsis Some other rodent Linking information across databases Tools to organize and integrate information

    32. Genomic Databases: The Rat Genome Database

    33. Genomic Databases: FlyBase

    34. Genomic Databases: EcoGene

    35. Genomic Databases: Some other examples

    36. Mutation Databases Allele distributions in populations Inherited genetics diseases Mutations in proteins implicated in disease development

    37. Mutation Databases: ALFRED designed to make allele frequency data on anthropologically defined human population samples readily available to the scientific community link these polymorphism data to the molecular genetics-human genome databases

    38. Mutation Databases: Human Gene Mutation Database an attempt to collate known (published) gene lesions responsible for human inherited disease provides information of practical diagnostic importance to researchers and diagnosticians in human molecular genetics physicians interested in a particular inherited condition in a given patient or family genetic counsellors.

    39. Mutation Databases: Online Mendelian Inheritance in Man (OMIM) catalog of human genes and genetic disorders contains textual information, pictures, and reference information

    40. Mutation Databases: Other examples Atlas of Genetics and Cytogenetics in Oncology and Haematology Database of Germline p53 Mutations SV40 Large T-Antigen Mutant Database KinMutBase Disease causing kinase mutations

    41. Protein Databases Protein sequences collection Clustering of protein data into families Specialized protein sites Organism Function Large variety of enzymes

    42. Protein Databases: InterPro a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences amalgamating the major protein signature databases, data have been manually integrated and curated and are available in InterPro PROSITE Pfam PRINTS ProDom SMART TIGRFAMs Home Home

    43. Protein Databases: ProtoNet provides global classification of the proteins, from the SWISS-PROT database into hierarchical clusters clustering is based on an all-against-all BLAST similarity search

    44. Protein Databases: iProClass an integrated resource that provides comprehensive family relationships and structural/functional features of proteins currently consists of non-redundant PIR and SwissProt/TrEMBL proteins 36,200 PIR superfamilies 145,300 families 5720 domains 1300 motifs 280 post-translational modification sites links to over 50 biological databases.

    45. Protein Databases: Other Examples Nuclear Protein Database Proteins localized in the nucleus PLANT-Pls Plant protease inhibitors SWISS-PROT/TrEMBL Curated protein sequences SENTRA Sensory signal transduction proteins Ribonuclease P Database

    46. Protein Sequence Motifs Alignment of protein sequences Organization of proteins into families

    47. Protein Sequence Motifs: BLOCKS multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins Tools: Block Searcher -- compare a protein or DNA sequence to a database of protein blocks Get Blocks -- retrieve blocks Block Maker -- create new blocks

    48. Protein Sequence Motifs: Pfam a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains and families. For each family in Pfam you can: Look at multiple alignments View protein domain architectures Examine species distribution Follow links to other databases View known protein structures

    49. Protein Sequence Motifs: PROSITE database of protein families and domains. It consists of biologically characterized sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs currently contains patterns and profiles specific for more than a thousand protein families or domains. each of these signatures comes with documentation providing background information on the structure and function of these proteins

    50. Protein Sequence Motifs: Other Examples ASC Active Sequence Collection Biologically active oligopeptides ClusTr Automatic classification of SWISS-PROT and TrEMBL proteins TMPDB Experimentally-characterized transmembrane topology O-GLYCBASE O- and C- linked glycosylation sites in proteins

    51. RNA Sequences Repository of RNA sequences RNA structure data RNA metabolism information Specialized site by organism, function, etc

    52. RNA Sequences: HyPaLib contains annotated structural elements characteristic for certain classes of structural and/or functional RNAs developing software tools that allow a user to search sequence databases for any pattern in HyPaLib

    53. RNA Sequences: Rfam a collection of multiple sequence alignments and covariance models representing non-coding RNA families allow the user to search a query sequence against a library of covariance models, and view multiple sequence alignments and family annotation

    54. RNA Sequences: tRNA sequences compilation of tRNA Sequences and Sequences of tRNA genes

    55. RNA Sequences: Other Examples 16S and 23S Ribosomal RNA Mutation Database ACTIVITY functional DNA/RNA site activity PLANTncRNAs Plant non-coding RNAs RNA Modification Database Naturally modified nucleosides in RNA

    56. Structure Information on protein structure derived from physical data crystallography, NMR Classification of proteins according to tertiary structures Specialized site for specific proteins

    57. Structure: ASTRAL provides databases and tools useful for analyzing protein structures and their sequences Partially derived from the SCOP database (Structural Classification of Proteins)

    58. Structure:SCOP Comprehensive ordering of proteins to know structures based on their evolutionary and structural relationships Protein domains are grouped into species and hierarchically classified in families superfamilies, folds, and classes

    59. Structure: PDB Structure data determined by X-ray crystallography and NMR

    60. Structure: Other Examples CADB conformation angles of protein structures, with associated crystallographic data Database of Macromolecular Movements DSDBase Disulfide Bonds in proteins PSSH alignment between sequences and tertiary structures SUPERFAMILY Assignments of proteins to structural superfamilies

    61. Other Databases Intermolecular Interactions Metabolic Pathways and Cellular Regulation Pathology Proteome Resources Retrieval Systems and Database Structure Transgenics Varied Medical Content

    62. Other Databases: Intermolecular Interactions BIND Molecular interactions, complexes and pathways DIP (Database of Interacting Proteins) Experimentally determined protein-protein interactions KDBI Kinetic data on biomolecular interactions

    63. Other Databases: Metabolic Pathways and Cellular Regulation KEGG Kyoto Encyclopedia of Genes and Genomes MetaCyc Metabolic Pathways and Enzymes from Various organisms PathDB EcoCyc E. coli K-12 genome and pathway data PRODORIC gene regulation and regulatory networks in prokaryotes

    64. Other Databases: Pathology BayGenomics cardiovascular and pulmonary disease INFEVERS hereditary inflammatory disorder GOLD.db lipid-associated disorders Mouse Tumor Biology Database

    65. Other Databases: Proteome Resources GELBANK 2D gel data repository REBASE Restriction enzymes and associated methylases SWISS-2DPAGE Annotated two-dimensional gel electrophoresis database

    66. Other Databases: Retrieval Systems and Database Structure TESS Transcription Element search system Virgil Database interconnectivity

    67. Other Databases: Transgenics Cre Transgenic database Cre transgenic mouslines Transgenic/targeted mutation database information on transgenic animals and targeted mutations

    68. Other Databases: Varied Medical Content Tree of Life phylogeny and biodiversity PubMed biomedical literature NCBI Taxonomy Browser organisms with at least one sequence deposited in the database Pharmgkb Pharmacogenomics and variations in drug response based on human variation

    69. The Rat Genome Database Data Tools Operations

    70. The Rat Genome Database: data Genes Maps and Markers QTLs Strains Homologs

    71. The Rat Genome Database: tools VCMap Mapserver Meta Gene Genome Scanner Ontology Browser

    72. The Rat Genome Database: operations Curation Data QC and Loading Data development Tool development

    73. The Rat Genome Database Operations: Curation Information gathering from peer-reviewed work Coordination with other model organism data bases Data quality policy development and assessment

    74. The Rat Genome Database Operations: data development Development of data integration strategies Development of ontology annotation protocols Some development of curation policies Outreach Ontology development

    75. The Rat Genome Database Operations: tool development Ontology system development Systems analysis Tool integration Tool building Software system migration

More Related