1 / 39

From Databases to Dynamics

From Databases to Dynamics. Dr. Raquell M Holmes Center for Computational Science Boston University. Computational Biology. Computational Biology  Bioinformatics More than sequences, database searches, statistics or image analysis. A part of Computational Science

shima
Télécharger la présentation

From Databases to Dynamics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Databases to Dynamics Dr. Raquell M Holmes Center for Computational Science Boston University

  2. Computational Biology • Computational Biology Bioinformatics • More than sequences, database searches, statistics or image analysis. • A part of Computational Science • Using mathematical modeling, simulation and visualization • Complementing theory and experiment

  3. Overview • Databases: • Publicly available • Why so many? • Metabolic Pathway databases • How do we find and visualize information? • How did they get it? • Beyond information to behaviors • Expression data • Dynamic behaviors

  4. Biologists search for function at all levels Chromosomal structures Genomic organization RNA secondary structures Genomic information/mapping: RicBase http://igs-server.cnrs-mrs.fr/mgdb/Rickettsia/ Center for biotechnology http://www.biotech.unl.edu/home.html

  5. Protein properties: folds, motifs, active sites Protein interactions: network control Pathways http://ca.expasy.org/cgi-bin/get-sw3d-entry?P02568 Jeong et al 2001. Nature 411,41 www.kegg.org

  6. Databases • Databases enable biologists to access large amounts of research data. • Commercial, publicly available • Downloads, web accessible • 100’s to 1000’s

  7. Publicly available databases • Web Interfaces • Content • Search methods: Navigation • Analysis tools

  8. Database Content Over 11 Data Categories • Sequence: DNA, RNA, protein • Structure: genomics, protein, carbohydrate • Networks: metabolic enzymes and pathways (signaling) • Organisms: human/vertebrate, human genes and diseases • Expression: mircroarry and gene expression, proteomics • Nucleic Acid Research • Dedicated to review of databases (548-’04, 386-’03) • http://nar.oupsjournal.org • Most discussion focus on sequence databases.

  9. Protein databases Why so many? Where do they get their data?

  10. Many types of Protein Databases • Sequence (103): • publication, organism, sequence type, function • Protein properties (80): • Motifs, active sites, individual families, localization • Structure (15) • Resolution, experiment type, type of chain, space group. Nucl. Ac. Res. 32, D3 2004

  11. Where data comes from…Exp.Why do we care? • Experiments • Sequencing data: Gene or protein identity • Enzymatic assays: Biochemical properties • Expression studies:Localization, putative cellular function, regulation patterns • Protein interactions: complexes and networks

  12. Where data comes from…Comp.Why do we care? • Annotated sequences • Align sequences (full/partial) • Homology to other genes, Identity • Pattern recognition, property predictions • Biochemical properties • Motifs, profiles, families • Enzyme activity and structure prediction • Pathways • Homologous function

  13. Today’s journey • From protein sequence/name • To Metabolic pathway databases • Metabolic behaviors • Modeling glycolysis

  14. Discussing a familiar pathway Karp: Cell and Molecular Biology Textbook

  15. Generic Protein Seq. Record • Sequence: hexokinase • E.C. #: 2.7.1.1 • Publication: Stachelek et al 1986 • Organism: Yeast • Function: main glucose phosphorylating enzyme • Links: other databases or tools • Pre-determined Properties: • families, folds… • Swissprot example for enzyme.

  16. From protein (enzyme) sequence • Conserved domains, active sites, folding patterns • How do we find the related pathway?

  17. What is a metabolic pathway? Contains a series of reactions. Reactions: Metabolites (substrate, product), enzyme, co-factors.

  18. From enzyme to reaction • Are there databases of metabolites? • How can we get from enzyme to pathways?

  19. Databases on Molecular Networks Metabolic Pathways from NAR (5): • EcoCyc: http://www.ecocyc.org/ • Began with E. coli Genes and Metabolism • BioCyc includes additional genomes • KEGG: http://www.genome.ad.jp/kegg/ • Encyclopedia of genes and genomes • Others: WIT2, PathDB, UMBDD

  20. Starting out… • Metabolism pathway databases • Search by name or sequence • Compounds, Reactions, Pathway, Genes • Associated information • Formulas, Names, Synonyms, Links to other databases

  21. Example results KEGG: search for compound or enzyme by key word. Glucose, 82 hits 1. cpd:C00029 UDPglucose; UDP-D-glucose; UDP-glucose; Uridine diophosphate glucose 2. cpd:C00031 D-Glucose; Grape sugar; Dextrose 3. cpd:C00092 D-Glucose 6-phosphate; Glucose 6-phosphate; Robison ester 79. cpd:C11911 dTDP-D-desosamine; dTDP-3-dimethylamino-3,4,6-trideoxy-D-glucose 80. cpd:C11915 dTDP-3-methyl-4-oxo-2,6-dideoxy-L-glucose 81. cpd:C11922 dTDP-4-oxo-2,6-dideoxy-D-glucose 82. cpd:C11925 dTDP-3-amino-3,6-dideoxy-D-glucose Grape sugar: FORMULA C6H1206 NAME D-Glucose grape sugar Dextrose List of reactions involving compound List of enzymes acting on compound

  22. Navigation bar • list of categories • Compounds • Hunt and peck • Oligosaccharides • Polysaccharides • Fructans • Glycogens • Large-branched-glucans • Long-linear-glucans • Pectin • Short-glucans • Starch • All-Carbohydrates • a sugar • a sugar phosphate • a sugar-1-phosphate • Aldonic-Acids • Carbohydrate-Derivatives • Carbohydrates • Disaccharides

  23. Keyword search • Results are a list of data • Proteins, compounds, reactions, pathways • Hunt and peck

  24. Visualizing Data

  25. Visualization: Data levels EcoCyc draws multi-level views Based on the pathway Metabolite perspective

  26. Additional detail

  27. E. coli K-12 Reaction: 4.1.2.13 Comment Species Citations Gene reaction schematic { Reaction

  28. Different species http://biocyc.org/server.html

  29. E.coli Pseudomona

  30. Summary I • Protein ->enzyme->pathway database • Pathway Database Content: • Species specific (BioCyc), general (KEGG) • Known and proposed enzymes, co-factors, metabolites. • Searches provide similar results (compounds, reactions…) with different appearance.

  31. Populating metabolic databases

  32. Where do the pathways come from? Databases: KEGG, WIT, BioCyc Resources: Experimental data based on literature Genomic data from other databases Determination of metabolic pathway Comparison to known pathways.

  33. Karp et al 1999, TIBT

  34. Linking Genome to Reactions: EC EC # provides information on catalyzed reaction and synonyms. Search for EC# in EcoCyc database. Assign reactions and possible pathway association. How correct is the pathway assignment?

  35. PathLogic/EcoCyc scoring new species pathway X=# reactions for pathway 4 Y=# reactions found 2 Z=# found in other pathways 1 Karp etal., 1999. TIBT

  36. Probably, possibly, not • Probability score depends on X:Y ratio • 4,2,1 has a 4:2 ratio which equals 0.5 • Probable X:Y>= 0.5 • Possible 0< X:Y< 0.5 • Not Y=0

  37. Pathway Evidence Glyph Homo sapien glycolysis pathway E.Coli pathway • Key to edge colors: • green: reactions in which the enzyme is present • black: reactions for which the enzyme is not identified in this organism • orange: reactions in which the enzyme is unique to this pathway • magenta: reactions that are spontaneous, or edges that do not represent reactions at all (e.g. in polymerization pathways)

  38. Summary II • Genomes used to create database of reactions • EC# link gene product to enzyme in reactions • Pathways in the database vary in degrees of probability.

  39. From Data to Dynamics • Database information is static • Just the facts • What is the behavior of the pathway? • Expression data • Dynamics

More Related