1 / 46

Current Abstractions

Sequence GDKNADGWIEFEEL Database of Sequences Analysis String Theory BLAST. Current Abstractions. Pathways and Interaction Databases. Sequence databases teach us about biological “similarity”, how things are related. The 1st wave of Bioinformatics...

baylee
Télécharger la présentation

Current Abstractions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence GDKNADGWIEFEEL Database of Sequences Analysis String Theory BLAST Current Abstractions

  2. Pathways and Interaction Databases • Sequence databases teach us about biological “similarity”, how things are related. • The 1st wave of Bioinformatics... • An interaction database should likewise teach us about “specificity”, how things work. • The 2nd wave of Bioinformatics...

  3. New Abstractions Interaction pair “A binds B” Database of Interactions Analysis Graph Theory “PATHFIND” Goodsell

  4. http://bioinfo.mshri.on.ca A Quick tour of BIND

  5. A simple BIND INTERACTION record A B 5. Short label 6. Type of molecule 7. Database identifier 8. Origin 1. Short label 2. Type of molecule 3. Database identifier 4. Origin 9. Publication reference

  6. Understanding the BIND data model All cellular processes can be represented by a set of connected records where each record describes a biomolecular interaction and its associated consequences. A::B C::D E::F ? ? ?

  7. E S P E + S E-S INTERACTION record S P S CHEMICAL STATE data CHEMICAL ACTION data Understanding the BIND data model

  8. What BIND can encode... • Simple binding interactions • Enzymes, substrates and complete metabolic pathways including mechanisms • Restriction enzymes, Transcription factors • Limited proteolysis (insulin, clotting cascade, complement) • Reversible phosphorylation • Glycosylation • Intron splicing, tRNA base modifications • Ubiquitin mediated protein degradation • Viral life cycles in host cells

  9. What BIND cannot encode • bulk phenomena • membrane potentials • gradients • calcium waves • water • “perfect” cellular localization • (4-D time-development/organism axis).

  10. BIND Data Submission

  11. Java Chemistry Tool BIND Visitor Query BIND Submitter Data Entry Internet clients Text QueryCGI Flash Visual CGI BIND servers Text Data EntryCGI API API SeqHound specification specification Backfillingand Import BINDdata BINDdata data flow

  12. BIND Software A Visual Future...

  13. BIND Visualization, Consider... • How do we draw fast, high-quality, interactive • pictures of pathways and mechanisms from BIND and support thousands or more of simultaneous web clients? • keep “canned drawings” • long history (Metabolic Maps, 1968) • curators keep re-drawing… • large numbers of interactions • model may not scale well • generate drawings “on the fly” • BIND data > symbolic interactions • graph theory (edge and vertex) • need consistent symbolic language for pathways • never been done for biological processes

  14. “On The Fly” Visualization Strategy • Algorithmic generation of pathway drawings • User asks, “draw me a picture of ...” • Server queries database for binding partners, assembles an image, and sends it to the user • Define the symbolism in a creative and novel way • continuous line-symbols for domains • “mate-able” • we have already a library of about 500 - 1000 symbols

  15. Hand-drawn depiction of putative computer generated pathway graphic...

  16. Algorithmic Visualization • length mapped to sequence • mapping of sequence feature tables • legends automatically generated • can scale to the expected number of interactions/pathways • implementation is already underway...

  17. Electronics CAD software shows schematics alongside physical representations... both Structure and Function

  18. Electronics CAD systems are also database driven...

  19. BIND Proposal • We propose a GenBank-style public interaction database • public submissions of interactions • active software development • close ties to active proteomics and bioinformatics research • We propose a distributed collaboration • for managing indexing and database distribution.

  20. BIND - Data Quality Assurances • Two-tiered expert indexing and validation • professional indexers • public data submission • backfilling of literature data • validation by active “interaction” scientists

  21. BIND - Decentralized by Design • Indexing can be run at several sites • enabling technology is a unique key server • Indexing “nodes” should coincide with pockets of expertise

  22. BIND - Hybrid Data Ownership Model Like Entrez Some data is owned by databases (SWISSPROT) Other data is owned by submitters Ownership implies right to “edit” Curated/Backfilled - BIND owns the record Submitted - Submitter owns the record Redundant records are allowed (different citations) Dispute records may be entered

  23. Data From Existing Literature • The “Backfilling” problem • How do we go through the literature and put in the relevant interactions into a new database? • Joel Martin (NRC-IIT, Ottawa) • PubMed abstracts can be classified by SVM into • protein-protein interactions (95% accuracy) • protein-DNA interactions (99% accuracy) • 2 seconds analysis time per abstract

  24. Semi-Automated Backfilling • Automated text classification identifies paper describing interaction • Entrez-spiders find and cluster sequences of related papers • Backfilling indexers are presented with a “probable BIND record”

  25. BIND Database Features • Provides for precise descriptions of biochemical mechanisms and function. • Provides a mapping of interaction space to graph theory. • Tightly linked to the Entrez system.

  26. A Dynamic Data Specification Ready for change, suggestions and evolution to a mature data model...

  27. BIND interaction Date Updates Accession Molecule A Molecule B Description place, binding conditions, binding sites, chemical mechanism, kinetics Source (literature) Molecule A Molecule B Short Label ID and DB reference Origin/Cell Stage Sequence (NCBI Seq) Structure (NCBI Biostruc) Text Description Short Label ID and DB reference Origin/Cell Stage Sequence (NCBI Seq) Structure (NCBI Biostruc) Text Description

  28. Rapid Application Development • NCBI’s ASN.1 to C compiler, which generates bug-free code for each specified object: • memory allocation • freeing • read from file (stream) • write to file (stream) • This has saved us 2-4 person years • Allows us to rapidly test changes to BIND spec. • We leveraging work already paid for!

  29. The data is the database... • BIND has “exchange” types • lists of BIND ASN.1 records • Self-contained, extracted by an ASN.1 parser • Automated rules derived from the specification. • ASN.1 to XML via XER. • BIND data can be fed into any DBMS, on any platform. • We use a royalty-free DBMS allowing us to maintain distributed BIND indexing sites.

  30. BIND the fine print...

  31. Required Interaction Database Submissions…(when one is funded and ready to go) At the discretion of the participating granting bodies, mandate that an interaction “accession” be required for publication, as for: sequences (GenBank) structures (PDB) This ensures the growth and use of an interaction database and protects the investment in its development.

  32. BIND Personnel • Software Developers • System Administrators • Help/Training • Database Specialists • BIND Indexers • On-callers (validation) • Rotation, check entries for consistency, content • Resource for indexers to consult with

  33. http://bioinfo.mshri.on.ca BIND@mshri.on.ca Hogue Lab Gary Bader Ian Donaldson Katerina Michalickova Adrian Heilbut Kiran Deol Submitters and Volunteers... Tony Pawson Berivan Baskin BIND Colaborators: Francis Ouellette CMMT UBC Vancouver Joel Martin IIT-NRC Ottawa Christoph Sensen, IMB-NRC Halifax

More Related