1 / 33

BioMAS: A Multi-Agent System for Automated Genomic Annotation

BioMAS: A Multi-Agent System for Automated Genomic Annotation. Keith Decker Department of Computer and Information Sciences University of Delaware. Salim Khan, Ravi Makkena, Gang Situ Computer & Information Sciences. Dr. Carl Schmidt, Heebal Kim Animal & Food Sciences. Outline.

wyanet
Télécharger la présentation

BioMAS: A Multi-Agent System for Automated Genomic Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioMAS: A Multi-Agent System for Automated Genomic Annotation Keith Decker Department of Computer and Information SciencesUniversity of Delaware Salim Khan, Ravi Makkena, Gang Situ Computer & Information Sciences Dr. Carl Schmidt, Heebal Kim Animal & Food Sciences

  2. Outline • General class of problems and MAS solution approach • BioMAS: Automated Genomic Annotation • HVDB: HerpesVirus Database • ChickDB: Gallus Gallus Database • GOFigure! • CoPrDom • Signal Transduction Pathway Discovery

  3. What problems are we addressing? • Huge, dynamic “Primary Source” Databases • Highly distributed, overlapping • Heterogeneous content, structure, curation • Multitude of analysis algorithms • Different interfaces, output formats • Create contingent process plans chaining many analyses together • Individual PIs, working on non-model organisms • Learn, then hand-navigate sea of DBs and analysis tools • Easily overwhelmed by new sequence and EST data • Struggle to make results available usefully to others

  4. Approach: Multi-Agent Information Gathering • Software agents for information retrieval, filtering, integration, analysis, and display • Embody heterogeneous database technology (wrappers, mediators, …) • Deal with dynamic data and changing data sources • Efficient and robust distributed computation (for both info retrieval and analysis) • Deal with issues of data organization and ownership • Natural approach to providing integrated information • To humans via web • To other agents via semantic markup [XML/OIL/DAML]

  5. Example: Multi-Agent System for Automated Herpesvirus Annotation • Input raw sequence data • Output: an annotated database that allows fairly complex queries • BLAST homologs • Motifs • Protein domains [Prodomain records] • PSORT sub-cellular location predictions • GO [Gene Ontology] electronic annotation • “Show me all the genes in Marek’s Disease virus with a tyrosine phosphorylation motif and a transmembrane domain value ≥ 2”

  6. How does this help? • Automates collection of information from various primary source databases • If the info changes, can be updated automatically. PI can be notified. • Allows various analyses to be done automatically • Can encode complex (contingent) sequences of info retrieval and linked analyses, report interesting results only • New data sources, annotation, analyses can be applied as they are developed, automatically (open system) • Made available on internet to others, or private data • Much more sophisticated queries than keyword search • Dynamic menu of keys • Concept hierarchies (“ontology”) allow more concise queries • Query planning (e.g., time, resource usage) • Can search across multiple databases (i.e., from other researchers)

  7. Interface Agents Sequence Addition Applet User Query Applet Domain- Independent Task Agents Proxy Agent Matchmaker Agent Agent Name Server Agent Query Processing Agent Local Knowledgebase Management Agent Local Knowledgebase Management Agent Sequence Source Processing Agent Annotation Agent Local Knowledgebase Management Agent Task Agents GenBank Info Extraction Agent SwissProt/ProSite Info Extraction Agent Information Extraction Agents ProDomain Info Extraction Agent Psort Analysis Wrapper RETSINA-style Multi-Agent Organization How does it work?

  8. DECAF: A multi-agent system toolkit • Focus on programming agents, not designing internal architecture • Programming at the multi-agent level • Value-added architecture • Support for persistent, flexible, robust actions

  9. DECAF • Focus on programming agents, not designing internal architecture • Avoiding the API approach • DECAF as agent “operating system”, programmers have strictly limited access • Communication, planning, scheduling, [coordination], execution • Graphical dataflow plan editor

  10. DECAF • Programming at the multi-agent level • Standardized, domain-independent, reusable “middle agents” • Agent Name Server (white pages) • Matchmaker (yellow pages/directory service) • Brokers (managers) • Information extraction (learning [STALKER] + knowledgebase [PARKA]) • Proxy (web interfaces) • [Agent Management Agent (debugging, demos, external control)] • Note: heterogeneous architectures are OK!

  11. DECAF • Value-added architecture • Taking care of details (social/individual) • ANS registration/dereg (eventually MM) • Standard behaviors (AMA, error, FIPA, libraries) • Message dispatching (ontology, conversation) • Coordination (GPGP) • Efficient use of computational resources • Highly threaded: internally + domain actions • Memory efficient (ran systems for weeks, hundreds of thousands of messages)

  12. DECAF • Support for persistent, flexible, robust actions • HTN-style programming • Task alternatives and contingencies • RETSINA-style dataflow • Provisions/Parameters determine task activation • Multiple outcomes, Loops • TÆMS-style task network annotations • Dynamic overall utility: Quality, cost, duration task characteristics • Explicit representation of non-local tasks • Example: Time/Quality tradeoff

  13. Action Modules Action Modules Action Modules Action Modules Action Modules DECAF Architecture Incoming KQML/FIPA messages Plan file Incoming Message Queue Objectives Queue Task Queue Agenda Queue Agent Initialization Dispatcher Planner Scheduler Executor Pending Action Queue Task Templates Hash Table Action Results Queue [concurrent] Domain Facts and Beliefs Outgoing KQML/FIPA messages

  14. Plan Editor

  15. Expanding the Genomic Annotation System

  16. Functional Annotation Suborganization Gene Ontology Consortium www.geneontology.org • Biological process • Molecular Function • Cellular Component

  17. Co-present Domain Networks (CoPrDom) • Proteins can be viewed as conserved sets of domains • Vertex = domain, edge = co-present in some protein, edge weight = # of proteins co-present in • Network constructed from InterPro domain markup of proteins in 10 species (human, drosophila, c. elegans, s. cerevisiae among them) • Functional characterization via InterPro to GO mapping • Network constructed per organism per functional group, eg: apoptosis regulation in human

  18. Uses for COPRDOM • Functional characterization of unknown domains • Identification of core domains/groups in a functional group • Tracking domain evolution through species evolution • Predicting protein-protein interaction by identifying evolutionary merging of domain groups

  19. Biological Pathway Discovery thru AI Planning Techniques • AI planning is a computational method to develop complex plans of action using the representation of the initial states, the actions which manipulate these states to achieve the goal states specified. • Initial States: The initial state representation of objects in the "plan world" • Actions: Logical descriptions of preconditions and effects • Goals: The end states desired • HTN (Hierarchical Task Network) Planning proceeds by task decompostion of networks, and a successful is one that satisfies a task network.

  20. Uses of the Signal Transduction Planner • To produce computer interpretable plans capturing relevant qualitative information regarding signal transduction pathways. • To produce testable hypotheses regarding gaps in knowledge of the pathway, and drive future signal transduction research in an ordered manner. • To identify key nodes where many pathways are regulated by a node with only 1 functional protein serving as a critical checkpoint. • To perform in silico experiments of hyper expression and deletion mutation. • To enable pathway vizualization tools by providing human- and machine-readable pathway description.

  21. Advantages of Planning • Operator schema: Abstracted axiomatic definitions of sub-cellular processes, understandable to human + computer • Task abstraction:Decomposition of complex task into simpler, interchangeable actions. • Reduces search space, conflicts • Modeling of pathways at different levels of biochemical detail • Search conducted in Plan Space: Most planners perform bi-directional search (vs. Pathway Tools, Prolog implementations, etc.) • Partial-order Planning:Succinct representation of multiple pathways helps identify key causal relationships

  22. Advantages of Planning (contd.) • Conditional effectscan be used to model special cases ("exceptions") when applying operator schema • Resource Utilization can be used to model quantitative aspects such as amplification of a signal, feedback and feed-forward loops • Plan re-use: Old plans can be successfully inserted into new ones (if initial and final conditions are met )without additional computation

  23. (ontologically driven) Operator Schema Example: Transport (action: transport :parameters (?mol - macromolecule, ?compfrom, ?compto - compartment) :condition (and (in ?mol ?compfrom) (open ?compfrom ?compto)) :effects (and (in ?mol ?compto) (not (in ?mol ?compfrom)))

  24. RTK-MAPK pathway Activation of Ras following binding of a hormone (eg. EGF) to a receptor

  25. RTK-MAPK pathway step: O-Plan Output Phosphorylation of GRB2 at domain Sh2 by the RTK receptor

  26. Summary • Bioinformatics has many features amenable to multi-agent information gathering approach • BioMAS: Automated Analysis: EST processing to functional annotation ontologies • DECAF / RETSINA / TÆMS • GOFigure! And electronic GO annotation • CoPrDom Co-Present Domain Analysis • Signal Transduction Pathway Discovery

  27. BioMAS Future Work • Sophisticated queries are possible, but how to make available to Biologists?? • “Show me all glycoproteins in Marek’s Disease virus with a tyrosine phosphorylation motif and a transmembrane domain value ≥ 2 that are expressed in feather follicles” • Robustness, efficiency, scale, data materialization issues • Automating and integrating more complex analysis processes (using existing software!) • Estimating physical location of genes by synteny • Integrate new data sources • Microarray and other gene expression data • And thus, more analyses: QTL mapping, metabolic pathway learning • New off-site organism databases and analysis agents http://udgenome.ags.udel.edu/ http://www.cis.udel.edu/~decaf/

More Related