1 / 50

Emerging Standards for Interoperable Biological Systems

Emerging Standards for Interoperable Biological Systems. Technology for Life: North Carolina Symposium on Biotechnology and Bioinformatics. Standards: Why do we care?. IEEE standards for plugs, outlets and wiring – I can buy an appliance and use it ( most of the time )

molimo
Télécharger la présentation

Emerging Standards for Interoperable Biological Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Emerging Standards for Interoperable Biological Systems Technology for Life: North Carolina Symposium on Biotechnology and Bioinformatics Dr. Marty McClelland

  2. Standards: Why do we care? • IEEE standards for plugs, outlets and wiring – I can buy an appliance and use it ( most of the time ) • Any international traveler will tell you that standards vary around the world Dr. Marty McClelland

  3. Without Standards - • Custom builds by experts • Build once – use once • Need expertise in specific domain • Expensive • Most of us – still using candles Dr. Marty McClelland

  4. Standards and Software • World Wide Web • Plug and play • Plug-in / modular components • XML: Extensible Markup Language • Web Services • Federated Search • Grid Services Dr. Marty McClelland

  5. My Standards Journey • Middleware to integrate learning systems with enterprise resource planning systems • IMS / IEEE learning technology standards – learning object metadata • National Science Digital Library – STEM LOM repository • NCCU BBRI Cardiovascular Study – similar issues Dr. Marty McClelland

  6. Bioinformatics Community • Embraced open source • Philosophy of sharing of data and tools • Community involvement yields foundation for standards development Dr. Marty McClelland

  7. Emerging Standards • tools/middleware – web services for harvesting – federated searches • grid computing • ontologies – developing controlled vocabularies • analysis– standards for sharing results– e.g. microarray analysis • models- Systems Biology – standards for interchange Dr. Marty McClelland

  8. Sharing Data, Tools, and Middleware • XML, go to http://www.w3.org/XML/ • Specifications for data interchange in biology applications (XML schemas) • Web services • Define WSDL for biology applications Dr. Marty McClelland

  9. AnatML, CellML, BIOML, GEML, MSAML, GeneXML, MAGE-ML, BSML, CDISC, and HL7 XML for data exchange Dr. Marty McClelland

  10. Virginia Bioinformatics Institute • toolbus • PathPort • Middleware for web services • query multiple databases • facilitate decision making and data interpretation • http://staff.vbi.vt.edu/pathport/services/ Dr. Marty McClelland

  11. BioMOBY • simple extensible protocols • Web services for interoperable databases • http://biomoby.org/ Dr. Marty McClelland

  12. Grid Computing • user authentication and authorization ( like X.509 certificates ) • Open Grid Computing Environment (OGCE) portal toolkit • Open Grid Services Architecture , OGSA • Globus Toolkit Dr. Marty McClelland

  13. Grid Applications • iNquiry – commercial product • NC BioGrid prototype / planning stages • statewide Bioinformatics Portal being created by the University of North Carolina at Chapel Hill • GridNexus project Dr. Marty McClelland

  14. Ontologies • Controlled vocabulary • Crosswalks between controlled vocabularies • Interoperability • Browse and search services across disparate repositories • www.geneontology.org Dr. Marty McClelland

  15. Data Analysis • MIAME, minimal information for the annotation of a microarray experiment • http://mged.sourceforge.net/ontologies/index.php Dr. Marty McClelland

  16. Systems Biology • Historically – many custom, small scale models with little reuse • Goal of Systems Biology is to construct the system with modular models where data can be supplied via web service queries to databases Dr. Marty McClelland

  17. Model Integration • Biology Workbench (SBW) strives to support model integrations through • Systems Biology Markup Language ( SMBL) – XML to represent biochemical networks – common framework to document models • SBW provides framework for interoperation across heterogeneous modeling tools http://sbml.org/index.psp Dr. Marty McClelland

  18. Implications • expose databases with web services • construct queries to locate the data • standards for grid services • community developed XML schemas for sharing biological data Dr. Marty McClelland

  19. GridNexus Dr. Marty McClelland

  20. UNCW Grid Initiative: GridNexus • The UNCW Grid Computing Project is a two-year collaborative project among a multi-discipline, multi-investigator core research team at UNCW and several discipline-focused researchers at partner institutions: NCSU, WCU, NCCU, ECU, and CFCC. The research areas and institutional interests of this project are: • Advanced Grid Software Development (UNCW) • Computational Chemistry (UNCW and ECU) • Bioinformatics (UNCW, NCSU, and NCCU) • Combinatorics (UNCW) • Business Computing (UNCW and NCCU) • Education and Training (UNCW, WCU, CFCC) • This project proposes to develop a Grid interface that is easy-to-use and may be used by a wide-range of applications and users. We have developed an innovative graphical user interface (GUI) for grid applications. In particular, we introduced a new scripting language (JXPL) designed for web-based services, a GUI for creating scripts, and have demonstrated the use of these tools with grid services. Dr. Marty McClelland

  21. GridNexus • This initiative grew in part out of a need for HPC resources following the closure of the NCSC in June 2003, coupled with the availability of faculty with software programming expertise and others with computing applications that could benefit from use of a Grid. • The UNC-OP funded UNCW’s proposal for $557,634 over two years to develop Grid portals (GUI middleware to allow users to access software on computers on a Grid). Dr. Marty McClelland

  22. Resources of UNCW Grid • Beowulf cluster – 16 PIII processors in Computer Sciences Department • Fire and FireDev servers plus disc storage devices • PQS Quantum Cube – 8 cpu cluster with PQS and Gaussian 03 computational chemistry software, plus TCP-Linda environment. • An 8 processor IBM blade cluster with 0.5 tB disk storage will be added soon. • Other computers may be added, including the possibility of using all computing lab computers, or possibly even all faculty/staff computers (when not in use). Dr. Marty McClelland

  23. GridNexus • The objective is to make accessing HPC resources (wherever they may be located) easy to scientists who are not computer savvy. • Most computation involves doing various mathematical operations on a dataset. • A GUI approach is employed, in which the user, after a single login that checks authentication and authorization, can create a ‘workflow’ of functions/operations graphically by connecting boxes dragged from a series of lists of options, then applying that series of steps to a dataset. • Such a ‘workflow’ can be saved for subsequent application to another dataset. Dr. Marty McClelland

  24. GridNexus • Job submission: Ideally in a grid, the grid middleware should select the ‘best’ resource – those computers that are available, capable, and have the software needed to handle the job. • The user need not select – nor know – where the computation is taking place. In fact, the job may even be passed from one computer to another for various aspects of the calculation. • The output is returned to the user’s workstation or account, rather than the user having to access and download the output file from a remote computer. Dr. Marty McClelland

  25. GridNexus • GridNexus is a GUI that allows the user to create/edit/run workflows • Based on Ptolemy II http://ptolemy.eecs.berkeley.edu/ptolemyII. Ptolemy provides the GUI and workflow features. We have extended it to provide the functionality we want (JXPL and GridServices) • Release 1.0.0 download available www.gridnexus.org Dr. Marty McClelland

  26. Getting Started • The right frame is the palette for building workflows • The upper left frame provides the library of modules • The lower left is a thumbnail of the entire workflow Dr. Marty McClelland

  27. The Basics • Sources produce data without needing input • Sinks consume data but may have side effects (such as displaying results) • All workflows must start with sources and end with sinks Dr. Marty McClelland

  28. Simple Example 1 • Click and drag the “Const” source to the workflow. • Click and drag the “JxplDisplay” sink to the workflow Dr. Marty McClelland

  29. Simple Example 1 • Double click on the Const module • Change its value to 10 • Click commit • The new value is shown on the icon Dr. Marty McClelland

  30. Simple Example 1 • Input ports are on the left-hand side and output ports are on the right-hand side of each module • Click and drag from the output port of the Const module to the JxplDisplay Dr. Marty McClelland

  31. Simple Example 1 • A link (or relation) is created between the two modules • The output of Const is consumed by the JxplDisplay Dr. Marty McClelland

  32. Simple Example 1 • Click on the run button ( ) • The JxplDisplay evaluates the input and produces a display window to show the results. • Notice the output is in XML (actually JXPL) Dr. Marty McClelland

  33. Simple Example 2 • Transformers are modules that take input, transform it, and produce new output • This example computes the express: (23 + 6) * -2 Dr. Marty McClelland

  34. Simple Example 2 • The Multiplication module takes the result of the addition (its first input) and multiplies that by -2 (its second input) • The result is consumed by JxplDisplay Dr. Marty McClelland

  35. What's Going On? • The workflow is not actually performing the operations. Instead it is creating a script (JXPL) that, when executed, produces the result • The JxplDisplay is evaluating the script and displaying the results Dr. Marty McClelland

  36. What's Going On? • Double-click on the JxplDisplay and deselect the “Evaluate Jxpl” parameter • This parameter tells JxplDisplay whether or not to evaluate the script that is generated Dr. Marty McClelland

  37. What's Going On? • Now when we run it, we see the actual script that is produced by the workflow • The script is written in XML using a language developed at UNCW called JXPL Dr. Marty McClelland

  38. A Little Bit about JXPL • JXPL is based on LISP • The corresponding LISP to the JXPL on the right looks like: (* (+ (23 6) -2) Dr. Marty McClelland

  39. A Little Bit about JXPL • Why? • XML is used to transport data between web/grid services • XML opening/closing tags <-> LISP opening/closing parens • Everything is either an atom or a list (functions, Data Structures) Dr. Marty McClelland

  40. GridNexus and JXPL with Grid Services • create workflows that can make use of web and grid services • implement primitives in JXPL that are generic web and grid clients • inspect the WSDL of the service to determine its interface Dr. Marty McClelland

  41. GSClient module • GSClient module : whereby the user can specify the factory URL, the instance name of the service, the stub class, and the port type • primitive uses the OGSIServiceGridLocator to find the grid service and invoke the appropriate method with the arguments Dr. Marty McClelland

  42. GridNexus and OGSA-DAI • OGSA-DAI Grid Data Services are designed so that the output of one can be delivered to another • GridNexus allows non-programmers to create JXPL to control GDS interaction in a graphical environment Dr. Marty McClelland

  43. Using OGSA-DAI grid service clients Dr. Marty McClelland

  44. Molecular biology workflow created in GridNexus Dr. Marty McClelland

  45. Molecular chemistry workflow in GridNexus Dr. Marty McClelland

  46. Build the Library • Identify tasks in scientific workflows • Investigate existing open source modules for possible integration with GridNexus • Design for reuse incorporating appropriate standards • Implement library module in GridNexus Dr. Marty McClelland

  47. GridNexus • Release 1.0.0 download available www.gridnexus.org Dr. Marty McClelland

  48. Acknowledgments • UNC-OP for funding the UNCW Grid Initiative Proposal: “Fostering Undergraduate Research Partnerships through a Graphical User Environment for the North Carolina Computing Grid,” Dr. Ron Vetter, PI • Co-PIs:Dr. Rebecca S. Boston, NCSU; Dr. Anthony Wilkinson, WCU; Dr. Marilyn McClelland, NCCU; Dr. Libero Bartolotti, ECU; Ms. Judy Porter, CFCC. • UNCW Participants: Computer Science: Dr. Ron Vetter, Dr. Clayton Ferner, Dr. David Berman, and Dr. Tom Hudson. Information Technology Systems: Dr. Bob Tyndall and Mr. Bobby Miller. Mathematics and Statistics: Dr. Jeff Brown. Chemistry and Biochemistry: Dr. Ned H. Martin. Biological Sciences: Dr. Ann Stapleton Information Systems and Operations Management: Dr. Tom Janicki. • UNCW Computer Science students working on the Chemistry portal: Tristan Carland, Jerry Martin, Andrew Martin Dr. Marty McClelland

  49. Acknowledgments • Grid Computing: Harnessing Underutilized Resources Dr. Ned H. Martin • GridNexus UNCW GUI for Workflow Management Dr. Clayton Ferner • GridNexus: A Grid Services Scientific Workflow SystemJeffrey L. Brown, Clayton S. Ferner, Thomas C. Hudson, Ann E. Stapleton, Ronald J. Vetter, Andrew Martin, Jerry Martin, Allen Rawls, William J. Shipman, and Michael Wood Dr. Marty McClelland

  50. Dr. Marty McClelland

More Related