450 likes | 587 Vues
Macromolecular Structure Middleware. OpenMMS An Ontology Driven Architecture. Overview. The mmCIF Ontology OpenMMS Toolkit Macromolecular Structure (MMS) Metamodel Parser, XML SQL / Corba Servers and Clients Corba UML and the future. How do we “Enable” Science?.
 
                
                E N D
Macromolecular Structure Middleware OpenMMS An Ontology Driven Architecture
Overview • The mmCIF Ontology • OpenMMS Toolkit • Macromolecular Structure (MMS) Metamodel • Parser, XML • SQL / Corba Servers and Clients • Corba • UML and the future...
How do we “Enable” Science? • Promote well defined Macromolecular Structure (MMS) Specifications • Distribution – Open Interfaces • Now: • flat files • W3 browsing and searching • Future: • XML, SQL, CORBA
Why OpenMMS? • Allow programmers to more easily create efficient, high performance and robust applications. • A Java-only toolkit with that creates XML, CORBA and Relational DB representations of the mmCIF Macromolecular Structure Data. • Source code is publicly available so users can easily modify the metamodel or create an entirely new one.
What Do We Mean by an Ontology Driven Architecture? What do we mean by an Ontology? A bridge between Our World of Natural Language and the World of Machines.
mmCIF Dictionary and Data Files • Based on Ontology for Macromolecular Structure defined by the International Union of Crystallography • Replaces the older 80-Column PDB files • mmCIF Dictionary contains over 140 Category and 1600 Item definitions • Open, Extensible • Provides a well-defined reference standard for data distribution
mmCIF Parsers Applications XML Files mmCIF Data Files (Reference Standard) Relational Database Corba Server OpenMMS Toolkit Data Flow
Metamodel Information Flow mmCIF Dictionary mmCIF Ontology Metamodel Metamodel Framework Corba IDL, SQL Schema, XML DTD, Java Data Loaders JDBC Loaders
What can OpenMMS do? • PDBase program will load any or all PDB files into any SQL-92 compatible database (Oracle, mySQL, Sybase...) • Translate any PDB file into an XML file. • Contains Two Corba servers: • Reference server will cache and serve data read from PDB flat files. • DB server will cache and serve data read from a SQL database (very quickly...) • All Source code written in Java and publicly available.
Some Advantages of Using an Ontology Driven Architecture • Scales to very large Ontologies • More reliable and maintainable code • Transfer between representations • Scientific Correctness of representation • Help in maintaining backward compatibility
How does one actually represent an ontology?(OpenMMS Internal Metamodel Overview) Root Visitor Abstract Class Module Module Interface Struct Visitor Subclass Struct Struct Field Field
mmCIF Parsers • General Purpose, Low-level access to data • Parsers available in many languages • OpenMMS toolkit includes Java Parser • Uses “Builder” Design Pattern • An application subclasses Abstract Builder class and stores data into its data structures
MMS in XML • Large Flat Files (open and close tags) • Tables can be grouped by rows or columns • XML from SQL Query • Many requests from Web browsers don’t really need or want all the data • SW available from DB Vendors and ISVs for creating XML files from SQL result sets • Smaller files load faster
Relational DB Expression • SQL-92 Compatible • Schemas for all the standard DB vendors • Fast and Flexible Keyword searches • PDBase loader allows structures to be selectively loaded • Oracle Instance Tested • 14,556 Structures • 16GB, 88 Million Atom Records
A very high-level (and very-rough) classification of communication • Person-to-Person communication • email • Person-to-Machine communication • HTTP/HTML • Machine-to-Machine communication • CORBA, SQL, .NET, Soap • Not Communications -> Data Formats • XML, mmCIF (STAR), many more …
What is CORBA? Common Object Request Broker Architecture • Defines a family of open software interface specifications for distributed object computing. http://www.omg.org
What is an Object? “A Data Structure with an Attitude” Programs = Algorithms + Data Structure Object Oriented Programming Principle: Partition the parts of algorithms with the data structures they use
Side View of a Distributed Application Client E.g. a Java Applet Server Middle Ware Middle Ware E.g. Mainframe Computer Server IDL IDL Network Internet (TCP/IP)
The “Hourglass” view of the Internet Applications • OO High-Level Interface HTTP, Corba, .NET TCP, RTP,...  Reliable Bitsteam IP  Unreliable Datagrams Copper, Glass Radio Spectrum (ATM, Ethernet, V.90, SONET...)
Where is Corba? • Inside every Java Runtime Environment. • Commonly used in middle tier and backend (e.g. database) connections. • Open Source and Commercial Implementations Available • Usually buried deep inside the software • Difficult or impossible to tell when it is being used
What is Distributed Object Computing? • Extends the benefits of object-oriented technology across process and machine boundaries to encompass entire networks. • Attempts to make remote objects appear to programmers as if they were local objects in the same process. This is called location transparency.
Advantages of Distributed Object Computing • Easier (and faster) for programmers to create distributed applications • Increases Reliability • Increases Maintainability • Increases Portability • Increases Extensibility
The Alphabet Soup • OMG = Object Management GroupConsortium of 800+ companies founded in 1989. • IDL = Interface Definition Language
Shape of boundary is defined in IDL Boundaries, Interfaces • The key is to focus on boundaries, interfaces, how things fit together • Not on the internal details of how they’re built; assume that will be diverse & changing
The glue that binds parts together is the ORB Boundaries, Interfaces The Interface to an object can be distributed over a network Shape of boundary is defined in IDL
Corba Independence • Open Standard for Distributed Object Oriented Design • Independent of Hardware Platform • Independent of Operating System • Independent of Programming Language • Independent of Object Location
IDL Object Client IDL Object Request Broker • ORBs mediate between objects and things that use them (clients) Object Request Broker
Terminology • IIOP • The Internet Inter-ORB Protocol, defined in the Spec as a vendor-independent, wire-level network protocol on top of TCP/IP. This allows ORB implementations of different vendors to interoperate.
Java C++ Perl C Ada Java Corba / IIOP—Internet Inter-ORB Protocol VB ActiveX ORBs: Medium for Integration ORB ORB ORB
Corba Facilities:Industry Standards in Vertical Markets • Manufacturing • Finance • Life Sciences Research • C4I • Many others...
Using Corba to accessMacromolecular Structure Data • No Parsing of Flat Files • Direct Access to Binary Data Structures • Strongly Typed Data • Granularity of Access • Indices and Presence Flags Pre-computed • Highest Performance
OMG/LSR Macromolecular Structure Adoption Process • August 1999 RFP issued • March 2000 Initial Submission • September 2000 Revised Submission • February 2001 Adopted Spec by the OMG • 4Q 2001 OpenMMS LSR/MMS1.0 compliant implementation source code publicly available • February 2002 Approved as a Formal OMG Available Specification.
Using the CORBA MMS Server An excerpt from legacy PDB Formatted File ATOM Record (4hhb.ent) ... ATOM 6 CG1 VAL A 1 7.009 20.127 5.418 6.00 61.79 ... ATOM 7 CG2 VAL A 1 5.246 18.533 5.681 6.00 80.12 ... ATOM 8 N LEU A 2 9.096 18.040 3.857 7.00 26.44 ... ATOM 9 CA LEU A 2 10.600 17.889 4.283 6.00 26.32 ... ATOM 10 C LEU A 2 11.265 19.184 5.297 6.00 32.96 ... ATOM 11 O LEU A 2 10.813 20.177 4.647 8.00 31.90 ... ATOM 12 CB LEU A 2 11.099 18.007 2.815 6.00 29.23 ... ATOM 13 CG LEU A 2 11.322 16.956 1.934 6.00 37.71 ... ATOM 14 CD1 LEU A 2 11.468 15.596 2.337 6.00 39.10 ... ATOM 15 CD2 LEU A 2 11.423 17.268 .300 6.00 37.47 ... ...
LSR/MMS “ATOM Record” DsLSRMacromolecularStructure.idl excerpt: struct AtomSite { string id; IndexId type_symbol; AtomIndex label; IndexId label_entity; VectorXYZ cartn; float occupancy; float b_iso_or_equiv; };
Example Code and Resulting Output Entry e = entryFactory.get_entry_from_id(”4hhb"); AtomSite[] a = e.get_atom_site_list(); for (int i = 0; i < a.length; i++) { System.out.println(a[i].id + " " + a[i].type_symbol.id + " (" + a[i].cartn.x + ", " + a[i].cartn.y + ", " + a[i].cartn.z + ")"); } produces: 1 N (11.065, 7.352, 9.598) 2 C (12.436, 7.764, 9.902) 3 C (12.883, 7.09, 11.208) 4 O (12.088, 7.0, 12.147) 5 C (12.611, 9.264, 10.06) ...
What are the alternatives to Corba? • TCP/IP Sockets - Byte stream • DCOM, COM++, OLE, .NET (Microsoft Only) • DCOM   Corba Bridges are available from several vendors • SOAP (Simple Object Access Protocol) • XML Based
Unified Modeling Language – UMLWhat do all those arrows and boxes Mean? • Schematic Language for Defining SW • Graphics Representations • UML = Things, Relations and Diagrams • 9 types of Diagrams • The most commonly used diagram is the “Class Diagram”
Identifier ModificationDateList EntryIdList EntryId UML Class Diagram Example EntryFactory get_version() get_entry_id_list() get_entry_modification_dates() native_formats_supported() get_native_entry_representation() * * ModificationDate Entry_id : EntryId date: TimeBase::TimeT
UML Class Diagram Basics  Underlined for Class Instances, Italics for Abstract Classes Class_Name var1: Type var2: Type  Variables method1() method2() method3() • Methods Details may be omitted if not important
UML Relationships Dependency 0..1 * Association Generalization (Inheritance) Aggregation *
Identifier ModificationDateList EntryIdList EntryId UML Example EntryFactory get_version() get_entry_id_list() get_entry_modification_dates() native_formats_supported() get_native_entry_representation() * * ModificationDate Entry_id : EntryId Date : TimeBase::TimeT
XMI: XML Metadata Interchange • UML is a graphical representation; need some way to exchange UML models between applications • XMI is used to store and transmit UML models • XML based • Defines XML tags for classes, relationships between classes etc.
OMG MDA • Platform Independent Models (PIMs) that define the interface are defined in UML • The PIMs are translated to Platform Specific Models (PSMs) such as Corba, SOAP, .NET or XML Schemas • The Corba servers and clients may be the same, but now the interface is defined in UML and the IDL is then generated from the UML
MDA Platform Independent toPlatform Dependent Translation UML .NET Corba SOAP XML
Phil Bourne John Westbrook David Benton Karl Konnerth Lynn TenEyck Thanks and Acknowledgments