XML Metadata Services

XML Metadata Services SKG06 http://www.culturegrid.net/SKG2006/ Guilin China November 3 2006 Mehmet S. Aktas, Sangyoon Oh, Geoffrey C. Fox and Marlon Pierce Presented by Geoffrey Fox: Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 gcf@indiana.edu http://www.infomall.org

Different Metadata Systems • There are many WS-* specifications addressing meta-data defined broadly • WS-MetadataExchange • WS-RF • UDDI • WS-ManagementCatalog • And many different implementations from (extended) UDDI through MCAT of the Storage Research Broker • And of course representations including RDF and OWL • Further there is system metadata (such as UDDI for core services) and metadata catalogs for each application domain such as WFS (Web Feature Service) for GIS (Geographical Information Systems) • They have different scope and different QoS trade-offs • e.g. Distributed Hash Tables (Chord) to achieve scalability in large scale networks • WS-Context • ASAP • WBEM • WS-GAF

Different Trade-offs • It has never been clear how a poor lonely service is meant to know where to look up meta-data and if it is meant to be thought up as a database (UDDI, WS-Context) or as the contents of a message (WS-RF, WS-MetadataExchange) • We identified two very distinct QoS tradeoffs • 1) Large scale relatively static metadata as in (UDDI) catalog of all the world’s services • 2) Small scale highly dynamic metadata as in dynamic workflows for sensor integration and collaboration • Fault-tolerance andability to support dynamic changes with few millisecond delay • But only a modest number of involved services (up to 1000’s in a session) • Need Session NOT Service/Resource meta-data so don’t use WS-RF

Hybrid WS-Context ServiceArchitecture and Prototype

WS-Context compliant XML Metadata Services • We designed and built a WS-Context compliant XML Metadata services supporting distributed or central paradigms. This service, • supports extensive metadata requirements of rich interacting systems, such as • correlating activities of widely distributed services, EX: workflow style GIS Service Oriented Architectures, AND • optimizing Grid/Web Service messaging performance, EX: mobile computing environment, AND • managing dynamic events especially in multimedia collaboration, EX: collaboration Grid/Web service applications, AND • providing information to enable session failure recovery capabilities.

Context as Service Metadata • We define all metadata (static, semi-static, dynamic) relevant to a service as “Context”. • Context can be associated to a single service, a session (service activity) or both. • Context can be independent of any interaction • slowly varying, quasi-static context • Ex: type or endpoint of a service, less likely to change • Context can be generated as result of service interactions • dynamic, highly updated context • information associated to an activity or session • Ex: session-id, URI of the coordinator of a workflow session

Hybrid XML Metadata Services –> WS-Context + extended UDDI • We combine functionalities of these two services: WS-Context AND extendedUDDI in one hybrid service to manage Context (service metadata). • WS-Context controlling a workflow • (Extended) UDDI supporting semantic service discovery • This approach enables uniform query capabilities on service metadata catalog. • http://www.opengrids.org/wscontext/index.html

Note that all Replica Servers are identical in their capabilities. This figure illustrates the system from the perspective of one Replica Server. Client Client WSDL WSDL WSDL WSDL Hybrid-WSContext Service WSDL JDBC Database Topic Based Publish-Subscribe Messaging System WSDL Extended UDDI Service WSDL WSDL JDBC Hybrid-WSContext Service Hybrid-WSContext Service Database JDBC JDBC Database Database Distributed Hybrid WS-Context XML Metadata Services HTTP(S) HTTP Subscriber Publisher Replica Server-1 Replica Server-2 Replica Server-N

Key Features • Publish-Subscribe exploited to support replicated storage e.g. • Initial storage of context • Update to make copies consistent • Access context • Use of Javaspaces cache running in memory on each WS-Context node • Naturally supports Get Context by name requests • Backed up every ~30 milliseconds to a MySQL database • If query can be satisfied by Javaspaces cache, the query can be satisfied in < 1ms plus the few milliseconds of Web service overhead

TupleSpaces-Based Caching Strategies • TupleSpaces is a communication paradigm • asynchronous communication • pioneered by David Gelernter • first described in Linda project in 1982 at Yale • communication units are tuples • data-structure consisting of one or more typed fields • Hybrid WS-Context Service employs/extends TupleSpaces: • all memory accesses. overhead is negligible (less than 1msec. for inqueries) • data sharing - mutual exclusive access to tuples • associative lookup - content based search, appropriate for key-based caching • temporal, spatial uncoupling of communicating parties • e.g. a tuple: ("context_id", Context). This indicates a tuple with two fields: a) a string, "context_id" and b) a Java object, "Context". • back-up with frequent time intervals for fault-tolerance

A general performance evaluation on the most recent implementation of the Hybrid WS-Context Service

Prototype Evaluation - I • Performance Experiment: We investigate the practical usefulness of the system by exploring following research questions. • What is the baseline performance of the hybrid WS-Context Service implementation for given standard operations? • What is the effect of the network latency on the baseline performance of the system? • How does the performance compare with previous metadata management solutions?

1 user/1000 transactions Expeditor 1 user/1000 transactions Expeditor single threaded single threaded Publishing Querying Module Publishing Querying Module WSDL WSDL WSDL WSDL JDBC Handler JDBC Handler WS-Context Client Hybrid-WSContext Service WS-Context Client Hybrid-WSContext Service Test-2. Hybrid-WSContext inquiry/publication without database access Test -3. Hybrid-WSContext inquiry/publication with database access 1 user/1000 transactions 1 user/1000 transactions single threaded single threaded Extended UDDI Server Engine Dummy Server WSDL WSDL WSDL WSDL extended UDDI Client Extended UDDI Server Client Dummy Server Test-1. Dummy Server Test-4. extended UDDI inquiry/publication PERFORMANCE TEST

Test 2-Test 1 is Javaspaces overhead The experimental study indicates that the proposed system can provide comparable performance for standard operations with the existing metadata management services.

Prototype Evaluation - II • Scalability Experiment: We investigate the scalability of the system by finding answers to the following research questions. • What is the performance degradation of the system for standard operations under increasing message sizes? • What is the performance degradation of the system for standard operations under increasing message rates? • What is the scalability gain (both in numbers and in performance) of moving from a centralized system to a distributed system under the same workload?

Expeditor Expeditor Publishing Querying Module Publishing Querying Module WSDL WSDL JDBC Handler JDBC Handler Hybrid-WSContext Service Hybrid FTHPIS-WSContext Service 5 Client distributed to cluster nodes 1 to 5, with each running 1 to 15 threads Thread Pool Thread Pool WSDL WSDL HTTP(S) SCALABILITY TEST-1 1 user/100 transactions single threaded WSDL WS-Context Client TEST-1 - Hybrid-WSContext inquiry/publication with increasing message sizes TEST-2 - Hybrid-WSContext inquiry/publication with increasing message rates (# of messages per second)

=> OGSA-DAI Results are from • http://www.ogsadai.org.uk/documentation/scenarios/-performance • Both OGSA-DAI and WS-Context testing cases were conducted on a tightly coupled network. The results indicate that the cost of inquiry and publication operations remains the same, as the context’s payload size increases from 100Bytes up to 10KBytes. We also see that the hybrid WS-Context presents better performance than OGSA-DAI approach but latter technology more powerful

The results indicate that the proposed system can scale up to 940 simultaneousquerying clients or 222 simultaneouspublishing clients where each client sending one query per second, for small size context payloads with 30 milliseconds fault tolerance. Multi-core hosts will improve performance dramatically

1 Chip 8 Core/chip 2 Chips 1 Core/chip 1 Chip 6 Core/chip Opteron 2 Chips 2 Core/chip Xeon 4 Cores is 3000 messages per second; about one message per millisecond per core for Opteron; one message per 2 ms for Sun Niagara core

5 Client distributed to cluster nodes 1 to 5, with each running 1 to 15 threads firing messages to randomly selected servers. node-1 node-1 node-1 node-1 node-5 node-5 node-5 node-5 node-5 Thread Pool Thread Pool WSDL WSDL node-2 node-4 node-2 node-3 node-3 node-3 2 3 4 5 1 DISTRIBUTION TEST HTTP(S) • We investigate scalability when moving from a centralized server to a distributed one under heavy workloads. • Numbered rectangle shapes correspond to an N-node FTHPIS system with various Publish-Subscribe topologies (this does NOT affect performance) • 5 different FTHPIS system tested when N range from 1 to 5 under the same workload. • At each testing case, same volume of data is evenly distributed among the nodes.

Non-optimal caching algorithm as does database access BEFORE Publish-Subscribe. Reversingthis choice should lead to throughput Linear in #nodes Pub-Sub overhead ~ 2ms The results indicate that the scalability of metadata store can be increased when moving from a centralized service to a distributed system.

Prototype Evaluation - III • Fault Tolerance Experiment: We investigate the empirical cost of having fault-tolerance by finding answers to the following research questions. • What is the cost of the fault-tolerance in terms of execution time of standard operations on a tight cluster? • How does the cost of fault-tolerance change when the replica servers separated with significant network distances?

FAULT-TOLERANCE TEST

FAULT-TOLERANCE EXPERIMENT TEST BED

FAULT-TOLERANCE TEST RESULTS The results point out the inevitable trade-off between the fault-tolerance (degree of replication or high availability of data) and performance. The lower the level of fault-tolerance, the higher the performance would be for publication operations. These results also indicated that, high degree of replication could be succeeded (by utilizing an asynchronous communication model such as publish-subscribe paradigm) without increasing the cost of fault-tolerance.

An Application Case Scenarioand an application-specific performance evaluation of the Hybrid WS-Context Service

Application – Context Store usage in communication of mobile Web Services • Handheld Flexible Representation (HHFR) is an open source software for fast communication in mobile Web Services. HHFR supports: • streaming messages, separation of message contents and usage of context store. • http://www.opengrids.org/hhfr/index.html • We use WS-Context service as context-store for redundant message parts of the SOAP messages. • redundant data is static XML fragments encoded in every SOAP message • Redundant metadata is stored as context associated to service conversion in place • The empirical results show that we gain 83% in message size and on avg. 41% on transit time by using WS-Context service.

Optimizing Grid/Web Service Messaging Performance The performance and efficiency of Web Services can be greatly increased in conversational and streaming message exchanges by removing the redundant parts of the SOAP message.

Performance with and without Context-store Summary of the Round Trip Time (TRTT) • Experiments ran over HHFR • Optimized message exchanged over HHFR after saving redundant/unchanging parts to the Context-store • Save on average 83% of message size, 41% of transit time

System Parameters • Taccess: time to access to a Context-store (i.e. save a context or retrieve a context to/from the Context-store) from a mobile client • TRTT: Round Trip Time to exchange message through a HHFR channel • N: number of simultaneous streams supported by stream summed over ALL mobile clients • Twsctx: time to process setContext operation • Taxis: time consumed for Axis process • Ttrans: transmission time through network • Tstream: stream length

Context-store: System Parameters

Summary of Taxis and Twsctx measurements Taccess = Twsctx + Taxis + Ttrans Data binding overhead at Web Service Container is the dominant factor to message processing

Performance Model and Measurements • Chhfr = nthhfr + Oa + Ob • Csoap = ntsoap • Breakeven point: nbe thhfr + Oa + Ob = nbe tsoap Oa(WS) is roughly 20 milliseconds Oa : overhead for accessing the Context-store Service Ob : overhead for negotiation

nbe String Concatenation • Measure the total time to process stream • Independent variables • Number of messages per stream • Size of the message

XML Metadata Services

XML Metadata Services

Presentation Transcript

Automating Metadata Services

XML Web Services

Rich descriptive metadata in XML: MODS as a metadata scheme

XML and Metadata

Metadata, Structured Documents, and XML

OKC Tools for XML Metadata Management

Metadata Acquisition with XML

XML Web Services

XML Web Services

XML Web Services

Workshop on XML-Based Library Applications 2. XML and Library Metadata

Millennium and XML: Repurposing and Customizing Metadata

CENG454 – XML and XML Web Services

XML Web Services

Manage Scientific Metadata Using XML

Metadata and XML

Automating Metadata Services

XML Metadata Services

Understanding Metadata Tagging Services