1 / 8


context. High-level data access and integration services are needed if applications that have data with complex structure and complex semantics are to benefit from the GRID .

Télécharger la présentation


An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. context • High-level data access and integration services are needed if applications that have data with complex structure and complex semantics are to benefit from the GRID. • Standards for data access are emerging, and middleware products that are reference implementations of such standards are already available. • Distributed query processing technology is one approach to delivering (1.) given the availability of (2.). EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)

  2. OGSA-DQPgoals • To benefit from homogeneous access to heterogeneous data sources [OGSA-DAI]. • To benefit from Grid abstractions for on-demand, transparent allocation of resources required for a task [OGSA/OGSI/GT3]. • To provide transparent, implicit parallelism and distribution. [Polar*] • To orchestrate the composition of data retrieval and analysis servicesusing query mechanisms. • To expose this orchestration capability as a Grid data service. EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)

  3. OGSA-DQPinnovations • OGSA-DQP dynamically allocates evaluators to do work on behalf of the mediator. • All available nodes can be allocated for query evaluation (not just the nodes with data sources) • A distributed query execution plan is resourced on the fly • This allows for runtime circumstances to be taken into account when the optimiser decides how to partition and schedule. • The query plan is the outcome of optimising a declarative service orchestration expressed as a query. • OGSA-DQP uses a parallel physical algebra: most mediator-based query processors do not. EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)

  4. Exposes to clients Grid Distributed Query Services (GDQSs) that: interact with clients; find and retrieve service descriptions; parse, compile, partition and schedule the query execution over a union of distributed data sources. Coordinates the GQESs into executing the plan The query plan is an orchestration of GQESs Coordinates transparently Grid Query Evaluation Services (GQESs) that: implement the physical query algebra; implement the query execution model and semantics; run a partition of a query execution plan generated by a GDQS; interact with other GQESs/GDSs/WSs but not with clients. OGSA-DQPprovides two grid services EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)

  5. <?xml version="1.0" encoding="UTF-8"?> <databaseSchema xmlns=""> <logicalSchema> <table name="goterm"> <column fullName="goterm_id" length="32" name="id"> <sqlTypeName>varchar</sqlTypeName> <sqlJavaTypeID>12</sqlJavaTypeID> </column> <column fullName="goterm_type" length="55" name="type"> <sqlTypeName>varchar</sqlTypeName> <sqlJavaTypeID>12</sqlJavaTypeID> </column> <column fullName="goterm_name" length="255" name="name"> <sqlTypeName>varchar</sqlTypeName> <sqlJavaTypeID>12</sqlJavaTypeID> </column> <primaryKey> <columnFullName>id</columnFullName> </primaryKey> </table> </logicalSchema> <physicalSchema> <hostMachine></hostMachine> <database join_buffer_size="131072" max_join_size="4294967295"> <physTable avgRowLength="67" dataLength="766784" indexLength="126976" name="goterm" rowFormat="Dynamic" rows="11369"/> </database> </physicalSchema> <GDSFHandle>http://phoebus.cs.man.ac.uk:9090/ogsa/services/ogsadai/GridDataServiceFactory</GDSFHandle> </databaseSchema> <?xml version="1.0" encoding="UTF-8"?> <Partitions> <Partition> <evaluatorURI></evaluatorURI> <Operator operatorID="0" operatorType="TABLE_SCAN"> <tupleType> <type>goterm</type> <name>goterm.OID</name> <type>string</type> <name>goterm.id</name> <type>string</type> <name>goterm.type</name> <type>string</type> <name>goterm.name</name> </tupleType> <TABLE_SCAN> <dataResourceName> goterms </dataResourceName> <GDSHandle></GDSHandle> <tableName> goterms </tableName> <predicateExpr> <predicate> <comparativeOperator>LIKE</comparativeOperator> <leftOperand name=" goterm.id" type="13"/> <rightOperand name=" GO:0000%" type="16"/> </predicate> </predicateExpr> </TABLE_SCAN> </Operator> . . . </Partition> . . . </Partitions> Brief tour: an illustration <?xml version="1.0" encoding="UTF-8"?> <GDQDataSourceList xmlns="http://dqp.ogsadai.org.uk/schema/gdqs"> <importedDataSource> <GDSFactoryHandle>http://phoebus.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle> <GDSFactoryHandle>http://rpc676.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle> <GDSFactoryHandle>http://mygrid.ncl.cs.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle> </importedDataSource> <importedService> <wsdlURL>http://phoebus.cs.man.ac.uk:9090/axis/services/EntropyAnalyserService?WSDL</wsdlURL> </importedService> </GDQDataSourceList> EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)

  6. The Demonstration:Configuring the DQP Select DQP Factory Select Data Sources Select Web Services Import Metadata EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)

  7. Given two DBMSs and one analysis tool (e.g., a WS): Goterm to a GO Gene Ontology running as a remote mySQL DB, proteinSequence yeast protein sequences, EntropyAnalyser (information Content analyser); We can obtain the information content of protein sequences of a certain kind specified by certain gene ontology terms: select p.ORF, go.id, calculateEntropy(p.sequence) from p in protein_sequences, go in goterms, pg in protein_goterms where go.id=pg.GOTermIdentifier and p.ORF=pg.ORF and p.ORF like "YBL06%" and go.id like "GO:0000%"; The Demonstration :Example Query • Then, OGSA-DQP acts as an enactor of a declarative orchestration of services on the Grid: Parallelized on nodes 1 & 2 Partition boundaries EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)

  8. where to find out more: software OGSA-DQP Grid middleware to query distributed data sources www.ogsadai.org.uk/dqp OGSA-DAI Grid middleware to interface with data(bases) www.ogsadai.org.uk/ Globus Toolkit Open-source implementation of OGSA/OGSI www.globustoolkit.org/ EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)

More Related