1 / 13

Toward Replication in Grids for Digital Libraries with Freshness and Correctness Guarantees *

Toward Replication in Grids for Digital Libraries with Freshness and Correctness Guarantees *. Fuat Akal , Heiko Schuldt and Hans-Jörg Schek <fuat.akal¦heiko.schuldt>@unibas.ch, schek@inf.ethz.ch University of Basel, Computer Science Department Bernoullistr 16, CH-4056, Basel, Switzerland

haroldn
Télécharger la présentation

Toward Replication in Grids for Digital Libraries with Freshness and Correctness Guarantees *

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Toward Replication in Grids for Digital Libraries with Freshness and Correctness Guarantees* Fuat Akal, Heiko Schuldt and Hans-Jörg Schek <fuat.akal¦heiko.schuldt>@unibas.ch, schek@inf.ethz.ch University of Basel, Computer Science Department Bernoullistr 16, CH-4056, Basel, Switzerland 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 * The work has been partly supported by the EU in the 6th framework programme within the project DILIGENT (contract No. IST-2003-004260). <<DIgital Library Infrastructure on Grid ENabled Technology >>

  2. Satellite pictures of Mediterranean Sea are continuously taken and ... stored as complex documents in a Digital Library (DL). A typical activity is to generate periodical reports. Storage Properties Storage Properties Image Features Image Features Image Features <DIMAP_DOCUMENT> <DATASET>MER_RR__2P</DATASET> <INSTRUMENT>MER</INSTRUMENT> … <LONMIN_INT>17000</LONMIN_INT> <LATMIN_INT>12000</LATMIN_INT> <LONMAX_INT>22000</LONMAX_INT> <LATMAX_INT>13500</LATMAX_INT> <COVER_REGIONS>World</COVER_REGIONS> <OVERLAP_REGIONS> World Europe Bigger_Europe Smaller_Europe Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East Portugal </OVERLAP_REGIONS> ... </DIMAP_DOCUMENT> Metadata as XML Documents Example Scenario Image Similarity Queries Earth Observation Simple Boolean Queries 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007

  3. Monitoring of the Mediterranean Sea There are some busy oil terminals in the region Oil tankers keep floating in the sea Potential oil spill into the sea Watching the Environment Closely Earth Observation satellite images, metadata, image features... Data Grid „I am interested inGreek coasts as of last week“ „FreshTurkish water please“ Both are extremely concerned about the environment! Scientist 1in AthensGreece Scientist 2in AntalyaTurkey 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007

  4. Desired Replica Management in the Grid Assumption: Whole data is collected at a single node, e.g. ESA in Italy Replication at a higher level, e.g. collections, subcollections. satellite images, metadata, image features... EntireMediterranean Data Grid storagenode 0 Dynamic decision on when/where to create replicas, e.g. sn1 becomes a hot spot Create Replica TurkishCoasts GreekCoasts sn2 Automatic selection of the best replica from the user‘s location sn1 GreekCoasts Scientists may also 1) write back their reports and/or 2) create versions of documents or annotate sn3 Freshness and correctness guarantees on accessed data is insured, e.g. „I want uptodate data“ Sophisticated replication mechanism is required! Scientist 1in AthensGreece Scientist 3in ThessalonikiGreece Scientist 2in AntalyaTurkey 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007

  5. Outline • Digital Library built atop a grid middleware • Rich variety, structure, volume of data, e.g. traditional documents, complex multimedia objects • Simple Boolean queries as well as sophisticated multi-feature similarity queries • Consistent access to up-to-date data may be essential • Rest of the talk is... • Replication in a DB Cluster • Transition from a DB cluster to the Grid • DILIGENT Replication Architecture • Conclusions and Outlook 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007

  6. Replication in a DB Cluster (PDBREP) • Available replication solutions for grid environments do not meet all of the desired properties just mentioned, e.g. freshness and correctness. • In our previous work [VLDB2005], we devised a replication protocol for database clusters named PDBREP. • It provides already some properties of what we call desired replica management in the Grid, e.g. freshnes, higher replication granularity. • Our approach in this work is to start with this protocol and adapt it to the grid. PDBREP stands for PowerDB Replication, which was a a project conducted at ETH Zurich partially supported by Microsoft. 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007

  7. Q U w(a) r(a) r(b) a,c b,c d b,d + + Replication in a DB Cluster (PDBREP) U: update(a) Q: query(a, b, fr) Coordination Middleware fr : freshness requirement, e.g. „I am fine with 2 minutes old data“, „I want fresh data“ etc. distributed query execution Global Log Local Update Queue a,b,c,d Continuous Update Broadcast Continuous Update Propagation Transactions(only, when the node is idle) Refresh Transactions(on-demand) Update Node(s) Read-only Nodes 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007

  8. Update Node(s) Read-only Nodes Transition to the Grid • We still distinguish update and read-only nodes • Potentially several update nodes • We still assume that all updates are serialized into a global log • Broadcast of updates not feasible, replicas subscribe for changes instead • Service Oriented Architecture • More nodes which are heterogeneous • Failuresare more likely to happen Updates Queries Coordination Middleware Global Log 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007

  9. Replication Granularity • The unit of replication is called a DataSet (DS) • A DataSet can be a collection of documents, a subcollection or as small as a single document. • Rule based definition: information on a specific region, documents not older than 30 days, created between date1 and date 2, etc... EntireMediterranean Collection of Satellite Images and its metadata GreekCoasts TurkishCoasts Subcollection 1 Subcollection 2 DS2 DataSet1 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007

  10. DILIGENT Grid Replication Architecture DS1 : <1, 0.7> DS2 : <2, 0.6>,<3, 0.7> DS3 : <5, 0.6> DS4 : <4, 0.6> Freshness Repository DS1 : 1 DS2 : 2,3 DS3 : 5 DS4 : 4 Replica Catalog DS1 DS2 DS3 DS4 (2.2) sn 1 (2.1) Locate bestReplicas .... TSx, Wx, DSy ... DS1 Update Queue Storage Node 4 RSS RMS continuous propagation subscription FTS DS4 sn 2 sn 3 Queue DS2 DS2 sn 5 DS3 SN1 : 50% SN2 : 25% SN3 : 60% SN4 : 30% SN5 : 50% Load Repository (3) Read Data (2.3) Access History (4) Log (1) Read(DS2(x), DS4(y), 0.6) Client 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007

  11. Conclusions & Outlook • We presented the first steps of our on-going work whose ultimate goal is to come up with a fully integrated and self-managing replication subsystem for the Grid • We want to adapt an existing database replication mechanism, i.e. PDBREP from database clusters to data grids • This looks feasible: • The infrastructure related assumptions like broadcasting of changes to replicas can be replaced by a subscription mechanism easily • Additional components presented in the envisioned architecture to facilitate scheduling of queries can be included in the PDBREP without requiring major changes. • Implementation of the DILIGENT replication on top of gLite is still ongoing 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007

  12. Thank you!.. Questions? 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007

  13. References • DILIGENT: A DIgital Library Infrastructure on Grid ENabled Technology. http://www.diligentproject.org/. IST-2003-004260 • F. Akal, C. T¨urker, H.-J. Schek, Y. Breitbart, T. Grabs, and L. Veen. Fine-Grained Replication and Scheduling with Freshness and Correctness Guarantees. In VLDB, pages 565–576, 2005. 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007

More Related