220 likes | 334 Vues
OGSA-DAI data access and integration. NERC GridGIS workshop eSI, 1 February 2006. Overview. The Data Deluge challenges of increasing data availability benefits of bringing data together OGSA-DAI overview use as a data integration base layer. Data Services: challenges to management.
E N D
OGSA-DAIdata access and integration NERC GridGIS workshop eSI, 1 February 2006
Overview • The Data Deluge • challenges of increasing data availability • benefits of bringing data together • OGSA-DAI • overview • use as a data integration base layer NERC GridGIS workshop - 1 February 2006
Data Services: challenges to management • Scale • Many sites, large collections, many uses • Longevity • Research requirements outlive technical decisions • Diversity • No “one size fits all” solutions will work • Primary Data, Data Products, Meta Data, Administrative data, … • Many Data Resources • Independently owned & managed • No common goals • No common design • Work hard for agreements on foundation types and ontologies • Autonomous decisions change data, structure, policy, … • Geographically distributed • and I haven’t even mentioned security yet! NERC GridGIS workshop - 1 February 2006
What is a data service? • An interface to a stored collection of data • e.g. Google and Amazon • web services • But the data could be: • replicated • shared • federated • virtual • incomplete • Don’t care about the underlying representation • do care about the information it represents • Adding a service layer to existing data sources can improve composability NERC GridGIS workshop - 1 February 2006
Use Cases for Data Services • Data Filtering: • Single source producing large amounts of data distributed to many sites downstream • Data Discovery: • many sources, many query entry points in a linked system • Data Translation: • source to sink, conversion of data model / structure • Data Federation: • many sources, linked to provide view as a single source • Data Replication • full or partial copies to improve throughput • Data Integration (model aggregation) • e.g. integration of time variant data, streams, files • Data Integration (knowledge expansion) • forming links between databases to increase knowledge NERC GridGIS workshop - 1 February 2006
OGSA-DAI In One Slide • An extensible framework for data access and integration. • Expose heterogeneous data resources to a grid through web services. • Interact with data resources: • Queries and updates. • Data transformation / compression • Data delivery. • Customise for your project using • Additional Activities • Client Toolkit APIs • Data Resource handlers • A base for higher-level services • federation, mining, visualisation,… NERC GridGIS workshop - 1 February 2006
The OGSA-DAI Framework Application Client Toolkit OGSA-DAI service Engine XPath readFile SQLQuery XSLT GZip GridFTP Activities JDBC XMLDB File Data Resources MySQL SQL Server DB2 XIndice SWISS PROT Data- bases NERC GridGIS workshop - 1 February 2006
Intermediary • Simple intermediary • potential to accelerate development, logging, or filtering • Persistent intermediary • e.g. to allow efficient local indexing NERC GridGIS workshop - 1 February 2006
Redirector, Coordinator, Network • Allowing composition and decentralisation NERC GridGIS workshop - 1 February 2006
SQL SQL SQL SQL JDBC JDBC JDBC JDBC Extensibility Example OGSA-DAI service Engine SQLQuery SQLQuery Multiple SQL GDS JDBC MySQL NERC GridGIS workshop - 1 February 2006
browser EDINA OGC Service GIS Internet Oracle Map Retrieval: Current NERC GridGIS workshop - 1 February 2006
Basic client to demonstrate proof of concept EDINA SO-OGC OGC OGSA-DAI 1 GIS Oracle Client Map Retrieval: Grid Prototype NERC GridGIS workshop - 1 February 2006
Portlet Map Retrieval: Security • Exploit NGS infrastructure to provide secure access layer EDINA NGS Authentication Allowed users dn SO-OGC OGC ODS 1 GIS Oracle NERC GridGIS workshop - 1 February 2006
JDBC NGS Authentication Oracle Census ODS 1 SQL/XML SO-OGC OGC Portlet ODS 2 GIS Oracle SO-OGC ODS 3 Application data Map Retrieval: Integration • Exploit OGSA-DAI extensibility to add e.g. overlay NERC GridGIS workshop - 1 February 2006
OGSA-DAI / EDINA prototyping work • Stage 1: Using existing OGSA-DAI technology • Stage 2: Extending OGSA-DAI OGSA-DAI service Input Parameters URL GIS Client DeliverFrom URL GIS Activities Image/XML File HTTP Request WMS Server HTTP Data Resource HTTP Response NERC GridGIS workshop - 1 February 2006
3,4 reduce op_call (Blast) exchange hash_join (proteinId) reduce exchange reduce 1 2 table_scan (protein) table_scan termID=S92 (proteinTerm) Distributed Query Processing • Higher level services building on OGSA-DAI • specialised metadata extraction • Execute queries in parallel over multiple data resources • Queries mapped to algebraic expressions for evaluation • Parallelism represented by partitioning queries • Use exchange operators • Equality based joins in current release • supported types: long, integer, string, double and float NERC GridGIS workshop - 1 February 2006
DQP architecture NERC GridGIS workshop - 1 February 2006
Contributing to OGSA-DAI • Additional functionality: • Provide activities which implement specific functionality • Provide extra client functionality • Provide different security mechanisms • Provide higher level components and applications • Different levels of contributions • Based on OGSA-DAI? • Works with OGSA-DAI? • Part of OGSA-DAI? NERC GridGIS workshop - 1 February 2006
In the near future • A new version of the OGSA-DAI Engine • should look mostly the same externally • better support for concurrency, sessions and monitoring • Implementing new versions of specifications • DAIS Specifications • Key things that we will be addressing: • Performance • A Security Model which can be applied across platforms • Full Transactions framework, distributed transactions • More data integration facilities • Better abstraction over DBMS variation • Application centric queries • collaborating with other projects • Research projects looking at: • schema mapping • extended data resources NERC GridGIS workshop - 1 February 2006
Associated Meetings and Workshops • DIALOGUE Workshops (http://www.datagrids.org) • Data Integration Applications: Linking Organisations to Gain Understanding and Experience • Bringing together Data Integration middleware and application providers with users • Next one at NeSC: 9-10th February 2006 • http://www.nesc.ac.uk/esi/events/636/ • Next Generation Distributed Data Management (HPDC15, Paris) • http://www.isi.edu/~annc/distributedDataWorkshop.html • Data Management on Grids (VLDB’06, Seoul) NERC GridGIS workshop - 1 February 2006
Conclusions • The benefits of trying to integrate data are hindered by challenges such as heterogeneity, scale and distribution • A common data service layer should make data integration easier • OGSA-DAI provides an extensible, data service based framework which makes it easier to implement data integration • GIS data is amenable to integration using data services NERC GridGIS workshop - 1 February 2006
Further information • The OGSA-DAI Project Site: • http://www.ogsadai.org.uk • The DAIS-WG site: • http://forge.gridforum.org/projects/dais-wg/ • OGSA-DAI Users Mailing list • users@ogsadai.org.uk • General discussion on grid DAI matters • Formal support for OGSA-DAI releases • http://bugs.ogsadai.org.uk/ • OGSA-DAI training courses NERC GridGIS workshop - 1 February 2006