1 / 14

OGSA-DAI Requirements Gathering Exercise

OGSA-DAI Requirements Gathering Exercise. 2 nd DIALOGUE workshop eSI, 9-10 February 2006. OGSA-DAI Requirements Gathering. Aims learn more about the data access and integration challenges that other projects are facing

dinesh
Télécharger la présentation

OGSA-DAI Requirements Gathering Exercise

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OGSA-DAI Requirements Gathering Exercise 2nd DIALOGUE workshop eSI, 9-10 February 2006

  2. OGSA-DAI Requirements Gathering • Aims • learn more about the data access and integration challenges that other projects are facing • use this information to inform the future development of the OGSA-DAI software • Timescale • Nov 2005 – Jan 2006 • Gatherers • Ally Hume • Amy Krause • Tom Sugden 2nd DIALOGUE workshop - 9-10 February 2006

  3. Projects • AstroGrid • (www.astrogrid.org) - distributed queries over large astronomy databases. • Automed and ISpider • (www.doc.ic.ac.uk/automed/) and (www.ispider.man.ac.uk) – model-based data integration and Grid-based informatics platform for proteomics. • CancerGrid • (www.cancergrid.org) – storage and analysis of distributed data containing clinical trial and lab data. • ESSC • (www.nerc-essc.ac.uk[MA1]) – environmental and atmospheric simulations. • Gold • (www.goldproject.ac.uk) – provides infrastructure for virtual organisations. • NTRAC • (www.ntrac.org.uk) – similar to CancerGrid. 2nd DIALOGUE workshop - 9-10 February 2006

  4. Structure of Meeting Reports • Data • the kind of data that the project is concerned with, including the structure, quantity and types of data resource. • Queries • the types of queries that are performed against this data, including the query languages used and the typical size of result sets. • The problem • the main problems that the project are currently facing with regards to data access and integration. • What Can OGSA-DAI Provide? • the functionality that the project would like OGSA-DAI to provide. • Checklist • summarises the importance of various aspects of data access and integration for the project. 2nd DIALOGUE workshop - 9-10 February 2006

  5. AstroGrid • a number of distributed databases, each of which contains astronomical data captured from different modalities • Almost all the tables in these databases contain a spatial coordinate of each feature and some numerical attributes associated with that feature. • want to do distributed queries using their algorithmic domain-specific joins. 2nd DIALOGUE workshop - 9-10 February 2006

  6. AutoMed and ISpider • middleware to transform schemas from different data sources (relational databases, XML documents, etc.) and evaluate distributed queries expressed in their own IQL language. • By creating a path of schema-transformations, it is possible to federate multiple data sources so that they appear as a single data source to the user • how to optimise distributed queries using metadata such as data size, occurrence of indexes, performance rates, etc. • how to fit AutoMed into a grid architecture 2nd DIALOGUE workshop - 9-10 February 2006

  7. CancerGrid • By analysing laboratory data and correlating it with hospital and trials data, it is hoped that new subsets of patients can be discovered who respond best to particular treatments • Security is a major concern because many of the owners of data are aware of the value of their data and consequently are concerned about who has access to it. • A good means of transforming trial forms (XML documents) into a format suitable for automatic insertion into relational tables is required. 2nd DIALOGUE workshop - 9-10 February 2006

  8. ESSC • dealing with large data sets of between 2 to 3 terabytes, stored mostly on a single machine. The user requests portions of data, often assembled from various files. • Uniform web service interfaces are provided for accessing data sets using the standard APIs associated with the binary data file formats that are used (netCDF, GRIB, HDF, etc.). • The queries used by ESCC are currently synchronous which causes request timeout problems when the resulting datasets are large. Sceptical of current WS-Notification implementations that require open ports on client machines. 2nd DIALOGUE workshop - 9-10 February 2006

  9. GOLD • develop an infrastructure to facilitate collaboration within virtual organisations • Data storage services will be used for capturing interactions amongst parties of a VO in order to facilitate auditing and VO-playback. • Data analysis services will be used for performing particular types of analysis of data existing mostly in relational database back-ends. • primary concern is managing security policies and service access rights of different types of user dynamically. 2nd DIALOGUE workshop - 9-10 February 2006

  10. NTRAC • build platforms to bring different systems together • Many of the data resources that they are accessing are stored in private networks (e.g. NHS patient information) with no open gateway to the public. • Researchers want to mine the data to find people to recruit into studies. 2nd DIALOGUE workshop - 9-10 February 2006

  11. Prioritised Requirements 2nd DIALOGUE workshop - 9-10 February 2006

  12. Notes on requirements • Prioritised based on a judgement of their importance to the various projects that were investigated. • Whether or not they are within the scope of the OGSA-DAI project, or have already satisfied by OGSA-DAI, is not considered here. • Frequent mention of the non-functional requirement: ease-of-use. • Some concern that installation and configuration remains too complex when compared with typical WAR-based web service deployment. • Hope to publish the full document in near future • let me know if you want a copy 2nd DIALOGUE workshop - 9-10 February 2006

  13. Conclusions • Efficient transportation of large quantities of data between heterogeneous data resources is a crucial requirement for several projects from distinct domains. • This is also an implicit requirement for projects requiring data federation and distributed query processing. • If we could solve this problem, it would be of great benefit to these projects, and also to higher-level middleware projects such as OGSA-DQP • Security remains a major concern because of the commercial and sensitive nature of much data • want a generalised, role-based mechanism for exposing different views of data resources to different users, and managing these views dynamically. • is this outside the scope of data integration middleware? • While we were previously aware of most of the requirements described in this document, associating them with actual projects can help with prioritisation. 2nd DIALOGUE workshop - 9-10 February 2006

  14. Further information • The OGSA-DAI Project Site: • http://www.ogsadai.org.uk • The DAIS-WG site: • http://forge.gridforum.org/projects/dais-wg/ • OGSA-DAI Users Mailing list • users@ogsadai.org.uk • General discussion on grid DAI matters • Formal support for OGSA-DAI releases • http://bugs.ogsadai.org.uk/ • OGSA-DAI training courses 2nd DIALOGUE workshop - 9-10 February 2006

More Related