1 / 70

Overview

Sessions 43 & 44 Accessing data using a common interface: OGSA-DAI as an example Elias Theocharopoulos and Tilaye Alemu ISSGC ‘09 – Sophia Antipolis – Tuesday, 14th July 2009. Overview. The problem: Sharing data in a grid What is OGSA-DAI? Data-centric workflows Key OGSA-DAI terms

errin
Télécharger la présentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sessions 43 & 44Accessing data using a common interface: OGSA-DAI as an exampleElias Theocharopoulos and Tilaye Alemu ISSGC ‘09 – Sophia Antipolis – Tuesday, 14th July 2009

  2. Overview • The problem: Sharing data in a grid • What is OGSA-DAI? • Data-centric workflows • Key OGSA-DAI terms • The OGSA-DAI client toolkit • Use cases and extensibility points • Pros and cons

  3. The problem: Sharing and accessing data in a grid

  4. Distributed data resources

  5. How about a central server? FR query FR data Client

  6. Central server pros and cons • Access to up-to-date data • Single point of access • Data in common format • Database can handle joins • Initial overhead in terms of time, effort and cost • Keeping data up to date • Loss of control by data providers • Assuming they even let go • Security and trust

  7. How about providing direct access? IA query IA data UK query UK data ES query ES data Client Translate and join

  8. Direct access pros and cons • Access to up-to-date data • Fast access • Data providers retain control • Fat clients • Heterogeneity and inconsistency • Data • Databases • Connection • Security • Security overheads for data providers • Manage firewalls and usernames/passwords for multiple clients • Hard to use in grid/web service workflows

  9. UK data ES data IA data How about providing a ZIP on the web? HTTP GET HTTP GET HTTP GET ZIP ZIP ZIP Client UnZIP, translate and join

  10. ZIP on the web pros and cons • Fast access • Data providers retain control • Very large downloads even if client only needs subset • Providers have to select and ZIP their data • Client has to install data into a local database • Static snapshot

  11. OGSA-DAI Sharing distributed heterogeneous resources with OGSA-DAI UK query UK data ES query ES data IA query IA data Translate and join FR data FR query Client

  12. Motivation Grid is about sharing resources Need to share structured data resources 12

  13. What is OGSA-DAI? Open Grid Services Architecture Data Access Integration A framework that executes workflows Workflows are data-centric Workflow components are designed for data access, integration, transformation and delivery Can access heterogeneous data resources Webservice interface Intended as a toolkit for building higher-level application-specific data services 13

  14. OGSA-DAI’s vision • Sharing data resources to enable collaboration • Data access • Structured data in distributed heterogeneous data resources • Data integration • e.g. expose multiple databases to users as a single virtual database • Data transformation • e.g. expose data in schema X to users as data in schema Y • Data delivery • To where it’s needed by the most appropriate means • e.g. web service, e-mail, HTTP, FTP, GridFTP

  15. OGSA-DAI and data-centric workflows

  16. OGSA-DAI workflow • Executes workflows • Workflows contain activities • Well-defined functional units • Data goes in, something is done, data comes out • Equivalent to programming language methods • Workflows are submitted by clients • To an OGSA-DAI web service

  17. An OGSA-DAI workflow - a simply analogy Convert query from French to English Convert data from English to French Run SQL query SELECT Country, Capital FROM Countries Join the data SELECT Pays,Capital FROM Pays SELECT País, Capital FROM Países Convert data from Spanish to French Convert query from French to Spanish Run SQL query

  18. OGSA-DAI How it appears to the client workflow(SELECT Pays,Capital FROM Pays) Client

  19. Data integration with OGSA-DAI workflows • Across OGSA-DAI services

  20. Key OGSA-DAI terms: activities, resources, workflows

  21. OGSA-DAI: Key Term Activity An activityis a named unit of functionality A well defined workflow unit Pluggable Composable An activity can have 0 or more named inputs 0 or more named outputs Blocks of data flow from an activity’s output into another activity’s input

  22. OGSA-DAI: Key Term Activity (cont.) Example activities include Execute an SQL query ZIP a batch of data List the files in a directory Execute an XSL transform on an XML document Deliver data to an FTP server

  23. OGSA-DAI: Key Term Activity (cont.) Activity Connections All required inputs must be connected All outputs must be connected Optional inputs Inputs Literal Streamed Types

  24. Connecting activities - examples

  25. [byte[]…],[ byte[]..] f1,f2 ReadFromFileActivity Data grouping: Lists • Special blocks are used to mark the beginning and the end of a list. • A list groups related data as one unit. • For example ReadFromFileActivity can dynamically take any number of filenames as input. • Without a way to group the output byte arrays we would have no way to differentiate between the binary data of filenames f1 and f2. • Streaming is preserved since for each file a number of byte arrays is produced to be forwarded to coming activities.

  26. SqlQuery SELECT city, temp FROM weather; Passing data internally: OGSA-DAI Tuple • A special type of data passing between activities • A Tuple is a data representation similar to a row of relational data. Each element of a Tuple represent a column. • Tuples are normally grouped in lists and they are preceded by a metadata block.

  27. [A,B,C,D] [A,B,C,D] TeeActivity [A,B,C,D] No of outputs: 2 An interesting activity: Tee • There are activities that operate on the level of blocks and are not concerned with the type and values of data they are handling. E.g TeeActivity:

  28. OGSA-DAI: Key Term Resource Data request execution resource Data resources Data sources Data sinks Sessions A state container associated with a set of workflows One workflow can lodge state A subsequent workflow can retrieve it Requests One per workflow submitted to a DRER Access request status

  29. OGSA-DAI: Key Term Workflow A workflow can contain: Activities Resource-based: SQLQuery Non-Resource: Transformation and Delivery Resources Targeted by Activities Other Workflows Sub workflows Other types of workflow

  30. OGSA-DAI: Key Term Workflow (cont’) • OGSA-DAI can be used as a workflow processing system that is designed to stream data through a set of activities in a pipelined manner. • In the Query->Transform->Deliver workflow, if the activities are well defined all three will be processing concurrently with different portions of the data stream.

  31. 1 2 OGSA-DAI: Key Term Workflow (cont’) • Pipeline workflow consists of a set of chained activities that will be executed in parallel with data flowing between the activities. • Sequence workflow all the sub-workflows added to this workflow will be executed in sequence. For example 1st sub-workflow in a sequence creates a table, 2nd bulk loads transformed data into this table. • Parallel workflow all the sub-workflows added to this workflow will be executed in parallel.

  32. Getting to the first practical: The OGSA-DAI client toolkit.

  33. OGSA-DAI client toolkit • OGSA-DAI client toolkit • Construct and submit requests in Java not XML • Toolkit manages interaction with web services via SOAP over HTTP; it handles SOAP request construction and response parsing. • Provides Java abstractions of • Services • OGSA-DAI resources and properties • Requests • Activities

  34. The client toolkit • The workflow description is sent to the OGSA-DAI server as an XML document. • Application developer does not need to worry about creating this document. • The client toolkit provides ways of assembling activity workflows programmatically. • We will see how to use the client toolkit during the hands-on session.

  35. Service/resource model One Data Resource Data MyDRER Two Data Request Execution Service Data Request Execution Resource Data Resource Data Three Data Resource Data Client Session Session Request Management Service Request MyRequest123456

  36. Client Toolkit Activities • One client activity per server activity • Same input and output names • Plus some convenience methods For example: • Retrieve results as a JDBC ResultSet from a TupleToWebRowSet activity. • Retrieve update count as an Integer from a SQLUpdate activity

  37. Step by Step Guide for Writing Clients • Create activities • There’s a corresponding client toolkit activity for each server-side activity DeliverToFTP deliver = new DeliverToFTP(); ReadFromFile readFile = new ReadFromFile();

  38. Connecting activities • Set inputs for each activity (e.g. parameters) • Every input parameter can either be literal input or streamed from another activity • Literal inputs, e.g. for constant parameters: • Connect input to the output of another activity to stream data deliver.addFilename("results1.txt"); deliver.addHost(“anonymous@test.ogsadai.org.uk:21"); deliver.connectDataInput(readFile.getDataOutput());

  39. Gaining access to the results • If the output of an activity can be provided in a user-friendly type, then there are methods to access the results: • Check whether there are more results to be retrieved • Get the next result in a convenient type boolean hasNext = sqlUpdate.hasNextResult(); int count = sqlUpdate.getNextResult();

  40. Build and execute the Workflow Request • Create workflow and add activities to them • A data service executes the workflow and returns a response (or an error!) • The response may contain data (depending on the activities) • Each client toolkit activity provides utility methods for retrieving its response data

  41. First hands-on session Go to : http://homepages.nesc.ac.uk/~elias/issgc09/html/practical.html

  42. Extensibility points & components

  43. Extending OGSA-DAI: What OGSA-DAI A Framework Extensible Out of the Box is the basics Different applications have different needs New Sources of Data New Functionality

  44. Extending OGSA-DAI: Overview Presentation Layer New Message Frameworks gLite Embedded UNICORE WS-DAI ? GT Axis OMII OGSA-DAI Core Persistence and Configuration Workflow Execution Engine Sessions Activity Framework Request SQLQuery XPathQuery XSLTransform New Functionality DeliverToURL Data Source MyOwnActivity Data Sink New Types of Data Data Resources

  45. Extending OGSA-DAI: Activities Activities do some unit of work Specific transformation Data Format: SWISS-PROT to format X Delivery Deliver to a target service Data analysis and Integration Combine data from different sources

  46. Extending OGSA-DAI: Resources New resources – why? New Products New Applications Specialised Access Required: DataResource DataResourceState ResourceAccessor

  47. Extending OGSA-DAI: Remote Resource Accessing Resources on Remote OGSA-DAI Avoid replication of resources Security Issues Devolved to Local OGSA-DAI Security between OGSA-DAI Deployments

  48. SQL views • Define a drPatient view • SELECT id, name, age, sex, doctor.name as drName FROM patient, doctor WHERE patient.DrID = doctor.ID; • Client runs SELECT * FROM drPatient; • Shorthand for complex query results • Data access control e.g. users of drPatient • Cannot access a patient’s ZIP • Are unaware of the doctor or patient tables

  49. OGSA-DAI SQL views • OGSA-DAI SQL views data resource • Represents a view across a database exposed by an OGSA-DAI relational resource • SQLQuery activity • Parses query • Splices in view definition • Submits transformed query to database • Can define views for read-only databases • Schema transformation • Map a logical schema to a physical schema

  50. Distributed query processing • OGSA-DQP • Developed by Universities of Manchester and Newcastle • Refactored for OGSA-DAI 3.0 by EPCC as part of the NextGrid project • OGSA-DAI DQP package • Multiple tables on multiple databases are exposed to clients as multiple tables in one “virtual database” • Clients are unaware of the multiple databases • Databases can be exposed • EITHER within one OGSA-DAI server • OR via multiple remote OGSA-DAI servers

More Related