490 likes | 639 Vues
Web Services und Grid Services im Grid Computing. Peter Brezany Institu t für Softwarewissenschaften Universit ät Wien. Medi en, die r adi k al die Gesellschaft beeinflußten. 1850s Telegraph. 1840s Penny Post. 1500s Druckp ress e. 1930s Radio. 1950s TV. 1920s
E N D
Web Services und Grid Services im Grid Computing Peter Brezany Institut für Softwarewissenschaften Universität Wien
Medien, dieradikal die Gesellschaft beeinflußten 1850s Telegraph 1840s Penny Post 1500s Druckpresse 1930s Radio 1950s TV 1920s Telefone 20xx Grid 1990s Web
"The Internet is about getting computers to talk together; Grid computing is about getting computers to work together." Tom Hawk, IBM's general manager of Grid computing Grid Computing Vision
Tim Berners-Lee replies to the question „What did you have in mind when you first developed the Web?“ by saying "The dream behind the Web is of a common information space in which we communicate by sharing information.“ If applied to the Grid computing this sentence can be rephrased to “The dream behind the Grid computing is a common resource space in which we can work together using shared recources.“ Grid Computing Vision (2)
Classical Web Classical Grid More computation Web im Vergleich zum Grid
Semantic Web Richer semantics Classical Web Web im Vergleich zum Grid (2)
Semantic Grid Semantic Web Richer semantics Classical Web Classical Grid More computation Web im Vergleich zum Grid (3) Source: Norman Paton
Motivation für Grids Grundbegriffe Bestehende Architekturen Neue Entwicklungen Von Web Services zu Grid Services Weiterentwickung und Integration von Web Services und Grid Services Grid Lösungen Lernziele
Beispiel Wasserversorgung – Früher: „Hausquelle“ / Brunnen – Heute: Wassersammelstelle Leitungen Wasserhahn Beispiel Energieversorgung Früher: Generator Heute: „Großer Generator“Stromleitungen Steckdose Power Grid Computational Grid / Grid Computing (z.B.: NASA: „Information Power Grid“ (www.ipg.nasa.gov)) Logische Konsequenz: Grid Computing Rechenleistung (und vieles mehr) aus der „Steckdose“ Viele Rechner zu einem Großen Netz verbunden; Vorteile: Komplett neue Möglichkeiten der Zusammenarbeit für Unternehmen Hardwareersparnis („mieten“) (vgl. Generator / Quelle) Teuere Software „mieten“ statt kaufen Selbst z.B. Rechenleistung anbieten Beispiele und logische Konsequenzen
Definition nach www.globus.org1: „The Grid“ ist eine Infrastruktur, die eine integrierte, gemeinschaftliche Verwendung von Ressourcen erlaubt. Als Ressourcen kommen nicht nur Re- chenleistungund Speicherplatz in Frage, sondern auch ganze (beliebige) Geräte können im Gridgemeinschaftlichverwendet werden, also zum Beispiel Hochleistungscomputer, Netzwerke,Datenbanken, Teleskope, Mikroskope bis zu Elektronenbeschleunigern. Ziel des Grid istes, dass man auf Geräte zugreifen kann, als ob man sie besitzen würde, ohne sie kaufenzu müssen. Charakteristika von Grid-Anwendungen: - Große Datenmengen - Großer Rechenaufwand Sicheres Resourcen-Sharing zwischen unabhängigenOrganisationen Aufbau vonVirtuellen Organisationen (VO) ----------------------------------------------------------- 1Praktisch alle wichtigsten Grid Projekte bauen auf middleware Globus (1998 -Globus 1, 2001 - Globus 2, 2003 - Globus 3) Grid Computing - Definition
• Autohersteller beauftragt: – Application service provider (ASP) Finanzielle Vorhersage – Storage service provider (SSP) (Historische) Daten – Cycle providers Rechenleistung für die Analyse Szenarienanalysen für neue Fabrik (bzw. Standort) durchzuführen. VO Beispiel
VO Beispiel (2) Figure:An actual organization can participate in one or more VOs by sharing some or all of its resources. We show three actual organizations (the ovals), and two VOs: P, which links participants in an aerospace design consortium, and Q, which links colleagues who have agreed to share spare computing cycles, for example to run ray tracing computations. The organization on the left participates in P, the one to the right participates in Q, and the third is a member of both P and Q. The policies governing access to resources (summarized in “quotes”) vary according to the actual organizations, resources, and VOs involved.
• Protokoll: – Menge von Regeln für Endpunkte von Telekommunikationssystemen zumInformationsaustausch – Standardprotokoll gewährleistet Interoperabilität • Dienst (Service): – Netzwerkfähige Instanz mit einer bestimmten Fähigkeit Definiert durch Protokoll und Reaktion auf eine Protokoll-Nachricht (service = protocol + behavior) • Application Program Interface (API): – Standardinterface für Zugriff auf Funktionalität (ein Protokoll kann mehrere APIs haben) – Ermöglicht Portabilität • Software Develpment Kit (SDK): – Implementiert ein API Definitionen: Protokoll, Dienst, API, SDK
Application Internet Protocol Architecture “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Collective “Sharing single resources”: negotiating access, controlling use Resource “Talking to things”: communication (Internet protocols) & security Connectivity Transport Internet “Controlling things locally”: Access to, & control of, resources Fabric Link Grid Protokoll Architektur vs. IP Architektur Application
• Fabric: – (Computer / Dateisysteme / Archive / Netzwerke / Sensoren / ...) (open, read, write, close, ...) – Kaum Beschränkungen am low-level solang Schnittstellen erfüllt • Connectivity: – Kommunikation (IP, DNS, Routing, ...) – Sicherheit (Grid Security Infrastructure, GSI) - Einheitliche Authentifikation - Single sign-on - Delegation - Public Key Technologie Grid Architektur (2)
• Resource Layer: – Grid Resource Allocation Management (GRAM) Zuweisung, Reservierung, Monitoring, Steuerung von Rechenresourcen – GridFTP Protokoll (FTP Erweiterungen) Hochgeschwindigkeitsdatenzugriff und –Transport – Grid Resource Information Service (GRIS) Zugang zu Struktur- und Statusinformationen – Netzwerkreservierung, Beobachtung und Steuerung – Baut auf Connectivity Layer (GSI & IP) auf. Grid Architektur (3)
Collective Layer: – Globale Protokolle und Dienste – Baut auf dem „neck“ auf – ist komplett „unabhängig“ von denResourcen – Verzeichnisdienste – Monitoring- und Diagnosedienste – Datenreplikationsdienste – etc. • Applications: – Verwenden Dienste beliebiger Layer Grid Architektur (4)
Data Grid • Ursprüngliche Motivation: Wissenschaftliche Anwendungen • sind sehr daten intensiv und enorm große Menge von • Forschern aus der ganzen Welt will einen schnellen • Zugriff auf diese Daten haben.
Die bisher diskutierten Konzepte implementiert von mehreren SDK, z.B. Globus (U.S.), Unicorn (EU Projekt), European Data Grid (EU Projekt), usw. Nur in wissenschaftlichen Kreisen gut bekannt und Fokus auf „big-science“ Anwendungen. Fast keine Anbindung von Datenbanktechnologien, Anwendung von „flat files“. Notwendigkeit näher zum „every-day life“ (e-Business, medicine, usw.) zu sein. Ignorierung von Web Entwicklung – Web Service Technologien Große Firmen (IBM, Sun, Microsoft, usw.) beginen jetzt auch mitzumachen. State of the Art in 2002
GT1 GT2 OGSI Started far apart in apps & tech Have been converging ? WSDL 2, WSDM WSDL, WS-* HTTP Grid and Web Services:Convergence? 1991 Grid 2004 Web GT – Globus Toolkit, OGSI – Open Grid Service Infrastructure However, despite enthusiasm for OGSI, adoption within Web community turned out to be problematic
Grid Service – OGSA – OGSI – GT3OGSA – Open Grid Service Architecture
Grid Services are defined by OGSA. The Open Grid Services Architecture (OGSA) aims to define a new common and standard architecture for grid-based applications. RIght at the center of this new architecture is the concept of a Grid Service. OGSA defines what Grid Services are, what they should be capable of, what types of technologies they should be based on, but doesn't give a technical and detailed specification (which would be needed to implement a Grid Service). Grid Services are specified by OGSI. The Open Grid Services Infrastructure is a formal and technical specification of the concepts described in OGSA, including Grid Services. The Globus Toolkit 3 is an implementation of OGSI. GT3 is a usable implementation of everything that is specified in OGSI (and, therefore, of everything that is defined in OGSA). Grid Services are based on Web Services. Grid Services are an extension of Web Services. We'll see what Web Services are in the next page, and what Grid Services are in the page after that. I still don't get it: What is the difference between OGSA, OGSI, and GT3? Consider the following simple example. Suppose you want to build a new house. The first thing you need to do is to hire an architect to draw up all the plans, so you can get an idea of what your house will look like. Once you're happy with the architect's job, it's time to hire an engineer who will make detailed blueprints that specify construction details (like where to put the master beams, the power cables, the plumbing, etc.). The engineer then passes all those blueprints to qualified professional workers (construction workers, electricians, plumbers, etc) who will actually build the house. We could say that OGSA (the definition) is the architect, OGSI (the specification) is the engineer, and GT3 (the implementation) is the workers. Grid Service – OGSA – OGSI – GT3 (2)
GT 3 Architecture I • Grid Services,which we have already seen, are the'GT3 Core' layer. Let's take a look at the rest of the layers from the bottom up: • GT3 Security Services: Security is an important factor in grid-based applications. GT3 Security Services can help us restrict access to our Grid Services, so only authorized clients can use them. For example, we said that only our New York, Los Angeles, and Seattle offices could access MathService. We want to make sure only those offices have access to MathService and, of course, we want all the data exchanged between MathService and clients to be encrypted so we can keep malicious users from intercepting our data. Besides the usual security measures (putting the web server behind a firewall, etc.) GT3 gives us one more layer of security with technologies such as SSL and X.509 digital certificates. • GT3 Base Services: This layer actually includes a whole lot of interesting services: • Managed Job Service: Suppose some particular operation in MathService might take hours or even days to be done. Of course, we don't want to simply stand in front of a computer waiting for the result to arrive (specially if, after 8 hours of waiting, all we get might simply be an error message!) We need to be able to check on the progress of the operation periodically, and have some control over it (pause it, stop it, etc.) This is usually called job management (in this case, the term 'job' is used instead of 'operation'), The Managed Job Service allows us to treat our invocations like jobs, and manage them accordingly.
GT 3 Architecture II • Index Service: Remember from A short introduction to Web Services that we usually know what type of Web Service we need, but we have no idea of where they are. This also happens with Grid Services: we might know we need a Grid Service which meets certain requirements, but we have no idea of what its location is. While this was solved in Web Services with UDDI, GT3 has its own Index Service. For example, we could have several dozen MathServices all around the country, each with different characteristics (some might be better suited for statistical analysis, while others might me better for performing simulations). Index Service will allow is to query what MathService meets our particular requirements. • Reliable File Transfer (RFT) Service: This service allows us to perform large file transfers between the client and the Grid Service. For example, suppose we have an operation in MathService which has to crunch several gigabytes of raw data (for a statistical analysis, for example). Of course, we're not going to send all that information as parameters. We'll be able to send it as a file. Furthermore, RFT guarantees the transfer will be reliable (hence its name). For example, if a file transfer is interrupted (due to a netwok failure, for example), RFT allows us to restart the file transfer from the moment it broke down, instead of starting all over again. • GT3 Data Services: This layer includes Replica Management, which is very useful in applications that have to deal with very big sets of data. When working with large amount of data, we're usually not interested in downloading the whole thing, we just want to work with a small part of all that data. Replica Management keeps track of those subsets of data we will be working with. • Other Grid Services: Other non-GT3 services can run on top of the GT3 Architecture.
Challenge:Advanced Grid ApplicationsExample: Knowledge Discoveryin Grid Databases
Motivation Business Medicine Scientific experiments Data and data exploration cloud Simulations Earth observations
The Knowledge Discovery Process Knowledge OLAP Queries OLAP Online Analytical Mining Evaluation and Presentation Data Mining Selection and Transformation Data Warehouse Cleaning and Integration
GridMiner :A knowledge discovery Grid infrastructure (http://www.gridminer.org/) OGSA-based architecture Workflow management Grid-aware data preprocessing and data mining services Data mediation service OLAP service GUI Implementation on top of Globus Toolkit 3.0 Application : Management of patients with traumatic brain injuries The GridMiner Project in Vienna
GridMiner Architecture GridMiner Workflow GM DSCE Dynamic Service Control GridMiner Core GMDIS Integration GMPPS Preprocessing GMDMS Data Mining GMOMS OLAM GMPRS Presentation GridMiner Base GMMS Mediation GMIS Information GMRB Resource Broker GMCMS OLAP / Cubes Grid Core Grid Core Services Security File and Database Access Service Replica Management Fabric Grid Resources Data Source
Collaboration of GM-Services Example 3:
Control Layer Provision of the whole knowledge discovery process to a client Knowledge discovery process in GridMiner services to execute not known order of service execution sequential and concurrent execution Approaches investigated: Data Mining Query Language Standard Workflow Orchestration Approach (BPEL4WS, WSFL, GSFL, …) Our approach: Dynamic Service Control The Control Layer
The Control LayerStandard Service Orchestration Approach (BPEL4WS)
Workflow Models Composition by Service Publisher Composition by Service Consumer
The Control Layer - Approaches:Dynamic Service Control Client • Dynamic Service Control Language (DSCL) • based on XML • easy to use • supports OGSA Grid Services • specially design to support knowledge discovery processes • Dynamic Service Control Engine (DSCE) • processes workflow according to DSCL subscribe Notification sink Start, stop, resume… (re)connect query results DSCL notify DSCE Service A Service B Service D Service C OGSA Grid Services
Features Control flow concurrent execution of activities sequential execution of activities Activities creation of new Grid Service Instances invoking operations on Grid Service Instances querying information of Grid Service Instances destroying of Grid Service Instances Dynamic Service Control Language (DSCL)
DSCL - Structure dscl variables composition qreate Service invoke query SDE qreate Service invoke query SDE qreate Service invoke
Initializing by simple type value Initializing by arrays DSCL - Variables <variable name=“intvar”> <ns1:value xsi:type=“xsd:int” xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance” xmlns:xsd=“http://www.w3.org/2001/XMLSchema” xmlns:ns1=“http://ogsa.globus.org”>4711</ns1:value> </variable> <variable name=“arrayvar”> <ns1:value xsi:type=“soapenc:Array” soapenc:arrayType=“xsd:int[2]” xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance” xmlns:soapenc=“http://schemas.xmlsoap.org/soap/encoding/” xmlns:xsd=“http://www.w3.org/2001/XMLSchema” xmlns:ns1=“http://ogsa.globus.org”> <soapenc:item>23</soapenc:item> <soapenc:item>-112</soapenc:item> </ns1:value> </variable>
Initializing by a complex type value DSCL - Variables <xsd:schema targetNamespace="http://www.gridminer.org/test/" xmlns:tns="http://www.gridminer.org/test/" ... <xsd:complexType name="address"> <xsd:sequence> <xsd:element name="country" type="xsd:string/"> <xsd:element name="zip" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="number" type="xsd:string"/> </xsd:sequence> </xsd:complexType> ... </xsd:schema> <variable name="address-var"> <ns1:value xmlns:ns1="http://ogsa.globus.org"> <ns1:country xmlns:ns1="http://www.gridminer.org/test/">Austria</ns1:country> <ns1:zip xmlns:ns1="http://www.gridminer.org/test/">1090</ns1:zip> <ns1:city xmlns:ns1="http://www.gridminer.org/test/">Vienna</ns1:city> <ns1:street xmlns:ns1="http://www.gridminer.org/test/">Liechtensteinstr.</ns1:street> <ns1:number xmlns:ns1="http://www.gridminer.org/test/">18</ns1:number> </ns1:value> </variable>
DSCL Control Flow act2.1 act1 act2.2 dscl variables composition sequence createService activityID=“act1” … parallel invoke activityID=“act2.1” … invoke activityID=“act2.2” … sequence …
GT1 GT2 OGSI Started far apart in apps & tech Have been converging WSRF WSDL 2, WSDM WSDL, WS-* HTTP Web Services Resource Framework - WSRF Grid and Web Services:Convergence: Yes! Grid Web The definition of WSRF means that Grid and Web communities can move forward on a common base First publications on WSRF: January 2004
Literatur • Grid Computing – Making the Global Infrastructure a Reality. • By F. Berman, G. Fox, T. Hey (Eds.), Wiley 2003 • www.globus.org • www.gridminer.org (unser Forschungsprojekt) • Viele Dokumente im Web