1 / 64

Grids for Real-time and Streaming Applications

This presentation explores how Grids can support high-performance streaming information for sensors and instruments. It discusses the WS-* specifications and showcases three projects from the Community Grids Laboratory, including NaradaBrokering, CrisisGrid, and SERVOGrid. The presentation emphasizes the integration and management of multiple Internet-connected resources, such as people, sensors, computers, and data systems.

abagwell
Télécharger la présentation

Grids for Real-time and Streaming Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grids for Real-time and Streaming Applications TIWDC 2005 CNIT Tyrrhenian International Workshop on Digital Communications July 4-6 2005 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 http://grids.ucs.indiana.edu/ptliupages/presentations/cnitjuly5-05.ppt gcf@indiana.edu http://www.infomall.org

  2. Summary • This presentation describes how Grids can support sensors or instruments with high performance streaming information • Grids, Web Services and P2P Networks are all though of as services connected by messages • We describe the WS-* specifications and their role • We illustrate with three projects from the Community Grids Laboratory • NaradaBrokering: Software Overlay Network • CrisisGrid and SERVOGrid: GIS and Sensor Grids • GlobalMMCS: Service oriented collaboration Grid

  3. Internet Scale Distributed Services • Grids use Internet technology and are distinguished by managing or organizing sets of network connected resources • Classic Web allows independent one-to-one access to individual resources • Grids integrate together and manage multiple Internet-connected resources: People, Sensors, computers, data systems • Organization can be explicit as in • TeraGrid which federates many supercomputers; • Deep Web Technologies Information Retrieval Grid which federates multiple data resources; • CrisisGrid which federates first responders, commanders, sensors, GIS, (Tsunami) simulations, science/public data • Organization can be implicit as in Internet resources such as curated databases and simulation resources that “harmonize a community”

  4. Different Visions of the Grid • Grid just refers to the technologies • Or Grids represent the full system/Applications • In the USA, DoD’s vision of Network Centric Computing can be considered a Grid (linking sensors, warfighters, commanders, backend resources) and they are building the GIG (Global Information Grid) • Utility Computing or X-on-demand (X=data, computer ..) is major computer Industry interest in Grids and this is key part of enterprise or campus Grids • e-Science or Cyberinfrastructure are virtual organization Grids supporting global distributed science (note sensors, instruments are people are all distributed) • Skype (Kazaa) VOIP system is a Peer-to-peer Grid (and VRVS/GlobalMMCS like Internet A/V conferencing are Collaboration Grids) • Commercial 3G Cell-phones and ad-hoc network initiatives are forming mobile Grids

  5. e-moreorlessanything and the Grid • e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world. • The growing use of outsourcing is one example • e-Science is the similar vision for scientific research with international participation in large accelerators, satellites or distributed gene analyses. • The Grid integrates the best of the Web, traditional enterprise software, high performance computing and Peer-to-peer systems to provide the information technology e-infrastructure for e-moreorlessanything. • A deluge of data of unprecedented and inevitable size must be managed and understood. • People, computers, data and instruments must be linked. • On demand assignment of experts, computers, networks and storage resources must be supported

  6. Some Important Styles of Grids • Computational Grids were origin of concepts and link computers across the globe – high latency stops this from being used as parallel machine • Typically Compute/File Grids where information (messages) exchanged by writing and reading files • Knowledge and Information Grids link sensors and information repositories as in Virtual Observatories or BioInformatics • Education Grids link teachers, learners, parents as a VO with learning tools, distant lectures etc. • e-Science Grids link multidisciplinary researchers across laboratories and universities • Community Grids focus on Grids involving large numbers of peers rather than focusing on linking major resources – links Grid and Peer-to-peer network concepts • Semantic Grid links Grid, and AI community with Semantic web (ontology/meta-data enriched resources) and Agent concepts • Collaboration Grids support the linkage of multiple people and electronic resources (often peer-to-peer architecture)

  7. Information/Knowledge Grids • Distributed (10’s to 1000’s) of data sources (instruments, file systems, curated databases …) • Data Deluge: 1 (now) to 100’s petabytes/year (2012) • Moore’s law for Sensors • Possible filters assigned dynamically (on-demand) • Run image processing algorithm on telescope image • Run Gene sequencing algorithm on compiled data • Needs decision support front end with “what-if” simulations • Metadata (provenance) critical to annotate data • Integrate across experiments as in multi-wavelength astronomy Data Deluge comes from pixels/year available

  8. Field Trip Data Database ? GISGrid Discovery Services RepositoriesFederated Databases Streaming Data Sensors Database Sensor Grid Database Grid Research Education SERVOGrid Compute Grid Customization Services From Researchto Education Data FilterServices ResearchSimulations Analysis and VisualizationPortal EducationGrid Computer Farm Grid of Grids: Research Grid and Education Grid

  9. Virtual Observatory Astronomy GridIntegrate Experiments Radio Far-Infrared Visible Dust Map Visible + X-ray Galaxy Density Map

  10. Databases and/or Sensors Data Data Filter Filter Filter Data Filter Data OGSA-DAIGrid Services AnalysisControl Visualize Grid Data Filter This Type of Grid integrates with Parallel computing Multiple HPC facilities but only use one at a time Many simultaneous data sources and sinks HPC Simulation Grid Data Assimilation Other Gridand Web Services Distributed Filters massage data For simulation SERVOGrid (Complexity) Computing Model

  11. Sources of Grid Technology • Grids support distributed collaboratories or virtual organizations integrating concepts from • The Web • Agents • Distributed Objects(CORBA Java/Jini COM) • Globus, Legion, Condor, NetSolve, Ninf and other High Performance Computing activities • Peer-to-peer Networks • With perhaps the Web and P2P networks being the most important for “Information Grids” and Globus for “Compute/File Grids”

  12. The Essence of Grid Technology? • We will start from the Web view and assert that basic paradigm is • Meta-data rich Web Services communicating via messages • These have some basic support from some runtime such as .NET, Jini (pure Java), Apache Tomcat+Axis (Web Service toolkit), Enterprise JavaBeans, WebSphere (IBM) or GT3/4 (Globus Toolkit 3/4) • These are the distributed equivalent of operating system functions as in UNIX Shell • Called Hosting Environment or platform • W3C standard WSDL defines IDL (Interface standard) for Web Services

  13. PortalService Security Catalog A typical Web Service • In principle, services can be in any language (Fortran .. Java .. Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining) • The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python PaymentCredit Card Web Services WSDL interfaces Warehouse Shipping control WSDL interfaces Web Services

  14. Database Database Classic Grid Architecture Resources Content Access Composition Middle TierBrokers Service Providers Netsolve Security Collaboration Computing Middle Tier becomes Web Services Clients Users and Devices

  15. Database Database Event/MessageBrokers Event/MessageBrokers Event/MessageBrokers Peer to Peer Grid Peers Service FacingWeb Service Interfaces Peers User FacingWeb Service Interfaces A democratic organization Peer to Peer Grid

  16. What is Happening? • Grid ideas are being developed in (at least) four communities • Web Service – W3C, OASIS, (DMTF) • Grid Forum (High Performance Computing, e-Science) • Enterprise Grid Alliance (Commercial “Grid Forum” with a near term focus) • Service Standards are being debated • Grid Operational Infrastructure is being deployed • Grid Architecture and core software being developed • Apache has several important projects as do academia; large and small companies • Particular System Services are being developed “centrally” – OGSA framework for this in GGF; WS-* for OASIS/W3C/Microsoft-IBM • Lots of fields are setting domain specific standards and building domain specific services • USA started Grids but now Europe is probably in the lead and Asia will soon catch USA if momentum (roughly zero for USA) continues

  17. Technical Activities of Note • Look at different styles of Grids such as Autonomic(Robust Reliable Resilient) • New Grid architectures hard due to investment required • Program the Grid – Workflow • Access the Grid – Portals, Grid Computing Environments • Critical Services Such as • Security – build message based not connection based • Notification – event services • Metadata – Use Semantic Web, provenance • Fabric and Service Management • Databases and repositories – instruments, sensors • Computing – Submit job, scheduling, distributed file systems • Visualization, Computational Steering • Network performance Low Level WS-* High Level e.g. OGSA

  18. Web services • Web Services build loosely-coupled, distributed applications, (wrapping existing codes and databases) based on the SOA (service oriented architecture) principles. • Web Services interact by exchanging messages in SOAPformat • The contracts for the message exchanges that implement those interactions are described via WSDL interfaces.

  19. Philosophy of Web Service Grids • Much of Distributed Computing was built by natural extensions of computing models developed for sequential machines • This leads to the distributed object (DO) model represented by Java and CORBA • RPC (Remote Procedure Call) or RMI (Remote Method Invocation) for Java • Key people think this is not a good idea as it scales badly and ties distributed entities together too tightly • Distributed Objects Replaced by Services • Note CORBA was considered too complicated in both organization and proposed infrastructure • and Java was considered as “tightly coupled to Sun” • So there were other reasons to discard • Thus replace distributed objects by services connected by “one-way” messages and not by request-response messages

  20. H1 H2 H3 H4 Body Service F1 F2 F3 F4 Container Workflow Container Handlers SOAP Message Structure I • SOAP Message consists of headers and a body • Headers could be for Addressing, WSRM, Security, Eventing etc. • Headers are processed by handlers or filters controlled by container as message enters or leaves a service • Body processed by Service itself • The header processing defines the “Web Service Distributed Operating System” • Containers queue messages; control processing of headers and offer convenient (for particular languages) service interfaces • Handlers are really the core Operating system services as they receive and give back messages like services; they just process and perhaps modify different elements of SOAP Message

  21. Merging the OSI Levels • All messages pass through multiple operating systems and each O/S thinks of message as a header and a body • Important message processing is done at • Network • Client (UNIX, Windows, J2ME etc) • Web Service Header • Application • EACH is < 1ms (except forsmall sensor clients andexcept for complex security) • But network transmissiontime is often 100ms or worse • Thus no performance reasonnot to mix up places processingdone IP TCP App SOAP

  22. Consequences of Rule of the Millisecond • Useful to remember critical time scales • 1) 0.000001 ms – CPU does a calculation • 2a) 0.001 to 0.01 ms – Parallel Computing MPI latency • 2b) 0.001 to 0.01 ms – Overhead of a Method Call • 3) 1 ms – wake-up a thread or process (do simple things on a PC) • 4) 10 to 1000 ms – Internet delay • 2a), 4) implies geographically distributed metacomputing can’t in general compete with parallel systems • 3) << 4) implies a software overlay network is possible without significant overhead • We need to explain why it adds value of course! • 2b) versus 3) and 4) describes regions where method and message based programming paradigms important

  23. Plethora of Web Service Standards • Javais very powerful partly due to its many “frameworks” that generalize libraries e.g. • Java Media Framework JMF • Java Database Connectivity JDBC • Web Services have a correspondingly collections of specifications that represent critical features of its distributed operating systems for • About 60 WS-* specifications introduced in last 2-3 years • These are low level with higher level standards such as access database (OGSA-DAI) or “Submit a job” built on top of these • Many battles both between standard bodies and between companies as each tries to set standards they consider best; thus there are multiple standards for many of key Web Service functionalities • Microsoft a key player and stands to benefit as Web Services open up enterprise software space to all participants • e.g. MQSeries (IBM) and Tibco have to change their messaging systems to support new open standards

  24. WS-I Interoperability • Critical underpinning of Grids and Web Services is the gradually growing set of specifications in the Web Service Interoperability Profiles • Web Services Interoperability (WS-I) Interoperability Profile 1.0a." http://www.ws-i.org. gives us XSD, WSDL1.1, SOAP1.1, UDDIin basic profile and parts of WS-Security in their first security profile. • We imagine the “60 Specifications” being checked out and evolved in the cauldron of the real world and occasionally best practice identifies a new specification to be added to WS-I which gradually increases in scope • Note only 4.5 out of 60 specifications have “made it” in this definition

  25. Application Specific Grids Generally Useful Services and Grids Workflow WSFL/BPEL Service Management (“Context etc.”) Service Discovery (UDDI) / Information Service Internet Transport  Protocol Service Interfaces WSDL Higher Level Services ServiceContext ServiceInternet Base Hosting Environment Protocol HTTP FTP DNS … Presentation XDR … Session SSH … Transport TCP UDP … Network IP … Data Link / Physical Bit level Internet (OSI Stack) Layered Architecture for Web Services and Grids

  26. WS-* implies the Service Internet • We have the classic (CISCO, Juniper ….) Internet routing the flood of ordinary packets in OSI stack architecture • Web Services build the “Service Internet” or IOI (Internet on Internet) with • Routing via WS-Addressing not IP header • Fault Tolerance (WS-RM not TCP) • Security (WS-Security/SecureConversation not IPSec/SSL) • Data Transmission by WS-Transfer not HTTP • Information Services (UDDI/WS-Context not DNS/Configuration files) • At message/web service level and not packet/IP address level • Software-based Service Internet possible as computers “fast” • Familiar from Peer-to-peer networks and built as a software overlay network defining Grid (analogy is VPN) • SOAP Header contains all information needed for the “Service Internet” (Grid Operating System) with SOAP Body containing information for Grid application service

  27. What is a Simple Service? • Take any system – it has multiple functionalities • We can implement each functionality as an independent distributed service • Or we can bundle multiple functionalities in a single service • Whether functionality is an independent service or one of many method calls into a “glob of software”, we can always make them as Web services by converting interface to WSDL • Simple services are gotten by taking functionalities and making as small as possible subject to “rule of millisecond” • Distributed services incur messaging overhead of one (local) to 100’s (far apart) of milliseconds to use message rather than method call • Use scripting or compiled integration of functionalities ONLY when require <1 millisecond interaction latency • Apache web site has many (pre Web Service) projects that are multiple functionalities presented as (Java) globs and NOT (Java) Simple Services • Makes it hard to integrate “globs” sharing common security, user profile, file access .. services

  28. CPUs Clusters Compute Resource Grids Overlay and Compose Grids of Grids MPPs Methods Services Component Grids Federated Databases Databases Data Resource Grids Sensor Sensor Nets Grids of Grids of Simple Services • Link via methods  messages  streams • Services and Grids are linked by messages • Internally to service, functionalities are linked by methods • A simple service is the smallest Grid • We are familiar with method-linked hierarchyLines of Code  Methods  Objects  Programs  Packages

  29. Component Grids • So we build collections of Web Services which we package as component Grids • Visualization Grid • Sensor Grid • Management Grid • Utility Computing Grid • Collaboration Grid • Earthquake Simulation Grid • Control Room Grid • Crisis Management Grid • Intelligence Data-mining Grid • We build bigger Grids by composing component Grids using the Service Internet

  30. Gas CIGrid Flood CIGrid … … Gas Servicesand Filters Flood Servicesand Filters Electricity CIGrid Portals Collaboration Grid Visualization Grid Sensor Grid GIS Grid Compute Grid Data Access/Storage Registry Metadata Core Grid Services Physical Network Security Notification Workflow Messaging Critical Infrastructure (CI) Grids built as Grids of Grids

  31. The WS-* Infrastructure • Core Grid Services build on and/or extend the 60 or so WS-* Infrastructure specifications which define • Container Model, XML, WSDL … • Service Internet ( (Reliable) Messaging, Addressing) including extensions for high performance transport and representation. This is natural basis for streaming applications • Notification (WS-Notification and WS-Eventing) • Workflow and Transactions • Security • Service Discovery • Metadata and State including lifetime • Management (service interactions) • Policy, Agreements • Portals and User Interfaces

  32. Role of WS-* specifications • Service Internet and Notification define a distributed publish-subscribe software overlay network • SOAP Intermediaries give you needed routing • Workflow specifies how the services (sensors and their aggregators and transformers) are linked together • WS-Security gives you message based security • Service discovery, metadata and state define several types of metadata catalogs (MDS, UDDI, Chord …) • Management defines the protocol and information exchange between services or between a service and a proxy • Intel will support plug & play by putting WS-Management on every chip • WS-Policy allows one to specify HOW to do things • WSRP specifies how users interact with services

  33. Core (Commonly Used) Grid Services (OGSA) • Higher level discovery and metadata – Registries, catalogs, Semantic Grid, Provenance • Higher level security with fine grain authorization, session level security etc. • Higher level execution services building on workflow and including job management for simulations • Infrastructure services giving common interfaces to heterogeneous resource such as storage and computers • Data and Information services including federated databases and use of CIM • Self and distributed (resource) management for autonomic features, configuration • Collaboration, Sensors, Visualization, GIS Not in OGSA

  34. 3 Real-time Grid Projects in Community Grids Laboratory • htpp://www.naradabrokering.org provides Web service (and JMS) compliant distributed publish-subscribe messaging (software overlay network) • htpp://www.globlmmcs.org is a service oriented (Grid) collaboration environment (audio-video conferencing) including mobile clients • http://www.crisisgrid.org is an OGC (open geospatial consortium) Geographical Information System (GIS) compliant GIS and Sensor Grid • The work is still in progress but NaradaBrokering is quite mature • All software is open source and freely available

  35. NaradaBrokering Queues Stream NB supports messages and streams NB role for Grid is Similar to MPI role for MPP

  36. Traditional NaradaBrokering Features

  37. Features for mid 2005 Releases • Production implementations of WS-Eventing, WS-RM and WS-Reliability. • WS-Notification when specification agreed • SOAP message supportand NaradaBrokers viewed as SOAP Intermediaries • Active replay support: Pause and Replay live streams. • Stream Linkage: can link permanently multiple streams – using in annotating real-time video streams • Replicated storage support for fault tolerance and resiliency to storage failures. • Management: HPSearch Scripting Interface to streams and services • Broker Discovery: Locate appropriate brokers

  38. Mean transit delay for message samples in NaradaBrokering: Different communication hops 9 hop-2 hop-3 8 hop-5 7 hop-7 6 5 Transit Delay (Milliseconds) 4 3 2 1 0 100 1000 Message Payload Size (Bytes) Pentium-3, 1GHz, 256 MB RAM 100 Mbps LAN JRE 1.3 Linux

  39. Possible NaradaBrokering Futures • Support for replicated storages within the system. • In a system with N replicas the scheme can sustain the loss of N-1 replicas. • Clarification and expansion of NB Broker to act as a WS container • Integration with Axis 2.0 as Message Oriented Middleware infrastructure • Support for High Performance transport and representation for Web Services • Needs Context catalog under development • Performance based routing • The broker network will dynamically respond to changes in the network based on metrics gathered at individual broker nodes. • Replicated publishers for fault tolerance • Pure client P2P implementation (originally we linked to JXTA) • Security Enhancements for fine-grain topic authorization, multi-cast keys, Broker attacks

  40. H1 H2 H3 H4 Body hp1 hp2 hp3 hp4 hp5 bp3 bp1 bp2 SOAP Message Structure II • Content of individual headers and the body is defined by XML Schema associated with WS-* headers and the service WSDL • SOAP Infoset captures header and body structure • XML Infoset for individual headers and the body capture the details of each message part • Web Service Architecture requires that we capture Infoset structure but does not require that we represent XML in angle bracket <content>value</content> notation Infoset representssemantic structure of message and itsparts

  41. High Performance XML I • There are many approaches to efficient “binary” representations of XML Infosets • MTOM, XOP, Attachments, Fast Web Services • DFDL is one approach to specifying a binary format • Assume URI-S labels Scheme and URI-R labels realization of Scheme for a particular message i.e. URI-R defines specific layout of information in each message • Assume we are interested in conversations where a stream of messages is exchanged between two services or between a client and a service i.e. two end-points • Assume that we need to communicate fast between end-points that understand scheme URI-S but must support conventional representation if one end-point does not understand URI-S

  42. F1 F2 F3 F4 Container Handlers High Performance XML II • First Handler Ft=F1 handles Transport protocol; it negotiates with other end-point to establish a transport conversation which uses either HTTP (default) or a different transport such as UDP with WSRM implementing reliability • URI-T specifies transport choice • Second Handler Fr=F2handles representation and it negotiates a representation conversation with scheme URI-S and realization URI-R • Negotiation identifies parts of SOAP header that are present in all messages in a stream and are ONLY transmitted ONCE • Fr needs to negotiate with Service and other handlers illustrated by F3 and F4 below to decide what representation they will process

  43. H1 H2 H3 H4 Body Ft Fr F3 F4 Container Handlers High Performance XML III • Filters controlled by Conversation Context convert messages between representations using permanent context (metadata) catalog to hold conversation context • Different message views for each end point or even for individual handlers and service within one end point • Conversation Context is fast dynamic metadata service to enable conversions • NaradaBrokering will implement Fr and Ft using its support of multiple transports, fast filters and message queuing; Conversation ContextURI-S, URI-R, URI-T Replicated Message Header Transported Message Handler Message View ServiceMessage View Service

  44. GIS and Sensor Grids • OGC has defined a suite of data structures and services to support Geographical Information Systems and Sensors • GML Geography Markup language defines specification of geo-referenced data • SensorML and O&M (Observation and Measurements) define meta-data and data structure for sensors • Services like Web Map Service, Web Feature Service, Sensor Collection Service define services interfaces to access GIS and sensor information • Grid workflow links services that are designed to support streaming input and output messages • We are building Grid (Web) service implementations of these specifications for NASA’s SERVOGrid

  45. Raw to GML via NaradaBrokering • The Scripps Orbit and Permanent Array Center (SOPAC) GPS station network data published in RYO format is converted to ASCII and GML

  46. A Screen Shot From the WMS Client

  47. WMS uses WFS that uses data sources <gml:featureMember> <fault> <name> Northridge2 </name> <segment> Northridge2 </segment> <author> Wald D. J.</author> <gml:lineStringProperty> <gml:LineStringsrsName="null"> <gml:coordinates> -118.72,34.243 -118.591,34.176 </gml:coordinates> </gml:LineString> </gml:lineStringProperty> </fault> </gml:featureMember>

  48. Role of WS-Context • There are many WS-* specifications addressing meta-data and both many approaches and many trade-offs • We heard about Distributed Hash Tables (Chord) to achieve scalability in large scale networks • Managed dynamic workflows as in sensor integration and collaboration require • Fault-tolerance • Ability to support dynamic changes with few millisecond delay • But only a modest number of involved services (up to 1000’s) • We are building a WS-Context compliant metadata catalog supporting distributed or central paradigms • Use for OGC Web catalog service with UDDI for slowly varying meta-data

  49. Controlling Streaming Data • NaradaBrokering capabilities can be created by messages (as in WS-*) and by a scripting interface that allows topics to be created and linked to external services • Firewall traversal algorithms and network link performance data can be accessed • HPSearch offers this via JavaScript • This scripting engine provides a simple workflow environment that is useful for setting up Sensor Grids • Should be made compatible with Web Service workflow (BPEL) and streaming workflow models Triana and Kepler • Also link to WS-Management

More Related