1 / 62

SERVOGrid and Grids for Real-time and Streaming Applications

SERVOGrid and Grids for Real-time and Streaming Applications. Grid School Vico Equense July 21 2005 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401

wynn
Télécharger la présentation

SERVOGrid and Grids for Real-time and Streaming Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SERVOGrid and Grids for Real-time and Streaming Applications Grid School Vico Equense July 21 2005 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 http://grids.ucs.indiana.edu/ptliupages/presentations/GridSchool2005/ gcf@indiana.edu http://www.infomall.org

  2. Thank you • SERVOGrid and iSERVO are major collaborations • In the USA, JPL leads project involving UC Davis and Irvine, USC and Indiana university • Australia, China, Japan and USA are current international partners • This talk takes material from talks by • Andrea Donnellan • Marlon Pierce • John Rundle • Thank you!

  3. 4: Application or Community of InterestSpecific Services such as “Run BLAST” or “Look at Houses for sale” OGSAand otherGGF/W3C/ ……… 3: Generally Useful Services and Features Such as “Access a Database” or “Submit a Job” or “ManageCluster” or “Support a Portal” or “Collaborative Visualization” WS-* fromOASIS/W3C/Industry 2: System Services and Features Handlers like WS-RM, Security, Programming Models like BPELor Registries like UDDI Apache Axis.NET etc. 1: Container and Run Time (Hosting) Environment Grid and Web Service Institutional Hierarchy • We will discuss some items at layer 4 and some at layer 1(and perhaps 2)

  4. Motivating Challenges • What is the nature of deformation at plate boundaries and what are the implications for earthquake hazards? • How do tectonics and climate interact to shape the Earth’s surface and create natural hazards? • What are the interactions among ice masses, oceans, and the solid Earth and their implications for sea level change? • How do magmatic systems evolve and under what conditions do volcanoes erupt? • What are the dynamics of the mantle and crust and how does the Earth’s surface respond? • What are the dynamics of the Earth’s magnetic field and its interactions with the Earth system? From NASA’s Solid Earth Science Working Group Report, Living on a Restless Planet, Nov. 2002

  5. US Earthquake Hazard Map US Annualized losses from earthquakes are $4.4 B/yr

  6. Characteristics of Solid Earth Science • Widely distributed heterogeneous datasets • Multiplicity of time and spatial scales • Decomposable problems requiring interoperability for full models • Distributed models and expertise Enabled by Grids and Networks

  7. Facilitating Future Missions SERVOGrid develops the necessary infrastructure for future spaceborne missions such as gravity or InSAR (interferometric Synthetic Aperture Radar) Satellite. This can measure land deformation by comparing samples

  8. Interferometry Basics t t 2 Single Pass (Topography) 1 Repeat Pass (Topographic Change) A 2 B A r + d r r ( ) t 1 2 r t ( ) D r 1 change h t t 1 2 z

  9. The Northridge Earthquake was Observed with InSAR The Mountains grew 40 cm as a result of the Northridge earthquake. 1993–1995 Interferogram

  10. Objective Develop real-time, large-scale, data assimilation grid implementation for the study of earthquakes that will: • Assimilate (means integrate data with model) distributed data sources and complex models into a parallel high-performance earthquake simulation and forecasting system • Real-time sensors (support high performance streams) • Simplify data discovery, access, and usage from the scientific user point of view (using portals) • Support flexible efficient data mining (Web Services)

  11. Data Assimilation Information Simulation Model Datamining Ideas Reasoning Data Deluged Science Computing Paradigm Informatics ComputationalScience

  12. Field Trip Data Database ? GISGrid Discovery Services RepositoriesFederated Databases Streaming Data Sensors Database Sensor Grid Database Grid Research Education SERVOGrid Compute Grid Customization Services From Researchto Education Data FilterServices ResearchSimulations Analysis and VisualizationPortal EducationGrid Computer Farm Grid of Grids: Research Grid and Education Grid

  13. Solid Earth Research Virtual Observatory • Web-services and portal based Problem Solving Environment • Couples data with simulation, pattern recognition software, and visualization software • Enable investigators to seamlessly merge multiple data sets and models, and create new queries. • Data • Space-based observational data • Ground-based sensor data (GPS, seismicity) • Simulation data • Published/historical fault measurements • Analysis Software • Earthquake fault • Lithospheric modeling • Pattern recognition software

  14. Component Grids • We build collections of Web Services which we package as component Grids • Visualization Grid • Sensor Grid • Management Grid • Utility Computing Grid • Collaboration Grid • Earthquake Simulation Grid • Control Room Grid • Crisis Management Grid • Intelligence Data-mining Grid • We build bigger Grids by composing component Grids using the Service Internet

  15. Gas CIGrid Flood CIGrid … … Gas Servicesand Filters Flood Servicesand Filters Electricity CIGrid Portals Collaboration Grid Visualization Grid Sensor Grid GIS Grid Compute Grid Data Access/Storage Registry Metadata Core Grid Services Physical Network Security Notification Workflow Messaging Critical Infrastructure (CI) Grids built as Grids of Grids

  16. QuakeSim Portal Shots

  17. 1000 Years of Simulated Earthquakes Simulations show clustering of earthquakes in space and time similar to what is observed.

  18. SERVOGrid Apps and Their Data • GeoFEST: Three-dimensional viscoelastic finite element model for calculating nodal displacements and tractions. Allows for realistic fault geometry and characteristics, material properties, and body forces. • Relies upon fault models with geometric and material properties. • Virtual California: Program to simulate interactions between vertical strike-slip faults using an elastic layer over a viscoelastic half-space. • Relies upon fault and fault friction models. • Pattern Informatics: Calculates regions of enhanced probability for future seismic activity based on the seismic record of the region • Uses seismic data archives • RDAHMM: Time series analysis program based on Hidden Markov Modeling. Produces feature vectors and probabilities for transitioning from one class to another. • Used to analyze GPS and seismic catalog archives. • Can be adapted to detect state change events in real time.

  19. Pattern Informatics (PI) • PI is a technique developed by john rundle at University of California, Davis for analyzing earthquake seismic records to forecast regions with high future seismic activity. • They have correctly forecasted the locations of 15 of last 16 earthquakes with magnitude > 5.0 in California. • See Tiampo, K. F., Rundle, J. B., McGinnis, S. A., & Klein, W. Pattern dynamics and forecast methods in seismically active regions. Pure Ap. Geophys. 159, 2429-2467 (2002). • http://citebase.eprints.org/cgi-bin/fulltext?format=application/pdf&identifier=oai%3AarXiv.org%3Acond-mat%2F0102032 • PI is being applied other regions of the world, and has gotten a lot of press. • Google “John Rundle UC Davis Pattern Informatics”

  20. Real-time Earthquake Forecast Seven large events with M  5 have occurred on anomalies, or within the margin of error: • Big Bear I, M = 5.1, Feb 10, 2001 • Coso, M = 5.1, July 17, 2001 • Anza, M = 5.1, Oct 31, 2001 • Baja, M = 5.7, Feb 22, 2002 • Gilroy, M=4.9 - 5.1, May 13, 2002 • Big Bear II, M=5.4, Feb 22, 2003 • San Simeon, M = 6.5, Dec 22, 2003 Plot of Log10P(x) Potential for large earthquakes, M  5, ~ 2000 to 2010 JB Rundle, KF Tiampo, W. Klein, JSS Martins, PNAS, v99, Supl 1, 2514-2521, Feb 19, 2002; KF Tiampo, KF Tiampo, JB Rundle, S. McGinnis, S. Gross and W. Klein, Europhys. Lett., 60, 481-487, 2002

  21. World-Wide Forecast Hotspot Map for Likely Locations ofGreat Earthquakes M  7.0 For the Decade 2000-2010Green Circles = Large Earthquakes M  7 from Jan 1, 2000 – Dec 1, 2004 World-Wide Forecast Hotspot Map Green Circles = Large Earthquakes M  7 from Jan 1, 2000 – Dec 1, 2004 Blue Circles: Large Earthquakes from December 1, 2004 - Present World-Wide Earthquakes, M > 5, 1965-2000 Dec. 26 M ~ 9.0 Northern Sumatra Dec. 23 M ~ 8.1 Macquarie Island

  22. Pattern Informatics in a Grid Environment • PI in a Grid environment: • Hotspot forecasts are made using publicly available seismic records. • Southern California Earthquake Data Center • Advanced National Seismic System (ANSS) catalogs • Code location is unimportant, can be a service through remote execution • Results need to be stored, shared, modified • Grid/Web Services can provide these capabilities • Problems: • How do we provide programming interfaces (not just user interfaces) to the above catalogs? • How do we connect remote data sources directly to the PI code. • How do we automate this for the entire planet? • Solutions: • Use GIS services to provide the input data, plot the output data • Web Feature Service for data archives • Web Map Service for generating maps • Use HPSearch tool to tie together and manage the distributed data sources and code.

  23. Japan

  24. GIS and Sensor Grids • OGC has defined a suite of data structures and services to support Geographical Information Systems and Sensors • GML Geography Markup language defines specification of geo-referenced data • SensorML and O&M (Observation and Measurements) define meta-data and data structure for sensors • Services like Web Map Service, Web Feature Service, Sensor Collection Service define services interfaces to access GIS and sensor information • Grid workflow links services that are designed to support streaming input and output messages • We are building Grid (Web) service implementations of these specifications for NASA’s SERVOGrid

  25. A Screen Shot From the WMS Client

  26. WMS uses WFS that uses data sources <gml:featureMember> <fault> <name> Northridge2 </name> <segment> Northridge2 </segment> <author> Wald D. J.</author> <gml:lineStringProperty> <gml:LineStringsrsName="null"> <gml:coordinates> -118.72,34.243 -118.591,34.176 </gml:coordinates> </gml:LineString> </gml:lineStringProperty> </fault> </gml:featureMember> Can add Google or Yahoo Map WMS Web Services

  27. SOPAC GPS Sensor Services • The Scripps Orbit and Permanent Array Center (SOPAC) GPS station network data published in RYO format is converted to ASCII and GML

  28. Position Messages • SOPAC provides 1-2Hz real-time position messages from various GPS networks in a binary format called RYO. • Position messages are broadcasted through RTD server ports. • We have implemented tools to convert RYO messages into ASCII text and another that converts ASCII messages into GML.

  29. WS WS WS X X X X Stream WS X Data-mining, Archiving Web Services Pub-Sub Queued Stream Control SOPAC GPS Services • We implemented services to provide real-time access to GPS position messages collected from several SOPAC networks. • Data Philosophy: post all data before any transformations; post transformed data • Data are streams and not files; they can be archived to files • Then we couple data assimilation tools (such as RDAHMM) to real-time streaming GPS data. • Next steps include a Sensor Collection Service to provide metadata about GPS sensors in SensorML.

  30. Real-Time Access to Position Messages • We have a Forwarder tool that connects to RTD server port to forward RYO messages to a NB topic. • RYO to ASCII converter service subscribes this topic to collect binary messages and converts them to ASCII. Then it publishes ASCII messages to another NB topic. • ASCII to GML converter service subscribes this topic and publishes GML messages to another topic.

  31. RDAHMM GPS Signal AnalysisCourtesy of Robert Granat, JPL DrainReservoir Earthquake

  32. Handling Streams in Web Services • Do not open a socket – hand message to messaging system • Use Publish-Subscribe as overhead negligible • Model is totally asynchronous and event based • Messaging system is a distributed set of “SOAP Intermediaries” (message brokers) which manage distributed queues and subscriptions • Streams are ordered sets of messages whose common processing is both necessary and an opportunity for efficiency • Manage messages and streams to ensure reliable delivery, fast replay, transmission through firewalls, multicast, custom transformations

  33. Different ways of Thinking • Services and Messages – NOT Jobs and Files • Service Internet: Packets replaced by Messages • The BitTorrent view of Files • Files are chunked into messages which are scattered around the Grid • Chunks are re-assembled into contiguous files • Streams replace files by message queues • Queues are labeled by topics • System MIGHT chose to backup queues to disk but you just think of messages on distributed queuestimes • Note typical time to worry about is a Millisecond • Schedule stream-based services NOT jobs

  34. DoD Data Strategy • Only Handle Information Once (OHIO) – Data is posted in a manner that facilitates re-use without the need for replicating source data. Focus on re-use of existing data repositories. • Smart Pull (vice Smart Push) – Applications encourage discovery; users can pull data directly from the net or use value added discovery services (search agents and other “smart pull techniques). Focus on data sharing, with data stored in accessible shared space and advertised (tagged) for discovery. • Post at once in Parallel – Process owners make their data available on the net as soon as it is created. Focus on data being tagged and posted before processing (and after processing).

  35. NaradaBrokering Queues Stream NB supports messages and streams NB role for Grid is Similar to MPI role for MPP

  36. Traditional NaradaBrokering Features

  37. Features for July 12 2005 Releases • Production implementations of WS-Eventing, WS-RM and WS-Reliability. • WS-Notification when specification agreed • SOAP message supportand NaradaBrokers viewed as SOAP Intermediaries • Active replay support: Pause and Replay live streams. • Stream Linkage: can link permanently multiple streams – using in annotating real-time video streams • Replicated storage support for fault tolerance and resiliency to storage failures. • Management: HPSearch Scripting Interface to streams and services • Broker Discovery: Locate appropriate brokers

  38. Mean transit delay for message samples in NaradaBrokering: Different communication hops 9 hop-2 hop-3 8 hop-5 7 hop-7 6 5 Transit Delay (Milliseconds) 4 3 2 1 0 100 1000 Message Payload Size (Bytes) Pentium-3, 1GHz, 256 MB RAM 100 Mbps LAN JRE 1.3 Linux

  39. Consequences of Rule of the Millisecond • Useful to remember critical time scales • 1) 0.000001 ms – CPU does a calculation • 2a) 0.001 to 0.01 ms – Parallel Computing MPI latency • 2b) 0.001 to 0.01 ms – Overhead of a Method Call • 3) 1 ms – wake-up a thread or process (do simple things on a PC) • 4) 10 to 1000 ms – Internet delay • 2a), 4) implies geographically distributed metacomputing can’t in general compete with parallel systems • 3) << 4) implies a software overlay network is possible without significant overhead • We need to explain why it adds value of course! • 2b) versus 3) and 4) describes regions where method and message based programming paradigms important

  40. Possible NaradaBrokering Futures • Support for replicated storages within the system. • In a system with N replicas the scheme can sustain the loss of N-1 replicas. • Clarification and expansion of NB Broker to act as a WS container and SOAP Intermediary • Integration with Axis 2.0 as Message Oriented Middleware infrastructure • Support for High Performance transport and representation for Web Services • Needs Context catalog under development • Performance based routing • The broker network will dynamically respond to changes in the network based on metrics gathered at individual broker nodes. • Replicated publishers for fault tolerance • Pure client P2P implementation (originally we linked to JXTA) • Security Enhancements for fine-grain topic authorization, multi-cast keys, Broker attacks

  41. Controlling Streaming Data • NaradaBrokering capabilities can be accessed by messages (as in WS-*) and by a scripting interface that allows topics to be created and linked to external services • Firewall traversal algorithms and network link performance data can be accessed • HPSearch offers this via JavaScript • This scripting engine provides a simple workflow environment that is useful for setting up Sensor Grids • Should be made compatible with Web Service workflow (BPEL) and streaming workflow models Triana and Kepler • Also link to WS-Management

  42. NaradaBrokering topics

  43. Role of WS-Context • There are many WS-* specifications addressing meta-data and both many approaches and many trade-offs • There are Distributed Hash Tables (Chord) to achieve scalability in large scale networks • Managed dynamic workflows as in sensor integration and collaboration require • Fault-tolerance • Ability to support dynamic changes with few millisecond delay • But only a modest number of involved services (up to 1000’s) • We are building a WS-Context compliant metadata catalog supporting distributed or central paradigms • Use for OGC Web catalog service with UDDI for slowly varying meta-data

  44. Publish-Subscribe Streaming Workflow: HPSearch • HPSearch is an engine for orchestrating distributed Web Service interactions • It uses an event system and supports both file transfers and data streams. • Legacy name • HPSearch flows can be scripted with JavaScript • HPSearch engine binds the flow to a particular set of remote services and executes the script. • HPSearch engines are Web Services, can be distributed interoperate for load balancing. • Boss/Worker model • ProxyWebService: a wrapper class that adds notification and streaming support to a Web Service.

  45. HPSearch (TRex) HPSearch (Danube) Actual Data flow HPSearch controls the Web services Final Output pulled by the WMS HPSearch Engines communicate using NB Messaging infrastructure Data can be stored and retrieved from the 3rd part repository (Context Service) WS Context (Tambora) WFS (Gridfarm001) NaradaBroker network: Used by HPSearch engines as well as for data transfer WMS Data Filter (Danube) Virtual Data flow WMS submits script execution request (URI of script, parameters) HPSearch hosts an AXIS service for remote deployment of scripts • PI Code Runner • (Danube) • Accumulate Data • Run PI Code • Create Graph • Convert RAW -> GML GML (Danube)

  46. H1 H2 H3 H4 Body Service F1 F2 F3 F4 Container Workflow Container Handlers SOAP Message Structure I • SOAP Message consists of headers and a body • Headers could be for Addressing, WSRM, Security, Eventing etc. • Headers are processed by handlers or filters controlled by container as message enters or leaves a service • Body processed by Service itself • The header processing defines the “Web Service Distributed Operating System” • Containers queue messages; control processing of headers and offer convenient (for particular languages) service interfaces • Handlers are really the core Operating system services as they receive and give back messages like services; they just process and perhaps modify different elements of SOAP Message

  47. Merging the OSI Levels • All messages pass through multiple operating systems and each O/S thinks of message as a header and a body • Important message processing is done at • Network • Client (UNIX, Windows, J2ME etc) • Web Service Header • Application • EACH is < 1ms (except forsmall sensor clients andexcept for complex security) • But network transmissiontime is often 100ms or worse • Thus no performance reasonnot to mix up places processingdone IP TCP App SOAP

  48. Application Specific Grids Generally Useful Services and Grids Workflow WSFL/BPEL Service Management (“Context etc.”) Service Discovery (UDDI) / Information Service Internet Transport  Protocol Service Interfaces WSDL Higher Level Services ServiceContext ServiceInternet Base Hosting Environment Protocol HTTP FTP DNS … Presentation XDR … Session SSH … Transport TCP UDP … Network IP … Data Link / Physical Bit level Internet (OSI Stack) Layered Architecture for Web Services and Grids

  49. WS-* implies the Service Internet • We have the classic (CISCO, Juniper ….) Internet routing the flood of ordinary packets in OSI stack architecture • Web Services build the “Service Internet” or IOI (Internet on Internet) with • Routing via WS-Addressing not IP header • Fault Tolerance (WS-RM not TCP) • Security (WS-Security/SecureConversation not IPSec/SSL) • Data Transmission by WS-Transfer not HTTP • Information Services (UDDI/WS-Context not DNS/Configuration files) • At message/web service level and not packet/IP address level • Software-based Service Internet possible as computers “fast” • Familiar from Peer-to-peer networks and built as a software overlay network defining Grid (analogy is VPN) • SOAP Header contains all information needed for the “Service Internet” (Grid Operating System) with SOAP Body containing information for Grid application service

More Related