QoS management prototype in WS-DIAMOND

QoS management prototype in WS-DIAMOND LAAS-CNRS Paris meeting 11-12 October 2007

Outline • The constituting modules: • Summary • Details: • Monitoring • Measurement • Diagnosis & Planning • Repair • Relaxing assumption for: • Summary • Details: • Local vs global properties • QoS Manager deployment • Provided functions • Internal architecture • Associated approach

Summary (1/2):characteristics • QoS degradations are the handled faults. • Their detection is based on statiscal analysis of logged QoS parameters. • Two models are used: chronicles, and Markovian models. • Monitoring & repair based on : • Intercepting communication messages between providers and requesters, • Meta-level communication extending message headers by metadata describing QoS information. • Repair by reconfiguration • Is based on class-level substitution and duplication, • Acts by re-routing requests to different services

Summary (2/2): Two possible levels for implementation • Level1: Management at the SOAP level • High level programming based on • Container-provided handlers for interception and standard XML parsing libraries • Reflexive programming libraries for Java • Reduced access to information: SOAP envelope • No information about IP address (requester,provider) • Needs manageable WS • Appropriate for stateless WS but needs more assumption and cooperation to manage statefull WS, asynchronous interaction and global repair • Level2: Management at the HTTP level • Augmented access to routing-information: • May handle sessions, may handle asynchronous interactions, useful for stateful services • Low level programming, Socket-based • HHTP proxies, handling of HTML messages(including SOAP part)

Monitoring (1/1):The implemented module • Provided functions: Computes and stores the observed QoS parameter values. • Internal architecture and implementation: Two interceptors and a log database: • Requester Side Interceptor: • Implemented as a handler of the web service requester • Computes QoS values involved in the diagnosis process from the requester side. • Stores these values in database. • Provider Side Interceptor: • Implemented as a handler of the web service container • Computes QoS values involved in the diagnosis process from the provider side • Stores these values in the log database. • The log database: • Implementation technologies: • Axis Handlers for the interceptors • MySQL5 for the log database.

Measurement (1/5): The approach • Main goal of measurement: detection of QoS degradation • Approach based on identification of temporal chronicles as QoS degradation symptoms • Some experimented temporal chronicles: • Example 1: N “consecutive” Texec greater than the average time : • Texeci>(AVGTexec+delay) • Example 2: N “consecutive” accelerations of Texeci: (without deceleration between Texeci) • TexecN>TexecN-1> … >Texec2>Texec1 • Example 3: Texec increases abruptly: • Texec2 >> Texec1

Measurement (2/5): Chronicles examples (1/3) Pre-computed average (AVG should be deduced from large scale experiments, AGV=Constant) Texec Chronicle1 not triggered with N=3 Chronicle1 is triggered with N=3: t1,t2,t3>AVG+delay t3 t2 t1 AVG+delay delay AVG - Legend Time Acceptable values Monitored Texec

Measurement (3/5): Chronicles examples (2/3) On the fly-computed average (AVG should be deduced from current QoS parameter values) Texec Chronicle1 is triggered with N=3: t1,t2,t3>AVG+delay Chronicle1 not triggered with N=3 t1 t3 t2 delay AVG+delay AVG - Legend Time Acceptable values Monitored Texec Real time computed AVG value Real time computed AVG+delay value

Measurement (4/5): Chronicles examples (3/3) • Chronicle associated with example 1: • Messages: • message TexecViolation • message TexecOK • Triggering time condition: • TexecViolation: Texec>(AVGTexec+delay) • TexecOK: Texec<=(AVGTexec+delay) • Events: • Event(TexecViolation, t0) • Event(TexecViolation, t1) • Event(TexecViolation, t2) • No Event of type TexecOK between t0 and t2 • NoEvent(TexecOK,(t0,t2)) • Temporal constraints between instants : • t0<t1<t2

Measurement (5/5):The implemented module • Provided functions: • Extracts data logged by the monitoring module in the Log database • Detects misbehaviors (QoS degradation) • Approach based on statistical functions and elementary temporal chronicles detection • Sends alarms to the diagnosis and repair planning module. • Internal architecture and implementation: a single web service

Diagnosis & planning (1/2): The approaches • Focuses on a local diagnosis related to a single service • Main goal: Identify QoS degradation source • Discriminating faults of the communication and processing levels • Approach based on reasoning about execution and response times • Degradation(response time) & not Degradation (execution time) ==> communication level problem • Degradation(response time) & Degradation (execution time) ==> processing level problem • Considered plans: • Communication problem: service substitution • Processing problem: service duplication

Diagnosis & planning (2/2): The implemented module • Provided functions: • A diagnosis by reasoning about the alarms received from the measurement module • Generates the repair plan to be performed by the reconfiguration execution module • Internal architecture and implementation: a single web service

Repair(1/2):The approach • Main goal: eliminate source of detected QoS degradation • Approach based on architectural reconfiguration targeting class-level repair • Considered elementary reconfiguration actions: • Service substitution • Service duplication

Repair(2/2): The implemented module • Provided functions: • Performs architectural reconfiguration using dynamically bound connectors between requesters and providers • Service substitution based on 1 to 1 equivalence specified in a predefined list • Internal architecture and implementation: • Two web services: • One for the automated dynamic code generation of the new connector • The second redeploys the new connector • Implementation technologies: • Java Reflection, Java Runtime Compilation, WSDL Runtime Compilation, XML Parsing

Validation & experiments • First validation : • Prototype acting on a simple web service implementing arithmetic functions. • Delays are added in the web service itself to trigger QoS degradation • Current validation: FoodShop • Windows Version provided by POLIMI • The FoodShop simulator has been ported on Linux and deployed successfully on Grid5000 with provider-side interceptors and connectors • Current possible invocations through the SOAP User Interface • A non-interactive Java client is required to deal with QoS monitoring and measurement at the requester-side • Next: Fault Injector of POLIMI may be used at the SOAP level for triggering QoS degradation • Experiments: • Will be conducted on the Grid5000 for hundreds of requests to extract significant chronicles for the FoodShop example under critical utilization conditions. • We will rely on first developments and experiments with another toy example of collaborative activities.

Relaxing assumptions on local and global properties of monitoring, measurement, diagnosis & planning and repair WS-Diamond review LAAS-CNRS September 2007

Summary for Relaxing assumption for QoS • All kind of services • Black box assumptions: • WSDL is known • Global knowledge on the interdependencies between services • Messages are intercepted, extended, rerouted • Deployment assumptions • Complex services • Class-level, Global diagnosis & repair with formal models (chronicles, Markov) • Experiments: • Foodhop application • HTTP&SOAP level handling • Implementing integration with process-level functional self-healing modules • Only non orchestrated services • Black box assumptions: • Only WSDL is known • Messages are intercepted, extended, rerouted • Deployment assumptions • Simple services: N requester, 1 provider • Class-level, Local diagnosis & repair with simple model • Experiments: • Scalability on Grid5000 • Simple application • SOAP level handling • Study of integration with process-level functional self-healing modules

Architecture (1/4): • Monitoring • Distributed & Local • Storage: 1DB/Provider & all its Requesters • Next: • Distributed & global Asynchronous (one-way) T2 T1 BD1 BD QoS Manager1 WS1 WS2 WS1 Req1 I I QoS Manager2 QoS Manager1 QoS Manager2 T2’ T1’ I Req2 WS2 I I BD2 Global Monitoring Local Monitoring

Architecture (2/4): • Measurement • Distributed & Local • Next: • Distributed & global Asynchronous (one-way) BD1 Measurement Measurement1 T2 T1 WS1 Req1 I I BD QoS Manager1 WS1 WS2 QoS Manager2 I Measurement2 Req2 WS2 I T2’ T1’ I BD2 Global Measurement Local Measurement • Global Measurement: • We consider QoS parameters involving several requests/responses in the same work flow: • Response time= T2’-T1, Execution time= T1’-T2, etc… • Local Measurement: • We consider only QoS parameters related to single request/response

Architecture (3/4): • Diagnosis & Repair Planning • Distributed • Local • Next: • Distributed global • Why Global Diagnosis? • Example : interlocked web services WS2 WS1 Req1 Tresp2 Tresp1 WS1 WS2 Req1 Sequence diagram Tresp1 Tresp2 • (Tresp1>>Tresp1Avg) && (Tresp2 >> Tresp2Avg) ==> Error detection • If Diag_Local(WS1)==> WS1 QoS degradation • Substitute (WS1,WS1’) [WS1’equivalent to WS1] • If Diag_Global(WS1,WS2)==> WS2 QoS degradation • Local Repair: Substitute(WS1,WS1”) [Where: WS1” Not_Connected_to WS2] • Global Repair: Substitute(WS2,WS2’) [Where: WS2’equivalent to WS2]

Architecture (4/4): • Repair Enforcement • Distributed • Local (1 ReconfMgr/provider and all its requesters) • Next: • Global : Enable coordination between reconfiguration actions in order to avoid looping and to cooperate simultaneous reconfiguration process. • Replace (ws1 U ws2) by (ws3 u ws4) where • (ws1 U ws2) == (ws3 U ws4) • ws1 != ws3 and ws2 != ws4 Fgh Fg WS3 WS1 Req1 Req1 hi WS4 WS2 i

Shop BPEL Local Shop WS Local Sup1 WS Local Sup2 WS Sup1 Sup2 WH BPEL BPEL BPEL Local WH WS Possible applications of QoS management to the FoodShop SH-BPEL QoS Manager1 QoS Manager4 Client Report “Integration and consistency problems”: is being completed QoS Manager3 QoS Manager2

Assumption on QoS Manager deployment WS-Diamond review LAAS-CNRS September 2007

Assumption on QoS Manager deployment (1/4) • First case: Pair of WS-Requester (WS-R) and WS-Provider (WS-P) without BPEL on the both sides. (Requester may be a simple client…) • Deployment possibilities: • Modifying requester code to send requests to the FS (Require access to the requester code), OR • Placing the FS in the URL of the initial WS-P (Require access privileges to the Provider-side to deploy directory) WS-R WS-P

Assumption on QoS Manager deployment (2/4) • Second case: With PBEL on the left side (Requester-side) • Loss of communication-related errors, no request-side interception for standard BPEL. • May be possible for SH-BPEL which may provide QoS values of requester-side. • Deployment possibilities: • Modifying BPEL code to send requests to the FS (Require access to BPEL code), OR • Placing the FS in the URL of the initial WS-P (Require access privileges to the Provider-side to deploy directory) BPEL-R WS-P

Assumption on QoS Manager deployment (3/4) • Third case: With PBEL on the right side (Provider-side) • Monitoring is Ok. • Repair by class-level substitution is OK: • We can maintain the session information which is saved in the request SOAP Header (RelatesTo, MessageID, etc.) • Deployment possibilities: • Modifying WS-R code to send requests to the FS (Require access to WS-R code) WS-R BPEL-P

Assumption on QoS Manager deployment (4/4) • Fourth case: With PBEL on the both sides • Loss of communication-related errors, no request-side interception for standard BPEL. • May be possible for SH-BPEL which may provide QoS values of requester-side. • Repair by class-level substitution is OK: • We can maintain the session information which is saved in the request SOAP Header (RelatesTo, MessageID, etc.) • Deployment possibilities: • Modifying BPEL code to send requests to the FS (Require access to BPEL code) BPEL-R BPEL-P

QoS management prototype in WS-DIAMOND