Software Reliability for Web Services

Software Reliability for Web Services Utku ÖZBEK 2006703363

Outline • Introduction • Group Testing For Reliability • Application of this Reliability Model • Weather services application • Results of the application • ASTRAR Group Testing • Application of this Group Testing Model • A real-time stock-buy-sell web service application • Results of the application • Conclusion • References • Question & Answer

Introduction • Software development is shifting • from theproduct-oriented paradigm • to the service-orientedparadigm • Web Services (WS) • services that are offered through Web andInternet technology • Examples: tax return service, stockranking service, and equations-solving service • can beoffered by many service providers, based on the same theories but different implementations • trustworthiness and dependability problem

Introduction • Web Services (WS) • Under SOA and WS, a system consists of a collection ofloosely coupled services • These services can make useof each other's services to achieve their own desired goalsand end results • Simple services can cooperate in this wayto form a complex or composite service dynamically and at runtime

Introduction • History on WS testing: • In phase one, WS are essentially tested like ordinary software • In phase two (2003-2005), the following are included in testing: • publishing, finding, and binding capabilities of an SOA (Service-Oriented Architecture) • the asynchronous capabilities of WS • the SOAP (Simple Object Access Protocol) intermediary capability • the quality of services. • In phase three (2004 and beyond), the following are included in testing: • dynamic runtime capabilities, • WS versioning, and WS orchestration testing, which invoke remote WS in a specific order to test their interoperability.

Introduction • History on WS testing: • Both clients and service providers must be involved in WStesting • issues must be addressed during WSdevelopment including: • Security • Interoperability • UDDI(Universal Description, Discovery, and Integration) registration • Performance considerations

Introduction • This presentation proposes • a Service-Oriented softwareReliability Model (SORM) • This model evaluates thereliability of WS in two steps: • Use highly efficientgroup testing to evaluate the reliability of atomic services • Evaluate the reliability of a composite servicebased on the reliabilities of the component services • a technique to test large numberof WS simultaneously • to determine the oracle andcorrectness of the WS under test by majority voting • to provide quality ranking of WS and the test cases

Group Testing For Reliability • WebStrar • Web Services Testing • Reliability Assessment • Ranking services • Directory services

Group Testing For Reliability • The WebStrar can take registration from service providers and various kinds of service brokers • It considers the services registered as atomic services and uses them to compose composite services • An atomic service is a service agent submitted by a service provider that does not call other WS and thus should be treated as a unit that is not to be broken like atom • A composite service is a service agent submitted by a service provider that uses (calls) other WS • Both atomic and composite services can be provided by the WebStrar directly to the clients

Group Testing For Reliability • Group testing technique was originally developed fortesting large samples of blood • It is used to testcomplex composite WS at runtime • It tests thecontamination of an entire group of services by applyingone test.

Group Testing For Reliability • Assume CSn is a composite service consisting ofn services S1, S2, ..., Sn, where Si can be an atomic service • Assumeservices S11, S12, ..., S1m are functionally equivalent to theservice S1 in CSn. • We can forward (broadcast) the input toS1 to S11, S12, ..., S1m • The resultsfrom all services, including that from S1, are voted by avoting service • The voting is weighted based on thecurrent reliabilities of the services under test. • The votingservice can set the initial weight of each incoming serviceto zero while the exiting service S1's weight to thereliability R(S1) • The voting service detects faults bycomparing the output of each service with the weightedmajority output. A disagreement indicates a fault.

Group Testing For Reliability • The reaibility of the services is calculated using formula: • Where: • the reliability of service S at time point t is R(S, ∆t) • In thenext ∆t time, k runs are executed and f disagreementshave been detected • M is the total number of tests that the service hasever been tested

Group Testing For Reliability • The advantages of the model include: • One of the toughest problems in software testing is to construct an oracle that can determine if a fault has occurred. In this model, the voting service serves as the oracle according to the majority principle. • The model estimates the reliability of each incoming service while performing the normal operation. In other words, the incoming services are tested in the real operational environment at no extra time if sufficient computing power is available. • The model is dynamic, i.e., the data are collected and computed at runtime in real time. The reliability of each service involved in group testing is updated after each run or after a given period of time.

Group Testing For Reliability • One situation in which SORM would not work well is: • when there are no alternative services available • In thiscase, the SOA is basically degraded to the traditionalsoftware architecture: • The service is only tested by theservice provider in its development cycle. • However, thisis an unlikely situation because SOA is an open platformthat allows and encourages cooperation and competitionamong service providers to create increasingly improvedservices.

Application of Reliability Models • Examples to illustrate the applications of the proposed service-oriented reliability model: • Assume a space agency plans to launch a satellite on a specific date and from a specific location • The launch is heavily depends on the weather conditions in the launch location, including rain, wind, and temperature • They designed 10 independent weather services, each of which offers three component services: • RainForecast, TempForecast, and WindForecast • The forecasts are given by their probabilities

Evaluation of component Services • To build a trust on the reliability of the component services, the space agency puts them in a group testing framework, and sets their initial reliability to zero • After a period of group testing, the space agency has the reliability estimation of each service. Table 2 shows a set of sample results obtained in their experiments.

Evaluation of component Services • The first column of the table lists the component services under test. • The second column shows the highest reliability of the service in the given test period. • Column 3 shows the forecasted probabilities of heavy rain, extreme temperature, and the strength of the wind, respectively, of the component services. • Column 4, shows the adjusted forecast probabilities from the service by taking the reliability of the service into account, which are the final evaluation values for the componentservices

Evaluation of Composite Services • To base the decision whether to change the launch date on the most accurate whether forecasting information, the space agency then constructed a composite service, as shown in Figure 3. • The decision is based on these two factors: • The numbers in the diamond boxes are the reliability of the best component services • The numbers on the branches are the probabilities forecasted by the best service

Evaluation of Composite Services

Evaluation of Composite Services • Assume that the plan of launch is made a year before the launch date. The composite service is up and running from day one. • At the beginning, the space agency would have little data about the reliability of each service and the weather forecast a year before the launch date won't be accurate too. • However, by the time, say a month or a week before the launch, we already have sufficient data about the reliability of the services. • These reliability data will be used in the future applications too. When the agency plans its next launch, or another event that needs weather forecast, it already has the reliability data.

Results of the application • Design Of Experiment (DOE) is an engineering technique that can be used to determine the extend of the impact of the parameters (factors) of a model on the final results. • applies DOE to analyze the impact of the reliability of the component services on the reliability of the composite service • three factors in our example, the reliabilities of • RainForecast • TempForecast • WindForecast • They use 2 level DOE techniques, i.e., use high and low values of each factor: RainForecast (70%, 90%), TempForecast (90%, 99%) and WindForecast (85%, 95%). • The 3-factor and 2-level design generated an ANOVA (ANalysis Of Ariance) table

Results of the application • The F-Value represents the significance of the impact of a model and its components. • In general, if a component generates a significance value “Prob>F-Value” of less than 0.05, the impact of the componet is significant

Results of the application • The experiment results in Table 3 also show that the FValuesand significances of RainForecast, TempForecast,and WindForecast are all less than 0.0001, and thus theyare all significant model components

Results of the application • the higher the component reliability, the higher the overall reliability • The impact of the RainForecast service is much more significant than that of the others • The space agency should pay more attention to the quality of rain-forecast service-provider

Results of the application • The evaluation process is dynamic and at runtime. • The vastly available WS on-line make it necessary to perform group testing, which, in turn, makes it possible to identify the correct service output • without having to design an oracle

ASTRAR Group Testing • a technique to test large number of WS simultaneously, to determine the oracle and correctness of the WS under test by majority voting, and to provide quality ranking of WS and the test cases. • can be used by WS service providers, brokers, and clients. A WS provider or client can use the technique to find the best WS for composing new services or applications. • For example, a WS provider can compose a digital imaging using the Fast Fourier Transformation service as a component service. • A WS broker can use the technique to evaluate the quality of WS trying to be registered to make sure only WS with reasonable quality will be offered to the public.

ASTRAR Group Testing • These techniques are used to rank different WS implementations based on the same specification, the same business logic, and the same input and internal states. • In other words, the WS under group testing should produce the same or close results if the same inputs are applied, • e.g., various Fast Fourier Transformation WS should produce the same or close results based on the same input

ASTRAR Group Testing • technique proposed here has the following advantages: • It can test large number of WS rapidly and rank them according to the test results • It can automatically create the oracle of test cases, • i.e.,the expected outputs for the given inputs • It can rank the effectiveness of test cases and thus apply the most effective test cases first to eliminate unacceptable WS quickly • Most of the steps in the process can be completely automated and this feature makes this process attractive for commercial applications.

ASTRAR Group Testing • A group testing technique, originally developed for testing a large number of blood samples and later for software regression testing is an attractive solution to address : • The Service-Oriented Architecture (SOA) based WS broker allows WS developers and providers to freely register WS and compose complex WS from other WS dynamically • As a result, for each WS specification, many alternative implementations may be available.

ASTRAR Group Testing • ASTRAR can test a large number of WS at both the unit and integration levels. • At each level, the testing process has two phases: • Phase 1: Training Phase • Phase 2: Volume Testing Phase

Phase 1: Training Phase • The process assumes that a reasonably large number of test inputs or test cases are available to test the concerned WS before the start of this phase 1) Select a subset of WS randomly from the set of all WS to be tested. The size of the subset will be experimentally decided. 2) Group testing: Apply each test case in the given set of test cases to test all the WS in the selected subset. 3) Voting: For each test input, the outputs from the WS under test are voted by a stochastic voting mechanism based on majority and deviation voting principles. 4) Failure detection and reliability computation: Compare the majority output with the individual output. A disagreement indicates a component failure. A dynamic reliability model is used to compute the reliability of each WS based on the failure rate and other factors.

Phase 1: Training Phase 5) Oracle establishment: If a clear majority output is found, the output is used to form the oracle of the test case that generates the output. A confident level is defined based on the extent of the majority. The confident level will be dynamically adjusted in the phase 2 as well. 6) Test case ranking: Test cases will be ranked according to their fault detection capacity, which is proportional to the number failures the test cases detect. In the phase 2, the higher ranked test cases will be applied first to eliminate the WS that failed to pass the test. 7) WS ranking: The stochastic voting mechanism will not only find a majority output, but also rank the WS under group testing according to their average deviation to the majority output.

Phase 1: Training Phase • By the end of training phase testing, they have tested the selected sample WS and they have the test cases ranked by their capability so far in detecting failures; • the oracle for test cases established with respect to their confidence levels • the sample WS are ranked

Phase 2: Volume Testing Phase • This phase continues to test the remaining WS and any newly arrived WS • based on the profiles and history (test case effectiveness, oracle, and WS ranking) obtained in the training phase. • Phase 2 continues to rank the WS, rank test cases, and update the oracles. 1) Test cases have been ranked by their capabilities in detecting failures/faults in Phase 1. Now they are divided into layers, with layer one having the highest capability. 2) Select layer one test cases and apply them in the next step 3) For each layer of test cases, group-test all the WS

Phase 2: Volume Testing Phase 4) If an oracle with acceptable confident level (e.g., greater than 50%) exists, no voting is necessary: Use the oracle to detect failure: Determine if each WS has produced a correct answer and then compute the failure rate and possibly the reliability of each WS using the given reliability model 5) If no oracle with acceptable confident level exists, use voting mechanism to detect failure, as described in phase 1. 6) Update the confident level of the oracles: an agreement between the oracle and the current test output increases the confident level, otherwise, decreases the confident level accordingly; 7) Update the ranking of test cases by including the new number of failures detected

Phase 2: Volume Testing Phase 8) Update the ranking of WS and eliminate the WS that have an unacceptable level of failure rate or reliability. The elimination of unnecessary testing in this step saves testing time. 9) Select next layer of test cases, and return to step 3. • By the end of Phase 2 group testing: all the WS available are tested and a short list of WS are ranked; • test cases are updated and ranked • oracles and their confidence levels are updated • The same processes can be applied at the integration testing level. If a composite WS consists of n different units of WS, ASTRAR group testing technique can be applied to this composite WS by considering each composite as an individual WS in the group testing.

Application of this Group Testing Model • A real-time stock-buy-sell WS is used as an example to illustrate the application of ASTRAR technique. • The WS under development consist of a server WS and multiple client WS, residing in different locations. • A client can send requests to the server and the server responses to the requests. • All WS under group testing implement the same specification. • The WS Server offers two functions and Client WS can access these two functions. • The database consists of objects of stock information, defined in the Class Stock.

Application of this Group Testing Model

Application of this Group Testing Model • Each stock object is set to an initial value at certain time point. • The evaluation engine then uses randomly generated purchase and sale information, or uses replayed data from past stock dump, to decide the price dynamically once every minute. • Once the price is changed, the other members (the percentages of changes in a minute, a day, a month, and a year) of each stock object are computed and updated.

Results of the application • The size of the subset (training size) is critical • The smaller the size is, the cheaper (fewer test runs) the testing and ranking process will be. • However, the smaller the size, the higher the probability that the training phase fails to find the correct oracle • An incorrect oracle will lead to incorrect ranking of the WS under test, while an incorrect ranking of test cases may result in more test runs in phase 2 of ASTRAR process. • Another factor that affects the testing cost is the target size, the number of WS to be ranked. • For a given large number of WS to be tested, only a short list of best WS needs to be ranked.

Results of the application • proposed an efficient process to test a large number of web services designed based on the same specification • The process is divided in two phases. In phase 1 (training phase), a selected number of WS is tested and their results are voted. • The purpose of the first phase is to establish the oracle and identify the most powerful test cases. • In the phase 2, no voting is applied and the oracle created in phase 1 is used to judge the correctness of WS under testing. • Furthermore, the powerful test cases are applied first, so that the incorrect WS can be eliminated in a few tests. • The experiment results reveal that • the smaller the training size, the lower the cost. • However, a small training size can lead to incorrect oracle, leading to incorrect WS ranking. • A small training size can also lead to incorrect test case ranking, resulting a higher test costin phase two. • Therefore, it is critical to select a reasonable sized training size in WS group testing • As future work • explore the impact of the age of the test cases.

Conclusion • They have proposed presentation proposes • a Service-Oriented softwareReliability Model (SORM) • which generates a voted information on the fly without using an oracle • a technique to test large numberof WS simultaneously • which uses an oracle to test the correctness of new web services

References • [1] W. T. Tsai, D. Zhang, Y. Chen, H. Huang, R. Paul, N. Liao, “A Software Reliability Model for WebServices,” the 8th IASTED International Conferenceon Software Engineering and Applications,Cambridge, MA, November 2004, pp. 144-149. • [2] W. T. Tsai, X. Wei, Y. Chen, B. Xiao, R. Paul, and H. Huang, “Developing and Assuring TrustworthyWeb Services,” 7th IEEE International Symposiumon Autonomous Decentralized Systems (ISADS), April 2005, pp. 43-50. • [3] W. T. Tsai, X. Wei, Y. Chen, B. Xiao, R. Paul, and H. Huang, “Adaptive Testing, Oracle Generation, and Test Case Ranking forWeb Services,” 29th Annual International Computer Software and Applications Conference (COMPSAC’05), 2005. • [4] W.T. Tsai, Y. Chen, R. Paul N. Liao, and H. Huang, “Cooperative and Group Testing in Verification ofDynamic Composite Web Services,” in Workshopon Quality Assurance and Testing of Web-Based Applications, September 2004, pp. 170-173. • [5] W.T. Tsai, Y. Chen, R. Paul, “Specification-Based Verification and Validation of Web Services andService-Oriented Operating Systems,” Proc. of IEEE WORDS, Sedona, February 2005.

Question & Answer

Software Reliability for Web Services

Software Reliability for Web Services

Presentation Transcript

Software Reliability

Software Reliability Modelling

Software faults reliability

Software Reliability

Software Reliability

Software Reliability

Architecture-based Reliability of web services

Evaluating Web Software Reliability

Automated Tools for Software Reliability

Evaluating Web Software Reliability

Software Reliability Model

Software Reliability Methods

Web Services Reliability Specification ( WS-Reliability )

Software Reliability

Software Reliability

Software Reliability Corroboration

Software Reliability Models

Software Reliability Research

Software Reliability Model

Web Services Reliability Specification ( WS-Reliability )