Exploring Tools and Techniques for Distributed Continuous Quality Assurance

Exploring Tools and Techniques for Distributed Continuous Quality Assurance Adam Porter aporter@cs.umd.edu University of Maryland http://www.cs.umd.edu/projects/skoll

Quality Assurance for Large-Scale Systems • Modern systems increasingly complex • Run on numerous platform, compiler & library combinations • Have 10’s, 100’s, even 1000’s of configuration options • Are evolved incrementally by geographically-distributed teams • Run atop of other frequently changing systems • Have multi-faceted quality objectives • How do you QA systems like this? http://www.cs.umd.edu/projects/skoll

A Typical QA Process • Developers make changes & perform some QA locally • Continuous build, integration & test of CVS HEAD • Sometimes performed by multiple (uncoordinated) machines • Lots of QA being done, but systems still break, too many bugs escape to field & there are too many late integration surprises • Some contributors to QA process failures • QA only for the most readily-available platforms & default configurations • Knowledge, artifacts, effort & results not shared • Often limited to compilation & simple functional testing • Static QA process that focus solely on CVS HEAD http://www.cs.umd.edu/projects/skoll

General Assessment • Works reasonably well for a few QA processes on a very small part of the full system space • For everything else it has deficiencies & leads to serious blind spots as to the system’s actual behavior • Large portions of system space go unexplored • Can’t fully predict effect of changes • Hard to interpret from-the-field failure reports & feedback http://www.cs.umd.edu/projects/skoll

Distributed Continuous Quality Assurance • DCQA: QA processes conducted around-the-world, around-the-clock on powerful, virtual computing grids • Grids can by made up of end-user machines, company-wide resources or dedicated computing clusters • General Approach • Divide QA processes into numerous tasks • Intelligently distribute tasks to clients who then execute them • Merge and analyze incremental results to efficiently complete desired QA process http://www.cs.umd.edu/projects/skoll

DCQA (cont.) • Expected benefits • Massive Parallelization • Greater access to resources/environs. not readily found in-house • Coordinate QA efforts to enable more sophisticated analyses • Recurring themes: • Leverage developer & application community resources • Maximize task & resource diversity • Opportunistically divide & conquer large QA analyses • Steer process to improve efficiency & effectiveness • Gather information proactively http://www.cs.umd.edu/projects/skoll

Group Collaborators Doug Schmidt & Andy Gohkale Alex Orso Myra Cohen Murali Haran, Alan Karr, Mike Last, & Ashish Sanil Sandro Fouché, Atif Memon, Alan Sussman, Cemal Yilmaz (now at IBM TJ Watson) & Il-Chul Yoon http://www.cs.umd.edu/projects/skoll

Overview • The Skoll DCQA infrastructure & approach • Novel DCQA processes • Future work http://www.cs.umd.edu/projects/skoll

register client kit task request QA task return results Skoll DCQA Infrastructure & Approach Server(s) Clients • Server selects best task that matches client characteristics • Client registers & receives client kit • When client becomes available it requests a QA task • Server processes results & updates internal databases • Client executes task & returns results See: A. Memon, A. Porter, C. Yilmaz, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: Distributed Continuous Quality Assurance, ICSE’04 http://www.cs.umd.edu/projects/skoll

Implementing DCQA Processes in Skoll • Define how QA analysis will be subdivided by defining generic QA tasks & the QA space in which they operate • QA tasks: Templates parameterized by “configuration” • Variable information needed to run concrete QA tasks • QA space: the set of all “configurations” in which QA tasks run • Define how QA tasks will be scheduled & how results will be processed by defining navigation strategies • Navigation strategies “visit” points in the QA space, applying QA tasks & processing incremental results http://www.cs.umd.edu/projects/skoll

The ACE+TAO+CIAO (ATC) System • ATC characteristics • 2M+ line open-source CORBA implementation • maintained by 40+, geographically-distributed developers • 20,000+ users worldwide • Product Line Architecture with 500+ configuration options • runs on dozens of OS and compiler combinations • Continuously evolving – 200+ CVS commits per week • Quality concerns include correctness, QoS, footprint, compilation time & more • Beginning to use Skoll for continuous build, integration & test of ATC http://www.cs.umd.edu/projects/skoll

Define QA Space http://www.cs.umd.edu/projects/skoll

Define Generic QA Tasks • Implemented as a workflow of “services” provided by client kit • Some built-in: log, cvs_download, upload_results • Most interesting ones application-specific: e.g., build, run_tests • Ex: The run_testsuite QA task invokes services that: • Enable logging • Define component to be built • Set configuration options & environment variables • Download source from a CVS repo • Configure & build component • Compile tests • Execute tests • Upload results to server http://www.cs.umd.edu/projects/skoll

Navigation Strategy: Nearest Neighbor Search http://www.cs.umd.edu/projects/skoll

Extension: Adding Temporary Constraints Χ http://www.cs.umd.edu/projects/skoll

Overview • The Skoll DCQA infrastructure & framework • Novel DCQA processes & feasibility studies • Configuration-level fault characterization • Performance-oriented regression testing • Code-level fault modeling • Future work http://www.cs.umd.edu/projects/skoll

Configuration-Level Fault Characterization Goal • Help developers localize configuration-related faults Solution Approach • Strategically sample the QA space to test for subspaces in which (1) compilation fails or (2) regression tests fail • Build models that characterize the configuration options and specific settings that define the failing subspace See: C. Yilmaz, M. Cohen, A. Porter, Covering Arrays for Efficient Fault Characterization in Complex Configuration Spaces, ISSTA’04, TSE v32 (1) http://www.cs.umd.edu/projects/skoll

Sampling Strategy • Goals: Maximize “coverage” of option setting combinations, while minimizing the number of configurations tested • Approach: compute test schedule from T-way covering arrays • A set of configurations in which all ordered t-tuples of option settings appear at least once. • 2-way covering array example http://www.cs.umd.edu/projects/skoll

CORBA_MESSAGING | 1 0 AMI_POLLER AMI 1 0 1 0 AMI_CALLBACK OK ERR-1 ERR-3 1 0 OK ERR-2 Fault Characterization • We used classification trees (Brieman et al) to model option & setting patterns that predict test failures http://www.cs.umd.edu/projects/skoll

Feasibility Study • QA Scenario: Do regression tests pass across entire QA space? • 37584 valid configurations • 10 compile-time options with 12 constraints • 96 test options with 120 test constraints • 6 run-time options with no constraints & 2 operating systems • Navigation strategies: • Nearest neighbor search • t-way covering arrays for each 2 ≤ t ≤ 6 • Applied to one release of ATC using clients running on 120 CPUs • Exhaustive process takes 18,800 cpu hours (excl. compilation) http://www.cs.umd.edu/projects/skoll

Some Sample Results • Found 89 (40 Linux, 49 Windows), possibly option-related failures • Models based on covering arrays (even for t=2) were nearly as good as those based on exhaustive testing • Several tests failed that don’t fail with default option settings • Root cause: problems in feature-specific code • Some failed on every single configuration (even though they don’t fail when using default options!) • Root cause: problems in option processing code • 3 tests failed when ORBCollocation = NO • Background: when ORBCollocation = • GLOBAL, PER-ORB: local objects communicate directly • NO: objects always communicate over network • Root cause: data marshalling/unmarshalling broken http://www.cs.umd.edu/projects/skoll

Summary • Basic infrastructure in place and working • Initial results were encouraging • Defined large control spaces and performed complex QA processes • Found many test failures corresponding to real bugs, some of which had not been found before • Fault characterization sped up debugging time • Documenting & centralizing inter-option constraints beneficial • Ongoing extensions • Expanding QA space for ATC continuous build, test & integration • Extending model for platform virtualization http://www.cs.umd.edu/projects/skoll

Performance-Oriented Regression Testing Goal • Quickly determine whether a software update degrades performance Solution Approach • Proactively find a small set of configurations on which to estimate performance across entire configuration space • Later, whenever software changes, benchmark this set and compare post-update & pre-update performance See: C. Yilmaz, A. Krishna, A. Memon, A. Porter, D. C. Schmidt, A. Gokhale, and B. Natarajan, Main Effects Screening: A Distributed Continuous Quality Assurance Process for Monitoring Performance Degradation in Evolving Software Systems, ICSE’05 http://www.cs.umd.edu/projects/skoll

Reliable Effects Screening (RES) Process • Proactive • Compute a formal experimental plan (we use screening designs) for identifying “important” options • Execute the experimental plan across QA grid • Analyze the data to identify important options • Recalibrate frequently as system changes • Reactive • When the software changes estimate the distribution of performance across entire configuration space • by evaluating all combinations of important options, (called the screening suite), while randomizing the rest • Distributional changes signal possible performance degradations http://www.cs.umd.edu/projects/skoll

Feasibility Study • Several recent changes to message queuing strategy. • Has this degraded performance? • Developers indentified14 potentially important binary options (214=16,384 configs.) http://www.cs.umd.edu/projects/skoll

Half-Normal Probability Plot for Latency (128-run screening design) Half-Normal Probability Plot for Latency (Full Data Set) 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 Identify Important Options option effects • Options B and J: clearly important, options I, C, and F: arguably important • Screening designs mimics exhaustive testing at a fraction of the cost http://www.cs.umd.edu/projects/skoll

Q-Q Plot for Latency screening suite full data set Estimate Performance with Screening Suite • Estimates closely track actuals http://www.cs.umd.edu/projects/skoll

Applying RES to Evolving System Latency Distribution Over Time • Estimated performance once a day on CVS snapshots of ATC • Observations • Developers noticed the deviation on 12/14/03, but not before • Developers estimated a drop of 5% on 12/14/03. • Our process more accurately shows around 50% drop http://www.cs.umd.edu/projects/skoll

Summary • Reliable Effects Screening is an efficient, effective, and repeatable approach for estimating the performance impact of software updates • Was effective in detecting actual performance degradations • Cut benchmarking work 1000x (from 2 days to 5 mins.) • Fast enough to be part of check-in process • Ongoing extensions • New experimental designs: Pooled ANOVA • Applications to component-based systems (multilevel interactions) http://www.cs.umd.edu/projects/skoll

Code-Level Fault Modeling Goal • Automatically gain code-level insight into why/when specific faults occur in fielded program runs Solution Approach • Lightly instrument, deploy & execute program instances • Capture outcome-labeled exec. profiles (e.g., pass/fail) • Build models that explain outcome in terms of profile data See: Murali Haran, Alan Karr, Michael Last, Alessandro Orso, Adam Porter, Ashish Sanil & Sandro Fouche, Techniques for Classifying Executions of Deployed Software to Support Software Engineering Tasks. UMd Technical Report #cs-tr-4826 http://www.cs.umd.edu/projects/skoll

Lightweight Instr. via Adaptive Sampling • Want to minimize overhead (e.g., runtime slowdown, code bloat, data volume & analysis costs) • Use incremental data collection approach • Individual instances sparsely sample the measurement space • Sampling weights initially uniform • Weights incrementally adjusted to favor high utility measurements • Over time, high utility measurements are sampled frequently while low ones aren’t • Typically requires increase in instances observed http://www.cs.umd.edu/projects/skoll

Association Trees • Resulting data set is very sparse (i.e., 90-99% missing) • Problematic for many learning algorithms • Developed new learning algorithm called association trees • For each measurement m split data range into 2 parts • e.g., low(m< k) orhigh (m ≥ k), where k is best split point • Convert data to items that are present or missing • Use apriori algorithm to find item combinations whose presence is highly correlated with each outcome • Uses of association trees • Analyze rules for clues to fault location • Use models on new instances, e.g., determine whether an in-house test case likely reproduces an in-the-field failure http://www.cs.umd.edu/projects/skoll

Feasibility Study • Subject program: Java Byte Code Analyzer (JABA) • 60k statements, 400 classes, 3000 methods • 707 test cases • 14 versions • Measured method entry and block execution counts • Ran clients on 120 CPUs • Standard classification techniques using exhaustive measurement yielded near perfect models (with some outliers) http://www.cs.umd.edu/projects/skoll

Some Sample Results • Competitive with standard methods using exhaustive data • Measured 5-8% of data/instance; ~4x as many instances total http://www.cs.umd.edu/projects/skoll

Summary • Preliminary results: created highly effective models at greatly reduced cost • Scales well to finer-grained data & larger systems • Ongoing extensions • Tradeoffs in # of measurements vs. instances observed • New utility functions • Cost-benefit models • Fixed-cost sampling • New applications • Lightweight test oracles • Failure report clustering http://www.cs.umd.edu/projects/skoll

Overview • The Skoll DCQA infrastructure & framework • Novel DCQA processes • Future work http://www.cs.umd.edu/projects/skoll

Future Work • Looking for more example systems (help!) • New problem classes • Performance and robustness optimization • Configuration advice • Integrate with model-based testing systems • GUITAR -- http://www.cs.umd.edu/~atif/newsite/guitar.htm • Distributed QA tasks • Improved statistical treatment of noisy data • Modeling unobservables and/or detecting outliers http://www.cs.umd.edu/projects/skoll

Exploring Tools and Techniques for Distributed Continuous Quality Assurance

Exploring Tools and Techniques for Distributed Continuous Quality Assurance

Presentation Transcript

Part IV: Tools and Techniques for Continuous Improvement

Continuous Quality Improvement (CQI) and Quality Assurance (QA)

Java Software Quality Assurance Tools

Quality Tools and Techniques

CONTINUOUS ASSURANCE

Continuous Assurance Development

Grid and Distributed Software Certification and Quality Assurance

Exploring Techniques of Data Quality and Profiling

Software Quality Assurance Automated Testing Techniques

QUALITY IMPROVEMENT TOOLS AND TECHNIQUES

Quality Assurance: Continuous Quality Improvement

Continuous Assurance 101

Model Driven Quality Assurance Techniques for DRE Applications

Tools and techniques for:

Tools and Techniques for Total Quality

Skoll: A System for Distributed Continuous Quality Assurance

Skoll: A System for Distributed Continuous Quality Assurance

Quality Assurance platform for Continuous Testing