1 / 21

Mitigating the Effects of Flaky Tests on Mutation Testing

This study explores strategies to mitigate the effects of flaky tests on mutation testing, including comparing test suites by mutation score and guiding testing based on the mutant-test matrix. The results show that flakiness can significantly impact mutation testing, and mitigation strategies based on reruns and isolation are effective.

cantrell
Télécharger la présentation

Mitigating the Effects of Flaky Tests on Mutation Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mitigating the Effects of Flaky Testson Mutation Testing August Shi, Jonathan Bell, Darko Marinov ISSTA 2019 Beijing, China 7/18/2019 CCF-1421503 CNS-1646305 CNS-1740916 CCF-1763788 CCF-1763822 OAC-1839010

  2. Mutation Testing Code Under Test Code Under Test test1 test2 test3 Mutant 1 Code Under Test Code Under Test UNRELIABLE Mut 1 test1 test2 test3 Killed Code Under Test Code Under Test test1 test2 test3 Mutant 2 • Compare test suites by mutation score • Guide testing based on mutant-test matrix Mut 2 Survived

  3. Mutation Testing with Flaky Tests Code Under Test Code Under Test test1 test2 test3 Mutant 1 Code Under Test Code Under Test Code Under Test STILL FLAKY Mut 1 test1 test2 test3 Killed? Code Under Test Code Under Test test1 test2 test3 Mutant 2 Run 2 Run 1 • Get test suite with deterministic outcomes • Debug/fix flaky tests1 • Remove/ignore flaky tests Mut 2 Survived? 1August Shi et al. “iFixFlakies: A Framework for Automatically Fixing Order-Dependent Tests”. ESEC/FSE 2019

  4. Flaky Coverage Example • Other reasons for flakiness: • Concurrency • Randomness • I/O • Order dependency 1 public classWatchDog { 2 ... 3publicvoid run() { 4 ... 5 synchronized (this) { 6 longtimeLeft = timeout – (System.currentTimeMillis() - startTime); 7 isWaiting = timeLeft > 0; 8 while (isWaiting) { 9 ... 10 wait(timeLeft); 11 ... 12 }} 13 ... 14 }} public void test() { newWatchDog.run(); ... }

  5. Motivating Study • Measure flakiness of coverage • 30 open-source GitHub projects from prior work • No flaky test outcomes! (all 35,850 tests pass in 17 runs) • Rerun tests and measure differences in coverage • 113,356 (22%) statements with different tests covering across runs • 5,736 (16%) tests cover different statements across runs • Lots of flakiness in coverage, even without flaky outcomes!

  6. Mutation Testing with Flaky Coverage 1 public classWatchDog { 2 ... 3publicvoid run() { 4 ... 5 synchronized (this) { 6 longtimeLeft = timeout – (System.currentTimeMillis() - startTime); 7 isWaiting = timeLeft > 0; 8 while (isWaiting) { 9 ... 10 wait(timeLeft); 11 ... 12 }} 13 ... 14 }} Mutation delete call public void test() { newWatchDog.run(); ... } Mutation not covered!

  7. Mutation Testing Results are Unreliable • Flakiness can shift mutation testing results • Mutation scores may be inflated/deflated • Mutant-test matrix unreliable • Need to mitigate the effects of flakiness on mutation testing! • Mitigation strategies based on reruns and isolation2 • Implemented on PIT, a popular mutation testing tool for Java https://doi.org/10.6084/m9.figshare.8226332 https://github.com/hcoles/pitest/pull/534 https://github.com/hcoles/pitest/pull/545 2Jonathan Bell et al. “DeFlaker: Automatically Detecting Flaky Tests”. ICSE 2018

  8. Mitigating Flakiness in Mutation Testing Traditional mutation testing Mutants to test Sorted tests per mutant Full test-suite coverage collection Test-mutant prioritization Mutant execution Improvements to cope with flakiness Rerun and isolate tests Run tests with least flaky coverage first Track mutations covered Rerun/isolate tests See paper

  9. Coverage Collection • When running multiple times, union coverage • More lines covered means more mutants generated • Run tests in isolation to remove test-order dependencies

  10. Executing Tests on Mutants • Monitor if tests actually execute mutated bytecode • Traditionally, mutant-test pair has status Killed or Survived • Only applicable if test executes the mutated bytecode • Mutant-test pair with test that does not execute mutated bytecode hasnew status Unknown • Test can potentially cover mutation, based on prior coverage

  11. New Status for Mutants • Overall mutant status depends on status of all mutant-test pairs run for the mutant • Need to reduce number of Unknown mutants and pairs Killed Survived + Covered + Covered Unknown (not covered)

  12. Rerunning Mutant-test Pairs • While status of mutant-test pair is Unknown, rerun • Change isolation level during reruns Mutant-test pairs for mutants in same class in same JVM Default Mutant-test Pairs More Isolation Increasing Cost Mutant-test pairs for same mutant in same JVM Most Isolation Mutant-test pairs in own JVM

  13. Experimental Setup • Evaluate on same 30 projects in motivating study • All modifications on top of PIT mutation testing tool • RQ1: Flakiness in traditional mutation testing? • RQ2: Effect of coverage on mutants generated? • RQ3: Effect of re-executing tests on mutant status? • RQ4: Prioritize tests for mutant-test executions? See paper See paper

  14. RQ1: Flakiness in Traditional Mutation Testing Max difference up to 23pp! Must improve mutation scores more than this variance! 9% of mutants-test pairs are unknown (max up to 55%)! Matrix results can be unreliable

  15. RQ3: Mutant Re-execution Results • Increasing isolation greatly increases covered pairs • Unnecessary to rerun too often with the most isolation

  16. Discussion • Flakiness can have negative effects beyond mutation testing • Tools/studies that rely on coverage must consider flakiness • Fault localization, program repair, test prioritization, test-suite reduction, test selection, test generation, runtime verification, … • Mitigation strategies applicable beyond mutation testing • Different isolation strategies for different tasks

  17. Conclusions • Even seemingly non-flaky tests have flaky coverage • 22% of statements not covered consistently! • We present problems in mutation testing due to flakiness • We propose techniques to mitigate effects • Different combinations of reruns and isolation • We reduce Unknown mutants/pairs by 79.4%/88.1% • Flakiness can have negative effects beyond mutation testing awshi2@illinois.edu https://doi.org/10.6084/m9.figshare.8226332

  18. BACKUP

  19. Prioritizing Tests for Mutants • Run mutant-test pairs in the order that gets the overall mutant status faster, more reliably • Once mutant status known, no need to run more • Prioritize tests per mutant based on coverage • Tests with more “stable” coverage on mutant prioritized earlier • Later prioritize based on time • When to rerun? • Immediately rerun pair? • Run all pairs first before rerunning?

  20. RQ2: Coverage and Mutant Generation • Not much difference in numbers of mutants and pairs • Can potentially use Default for mutant generation

  21. RQ4: Prioritizing Tests

More Related