THESIS
2019
9 unnumbered pages, 98 pages : illustrations ; 30 cm
Abstract
Testing is an important and necessary activity in developing programs to gain confidence
in their behaviour, but creating test cases is a laborious and time-consuming job
to developers. To reduce the manual effort on unit testing, much research has been
performed on automated unit test generation based on various underlying techniques
such as random testing and search-based testing. These automated test generation tools
provide developers with a set of failing tests that can possibly detect uncovered bugs.
However, automatically-generated tests are not widely used in practice because their
inputs often fail to reveal the real bugs. The reasons include that generated test failures
produce too many false alarms due to invalid test inputs, and generated tests fail to
exercise dive...[
Read more ]
Testing is an important and necessary activity in developing programs to gain confidence
in their behaviour, but creating test cases is a laborious and time-consuming job
to developers. To reduce the manual effort on unit testing, much research has been
performed on automated unit test generation based on various underlying techniques
such as random testing and search-based testing. These automated test generation tools
provide developers with a set of failing tests that can possibly detect uncovered bugs.
However, automatically-generated tests are not widely used in practice because their
inputs often fail to reveal the real bugs. The reasons include that generated test failures
produce too many false alarms due to invalid test inputs, and generated tests fail to
exercise diverse program behaviours and just focus on maximizing coverage. These
observations provide strong motivations to develop enhanced automated testing tools
that achieve higher bug detection rate.
To improve the practical usefulness of automated test generation tools, in this
thesis, we first propose a technique called PAF that tackles the false alarm problem.
The main reason of a false alarm failure is that the test fails not because the test input
actually reveals the real bug, but because it violates an implicit precondition that is
never specified in the code or document. PAF analyzes data flow of failing tests (i.e.,
test failures) with respect to their likelihood of violating the target method’s implicit
precondition. The set of failing tests are prioritized so that the failures that are less
likely to violate implicit preconditions are ranked higher. As a result, PAF provides
developers with a ranked list of failures so that they can diagnose those bug-revealing
failures first.
Second, to address the lack of test input diversity, we propose a technique called
DGI that diversifies generated test inputs to achieve higher bug detection rate. Even
though existing search-based techniques guide the generation toward maximizing code
coverage, achieving high code coverage is not enough to exercise faulty behaviours
because covering the faulty statements does not guarantee the propagation of the fault.
DGI addresses this limitation by diversifying test inputs efficiently. DGI generates
new test cases by mutating existing inputs in different combinations not only in test
object state level but also test statement level. The results show that DGI is effective
in detecting bugs and can significantly increase the bug detection ability of generated
tests.
Overall, our results show that PAF and DGI can produce test failures with few false
alarms and achieve high detection rate. This output supports the usefulness of our tool
in thorough practical settings. Consequently, the techniques and toolsets presented in
this thesis brings automated test generation tools closer to practical usage.
Post a Comment