THESIS
2010
x, 113 p. : ill. ; 30 cm
Abstract
Debugging is a tedious and expensive activity in software maintenance. To ease debugging, researchers have proposed various dynamic analysis techniques that aid developers in locating the program defects based on passed and failed test runs. In this thesis, we study a prevalent type of defects called code omission faults, which involve missing code and cannot be ascribed to any program entity that exists in the program. Code omission faults pose two major challenges to existing dynamic analysis techniques. First, missing code offers no execution behavior to observe directly. Second, even when the place of omission could have been accurately pinpointed, it is still difficult for developers to conjecture the missing code.
To address these challenges of code omission, we conducted an empi...[
Read more ]
Debugging is a tedious and expensive activity in software maintenance. To ease debugging, researchers have proposed various dynamic analysis techniques that aid developers in locating the program defects based on passed and failed test runs. In this thesis, we study a prevalent type of defects called code omission faults, which involve missing code and cannot be ascribed to any program entity that exists in the program. Code omission faults pose two major challenges to existing dynamic analysis techniques. First, missing code offers no execution behavior to observe directly. Second, even when the place of omission could have been accurately pinpointed, it is still difficult for developers to conjecture the missing code.
To address these challenges of code omission, we conducted an empirical study to characterize real-world omission faults. Among non-trivial bug fixes extracted from large open source projects GNU/GCC and MYSQL, we found that over half of them correspond to omission faults, which can be further categorized into 11 sub-types. Among them, we made a novel discovery that some sub-types, such as missing-assignment or missing-return, often induce certain dynamic control flows and data flows patterns when they trigger program failures. We express these patterns in an event flow language and show that by matching them against program execution, it is possible to accurately locate these sub-types of omission faults and infer part of the missing code.
The remaining sub-types of omission faults, such as missing-if-condition and missing-branch, are more complex and induce patterns that vary with the missing code, which is mostly unknown at the time of debugging. For these complex omission faults, we empirically observed that the missing code likely appear elsewhere in a similar or even the same form. Inspired by this observation, we propose an approach called learning-to-debug. The basic idea is to systematically remove different parts of the program and learn the control-/data- flow patterns that are associated with these pieces of artificially removed code. The inferred patterns are then applied on the original program to locate the omission faults. We have designed a novel pattern representation and a learning algorithm in order to make learning-to-debug scalable.
To evaluate our approaches, we used both real world and seeded omission faults in four benchmark programs: space, grep, bc, and gcc. Promising results are reported in the experiments.
Post a Comment