THESIS
2015
Abstract
This thesis deals with two important problems of symbolic analysis. The first one is path explosion problem. Path explosion is a major issue in applying path-sensitive symbolic analysis to large programs. Since real world programs are typically built on top of many library functions, it is generally impractical to model all APIs and symbolic analysis often approximates some APIs’ behavior. We observe that many symbolic states generated by such symbolic analysis of a procedure are indistinguishable to its callers. It is, therefore, possible to keep only one state from each set of equivalent symbolic states without affecting the analysis result. Based on this observation, we propose an equivalence relation called z-equivalence, which is weaker than logical equivalence to relate a large nu...[
Read more ]
This thesis deals with two important problems of symbolic analysis. The first one is path explosion problem. Path explosion is a major issue in applying path-sensitive symbolic analysis to large programs. Since real world programs are typically built on top of many library functions, it is generally impractical to model all APIs and symbolic analysis often approximates some APIs’ behavior. We observe that many symbolic states generated by such symbolic analysis of a procedure are indistinguishable to its callers. It is, therefore, possible to keep only one state from each set of equivalent symbolic states without affecting the analysis result. Based on this observation, we propose an equivalence relation called z-equivalence, which is weaker than logical equivalence to relate a large number of z-equivalent states. We prove that z-equivalence is strong enough to guarantee that paths to be traversed by the symbolic analysis of two z-equivalent states are identical, giving the same solutions to satisfiability and validity queries. We propose a sound linear algorithm to detect z-equivalence. Our experiments show that the symbolic analysis that leverages z-equivalence is able to achieve more than ten orders of magnitude reduction in terms of search space. The reduction significantly alleviates the path explosion problem, enabling us to apply symbolic analysis in large programs such as Hadoop and Linux Kernel.
The second problem is precision issue. API approximation can induce many unreachable symbolic states, which are expensive to validate manually. In the second part of this thesis, we propose a static approach to automatically validating the reported anomalous symbolic states. The validation makes use of the available runtime data of the un-modeled APIs collected from previous program executions. We show that the symbolic state validation problem can be cast as a MAX-SAT problem and solved by existing constraint solvers. Our technique found 80 unreported bugs when it was applied to 10 popular programs with a total of 1.5 million lines of code. All of them can be confirmed by test cases. Our technique presents a promising way to apply the big data paradigm to software engineering. It provides a mechanism to validate the symbolic states of a project by leveraging the many concrete input-output values of APIs collected from other projects.
Post a Comment