Precise and scalable static bug finding for industrial-sized code

HKUST Electronic Theses

Precise and scalable static bug finding for industrial-sized code

by Qingkai Shi

THESIS 2020

Ph.D. Computer Science and Engineering

xiv, 112 pages : illustrations ; 30 cm

Abstract

Software bugs cost developers and software companies a great deal of time and money. Although previous work has reported many success stories for using static bug-finding tools, it is still difficult to find bugs hidden behind sophisticated pointer operations, deep calling contexts, and complex path conditions with a low false-positive rate, while sieving through millions of lines of code in just a few hours. In this thesis, we present our novel designs of sparse value-flow analysis to tackle a wide range of software bugs caused by improper value flows. The proposed approach has been commercialized and deployed in many of the global 500 companies. It also has reported hundreds of real bugs for open-source software systems.

The first problem addressed in the thesis is referred to as the pointer trap: a precise points-to analysis limits the scalability of sparse value-flow analysis and an imprecise one seriously undermines its precision. To solve the problem, we present Pinpoint, a holistic approach that decomposes the cost of high-precision points-to analysis by precisely discovering local data dependence and delaying the expensive inter-procedural analysis through memorization. Such memorization enables the on-demand slicing and, thus, improves the scalability with high precision. Experiments show that Pinpoint can check millions of lines of code in less than five hours with a false positive rate lower than 30%.

The second problem addressed in the thesis is known as the extensional scalability problem, which happens when we simultaneously check many value-flow properties with high precision. A major factor to this deficiency is that the core static analysis engine is oblivious of the mutual synergy among the properties being checked, thus inevitably losing many optimization opportunities. Our work is to leverage the inter-property awareness and to capture redundancies and inconsistencies when many properties are considered together. The evaluation results demonstrate that the approach, Catapult, is more than 8× faster than Pinpoint but consumes only 1/7 of the memory when checking twenty value-flow properties together.

The third problem addressed in the thesis is how to improve the parallelism of static bug finding over the conventional parallel designs. Conventionally, bottom-up program analysis has been easy to parallelize because functions without caller-callee relations can be analyzed independently. However, functions with caller-callee relations have to be analyzed sequentially because the analysis of a function depends on the analysis results of its callees. We present Cheetah, a framework of bottom-up analysis, in which the analysis task of a function is partitioned into multiple sub-tasks. These sub-tasks are pipelined and run in parallel, even though the caller-callee relations exist. The evaluation results of Cheetah demonstrate significant speedup over a conventional parallel design.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Zhang, Charles Authors Shi, Qingkai Subjects Debugging in computer science Error-correcting codes (Information theory) Computer programs Testing Language English Call number Thesis CSED 2020 Shi DOI 10.14711/thesis-991012873366203412

Full record

Precise and scalable static bug finding for industrial-sized code

by Qingkai Shi

Post a Comment Cancel reply