The explosively increasing code size and high concurrency make modern software
increasingly complex and error-prone. Furthermore, the prevalence and extensive
deployment of modern software have exacerbated the hazards of any software vulnerabilities
and errors, potentially affecting millions of users and resulting in significant
financial costs. Unfortunately, the ever-growing complexity of modern codebases
has made many conventional program analysis tasks impractical for safeguarding
today’s software. Specifically, as one of the most fundamental tasks of program
analysis, call graph construction for large-scale code has become either imprecise or
unscalable, which impairs the effectiveness of many downstream program analysis
applications. In addition, as one of the most popular tasks o...[
Read more ]
The explosively increasing code size and high concurrency make modern software
increasingly complex and error-prone. Furthermore, the prevalence and extensive
deployment of modern software have exacerbated the hazards of any software vulnerabilities
and errors, potentially affecting millions of users and resulting in significant
financial costs. Unfortunately, the ever-growing complexity of modern codebases
has made many conventional program analysis tasks impractical for safeguarding
today’s software. Specifically, as one of the most fundamental tasks of program
analysis, call graph construction for large-scale code has become either imprecise or
unscalable, which impairs the effectiveness of many downstream program analysis
applications. In addition, as one of the most popular tasks of program analysis,
bug detection is faced with significant challenges in effectively detecting concurrency
bugs. In particular, unlike detecting sequential bugs, detecting concurrent bugs
requires additional characterization of exponentionally possible thread interactions.
To address these challenges, in this thesis, we contribute to making fundamental call
graph construction and popular concurrency bug detection more practical.
First, we advocate two new approaches for call graph construction with different
precision-scalability trade-offs, catering to different scenarios throughout software
development (e.g., the continuous integration and daily build process). Specifically,
Kelp significantly outperforms existing type-based call graph construction, which
enables the construction of precise call graphs for millions of lines of code in a few
minutes. On the other hand, Coral largely outperforms existing pointer-analysis-based
call graph construction, which facilitates the construction of precise call graphs
for millions of lines of code in a few hours. By evaluating the call graphs through
the lens of several various downstream program analysis clients (i.e., thread-sharing
analysis, program slicing, value-flow bug detection, and directed grey-box fuzzing),
our experimental results suggest that K
ELP and C
ORAL can dramatically promote
their effectiveness for better vulnerability understanding, hunting, and reproduction.
Second, we advocate a series of static approaches for concurrency bug detection,
including C
ANARY for detecting inter-thread value-flow bugs, P
EAHEN for detecting
deadlocks, and L
OCKPICK for detecting lock misuses. It is important to note that
the effectiveness of our concurrency bug detection can be also enhanced by our
work on practical call graph construction. Our experimental results show that our
concurrency bug detectors can beat many popular open-source and commercial static
concurrency bug detectors, such as Clang Static Analyzer (CSA) and Facebook/Meta
Infer, regarding efficiency and precision. Moreover, our concurrency bug detectors
are practical and deployable, checking millions of lines of code in less than five hours,
mostly with a false positive rate lower than 30%.
Excitingly, our research prototypes for call graph construction and concurrency
bug detection have been successfully deployed in two global 500 companies. Moreover,
our concurrency bug detectors have uncovered over two hundred bugs with sixteen
CVE IDs assigned in numerous award-winning and popular open-source software,
such as the Linux kernel, Firefox, PHP, PostgreSQL, and OpenSSL. We believe that
the adoption of our techniques at complicated industrial settings and the capability
to uncover genuine bugs in the real world are strong evidence of our approaches’
effectiveness and advancement.
Post a Comment