THESIS
2016
xiv, 117 pages : illustrations ; 30 cm
Abstract
Context sensitive points-to analysis, while significantly benefiting many static analysis
techniques, is known to be difficult to scale to large programs. To make context sensitive
points-to analysis practical to analyze contemporary software, which is commonly going up
to millions lines of code, this dissertation explores the use of encoding techniques to speedup
the computation and querying of points-to information. The first part of this dissertation
introduces a novel technique, geometric encoding, to effectively capture the redundancy
in representing a large number of calling contexts. Geometric encoding is capable of
evaluating contexts of points-to constraints in the compressed form, but incurring much
less space and time consumption compared to other compressing techniqu...[
Read more ]
Context sensitive points-to analysis, while significantly benefiting many static analysis
techniques, is known to be difficult to scale to large programs. To make context sensitive
points-to analysis practical to analyze contemporary software, which is commonly going up
to millions lines of code, this dissertation explores the use of encoding techniques to speedup
the computation and querying of points-to information. The first part of this dissertation
introduces a novel technique, geometric encoding, to effectively capture the redundancy
in representing a large number of calling contexts. Geometric encoding is capable of
evaluating contexts of points-to constraints in the compressed form, but incurring much
less space and time consumption compared to other compressing techniques such as BDD.
Based on geometric encoding, an efficient context sensitive points-to engine GeomPTA
is designed and implemented. GeomPTA can analyze large Java programs with JDK 1.6
library 7.1x – 81.9x faster than the worklist based 1-object-sensitive analysis in Paddle,
and meanwhile its precision is comparable or better than 1-object-sensitive analysis. Our
implementation of GeomPTA is now part of the official distribution of Soot, which is a
widely used framework for analyzing Java programs.
Since GeomPTA is fully context sensitive, GeomPTA can answer k-CFA points-to queries
for any constant k. This dissertation also describes a novel contexts-go-by query, where the
specified call edge is not restricted to the last k call edges ascending from the enclosing
function of the querying pointer. This new query can effectively identify 92.4% redundant
instrumentations for a call trace profiling tool and reduce its runtime overhead by 7.2%.
Both the k-CFA and contexts-go-by queries can be efficiently answered with the help of
geometric encoding.
Although GeomPTA is much faster than other points-to algorithms, analyzing large programs
still needs tens of minutes and sometimes the memory usage exceeds the capacity
of a commodity machine. Moreover, some queries, especially those require aliasing information,
cannot be answered efficiently merely with the points-to information. Therefore, the second part of this dissertation describes Pestrie, a persistent technique for compressing
and reusing points-to and aliasing information. Also, the persistent information is
structured for efficiently answering queries. Empirically, Pestrie produces 10.5x – 17.5x
smaller persistent files and is 2.9x – 123.6x faster than points-to results intersection based
approach to answer aliasing queries.
Post a Comment