THESIS
2016
xiv, 147 pages : illustrations ; 30 cm
Abstract
Concurrency bug is a major stumbling block to writing multi-threaded programs. The
capability of reproducing a concurrency bug is key to comprehending and finally fixing
the bug. Unlike sequential programs, concurrent programs may behave differently even
given the same input, making concurrency bugs much harder to be reproduced than bugs
caused within a single thread. This thesis targets on the reproduction of shared-memory
concurrency bugs, together with new techniques for the bug diagnostics given the capability
of reproducing the bug.
First, we contribute Stride, a deterministic replay technique for concurrent programs, that
improves the efficiency of the recording phase, which is key to the practicability of replay
techniques. Stride records the bounded read-write linka...[
Read more ]
Concurrency bug is a major stumbling block to writing multi-threaded programs. The
capability of reproducing a concurrency bug is key to comprehending and finally fixing
the bug. Unlike sequential programs, concurrent programs may behave differently even
given the same input, making concurrency bugs much harder to be reproduced than bugs
caused within a single thread. This thesis targets on the reproduction of shared-memory
concurrency bugs, together with new techniques for the bug diagnostics given the capability
of reproducing the bug.
First, we contribute Stride, a deterministic replay technique for concurrent programs, that
improves the efficiency of the recording phase, which is key to the practicability of replay
techniques. Stride records the bounded read-write linkage instead of the exact linkage,
which eliminates synchronization requirements on read operations during the recording
phase and is still able to synthesize a legal execution in polynomial time. Comparing to
the previous state-of-the-art approaches of deterministic replay, Stride is on average 2.5
times faster.
Second, we handle concurrency bug reproduction under two special scenarios with special
requirements. One is the bug reproduction for relaxed memory model(RMM) specific
behaviors, the challenge of which is the Heisenberg effects caused by the code instrumentation incurred by bug reproduction techniques (Some RMM specific behaviors cannot
occur in the instrumented program), which leads to a poor coverage. We contribute
SC-Transformation, the first technique that enables safe instrumentation under RMMs
with the provable guarantee of full coverage, i.e., instrumentations on the program do not
prevent RMM-specific behaviors. Therefore, we can apply all kinds of instrumentations to
deterministically reproduce RMM-specific behaviors. Experiments show that, supported by
SC-Transformation, we successfully recorded all of the seventy-one RMM-specific bug traces
we tested, none of which could be captured atop naive program instrumentation. The
second special scenario is the bug reproduction on the production runs on I/O intensive programs, the challenge of which is the long running time and the large input size. We
contribute POINT, which departs from the conventional philosophy of reproducing a similar
execution by going a new direction of reproducing only the interleaving windows occurring
among a few lines of code. Being more focused on the source of concurrency bugs, POINT
requires no duplication of the user inputs and supports programs running for months.
Finally, we have developed a method that deterministically pinpoints the root cause
of concurrency bugs given the capability of reproducing the bug, which is achieved by
deterministically isolating the effects of the context-switches within the buggy execution.
Post a Comment