THESIS
2023
1 online resource (xiii, 179 pages) : illustrations (some color)
Abstract
Software reverse engineering is the process of converting incomprehensible binary software
into human-readable or analysis-friendly high-level representation. In many security-critical
tasks, the source code is usually unavailable, which hinders automatic analysis and
makes further fixing or hardening impossible. In order to be able to analyze software in
such scenarios, the methodologies of software reverse engineering are summarized and
developed to extract the essence of the program while discarding machine-specific details.
After decades of continuous development and the investment of enormous resources,
perfect reverse engineering is still beyond reach due to the complexity of binaries. This
thesis presents our works of systematically testing and evaluating modern reverse engineeri...[
Read more ]
Software reverse engineering is the process of converting incomprehensible binary software
into human-readable or analysis-friendly high-level representation. In many security-critical
tasks, the source code is usually unavailable, which hinders automatic analysis and
makes further fixing or hardening impossible. In order to be able to analyze software in
such scenarios, the methodologies of software reverse engineering are summarized and
developed to extract the essence of the program while discarding machine-specific details.
After decades of continuous development and the investment of enormous resources,
perfect reverse engineering is still beyond reach due to the complexity of binaries. This
thesis presents our works of systematically testing and evaluating modern reverse engineering
tools. Our works provide a deep and timely understanding of these tools, pointing
out existing problems and possible future directions. Based on the observations provided
by our works, we also extend reverse engineering techniques to the emerging field
of DNN executables.
Our first work tests decompilation correctness to present an up-to-date understanding
of modern C decompilers. Our decompiler testing framework identifies 13 bugs in two open-source decompilers. Moreover, our findings reveal that modern decompilers are
making promising progress in functional correctness and have been underestimated by
researchers. Nevertheless, we show that some tasks that have been studied for years in
academia, such as variable recovery and type inference, still impede C decompilers from
generating quality outputs.
Our second work conducts an in-depth study of binary lifters from the “expressiveness”
perspective. We demystify binary lifters and reveal how well the lifters’ output
can support security-critical downstream applications. We study four popular static and
dynamic LLVM IR lifters that were developed by the industry or academia from a total
of 252,063 executables generated across compilers, optimizations, and architectures.
Our findings show that such binary lifters are suitable for common similarity- or code
comprehension-based security analysis (e.g., binary diffing). However, the lifted IR code
appears unsuited to rigorous static analysis (e.g., pointer analysis). We summarize our
findings and suggest the correct use and further enhancement of binary lifters.
Our third work presents BTD (Bin to DNN), a decompiler for deep neural network
(DNN) executables. BTD takes DNN executables and recovers DNN model specifications,
including DNN operators, network topology, dimensions, and parameters. BTD delivers
a practical framework to process DNN executables compiled by different deep learning
compilers and with full optimizations enabled on x86 platforms.
Finally, in our fourth work, we investigate the possibility of reverse engineering compiled
DNN executables in a remote scenario without physical access to the victim device.
All existing methods are limited and can hardly be extended to reverse model architecture
from DNN executables. Nevertheless, with in-depth analysis, we discover that cache-aware
optimizations employed by DL compilers result in distinguishable DNN operator
cache access traces and make model architecture recovery possible. In this paper, we propose
DEEPCACHE, a novel end-to-end, machine learning-based attack to reverse complex
DNN model architectures deployed on the clouds.
Post a Comment