Authors:
Anjali Pare
and
Prasad Kulkarni
Affiliation:
Electrical Engineering and Computer Science, University of Kansas, Lawrence, Kansas, U.S.A.
Keyword(s):
Reverse Engineering, Control-Flow Graphs, Disassembly.
Abstract:
The control-flow graph (CFG) is a graphical representation of the program and holds information that is critical to the correct application of many other program analysis, performance optimization, and software security algorithms. While CFG generation is an ordinary task for source level tools, like the compiler, the loss of high-level program information makes accurate CFG recovery a challenging issue for binary-level software reverse engineering (SRE) tools. Earlier research shows that while advanced SRE tools can precisely reconstruct most of the CFG for the programs, important gaps and inaccuracies remain that may hamper critical tasks, from vulnerability and malicious code detection to adequately securing software binaries. In this work, we perform an in-depth analysis of control-flow graphs generated by three popular reverse engineering tools - angr, radare2 and Ghidra. We develop a unique methodology using manual analysis and automated scripting to understand and categorize t
he CFG errors over a large benchmark set. Of the several interesting observations revealed by this work, one that is particularly unexpected is that most errors in the reconstructed CFGs appear to not be intrinsic limitations of the binary-level algorithms, as currently believed, and may be simply eliminated by more robust implementations. We expect our work to lead to more accurate CFG reconstruction in SRE tools and improved precision for other algorithms that employ CFGs.
(More)