N-Bref : A High-fidelity Decompiler Exploiting Programming Structures

1 Jan 2021 · Cheng Fu, Kunlin Yang, Xinyun Chen, Yuandong Tian, Jishen Zhao ·

In software development, decompilation aims to reverse engineer binary executables. With the success of neural machine translation (NMT), recent efforts on neural-based decompiler show promising results compared to traditional decompilers. However, key challenges remain: (i) Prior neural-based decompilers focus on simplified programs without considering sophisticated yet widely-used data types such as pointers; furthermore, many high-level expressions map to the same low-level code (expression collision), which incurs critical decompiling performance degradation; (ii) State-of-the-art NMT models (e.g., transformer and its variants) mainly deal with sequential data, and are inefficient for decompilation, where the input and output data are highly structured. In this paper, we propose N-Bref, a new framework for neural decompilers that addresses the two aforementioned challenges with two key designs: (i) N-Bref designs a structural transformer with three key design components for better comprehension of structural data – an assembly encoder, an abstract syntax tree encoder, and a tree decoder, extending transformer models in the context of decompilation. (ii) N-Bref introduces a program generation tool that can control the complexity of code generation and removes expression collisions. Extensive experiments demonstrate that N-Bref outperforms previous neural-based decompilers by a large margin. In particular, N-Bref successfully reverts human-written Leetcode programs with complex library calls and data types.

PDF Abstract