A recompilation pipeline


Team

  Payer Mathias


This project introduces DICER, a recompilation pipeline, performing assured recompilation of a patched binary, leveraging information extracted from the original binary by Technical Area 1 (TA1) performers. DICER can optimally recompile a patched binary (minimizing the differences with the original binary), and, at the same time, assure its functionality using multi-layer verifications.

DICER is based on the following two crucial intuitions:

  • Formally comparing the functionality of two binaries (the original binary and the patched one) becomes a tractable problem when their differences are limited
  • Binary verification must happen at different levels of abstractions.

To minimize the difference between the original binary and the patched version, we design a new intermediate representation called teIR to include both the type information from the decompiled source file and the original binary, as well as their mapping.

To formally verify the non-interference property of a given patch, we formalize the conditions for a patch to be “Safe to Apply” (StA) and prove the equivalence between StA and non-interference. DICER then uses symbolic execution to formally prove the StA property both at the IR level and the binary level.

To tackle the missing toolchain issue and reuse the existing binary as much as possible, we propose a genetic-programming-based recompiler, which can recover the best likely compiler and compilation options using machine learning. DICER then outputs the recovered toolchain, as well as the recompiled patched binary.

To handle specific challenges from embedded systems such as timing constraints and limitations from symbolic execution engine and SMT solvers, we include the timing requirement within the formalization of StA and model timing within teIR. We further extend symbolic execution engines and SMT solvers to support interrupt scheduling and non-linear operations.

The unique design of DICER lies in the multi-level formal verification and the genetic-based recompiler. We can achieve this multi-level verification bridging the high-level code and the resulted binary. The genetic-based search for compiler options is inspired by the state-of-the-art fuzzing tool AFL.

Since non-interference is a strong property, DICER might need human expertise to provide confirmations for patches. We will propose a weaker formalization of non-interference and leverage dynamic verification to provide better assurance of these patches.