Our HexVASAN sanitizer


  Payer Mathias

With increasing complexity in software, there is a phenomenal increase in the number and range of functions. Since functions often have complex arguments of the same or similar (castable) types, some common types of errors are observed; often, arguments are swapped or the wrong arguments are used. Such errors are hard to discover for both untyped low-level languages (like C and C++) and high-level languages (like Python, Perl, or PHP). Existing approaches are imprecise and have limited scalability due to the use of symbolic execution and heuristics.

We developed HexVASAN, where we track the arguments to variadic functions (such as printf) and execute a runtime type check whenever an argument is used. HexVASAN adds metadata that records the argument types at the call site and, in the callee, checks if values are consumed with the correct types. Our HexVASAN sanitizer enforces type safety for argument flow in variadic functions but does not yet consider API information.

We propose to build an API flow graph (AFG) that encodes all valid API interactions and their parameters. Our proposed algorithm will build the global AFG by analyzing all uses of a function on the system’s source code. To reduce the imprecision of our inferred API graphs, we will leverage large test projects that provide a large corpus of test cases and input files for a wide variety of programs. We plan to use the data set to infer API usage by monitoring the state construction through the provided seeds and examples.

The analysis will

  • Infer static parameters using a combination of value propagation and backward slicing
  • Record the static type of the arguments (e.g., a defined value from a header file).

Based on the inferred AFG, we will design an LLVM-based sanitizer that enforces parameter types for function calls and their concrete values. For each API call, we will update the runtime state accordingly and flag violations in a sanitizer. Learning benign interactions from a large body of source code allows us to infer correct API usage and to detect API misuse.

Challenges for this task are the precision of the AFG construction and the depth of the enforced API interactions. During enforcement, we will restrict the checks to certain library APIs. Limiting the scope enables us to increase the precision as orthogonal calls will be pruned automatically. We envision extending the AFG with stateful information, leveraging the AFG for runtime enforcement to protect against mimicry attacks.