An article parsing AddressSanitizer algorithm and source code

Introduction to AddressSanitizer

AddressSanitizer is a very useful tool used by Google to detect various buffer overflows (Heap buffer overflow, Stack buffer overflow, Global buffer overflow). This tool is a LLVM Pass, which is now integrated into llvm. If you use it, you can use it with the -fsanitizer=address option. The source code of AddressSanitizer is located in /lib/Transforms/Instrumentation/AddressSanitizer.cpp, and the source code of Runtime-library is located in the /lib/asan folder of compiler-rt, another project of llvm.

AddressSanitizer algorithm

The specific algorithm can refer to WIKI (https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm), here is a brief introduction to the AddressSanitizer algorithm. AddressSanitizer mainly includes two parts: Instrumentation and Run-time library. Instrumentation is mainly for processing memory access operations (store, load, alloca, etc.) at the llvm compiler level. The dynamic runtime library mainly provides some complex functions at runtime (such as poison/unpoison shadow memory) and hooks malloc, free and other system call functions. In fact, the idea of ​​this algorithm is very simple. If you want to prevent Buffer Overflow vulnerabilities, you only need to add a zone (RedZone) to the right end of each memory area (or both ends, which can prevent overflow and underflow) to make the shadow memory of the RedZone area ( Shadow Memory) is not writable. The specific schematic diagram is shown in the figure below.

An article parsing the AddressSanitizer algorithm and source code

Memory map

The main principle of AddressSanitizer protection is to provide coarse-grained shadow memory for virtual memory in the program (each 8 bytes of memory corresponds to one byte of shadow memory). In order to reduce overhead, a direct memory mapping strategy is adopted. The specific strategy is as follows: Shadow=(Mem >> 3) + offset. Every 8 bytes of memory corresponds to a byte of shadow memory. Each byte in the shadow memory accesses a number k. If k=0, it means that the 8 bytes of memory corresponding to the shadow memory can be accessed. If 0

An article parsing AddressSanitizer algorithm and source code

Figure 1: Virtual address map

An article parsing the AddressSanitizer algorithm and source code

Instrumentation

In order to prevent buffer overflow, it is necessary to allocate additional memory Redzone on both sides of the originally allocated memory, and lock the memory on both sides and set it to an inaccessible state, which can effectively prevent buffer overflow (but not prevent buffer overflow). The following is an example of instrumentation in the stack.

Uninstrumented code:

An article parsing the AddressSanitizer algorithm and source code

Code after instrumentation:

An article parsing the AddressSanitizer algorithm and source code

Dynamic runtime

The malloc/free function is replaced in the dynamic runtime library. In the malloc function, additional memory in the Redzone area is allocated, and the shadow memory corresponding to the Redzone area is locked, and the shadow memory corresponding to the main memory area is not locked.

The free function locks all allocated memory areas and puts them in the queue of the isolated area (to ensure that they will not be allocated by the malloc function within a certain period of time).

AddressSanitizer source code analysis

AddressSanitizer mainly has three levels of variables: Stack Variable (local variable), Global Variable, Heap Variable. Since the life time of each variable is different, the processing of different types of variables is also different. The following analyzes the logical structure of the AddressSanitizer source code from the three levels of Global Variable, Stack Variable, and Heap Variable.

Global Variable

Global Variable is stored in the data section of the program. In the implementation of this algorithm, the AddressSanitizerModule class that handles GlobalVariale, which inherits from llvm's ModulePass, so let’s take a look at the process of the runOnModule (Module & M) method of the AddressSanitizerModule class. The process first performs some initialization, and then we You can see the InstrumentGlobals() method of the Global instrumentation method.

An article parsing AddressSanitizer algorithm and source code

Figure 2: RunOnModule

In the InstrumentGlobals() method, it is mainly divided into two steps: First, redeclare a GlobalVariable, which contains the previous GlobalVariable and a RedZone; then, call runtime-library to lock the newly declared RedZone area of ​​the GlobalVariable. Let's first look at the specific implementation of the first step, as shown in Figure 3.

An article parsing AddressSanitizer algorithm and source code

Figure 3: Generate a new GlobalVariable containing RedZone

Below, we first look at a Struct structure, which records the first address of GlobalVariable storage, the size of the data, the size of the Redzone, the name of the Module and other information, which is convenient for use in Runtime-library. This structure has corresponding definitions in AddressSanitizerModule and runtime-library:

An article parsing AddressSanitizer algorithm and source code

An article parsing AddressSanitizer algorithm and source code

Then we can see that GlobalVariable is instrumented to implement RedZone's Poison and the entire GlobalVariable's Poison operation.

An article parsing AddressSanitizer algorithm and source code

An article parsing AddressSanitizer algorithm and source code

The specific implementation of Poison RedZone and Poison GlobalVariable is in Runtime-library:

An article parsing AddressSanitizer algorithm and source code

An article parsing AddressSanitizer algorithm and source code

Stack Variable

Stack Variable is stored in the stack area. We need to control the variable lifetime of the data in the stack. When a function is called, a stack will be opened. The data in the stack will have corresponding redzone and shadow memory, and Shadow memory Poison of redzone, when the function ends (normal return, exception), the stack is destroyed, the data and redzone need to be cleared, and the corresponding shadow memory also needs to be UnPoisoned.

For Stack Variable, the AddressSanitizer algorithm implements the AddressSanitizer class, which is a FunctionPass that inherits llvm. The Pass can process every function. When processing each function, it processes every load, store and other instructions that can access memory. Perform instrumentation before these instructions are executed to see if the memory accessed is poisoned.

Below we mainly look at the main instrumentation process in the AddressSanitizer::runOnFunction(Module &M) function.

An article parsing AddressSanitizer algorithm and source code

Every time the memory is accessed, the value of the shadow memory will be checked to see if it is 0. If it is 0, it means that the specific instrumentation can be accessed in the instrumentMop function.

An article parsing AddressSanitizer algorithm and source code

The specific processing process is in the instrumentAddress function:

An article parsing the AddressSanitizer algorithm and source code

Heap Variable

Heap Variable is stored in the heap area, and its allocated function is the malloc function. The main code of this part is in the runtime-library. In this library, the malloc library function is hooked first, and then the malloc function is defined by itself and the allocation strategy is defined.

An article parsing AddressSanitizer algorithm and source code

The specific allocation strategy is defined in the compiler-rt/lib/asan/asan-allocator.cc file, you can take a look if you are interested.

Detachable Laptop Stand

Detachable Laptop Stand,Foldable Detachable Computer Stand,Detachable Ergonomic Laptop Stand,Detachable And Foldable Aluminum Laptop Stand

Shenzhen ChengRong Technology Co.,Ltd. , https://www.laptopstandsupplier.com