Skip to content

Function Summary Analysis-Based Approach

Traditional static analysis work cannot meet the precision requirements of software security analysis. To achieve high-precision semantic analysis, we divide program execution into two layers based on abstract interpretation: 1) the Symbol layer, which includes identifiers such as variables; 2) the runtime state layer (State), which abstractly models the possible contents of Symbols during real execution as a so-called "State" and associates Symbols with their corresponding States. During semantic analysis, we simulate the dynamic execution process of the program, tracking and calculating changes in States to achieve high-precision state analysis.

The bottom-up analysis can be roughly divided into three parts:

1. GIR Instruction Analysis

In GIR instruction analysis, we process GIR instructions in the current method one by one in control flow order for state-level semantic analysis. For a GIR instruction, its operation field reflects the operation type it belongs to, and based on this field, the GIR instruction is dispatched to the corresponding type-specific processing function. For example, if the operation is "assign_stmt", it is dispatched to the assign_stmt_state function. In the processing function, we calculate and propagate States according to different instruction types.

2. Intraprocedural Analysis

In intraprocedural analysis, we perform GIR instruction analysis on each statement of a function. To avoid the state explosion problem when analyzing large programs and reduce repeated analysis of the same function, we take a single function as an analysis unit. After analyzing a function, we generate a state-level function summary for it, which does not change once generated. The function summary stores the final states of the function's key variables, which is a {Symbol->States} mapping. Key variables include: function parameters, external variables, the this variable, and the return variable. The function summary records the final changes to key variables starting from their initial states through the function's state calculations (e.g., adding a field to a parameter).

3. Interprocedural Analysis

Interprocedural analysis builds on intraprocedural analysis. When encountering a call_stmt like caller call callee, we first check if the callee has been analyzed. If it has, we only need to apply the callee's function summary to the corresponding states in the caller to fully retain the state semantics generated by this call; if not, we immediately interrupt the analysis of the current function and analyze the callee first. After all callees depended on by the caller are analyzed and their function summaries are generated, we resume the interrupted analysis of the caller. This approach avoids state explosion and reduces repeated analysis.

When applying function summaries, we first associate the caller and callee at the Symbol level (e.g., association between formal and actual parameters, external variables). Then, for each pair of associated Symbols, we retrieve the corresponding States from the caller and States' from the callee's function summary, applying States' to States to maximally ensure that the state semantics from the callee are mapped into the caller.

It should be noted that in the bottom-up analysis process, a single function serves as the analysis boundary, and we do not consider the specific states of external variables and function parameters passed into the function. At this stage, we abstract these external states as "anything", meaning they can take any value. If during analysis we find that these "anything" states flow into sensitive operations, such as call anything(), we will mark them as key points and record them in the function summary. In subsequent global analysis, we will resolve the specific states of these key "anything" values from a global perspective.