# ·iva

### Workshop: Analysis of Virtualization-based Obfuscation

Tim Blazytko @mr\_phrazer tim@blazytko.to https://synthesis.to binary security researcher, co-founder of emproof GmbH and former PhD student

- research: code deobfuscation, fuzzing and root cause analysis
- full-time: design and evaluation of obfuscation techniques
- freelancing: reverse engineering and trainings



basics of VM-based obfuscation

• manual analysis

• symbolic execution to guide manual analysis

• writing an SE-based disassembler

# https://github.com/mrphrazer/r2con2021\_deobfuscation

# Virtual Machine Basics

```
mov ecx, [esp+4]
xor eax, eax
mov ebx, 1
__secret_ip:
  mov edx, eax
  add edx, ebx
  mov eax, ebx
  mov ebx, edx
  loop __secret_ip
mov eax, ebx
ret
```

```
mov ecx, [esp+4]
xor eax, eax
mov ebx, 1
__secret_ip:
  mov edx, eax
  add edx, ebx
  mov eax, ebx
  mov ebx, edx
 loop __secret_ip
mov eax, ebx
ret
```

mov ecx, [esp+4] **xor** eax, eax mov ebx, 1 \_\_secret\_ip: **mov** edx, eax add edx, ebx mov eax, ebx ebx, edx IOV loop \_\_secret\_ip Nov eax, ebx ret



#### made-up instruction set

| byted<br>vld<br>vpop<br>vld<br>vld<br>vadd<br>vld<br>vld | r0<br>r1<br>r2<br>r1<br>r1<br>r2 | vld<br>vpop<br>vldi<br>vld<br>vsub<br>vld<br>veq<br>vbr0 | r2<br>#1<br>r3<br>r3<br>#0<br>r3 |
|----------------------------------------------------------|----------------------------------|----------------------------------------------------------|----------------------------------|
| vpop                                                     | r0                               | vbr0                                                     | #-0E                             |



#### made-up instruction set





#### Core Components

VM Entry/ExitContext Switch: native context ⇔ virtual contextVM DispatcherFetch-Decode-Execute loopHandler TableIndividual VM ISA instruction semantics

- Entry Copy native context (registers, flags) to VM context.
- Exit Copy VM context back to native context.
- Mapping from native to virtual registers is often 1:1.

#### **Core Components**

VM Entry/ExitContext Switch: native context ⇔ virtual contextVM DispatcherFetch-Decode-Execute loopHandler TableIndividual VM ISA instruction semantics

- 1. Fetch and decode instruction
- 2. Forward virtual instruction pointer
- 3. Look up handler for opcode in handler table
- 4. Invoke handler



#### Core Components

VM Entry/ExitContext Switch: native context ⇔ virtual contextVM DispatcherFetch-Decode-Execute loopHandler TableIndividual VM ISA instruction semantics

- Table of function pointers indexed by opcode
- One handler per virtual instruction
- Each handler decodes operands and updates VM context

|     | handle_vpush |              |
|-----|--------------|--------------|
| FDE | look up      | handle_vadd  |
|     |              | handle_vxor  |
|     |              | handle_vexit |
|     |              | handle_vpop  |
|     |              |              |





#### **Data Structures**

- bytecode
  - $\cdot$  array of bytes that **encodes** the protected code
  - will be **interpreted** by the virtual machine

#### **Data Structures**

- bytecode
  - array of bytes that encodes the protected code
  - will be **interpreted** by the virtual machine
- virtual instruction pointer
  - points to the **current** instruction in the bytecode
  - incremented after each instruction by its size

#### **Data Structures**

- bytecode
  - array of bytes that encodes the protected code
  - will be **interpreted** by the virtual machine
- virtual instruction pointer
  - points to the **current** instruction in the bytecode
  - incremented after each instruction by its size
- virtual stack pointer
  - points to the VM-internal top of stack (TOS)
  - modified by  $\mathbf{vpush}$  and  $\mathbf{vpop}$  instructions

```
__vm_dispatcher:
mov bl, [rsi]
inc rsi
movzx rax, bl
jmp __handler_table[rax * 8]
```

VM Dispatcher

rsi - virtual instruction pointer
rbp - VM context

```
__vm_dispatcher:
mov bl, [rsi]
inc rsi
movzx rax, bl
jmp __handler_table[rax * 8]
```

VM Dispatcher

rsi - virtual instruction pointer
rbp - VM context

handle vnor: mov rcx. [rbp] mov rbx. [rbp + 4]not rcx **not** rbx and rcx, rbx mov [rbp + 4], rcx pushf pop [rbp] jmp \_\_vm\_dispatcher

### Instruction Handler Arguments

Instruction handler can pass arguments through a stack or in registers.

- stack-based architecture
  - pop arguments from stack
  - push results onto stack
  - examples: JVM, CPython, WebAssembly, ...
- register-based architecture
  - pass arguments in virtual registers
  - store results in virtual registers
  - examples: Dalvik, Lua, LLVM, ...
- hybrid architectures possible

# Breaking Virtual Machine Obfuscation

- $\cdot$  locate the bytecode that is interpreted by the VM
- understand the VM architecture/context
- reverse engineer the handler semantics
- reconstruct the VM control flow
- reconstruct the high-level control flow

# Manual Analysis

Get a better understanding of the VM:

- $\cdot\,$  identify basic VM components and structures
- detect patterns in handlers

recover handler semantics

### Sample

- function that implements iterative Fibonacci
- basic virtual machine protection generated with *Tigress*<sup>1</sup>
- virtual machine layout
  - stack-based virtual machine
  - virtual instruction and stack pointer
  - nested tree-based dispatching
  - 11 VM handlers

<sup>&</sup>lt;sup>1</sup>https://tigress.wtf/

Open the sample vm\_basic.bin and start your analysis at 0x115a.

- Locate the VM dispatcher.
- Locate the bytecode.
- Identify some basic blocks that implement handlers.
- What are the functions of rdx and rcx?

Open the sample vm\_basic.bin and analyze the handler at 0x11e1.

- How fetches the handler its argument?
- What does it do with the argument?

• What else does the handler do?

Open the sample vm\_basic.bin and analyze the handler at 0x11a9.

- How fetches the handler its arguments?
- What does it compute?

• What else does the handler do?

Open the sample vm\_basic.bin and analyze the handler at 0x1281.

- What does the handler check?
- Why does it branch?

• What does the handler do with rdx and rax?

- rdx is virtual instruction pointer, rcx is virtual stack pointer
- handlers push and pop arguments from/onto the stack
- handlers update the virtual instruction and stack pointers
- handler **0x11e1** loads a constant from the bytecode and pushes it onto the stack
- handler 0x11a9 implements a stack-based addition
- handler **0x1281** implements a conditional branch

# Symbolic Execution

- computer algebra system for assembly code
- symbolic summaries of instructions, basic blocks and paths
- summaries provide detailed insights and reveal patterns
  - $\Rightarrow$  supports manual VM analysis
- can be mixed with concrete values (dynamic/concolic execution)
- can automatically follow the execution flow (interactive emulator/debugger)
  - $\Rightarrow$  dynamic VM disassembler

```
handle vnor:
 mov rcx, [rbp]
 mov rbx, [rbp + 4]
 not rcx
 not rbx
 and rcx, rbx
 mov [rbp + 4], rcx
 pushf
 pop [rbp]
 jmp __vm_dispatcher
```



rcx ← [rbp]



 $rcx \leftarrow [rbp]$  $rbx \leftarrow [rbp+4]$ 



 $rcx \leftarrow [rbp]$   $rbx \leftarrow [rbp+4]$   $rcx \leftarrow \neg rcx = \neg [rbp]$ 



 $rcx \leftarrow [rbp]$   $rbx \leftarrow [rbp+4]$   $rcx \leftarrow \neg rcx = \neg [rbp]$   $rbx \leftarrow \neg rbx = \neg [rbp+4]$ 



- $rcx \leftarrow [rbp]$
- $rbx \leftarrow [rbp + 4]$
- $\mathbf{rcx} \leftarrow \neg \mathbf{rcx} = \neg [\mathbf{rbp}]$
- $rbx \leftarrow \neg rbx = \neg [rbp + 4]$
- $rcx \leftarrow rcx \wedge rbx$ 
  - $= (\neg [rbp]) \land (\neg [rbp + 4])$



- $rcx \leftarrow [rbp]$
- $rbx \leftarrow [rbp + 4]$
- $\mathsf{rcx} \ \leftarrow \ \neg \, \mathsf{rcx} = \neg \, [\mathsf{rbp}]$
- $rbx \leftarrow \neg rbx = \neg [rbp + 4]$
- $rcx \leftarrow rcx \wedge rbx$ 
  - $= (\neg [rbp]) \land (\neg [rbp+4])$
  - $= [rbp] \downarrow [rbp + 4]$

```
handle vnor:
 mov rcx. [rbp]
 mov rbx. [rbp + 4]
 not rcx
 not rbx
 and rcx, rbx
• mov [rbp + 4], rcx
 pushf
 pop [rbp]
 jmp vm dispatcher
```

 $rcx \leftarrow [rbp]$   $rbx \leftarrow [rbp+4]$   $rcx \leftarrow \neg rcx = \neg [rbp]$   $rbx \leftarrow \neg rbx = \neg [rbp+4]$   $rcx \leftarrow rcx \land rbx$   $= (\neg [rbp]) \land (\neg [rbp+4])$   $= [rbp] \downarrow [rbp+4]$   $[rbp+4] \leftarrow rcx = [rbp] \downarrow [rbp+4]$ 

## handle vnor: mov rcx, [rbp] mov rbx, [rbp + 4]not rcx **not** rbx and rcx, rbx mov [rbp + 4], rcx • pushf pop [rbp] jmp \_\_vm\_dispatcher

Handler performing **nor** (with flag side-effects)

$$rcx \leftarrow [rbp]$$

$$rbx \leftarrow [rbp + 4]$$

$$rcx \leftarrow \neg rcx = \neg [rbp]$$

$$rbx \leftarrow \neg rbx = \neg [rbp + 4]$$

$$rcx \leftarrow rcx \land rbx$$

$$= (\neg [rbp]) \land (\neg [rbp + 4])$$

$$= [rbp] \downarrow [rbp + 4]$$

$$[rbp + 4] \leftarrow rcx = [rbp] \downarrow [rbp + 4]$$

$$rsp \leftarrow rsp - 4$$

$$[rsp] \leftarrow flags$$

## handle vnor: mov rcx. [rbp] mov rbx. [rbp + 4]not rcx **not** rbx and rcx, rbx mov [rbp + 4]. rcx pushf • pop [rbp] jmp vm dispatcher

Handler performing **nor** (with flag side-effects)

 $rcx \leftarrow [rbp]$   $rbx \leftarrow [rbp+4]$   $rcx \leftarrow \neg rcx = \neg [rbp]$   $rbx \leftarrow \neg rbx = \neg [rbp+4]$   $rcx \leftarrow rcx \land rbx$   $= (\neg [rbp]) \land (\neg [rbp+4])$   $= [rbp] \downarrow [rbp+4]$   $[rbp+4] \leftarrow rcx = [rbp] \downarrow [rbp+4]$ 

$$\begin{array}{rrrr} rsp & \leftarrow & rsp - 4 \\ [rsp] & \leftarrow & flags \\ [rbp] & \leftarrow & [rsp] = flags \\ rsp & \leftarrow & rsp + 4 \end{array}$$

## handle vnor: mov rcx. [rbp] mov rbx. [rbp + 4]not rcx **not** rbx and rcx, rbx mov [rbp + 4]. rcx pushf pop [rbp] • jmp vm dispatcher

Handler performing **nor** (with flag side-effects)

 $rcx \leftarrow [rbp]$   $rbx \leftarrow [rbp+4]$   $rcx \leftarrow \neg rcx = \neg [rbp]$   $rbx \leftarrow \neg rbx = \neg [rbp+4]$   $rcx \leftarrow rcx \land rbx$   $= (\neg [rbp]) \land (\neg [rbp+4])$   $= [rbp] \downarrow [rbp+4]$   $[rbp+4] \leftarrow rcx = [rbp] \downarrow [rbp+4]$ 

$$\begin{array}{rrrr} rsp \ \leftarrow \ rsp - 4 \\ [rsp] \ \leftarrow \ flags \\ [rbp] \ \leftarrow \ [rsp] = flags \\ rsp \ \leftarrow \ rsp + 4 \end{array}$$

# Symbolic Execution on the Binary Level

- disassemble a given code location
- lift the disassembled code into an intermediate representation
  - free of side effects (explicit formulas for implicit flag and stack pointer updates)
  - common language for various architectures (x86, arm, mips, ...)
- pre-configure the symbolic state with concrete values (for concolic execution)
- symbolically execute the code starting at a given address

### Today: Based on the *Miasm* reverse engineering framework<sup>2</sup>

<sup>&</sup>lt;sup>2</sup>https://github.com/cea-sec/miasm

Use symbolic\_execution.py and analyze the handler at 0x11e1.

- Can you spot the virtual instruction pointer update?
- Try to locate the handler's core semantics.
- What else do you see?

Reminder: The handler loads a constant (bytecode) and pushes it onto the stack.

Use symbolic\_execution.py and analyze the handler at 0x11a9.

- Can you spot the virtual instruction pointer update?
- Try to locate the handler's core semantics.
- Try to understand how the parameters are derived.

Reminder: The handler performs a stack-based addition.

### Lessons Learned

- $\cdot$  RDX = RDX + 0x1
  - increment the virtual instruction pointer by 1
- $\cdot$  RDX = RDX + 0x5
  - increment the virtual instruction pointer by 5
- $\cdot 032[RCX + 0x8] = 032[RDX + 0x1]$ 
  - load a constant from the bytecode and store it onto the stack
- @32[RCX + 0xFFFFFFFFFFF8] = @32[RCX] + @32[RCX + 0xFFFFFFFFFFFF8]
  - $\cdot\,$  pop to values from the stack, add them and push the result onto the stack

# Writing an SE-based Disassembler

## Overview

- up until now: manual analysis to get a basic VM understanding
  - VM components and structures
  - basic VM layout
  - handlers and (some) of their semantics
- next step: automated VM analysis
  - goal: SE-based disassembler
  - interactive approach between manual analysis and automation

# VM Deobfuscation Automation Primer

- 1. build a symbolic execution engine that automatically follows the execution flow
- 2. start SE at the VM entry
- 3. each time SE stops, check why and hardcode register/memory values (bytecode, ...)
- 4. if SE reaches VM exit, extend VM executor
  - add knowledge about handlers
  - dump values
  - reconstruct control-flow graph

### Modify follow\_execution\_flow.py until the symbolic execution leaves the VM.

- Execute the script and check where it stops.
- Add more and more knowledge about the VM and re-run the script.
- $\cdot$  Use multiple concrete inputs for the VM and derive their corresponding outputs.

Modify vm\_disassembler.py and enrich the disassembler output as much as possible.

- Start with the handlers you already know.
- Reverse engineer additional handlers and improve the disassembler output.
- If possible, dump intermediate values and add them to the output.

Hint: The handlers executed before conditional jumps are comparisons.

Run vm\_disassembler\_final.py. Try to reconstruct the underlying algorithm.

- Have a look at the disassembly. Can you identify patterns?
- Try to simplify the disassembly. Can you omit certain instructions?
- Can you rewrite multiple instructions in shorter sequences?
- Try to map the VM disassembly to the original code.

Hint: The underlying algorithm implements an iterative Fibonacci calculation.

- goto can be omitted
- PUSH 0x0 ; PUSHPTR var\_0x4 ; POPTOVAR

```
\cdot var_0x4 := 0
```

 PUSHPTR var\_0x8 ; PUSHFROMVAR ; PUSHPTR var\_0x4 ; PUSHFROMVAR ; ADD ; PUSHPTR ; POPTOVAR

```
· var_0xc := var_0x8 + var_0x4
```

# Conclusion

• VM analysis can be time-consuming

- mixture of manual analysis and automation
- automation can be cumbersome to implement (API calls, external data, ...)

• way more advanced VMs exist, but approach stays the same

# Conclusion

Today:

- manual analysis of a VM
- writing an SE-based disassembler
- reconstruction of VM disassembly
- slides, code and samples:
  https://github.com/mrphrazer/r2con2021\_deobfuscation

Reach out for questions or discussions:

Ƴ @mr\_phrazer

☆ https://synthesis.to

Thank you very much for your active participation!