用 S2E 和 Kaitai Struct 針對性地處理檔案解析器

Editor發表於2017-11-13

介紹

最近我一直在研究S2E中的檔案解析器。這通常涉及呼叫s2ecmd symbfile 檔案來使解析器的輸入符號化，然後執行S2E來解析通過解析器的不同路徑。但是，這是一個比較笨重的做法;它使整個輸入檔案產生一個非常大的符號化的塊，這很快導致了路徑爆炸。此外，我們可能只想探索行使特定功能的路徑。

那麼我們如何在基於檔案的程式（如解析器）上實現更有針對性地實現符號執行呢？一種方法是編寫一個自定義的S2E外掛來處理onSymbolicVariableCreation事件，攔截s2ecmd symbfile檔案。然後，您可以編寫C++程式碼來迭代和具體調整符號化的資料內容。這種方法的缺點是顯而易見的：編寫C++程式碼是相當耗時且容易出錯；它需要知道輸入檔案的格式;在處理不同的檔案型別時還要重寫，如何更好的實現呢？

KaitaiStruct

暫時拋開S2E不談，看看 Kaitai Struct。

Kaitai Struct是開發二進位制結構解析器的工具。它提供了一種類似YAML的語言，可以簡潔地定義二進位制結構。 Kaitai Struct

編譯器（ksc）然後根據這個定義生成一個解析器。該解析器可以用多種語言生成，包括C ++，Python和Java。

以下是Kaitai Struct中的ELF檔案格式的部分定義（取自格式庫)。它由許多描述ELF檔案的“屬性”（例如magic，abi_version等欄位）組成：

meta:

id: elf

title: Executable and Linkable Format

application: SVR4 ABI and up, many *nix systems

license: CC0-1.0

ks-version: 0.8

seq:

# e_ident[EI_MAG0]..e[EI_MAG3]

- id: magic

size: 4

contents: [0x7f, "ELF"]

# e_ident[EI_CLASS]

- id: bits

type: u1

enum: bits

# e_ident[EI_DATA]

- id: endian

type: u1

enum: endian

# e_ident[EI_VERSION]

- id: ei_version

type: u1

# e_ident[EI_OSABI]

- id: abi

type: u1

enum: os_abi

- id: abi_version

type: u1

- id: pad

size: 7

- id: header

type: endian_elf

強烈建議閱讀Kaitai Struct文件以充分利用這篇文章，因為我跳過了大部分細節（主要是因為我自己並不擅長這方面）。然而，有一個值得一提的功能是“處理規範”。

處理規範允許你以某種方式“處理”屬性的自定義函式。例如，可以對屬性進行加密/編碼。處理規範可以在執行時對該屬性進行解密/解碼。

這與符號執行有關嗎？假設我們有一個s2e_make_symbolic的檔案處理規範，並且通過將此規範應用於特定的屬性，我們只會使輸入檔案的這些部分符號化。這會使我們更好的控制S2E的狀態空間，並可能減少路徑爆炸問題。只需要將S2E和Kaitai Struct結合起來就可以實現！

結合S2E和Kaitai Struct

我們將使用Lua程式語言來組合S2E和Kaitai Struct。使用Lua可以重用現有的元件--S2E包含一個嵌入式的Lua直譯器（用於解析S2E配置檔案，編寫函式/指令註釋），而ksc能夠就生成Lua解析器。因此，我們可以使用ksc為我們的輸入檔案生成一個Lua解析器，並將該解析器嵌入到S2E配置檔案中，使其可以被S2E訪問。（我們可以使用ksc來生成一個C++解析器，但這樣的話，每次我們想要使用不同的檔案格式時，都需要重新編譯S2E）。通過在輸入定義中選擇性地應用s2e_make_symbolic處理規範，我們可以實現更有針對性的符號執行。

這篇文章剩餘部分將介紹如何組合S2E和Kaitai Struct。我將使用ELF檔案的定義（前面討論過）和readelf來作為一個例項。

為了讓其他人更容易地使用程式碼，我努力使它儘可能的獨立。- 沒有對S2E的核心引擎或ksc進行任何修改。然而，這意味著程式碼基本沒有優化！程式碼由以下部件組成：

在客戶作業系統中執行的命令列工具（s2e_kaitai_cmd）。這個工具讀取輸入檔案並且呼叫S2E外掛，選擇性地使檔案符號化;

一個S2E外掛（KaitaiStruct），它呼叫Lua程式碼來執行由ksc生成的解析器;

一小段Lua程式碼連線 S2E配置檔案和由ksc生成的解析器。

這些部件中的每一個在下面描述。完整的程式碼在這兒。

s2e_kaitai_cmd工具

在這篇文章的開頭，我提到我們通常會使用s2ecmd symbfile 來使輸入檔案的符號化。 symbfile命令使輸入檔案符號化：

以讀/寫模式開啟輸入檔案

將輸入檔案讀入緩衝區

在緩衝區上呼叫s2e_make_concolic

將（目前符號化的）緩衝區寫回原始輸入檔案

我們將採取類似的方法，除了我們將步驟（3）修改為：

呼叫KaitaiStruct外掛來選擇性地使緩衝區符號化

為此，我們將在S2E環境中新增以下目錄/檔案：

source/s2e/guest/common/s2e_kaitai_cmd/s2e_kaitai_cmd.c

source/s2e/guest/common/include/s2e/kaitai/commands.h

我會跳過步驟1,2和4，因為它們已經在s2ecmd中實現了。對於步驟3，我們會自己寫一個自定義的S2E命令來呼叫一個外掛（稍後描述），有選擇地使輸入的檔案符號化。命令結構應放在source/s2e/guest/common/include/s2e/kaitai/commands.h中。它遵循從客戶端呼叫S2E外掛的標準方法：

enum S2E_KAITAI_COMMANDS {

KAITAI_MAKE_SYMBOLIC,

};

struct S2E_KAITAI_COMMAND_MAKE_SYMBOLIC {

// Pointer to guest memory where the symbolic file has been loaded

uint64_t InputFile;

// Size of the input file (in bytes)

uint64_t FileSize;

// 1 on success, 0 on failure

uint64_t Result;

} __attribute__((packed));

struct S2E_KAITAI_COMMAND {

enum S2E_KAITAI_COMMANDS Command;

union {

struct S2E_KAITAI_COMMAND_MAKE_SYMBOLIC MakeSymbolic;

};

} __attribute__((packed))

然後我們可以將下面的函式新增到s2e_kaitai_cmd.c中。這個函式包含指向檔案內容（已經讀入緩衝區）的指標和緩衝區的大小（由lseek確定），構造相關命令並將此命令傳送到S2E。

static inline int s2e_kaitai_make_symbolic(const uint8_t *buffer, unsigned size) {

struct S2E_KAITAI_COMMAND cmd = {0};

cmd.Command = S2E_KAITAI_MAKE_SYMBOLIC;

cmd.MakeSymbolic.InputFile = (uintptr_t) buffer;

cmd.MakeSymbolic.FileSize = size;

cmd.MakeSymbolic.Result = 0;

s2e_invoke_plugin("KaitaiStruct", &cmd, sizeof(cmd));

return (int) cmd.MakeSymbolic.Result;

}

現在我們需要一個S2E外掛來處理這個命令。

KaitaiStruct外掛

讓我們從一個skeleton外掛開始（不要忘了在source/s2e/libs2eplugins/src/CMakeLists.txt中向s2e/Plugins/KaitaiStruct.cpp新增add_library命令）。

標頭檔案：

#ifndef S2E_PLUGINS_KAITAI_STRUCT_H

#define S2E_PLUGINS_KAITAI_STRUCT_H

#include

// Forward declare the S2E command from s2e_kaitai_cmd

struct S2E_KAITAI_COMMAND;

namespace s2e {

namespace plugins {

// In addition to extending the basic Plugin class, we must also implement the

// BaseInstructionsPluginInvokerInterface to handle custom S2E commands

class KaitaiStruct : public Plugin,

public BaseInstructionsPluginInvokerInterface {

S2E_PLUGIN

public:

KaitaiStruct(S2E *s2e) : Plugin(s2e) { }

void initialize();

// The method from BaseInstructionsPluginInvokerInterface that we must

// implement to respond to a custom command. This method takes the current

// S2E state, a pointer to the custom command object and the size of the

// custom command object

virtual void handleOpcodeInvocation(S2EExecutionState *state,

uint64_t guestDataPtr,

uint64_t guestDataSize);

private:

// The name of the Lua function that will run the Kaitai Struct parser

std::string m_kaitaiParserFunc;

// handleOpcodeInvocation will call this method to actually invoke the Lua

// function

bool handleMakeSymbolic(S2EExecutionState *state,

const S2E_KAITAI_COMMAND &command);

}

} // namespace plugins

} // namespace s2e

#endif

cpp 檔案：

// From source/s2e/guest/common/include

#include

#include "KaitaiStruct.h"

namespace s2e {

namespace plugins {

S2E_DEFINE_PLUGIN(KaitaiStruct,

"Combine S2E and Kaitai Struct",

"",

// Dependencies

"LuaBindings"); // Reuse the existing Lua binding code from

// the function/instruction annotation

// plugins

void KaitaiStruct::initialize() {

m_kaitaiParserFunc = s2e()->getConfig()->getString(getConfigKey() +

".parser");

}

bool KaitaiStruct::handleMakeSymbolic(S2EExecutionState *state,

const S2E_KAITAI_COMMAND &command) {

// We'll finish this later

return true;

}

void KaitaiStruct::handleOpcodeInvocation(S2EExecutionState *state,

uint64_t guestDataPtr,

uint64_t guestDataSize) {

S2E_KAITAI_COMMAND cmd;

// 1. Validate the received command

if (guestDataSize != sizeof(cmd)) {

getWarningsStream(state) << "S2E_KAITAI_COMMAND: Mismatched command "

<< "structure size " << guestDataSize << "\n";

exit(1);

}

// 2. Read the command

if (!state->mem()->readMemoryConcrete(guestDataPtr, &cmd, guestDataSize)) {

getWarningsStream(state) << "S2E_KAITAI_COMMAND: Failed to read "

<< "command\n";

exit(1);

}

// 3. Handle the command

switch (cmd.Command) {

case KAITAI_MAKE_SYMBOLIC: {

bool success = handleMakeSymbolic(state, cmd);

cmd.MakeSymbolic.Result = success ? 0 : 1;

// Write the result back to the guest

if (!state->mem()->writeMemory(guestDataPtr, cmd)) {

getWarningsStream(State) << "S2E_KAITAI_COMMAND: Failed to "

<< " write result to guest\n";

exit(1);

}

} break;

default: {

getWarningsStream(state) << "S2E_KAITAI_COMMAND: Invalid command "

<< hexval(cmd.Command) << "\n";

exit(1);

}

} // namespace plugins

} // namespace s2e

我們的外掛只有一個依賴關係：LuaBindings外掛。這個外掛配置了S2E的Lua直譯器，並允許我們在S2E配置檔案中呼叫Lua程式碼。

handleOpcodeInvocation方法遵循和其他外掛類似的方法，實現了BaseInstructionsPluginInvokerInterface介面（例如FunctionModels和LinuxMonitor）：

通過檢查它的大小來驗證接收的命令。

讀取命令。由於該命令是由客戶機發出的，因此它駐留在客戶機記憶體中。我們的命令都不是符號化的（記住它只包含輸入檔案的起始地址和大小），所以我們可以詳細地讀取這個記憶體內容。

處理命令。在這種情況下，我們呼叫另一個函式（我們將在稍後討論）來呼叫Lua直譯器解析輸入檔案。

顯示客戶機的成功/失敗。我們通過在命令結構中設定“返回值”並將命令寫回到客戶端記憶體中。

最終實現MakeSymbolic。為了編寫Lua程式碼，需要新增一些標頭檔案：

#include

最終實現的函式：

bool KaitaiStruct::handleMakeSymbolic(S2EExecutionState *state,

const S2E_KAITAI_COMMAND &command) {

uint64_t addr = command.MakeSymbolic.InputFile;

uint64_t size = command.MakeSymbolic.FileSize;

std::vector data(size);

// Read the input file's contents from guest memory

if (!state->mem()->readMemoryConcrete(addr, data.data(),

sizeof(uint8_t) * size)) {

return false;

}

// Get the Lua interpreter's state

lua_State *L = s2e()->getConfig()->getState();

// Wrap the current S2E execution state

LuaS2EExecutionState luaS2EState(state);

// Turn the input file into a Lua string

luaL_Buffer luaBuff;

luaL_buffinit(L, &luaBuff);

luaL_addlstring(&luaBuff, (char*) data.data(), sizeof(uint8_t) * size);

// Set up our function call on Lua's virtual stack

lua_getglobal(L, m_kaitaiParserFunc.c_str());

Lunar::push(L, &luaS2EState);

lua_pushinteger(L, addr);

luaL_pushresult(&luaBuff);

// Call our Kaitai Struct parser function

lua_call(L, 3, 0);

return true;

}

希望這比較容易理解（參見這裡有關Lua語言的C API的更多資訊）。首先，我們將輸入檔案讀入Kaitai Struct解析器的Lua字串。然後，我們呼叫Kaitai Struct解析器函式（我們將在下一部分中定義）。

我們必須設定解析器函式的引數才能呼叫它。用棧把值傳遞給Lua函式。函式名首先入棧。解析器函式在Lua的全域性名稱空間中定義（為了簡單起見），因此我們可以使用lua_getglobal從S2E配置檔案中檢索該函式，並將其壓入棧中。然後依次入棧：

當前S2E執行狀態;

輸入檔案的起始地址（在客戶機記憶體中）;

輸入檔案的內容（作為字串）。

現在要做的就是在S2E配置檔案中實現這個解析器。

Lua指令碼

首先，我們需要將Kaitai Struct格式的定義編譯成Lua解析器。既然我們是用readelf做實驗，現在讓我們建立一個readelf專案，並從Kaitai Struct Gallery

獲取ELF定義：

# Create the S2E project

s2e new_project -n readelf_kaitai readelf -h @@

cd projects/readelf_kaitai

# Get the ELF Kaitai Struct definition and compile it

wget https://raw.githubusercontent.com/kaitai-io/kaitai_struct_formats/master/executable/elf.ksy

ksc -t lua elf.ksy

這將會產生elf.lua。讓我們用AFL的例子測試下。如果您還沒有安裝它，您還需要Kaitai Struct的的Lua runtime：

# Get Kaitai Struct's Lua runtime

git clone https://github.com/kaitai-io/kaitai_struct_lua_runtime lua_runtime

# Get the ELF testcase

wget https://raw.githubusercontent.com/mirrorer/afl/master/testcases/others/elf/small_exec.elf

# Parse the testcase

lua5.3 - << EOF

package.path = package.path .. ";./lua_runtime/?.lua"

require("elf")

inp = assert(io.open("small_exec.elf", "rb"))

testcase = Elf(KaitaiStream(inp))

print("testcase e_ehsize: " .. testcase.header.e_ehsize)

EOF

你應該看到一個52位元組大小的header（你可以執行readelf -h small_exec.elf來確認）。

我原先說過我們會用Kaitai Struct的處理規範來定位特定的檔案屬性來使其符號化。我們在lua_runtime/s2e_make_symbolic.lua中定義這個處理規範：

local class = require("class")

S2eMakeSymbolic = class.class()

function S2eMakeSymbolic:_init(s2e_state, start_addr, curr_pos, name)

self._state = s2e_state

self._addr = start_addr + curr_pos

self._name = name

end

function S2eMakeSymbolic:decode(data)

local mem = self._state:mem()

local size = data:len()

-- The decode routine is called after the data has already been read, so we

-- must return to the start of the data in order to make it symbolic

local addr = self._addr - size

mem:makeConcolic(addr, size, self._name)

-- Return the data unchanged

return data

end

目前已經定義了一個新的類S2eMakeSymbolic和一個建構函式(_init)，一個decode方法：

構造器包含以下引數：

當前S2E的執行狀態;

輸入檔案的起始地址（在客戶機記憶體中）;

解析器的當前位置。這個地址加上起始地址可以計算出符號化的記憶體地址;

符號變數的名稱。

當ELF解析器遇到應用s2e_make_symbolic處理規範的屬性時，將自動呼叫decode。然而，在從輸入檔案中讀取資料之後才呼叫decode方法，所以使資料符號化（通過減去剛剛讀取的儲存器區域的大小）時，必須對此進行彌補。

讓我們做一些符號化的東西。我們現在將選擇一些簡單的部分 - ELF頭部的e_machine欄位。在elf.ksy中，e_machine欄位在endian_elf型別下定義：

# The original definition of the e_machine field

- id: machine

type: u2

enum: machine

處理規範只能應用於位元組陣列，所以我們必須用位元組陣列的size欄位來替換type欄位。因為原始資料型別是無符號的雙位元組數，所以我們可以將該機器簡單地視為一個大小為2位元組的陣列。我們還必須刪除列舉對映，否則當它嘗試將列舉型別應用到一個位元組的陣列時，ksc會引發編譯錯誤。

# Redefinition of the e_machine field to make it symbolic

- id: machine

size: 2

process: s2e_make_symbolic(s2e_state, start_addr, _io.pos, "machine")

最後，我們必須從解析器的建構函式傳遞另外兩個引數--S2E執行狀態和輸入檔案的起始地址--從解析器的構造器傳到s2e_make_symbolic。我們用“params spec”來實現。 machine屬性巢狀在endian_elf和頂級elf型別下，因此下面的引數規範必須被定義。

params:

- id: s2e_state

- id: start_addr

我們還必須將header的型別從endian_elf修改為endian_elf（s2e_state，start_addr）。這確保兩個引數傳遞給endian_elf的建構函式。（如果還有點困惑，看下這裡的原始碼）。

# The original header's type

- id: header

type: endian_elf

# Redefined to propagate the S2E execution state and input file's start address

# to the endian_elf type

- id: header

type: endian_elf(s2e_state, start_addr)

現在重新編譯elf.ksy。如果開啟elf.lua，你應該看到，建構函式（Elf：_init）的前兩個引數為s2e_state和start_addr。這些引數被儲存下來，並通過Elf.EndianElf建構函式傳播到S2eMakeSymbolic建構函式。

剩下要做的就是在我們的S2E配置檔案中寫一個小的函式來例項化並執行我們的解析器。該功能由KaitaiStruct外掛中的handleMakeSymbolic方法呼叫。

package.path = package.path .. ";./lua_runtime/?.lua"

local stringstream = require("string_stream")

require("elf")

function make_symbolic_elf(state, start_addr, buffer)

local ss = stringstream(buffer)

-- This will kick-start the parser. We don't care about the final result

Elf(state, start_addr, KaitaiStream(ss))

end

-- Enable and configure the necessary plugins

add_plugin("LuaBindings")

add_plugin("KaitaiStruct")

pluginsConfig.KaitaiStruct = {

parser = "make_symbolic_elf",

}

完成了!

本文由看雪翻譯小組 fyb波編譯，來源Adrian's Ramblings 轉載請註明來自看雪社群

用 S2E 和 Kaitai Struct 針對性地處理檔案解析器（二）
2017-11-14
AIStruct
對於CSV檔案中{,}和{"}的處理
2009-03-16
針對Adblock廣告遮蔽處理
2017-12-15
BloC
用c語言處理檔案
2020-09-13
C語言
sqlserver 針對預處理sql傳入引數的處理方式
2017-03-21
SQLServer
針對web高併發量的處理
2017-11-12
Web
多對一處理和一對多處理的處理
2020-06-20
MyBaits | 對映檔案之引數處理
2018-07-26
AI
[R]檔案處理
2016-09-30
bat處理檔案
2011-02-21
BAT
bat檔案處理
2010-05-18
BAT
用批處理檔案執行備份
2005-10-31
beetle.express針對websocket的高效能處理
2015-05-08
ExpressWeb
如何使用find和xargs查詢和處理檔案
2019-11-17
Linux學習之檔案處理命令（二）目錄處理命令 && 檔案處理命令
2017-06-20
Linux
針對字尾刪除檔案的方法
2016-09-12
用shell處理二進位制檔案(轉)
2007-08-11
window 批處理檔案
2019-05-10
python處理檔案
2020-09-18
Python
Go xml檔案處理
2022-03-15
GoXML
python檔案處理
2023-02-26
Python
python 檔案處理
2018-05-15
Python
Python 檔案處理
2014-10-31
Python
JAVA ZIP 處理檔案
2014-09-18
Java
批處理檔案命令
2010-05-13
檔案處理函式
2015-11-15
函式
Windows批處理檔案
2012-10-04
Windows
bat批處理檔案
2013-03-07
BAT
Javascript如何訪問和處理系統檔案
2013-10-17
JavaScript
Python使用struct處理二進位制(pack和unpack用法)
2016-10-10
PythonStruct
用批處理檔案編譯並執行java
2006-02-26
編譯Java
c#針對文字檔案之StreamRead和StreamWriter出現的理由
2017-11-09
C#
java 檔案處理工具類
2021-09-30
Java
python處理txt檔案
2020-10-10
Python
laravel處理檔案上傳
2017-03-11
Laravel
Python 批量處理檔案
2017-07-13
Python
node js 處理PDF檔案
2018-04-28
JS
Python處理大檔案
2014-04-12
Python

用 S2E 和 Kaitai Struct 針對性地處理檔案解析器

介紹

KaitaiStruct

結合S2E和Kaitai Struct

s2e_kaitai_cmd工具

KaitaiStruct外掛

Lua指令碼

相關文章