使用 LLVM 框架建立有效的編譯器，第 2 部分

發表於2012-08-01

簡介： 無論您使用哪一種程式語言，LLVM 編譯器基礎架構都會提供一種強大的方法來優化您的應用程式。在這個兩部分系列的第二篇文章中，瞭解在 LLVM 中測試程式碼，使用 clang API 對 C/C++ 程式碼進行預處理。

使用 LLVM 框架建立一個工作編譯器，第 1 部分探討了 LLVM 中間表示 (IR)。您手動建立了一個 “Hello World” 測試程式；瞭解了 LLVM 的一些細微差別（如型別轉換）；並使用 LLVM 應用程式程式設計介面 (API) 建立了相同的程式。在這一過程中，您還了解到一些 LLVM 工具，如llc 和 lli，並瞭解瞭如何使用 llvm-gcc 為您發出 LLVM IR。本文是系列文章的第二篇也是最後一篇，探討了可以與 LLVM 結合使用的其他一些炫酷功能。具體而言，本文將介紹程式碼測試，即向生成的最終可執行的程式碼新增資訊。本文還簡單介紹了 clang，這是 LLVM 的前端，用於支援 C、C++ 和 Objective-C。您可以使用 clang API 對 C/C++程式碼進行預處理並生成一個抽象語法樹 (AST)。

使用 LLVM 框架建立一個工作編譯器，第 1 部分

LLVM 階段

LLVM 以其提供的優化特性而著名。優化被實現為階段 (pass)。這裡需要注意的是 LLVM 為您提供了使用最少量的程式碼建立實用階段 (utility pass) 的功能。例如，如果不希望使用 “hello” 作為函式名稱的開頭，那麼可以使用一個實用階段來實現這個目的。

瞭解 LLVM opt 工具

從 opt 的手冊頁中可以看到，“opt 命令是模組化的 LLVM 優化器和分析器”。一旦您的程式碼支援定製階段，您將使用 opt 把程式碼編譯為一個共享庫並對其進行載入。如果您的 LLVM 安裝進展順利，那麼 opt 應該已經位於您的系統中。opt命令接受 LLVM IR（副檔名為 .ll）和 LLVM 位碼格式（副檔名為 .bc），可以生成 LLVM IR 或位碼格式的輸出。下面展示瞭如何使用 opt 載入您的定製共享庫：

tintin# opt –load=mycustom_pass.so –help –S

1	tintin# opt –load=mycustom_pass.so –help –S

還需注意，從命令列執行 opt –help 會生成一個 LLVM 將要執行的階段的細目清單。對 help 使用 load 選項將生成一條幫助訊息，其中包括有關定製階段的資訊。

建立定製的 LLVM 階段

您需要在 Pass.h 檔案中宣告 LLVM 階段，該檔案在我的系統中被安裝到 /usr/include/llvm 下。該檔案將各個階段的介面定義為 Pass 類的一部分。各個階段的型別都從 Pass 中派生，也在該檔案中進行了宣告。階段型別包括：

●BasicBlockPass 類。用於實現本地優化，優化通常每次針對一個基本塊或指令執行

●FunctionPass 類。用於全域性優化，每次執行一個功能

●ModulePass 類。用於執行任何非結構化的過程間優化

由於您打算建立一個階段，該階段拒絕任何以 “Hello ” 開頭的函式名，因此需要通過從 FunctionPass 派生來建立自己的階段。從 Pass.h 中複製

清單 1 中的程式碼。

清單 1. 覆蓋 FunctionPass 中的 runOnFunction 類

Class FunctionPass : public Pass {
 /// explicit FunctionPass(char &pid) : Pass(PT_Function, pid) {}
 /// runOnFunction - Virtual method overridden by subclasses to do the
 /// per-function processing of the pass.
 ///
 virtual bool runOnFunction(Function &F) = 0;
 /// …
 };

Class FunctionPass : public Pass {

/// explicit FunctionPass(char &pid) : Pass(PT_Function, pid) {}

/// runOnFunction - Virtual method overridden by subclasses to do the

/// per-function processing of the pass.

///

virtual bool runOnFunction(Function &F) = 0;

/// …

};

同樣，BasicBlockPass 類宣告瞭一個 runOnBasicBlock，而 ModulePass 類宣告瞭 runOnModule 純虛擬方法。子類需要為虛擬方法提供一個定義。

返回到清單 1 中的 runOnFunction 方法，您將看到輸出為型別 Function 的物件。深入鑽研 /usr/include/llvm/Function.h 檔案，就會很容易發現 LLVM 使用 Function 類封裝了一個 C/C++ 函式的功能。而 Function 派生自 Value.h 中定義的 Value 類，並支援 getName 方法。清單 2 顯示了程式碼。

清單 2. 建立一個定製 LLVM 階段

#include "llvm/Pass.h"
 #include "llvm/Function.h"
 class TestClass : public llvm::FunctionPass {
 public:
 virtual bool runOnFunction(llvm::Function &F)
 {
 if (F.getName().startswith("hello"))
 {
 std::cout << "Function name starts with hello\n";
 }
 return false;
 }
 };

#include "llvm/Pass.h"

#include "llvm/Function.h"

class TestClass : public llvm::FunctionPass {

public:

virtual bool runOnFunction(llvm::Function &F)

{

if (F.getName().startswith("hello"))

{

std::cout << "Function name starts with hello\n";

}

return false;

}

};

清單 2 中的程式碼遺漏了兩個重要的細節：

●FunctionPass 建構函式需要一個 char，用於在 LLVM 內部使用。LLVM 使用 char 的地址，因此您可以使用任何內容對它進行初始化。

●您需要通過某種方式讓 LLVM 系統理解您所建立的類是一個新階段。這正是 RegisterPass LLVM 模板發揮作用的地方。您在 PassSupport.h 標頭檔案中宣告瞭 RegisterPass 模板；該檔案包含在 Pass.h 中，因此無需額外的標頭。

清單 3 展示了完整的程式碼。

清單 3. 註冊 LLVM Function 階段

class TestClass : public llvm::FunctionPass
 {
 public:
 TestClass() : llvm::FunctionPass(TestClass::ID) { }
 virtual bool runOnFunction(llvm::Function &F) {
 if (F.getName().startswith("hello")) {
 std::cout << "Function name starts with hello\n";
 }
 return false;
 }
 static char ID; // could be a global too
 };
 char TestClass::ID = 'a';
 static llvm::RegisterPass global_("test_llvm", "test llvm", false, false);

class TestClass : public llvm::FunctionPass

{

public:

TestClass() : llvm::FunctionPass(TestClass::ID) { }

virtual bool runOnFunction(llvm::Function &F) {

if (F.getName().startswith("hello")) {

std::cout << "Function name starts with hello\n";

}

return false;

}

static char ID; // could be a global too

};

char TestClass::ID = 'a';

static llvm::RegisterPass global_("test_llvm", "test llvm", false, false);

RegisterPass 模板中的引數 template 是將要在命令列中與 opt 一起使用的階段的名稱。也就是說，您現在所需做的就是在清單 3 中的程式碼之外建立一個共享庫，然後執行 opt 來載入該庫，之後是使用 RegisterPass 註冊的命令的名稱（在本例中為 test_llvm），最後是一個位碼檔案，您的定製階段將在該檔案中與其他階段一起執行。清單 4 中概述了這些步驟。

清單 4. 執行定製階段

bash$ g++ -c pass.cpp -I/usr/local/include `llvm-config --cxxflags`
 bash$ g++ -shared -o pass.so pass.o -L/usr/local/lib `llvm-config --ldflags -libs`
 bash$ opt -load=./pass.so –test_llvm < test.bc

bash$ g++ -c pass.cpp -I/usr/local/include `llvm-config --cxxflags`

bash$ g++ -shared -o pass.so pass.o -L/usr/local/lib `llvm-config --ldflags -libs`

bash$ opt -load=./pass.so –test_llvm < test.bc

現在讓我們瞭解另一個工具（LLVM 後端的前端）：clang。

clang 簡介

LLVM 擁有自己的前端：名為 clang 的一種工具（恰如其分）。Clang 是一種功能強大的 C/C++/Objective-C 編譯器，其編譯速度可以媲美甚至超過 GNU Compiler Collection (GCC) 工具（參見參考資料中的連結，獲取更多資訊）。更重要的是，clang 擁有一個可修改的程式碼基，可以輕鬆實現定製擴充套件。與在使用 LLVM 框架建立一個工作編譯器，第 1 部分中對定製外掛使用 LLVM 後端 API 的方式非常類似，本文將對 LLVM 前端使用該 API 並開發一些小的應用程式來實現預處理和解析功能。

常見的 clang 類

您需要熟悉一些最常見的 clang 類：

●CompilerInstance

●Preprocessor

●FileManager

●SourceManager

●DiagnosticsEngine

●LangOptions

●TargetInfo

●ASTConsumer

●Sema

●ParseAST 也許是最重要的 clang 方法。

稍後將詳細介紹 ParseAST 方法。

要實現所有實用的用途，考慮使用適當的 CompilerInstance 編譯器。它提供了介面，管理對 AST 的訪問，對輸入源進行預處理，而且維護目標資訊。典型的應用程式需要建立 CompilerInstance 物件來完成有用的功能。清單 5 展示了 CompilerInstance.h 標頭檔案的大致內容。

清單 5. CompilerInstance 類

class CompilerInstance : public ModuleLoader {
 /// The options used in this compiler instance.
 llvm::IntrusiveRefCntPtr Invocation;
 /// The diagnostics engine instance.
 llvm::IntrusiveRefCntPtr Diagnostics;
 /// The target being compiled for.
 llvm::IntrusiveRefCntPtr Target;
 /// The file manager.
 llvm::IntrusiveRefCntPtr FileMgr;
 /// The source manager.
 llvm::IntrusiveRefCntPtr SourceMgr;
 /// The preprocessor.
 llvm::IntrusiveRefCntPtr PP;
 /// The AST context.
 llvm::IntrusiveRefCntPtr Context;
 /// The AST consumer.
 OwningPtr Consumer;
 /// \brief The semantic analysis object.
 OwningPtr TheSema;
 //… the list continues
 };

class CompilerInstance : public ModuleLoader {

/// The options used in this compiler instance.

llvm::IntrusiveRefCntPtr Invocation;

/// The diagnostics engine instance.

llvm::IntrusiveRefCntPtr Diagnostics;

/// The target being compiled for.

llvm::IntrusiveRefCntPtr Target;

/// The file manager.

llvm::IntrusiveRefCntPtr FileMgr;

/// The source manager.

llvm::IntrusiveRefCntPtr SourceMgr;

/// The preprocessor.

llvm::IntrusiveRefCntPtr PP;

/// The AST context.

llvm::IntrusiveRefCntPtr Context;

/// The AST consumer.

OwningPtr Consumer;

/// \brief The semantic analysis object.

OwningPtr TheSema;

//… the list continues

};

預處理 C 檔案

在 clang 中，至少可以使用兩種方法建立一個前處理器物件：

●直接例項化一個 Preprocessor 物件

●使用 CompilerInstance 類建立一個 Preprocessor 物件

讓我們首先使用後一種方法。

使用 Helper 和實用工具類實現預處理功能

單獨使用 Preprocessor 不會有太大的幫助：您需要 FileManager 和 SourceManager 類來讀取檔案並跟蹤源位置，實現故障診斷。FileManager 類支援檔案系統查詢、檔案系統快取和目錄搜尋。檢視 FileEntry 類，它為一個原始檔定義了 clang 抽象。清單 6 提供了 FileManager.h 標頭檔案的一個摘要。

清單 6. clang FileManager 類

class FileManager : public llvm::RefCountedBase{
 FileSystemOptions FileSystemOpts;
 /// \brief The virtual directories that we have allocated. For each
 /// virtual file (e.g. foo/bar/baz.cpp), we add all of its parent
 /// directories (foo/ and foo/bar/) here.
 SmallVector VirtualDirectoryEntries;
 /// \brief The virtual files that we have allocated.
 SmallVector VirtualFileEntries;
 /// NextFileUID - Each FileEntry we create is assigned a unique ID #.
 unsigned NextFileUID;
 // Statistics.
 unsigned NumDirLookups, NumFileLookups;
 unsigned NumDirCacheMisses, NumFileCacheMisses;
 // …
 // Caching.
 OwningPtr StatCache;

class FileManager : public llvm::RefCountedBase{

FileSystemOptions FileSystemOpts;

/// \brief The virtual directories that we have allocated. For each

/// virtual file (e.g. foo/bar/baz.cpp), we add all of its parent

/// directories (foo/ and foo/bar/) here.

SmallVector VirtualDirectoryEntries;

/// \brief The virtual files that we have allocated.

SmallVector VirtualFileEntries;

/// NextFileUID - Each FileEntry we create is assigned a unique ID #.

unsigned NextFileUID;

// Statistics.

unsigned NumDirLookups, NumFileLookups;

unsigned NumDirCacheMisses, NumFileCacheMisses;

// …

// Caching.

OwningPtr StatCache;

SourceManager 類通常用來查詢 SourceLocation 物件。在 SourceManager.h 標頭檔案中，清單 7 提供了有關 SourceLocation 物件的資訊。

清單 7. 理解 SourceLocation

/// There are three different types of locations in a file: a spelling
 /// location, an expansion location, and a presumed location.
 ///
 /// Given an example of:
 /// #define min(x, y) x < y ? x : y
 ///
 /// and then later on a use of min:
 /// #line 17
 /// return min(a, b);
 ///
 /// The expansion location is the line in the source code where the macro
 /// was expanded (the return statement), the spelling location is the
 /// location in the source where the macro was originally defined,
 /// and the presumed location is where the line directive states that
 /// the line is 17, or any other line.

/// There are three different types of locations in a file: a spelling

/// location, an expansion location, and a presumed location.

///

/// Given an example of:

/// #define min(x, y) x < y ? x : y

///

/// and then later on a use of min:

/// #line 17

/// return min(a, b);

///

/// The expansion location is the line in the source code where the macro

/// was expanded (the return statement), the spelling location is the

/// location in the source where the macro was originally defined,

/// and the presumed location is where the line directive states that

/// the line is 17, or any other line.

很明顯，SourceManager 取決於底層的 FileManager；事實上，SourceManager 類建構函式接受一個 FileManager 類作為輸入引數。最後，您需要跟蹤處理原始碼時可能出現的錯誤並進行報告。您可以使用 DiagnosticsEngine 類完成這項工作。和 Preprocessor 一樣，您有兩個選擇：

●獨立建立所有必需的物件

●使用 CompilerInstance 完成所有工作

讓我們使用後一種方法。清單 8 顯示了 Preprocessor 的程式碼；其他任何事情之前已經解釋過了。

清單 8. 使用 clang API 建立一個前處理器

using namespace clang;
 int main()
 {
 CompilerInstance ci;
 ci.createDiagnostics(0,NULL); // create DiagnosticsEngine
 ci.createFileManager(); // create FileManager
 ci.createSourceManager(ci.getFileManager()); // create SourceManager
 ci.createPreprocessor(); // create Preprocessor
 const FileEntry *pFile = ci.getFileManager().getFile("hello.c");
 ci.getSourceManager().createMainFileID(pFile);
 ci.getPreprocessor().EnterMainSourceFile();
 ci.getDiagnosticClient().BeginSourceFile(ci.getLangOpts(), &ci.getPreprocessor());
 Token tok;
 do {
 ci.getPreprocessor().Lex(tok);
 if( ci.getDiagnostics().hasErrorOccurred())
 break;
 ci.getPreprocessor().DumpToken(tok);
 std::cerr << std::endl;
 } while ( tok.isNot(clang::tok::eof));
 ci.getDiagnosticClient().EndSourceFile();
 }

using namespace clang;

int main()

{

CompilerInstance ci;

ci.createDiagnostics(0,NULL); // create DiagnosticsEngine

ci.createFileManager(); // create FileManager

ci.createSourceManager(ci.getFileManager()); // create SourceManager

ci.createPreprocessor(); // create Preprocessor

const FileEntry *pFile = ci.getFileManager().getFile("hello.c");

ci.getSourceManager().createMainFileID(pFile);

ci.getPreprocessor().EnterMainSourceFile();

ci.getDiagnosticClient().BeginSourceFile(ci.getLangOpts(), &ci.getPreprocessor());

Token tok;

do {

ci.getPreprocessor().Lex(tok);

if( ci.getDiagnostics().hasErrorOccurred())

break;

ci.getPreprocessor().DumpToken(tok);

std::cerr << std::endl;

} while ( tok.isNot(clang::tok::eof));

ci.getDiagnosticClient().EndSourceFile();

}

清單 8 使用 CompilerInstance 類依次建立 DiagnosticsEngine（ci.createDiagnostics 方法呼叫）和 FileManager（ci.createFileManager 和 ci.CreateSourceManager）。使用 FileEntry 完成檔案關聯後，繼續處理原始檔中的每個令牌，直到達到檔案的末尾 (EOF)。前處理器的 DumpToken 方法將把令牌轉儲到螢幕中。

要編譯並執行清單 8 中的程式碼，使用清單 9 中的 makefile（針對您的 clang 和 LLVM 安裝資料夾進行了相應調整）。主要想法是使用 llvm-config 工具提供任何必需的 LLVM（包含路徑和庫）：您永遠不應嘗試將這些連結傳遞到 g++ 命令列。

清單 9. 用於構建前處理器程式碼的 Makefile

CXX := g++
 RTTIFLAG := -fno-rtti
 CXXFLAGS := $(shell llvm-config --cxxflags) $(RTTIFLAG)
 LLVMLDFLAGS := $(shell llvm-config --ldflags --libs)
 DDD := $(shell echo $(LLVMLDFLAGS))
 SOURCES = main.cpp
 OBJECTS = $(SOURCES:.cpp=.o)
 EXES = $(OBJECTS:.o=)
 CLANGLIBS = \
 -L /usr/local/lib \
 -lclangFrontend \
 -lclangParse \
 -lclangSema \
 -lclangAnalysis \
 -lclangAST \
 -lclangLex \
 -lclangBasic \
 -lclangDriver \
 -lclangSerialization \
 -lLLVMMC \
 -lLLVMSupport \
 all: $(OBJECTS) $(EXES)
 %: %.o
 $(CXX) -o $@ $< $(CLANGLIBS) $(LLVMLDFLAGS)

CXX := g++

RTTIFLAG := -fno-rtti

CXXFLAGS := $(shell llvm-config --cxxflags) $(RTTIFLAG)

LLVMLDFLAGS := $(shell llvm-config --ldflags --libs)

DDD := $(shell echo $(LLVMLDFLAGS))

SOURCES = main.cpp

OBJECTS = $(SOURCES:.cpp=.o)

EXES = $(OBJECTS:.o=)

CLANGLIBS = \

-L /usr/local/lib \

-lclangFrontend \

-lclangParse \

-lclangSema \

-lclangAnalysis \

-lclangAST \

-lclangLex \

-lclangBasic \

-lclangDriver \

-lclangSerialization \

-lLLVMMC \

-lLLVMSupport \

all: $(OBJECTS) $(EXES)

%: %.o

$(CXX) -o $@ $< $(CLANGLIBS) $(LLVMLDFLAGS)

編譯並執行以上程式碼後，您應當獲得清單 10 中的輸出。

清單 10. 執行清單 7 中的程式碼時發生崩潰

Assertion failed: (Target && "Compiler instance has no target!"),
 function getTarget, file
 /Users/Arpan/llvm/tools/clang/lib/Frontend/../..
 /include/clang/Frontend/CompilerInstance.h,
 line 294.
 Abort trap: 6

Assertion failed: (Target && "Compiler instance has no target!"),

function getTarget, file

/Users/Arpan/llvm/tools/clang/lib/Frontend/../..

/include/clang/Frontend/CompilerInstance.h,

line 294.

Abort trap: 6

在這裡，您遺漏了 CompilerInstance 設定的最後一部分：即編譯程式碼所針對的目標平臺。這裡是 TargetInfo 和 TargetOptions 類發揮作用的地方。根據 clang 標頭 TargetInfo.h，TargetInfo 類儲存有關程式碼生成的目標系統的所需資訊，並且必須在編譯或預處理之前建立。和預期的一樣，TargetInfo 包含有關整數和浮動寬度、對齊等資訊。清單 11 提供了 TargetInfo.h 標頭檔案的摘要。

清單 11. Clang TargetInfo 類

class TargetInfo : public llvm::RefCountedBase{
 llvm::Triple Triple;
 protected:
 bool BigEndian;
 unsigned char PointerWidth, PointerAlign;
 unsigned char IntWidth, IntAlign;
 unsigned char HalfWidth, HalfAlign;
 unsigned char FloatWidth, FloatAlign;
 unsigned char DoubleWidth, DoubleAlign;
 unsigned char LongDoubleWidth, LongDoubleAlign;
 // …

class TargetInfo : public llvm::RefCountedBase{

llvm::Triple Triple;

protected:

bool BigEndian;

unsigned char PointerWidth, PointerAlign;

unsigned char IntWidth, IntAlign;

unsigned char HalfWidth, HalfAlign;

unsigned char FloatWidth, FloatAlign;

unsigned char DoubleWidth, DoubleAlign;

unsigned char LongDoubleWidth, LongDoubleAlign;

// …

TargetInfo 類使用兩個引數實現初始化：DiagnosticsEngine 和 TargetOptions。在這兩個引數中，對於當前平臺，後者必須將 Triple 字串設定為相應的值。LLVM 此時將發揮作用。清單 12 顯示了對清單 9 所附加的可以使前處理器工作的內容。

清單 12. 為編譯器設定目標選項

int main()
 {
 CompilerInstance ci;
 ci.createDiagnostics(0,NULL);
 // create TargetOptions
 TargetOptions to;
 to.Triple = llvm::sys::getDefaultTargetTriple();
 // create TargetInfo
 TargetInfo *pti = TargetInfo::CreateTargetInfo(ci.getDiagnostics(), to);
 ci.setTarget(pti);
 // rest of the code same as in Listing 9…
 ci.createFileManager();
 // …

int main()

{

CompilerInstance ci;

ci.createDiagnostics(0,NULL);

// create TargetOptions

TargetOptions to;

to.Triple = llvm::sys::getDefaultTargetTriple();

// create TargetInfo

TargetInfo *pti = TargetInfo::CreateTargetInfo(ci.getDiagnostics(), to);

ci.setTarget(pti);

// rest of the code same as in Listing 9…

ci.createFileManager();

// …

就這麼簡單。執行程式碼並觀察簡單的 hello.c 測試的輸出：

#include
 int main() { printf("hello world!\n"); }

1 2	#include int main() { printf("hello world!\n"); }

清單 13 展示了部分前處理器輸出。

清單 13. 前處理器輸出（部分）

typedef 'typedef'
 struct 'struct'
 identifier '__va_list_tag'
 l_brace '{'
 unsigned 'unsigned'
 identifier 'gp_offset'
 semi ';'
 unsigned 'unsigned'
 identifier 'fp_offset'
 semi ';'
 void 'void'
 star '*'
 identifier 'overflow_arg_area'
 semi ';'
 void 'void'
 star '*'
 identifier 'reg_save_area'
 semi ';'
 r_brace '}'
 identifier '__va_list_tag'
 semi ';'
identifier '__va_list_tag'
 identifier '__builtin_va_list'
 l_square '['
 numeric_constant '1'
 r_square ']'
 semi ';'

typedef 'typedef'

struct 'struct'

identifier '__va_list_tag'

l_brace '{'

unsigned 'unsigned'

identifier 'gp_offset'

semi ';'

unsigned 'unsigned'

identifier 'fp_offset'

semi ';'

void 'void'

star '*'

identifier 'overflow_arg_area'

semi ';'

void 'void'

star '*'

identifier 'reg_save_area'

semi ';'

r_brace '}'

identifier '__va_list_tag'

semi ';'

identifier '__va_list_tag'

identifier '__builtin_va_list'

l_square '['

numeric_constant '1'

r_square ']'

semi ';'

手動建立一個 Preprocessor 物件

clang 庫的其中一個優點，就是您可以通過多種方法實現相同的效果。在本節中，您將建立一個 Preprocessor 物件，但是不需要直接向 CompilerInstance 發出請求。從 Preprocessor.h 標頭檔案中，清單 14 顯示了 Preprocessor 的建構函式。

清單 14. 構造一個 Preprocessor 物件

Preprocessor(DiagnosticsEngine &diags, LangOptions &opts,
 const TargetInfo *target,
 SourceManager &SM, HeaderSearch &Headers,
 ModuleLoader &TheModuleLoader,
 IdentifierInfoLookup *IILookup = 0,
 bool OwnsHeaderSearch = false,
 bool DelayInitialization = false);

Preprocessor(DiagnosticsEngine &diags, LangOptions &opts,

const TargetInfo *target,

SourceManager &SM, HeaderSearch &Headers,

ModuleLoader &TheModuleLoader,

IdentifierInfoLookup *IILookup = 0,

bool OwnsHeaderSearch = false,

bool DelayInitialization = false);

檢視該建構函式，顯然，要想讓這個前處理器工作，您還需要建立 6 個不同的物件。您已經瞭解了 DiagnosticsEngine、TargetInfo 和 SourceManager。CompilerInstance 派生自 ModuleLoader。因此您必須建立兩個新的物件，一個用於 LangOptions，另一個用於 HeaderSearch。LangOptions 類使您編譯一組 C/C++ 方言，包括 C99、C11 和 C++0x。參考 LangOptions.h 和 LangOptions.def 標頭，獲取更多資訊。最後，HeaderSearch 類儲存目錄的 std::vector，用於在其他物件中搜尋功能。清單 15 顯示了 Preprocessor 的程式碼。

清單 15. 手動建立的前處理器

using namespace clang;
 int main() {
 DiagnosticOptions diagnosticOptions;
 TextDiagnosticPrinter *printer =
 new TextDiagnosticPrinter(llvm::outs(), diagnosticOptions);
 llvm::IntrusiveRefCntPtr diagIDs;
 DiagnosticsEngine diagnostics(diagIDs, printer);
 LangOptions langOpts;
 clang::TargetOptions to;
 to.Triple = llvm::sys::getDefaultTargetTriple();
 TargetInfo *pti = TargetInfo::CreateTargetInfo(diagnostics, to);
 FileSystemOptions fsopts;
 FileManager fileManager(fsopts);
 SourceManager sourceManager(diagnostics, fileManager);
 HeaderSearch headerSearch(fileManager, diagnostics, langOpts, pti);
 CompilerInstance ci;
 Preprocessor preprocessor(diagnostics, langOpts, pti,
 sourceManager, headerSearch, ci);
 const FileEntry *pFile = fileManager.getFile("test.c");
 sourceManager.createMainFileID(pFile);
 preprocessor.EnterMainSourceFile();
 printer->BeginSourceFile(langOpts, &preprocessor);
 // … similar to Listing 8 here on
 }

using namespace clang;

int main() {

DiagnosticOptions diagnosticOptions;

TextDiagnosticPrinter *printer =

new TextDiagnosticPrinter(llvm::outs(), diagnosticOptions);

llvm::IntrusiveRefCntPtr diagIDs;

DiagnosticsEngine diagnostics(diagIDs, printer);

LangOptions langOpts;

clang::TargetOptions to;

to.Triple = llvm::sys::getDefaultTargetTriple();

TargetInfo *pti = TargetInfo::CreateTargetInfo(diagnostics, to);

FileSystemOptions fsopts;

FileManager fileManager(fsopts);

SourceManager sourceManager(diagnostics, fileManager);

HeaderSearch headerSearch(fileManager, diagnostics, langOpts, pti);

CompilerInstance ci;

Preprocessor preprocessor(diagnostics, langOpts, pti,

sourceManager, headerSearch, ci);

const FileEntry *pFile = fileManager.getFile("test.c");

sourceManager.createMainFileID(pFile);

preprocessor.EnterMainSourceFile();

printer->BeginSourceFile(langOpts, &preprocessor);

// … similar to Listing 8 here on

}

對於清單 15 中的程式碼，需要注意以下幾點：

●您沒有初始化 HeaderSearch 並使它指向任何特定的目錄。但是您應當這樣做。

●clang API 要求在堆 (heap) 上分配 TextDiagnosticPrinter。在棧 (stack) 上分配會引起崩潰。
您還不能處理掉 CompilerInstance。總之是因為您正在使用 CompilerInstance，那麼為什麼還要費心去手動建立它而不是更舒適地使用 clang API 呢？

●語言選擇：C++

您目前為止一直使用的是 C 測試程式碼：那麼使用一些 C++ 程式碼如何？向清單 15 中的程式碼新增 langOpts.CPlusPlus = 1;，然後嘗試使用清單 16 中的測試程式碼。

清單 16. 對前處理器使用 C++ 測試程式碼

template
 struct s {
 T array[n];
 };
 int main() {
 s var;
 }

template

struct s {

T array[n];

};

int main() {

s var;

}

清單 17 展示了程式的部分輸出。

清單 17. 清單 16 中程式碼的部分前處理器輸出

identifier 'template'
 less ' struct 'struct'
 identifier 's'
 l_brace '{'
 identifier 'T'
 identifier 'array'
 l_square '['
 identifier 'n'
 r_square ']'
 semi ';'
 r_brace '}'
 semi ';'
 int 'int'
 identifier 'main'
 l_paren '('
 r_paren ')'

identifier 'template'

less ' struct 'struct'

identifier 's'

l_brace '{'

identifier 'T'

identifier 'array'

l_square '['

identifier 'n'

r_square ']'

semi ';'

r_brace '}'

semi ';'

int 'int'

identifier 'main'

l_paren '('

r_paren ')'

建立一個解析樹

clang/Parse/ParseAST.h 中定義的 ParseAST 方法是 clang 提供的重要方法之一。以下是從 ParseAST.h 複製的一個例程宣告：

void ParseAST(Preprocessor &pp, ASTConsumer *C,
 ASTContext &Ctx, bool PrintStats = false,
 TranslationUnitKind TUKind = TU_Complete,
 CodeCompleteConsumer *CompletionConsumer = 0);

void ParseAST(Preprocessor &pp, ASTConsumer *C,

ASTContext &Ctx, bool PrintStats = false,

TranslationUnitKind TUKind = TU_Complete,

CodeCompleteConsumer *CompletionConsumer = 0);

ASTConsumer 為您提供了一個抽象介面，可以從該介面進行派生。這樣做非常合適，因為不同的客戶端很可能通過不同的方式轉儲或處理 AST。您的客戶端程式碼將派生自 ASTConsumer。ASTContext 類儲存有關型別宣告的資訊和其他資訊。最簡單的嘗試就是使用 clang ASTConsumer API 在您的程式碼中輸出一個全域性變數列表。許多技術公司就全域性變數在 C++ 程式碼中的使用有非常嚴格的要求，這應當作為建立定製 lint 工具的出發點。清單 18 中提供了定製 consumer 的程式碼。

清單 18. 定製 AST consumer 類

class CustomASTConsumer : public ASTConsumer {
 public:
 CustomASTConsumer () : ASTConsumer() { }
 virtual ~ CustomASTConsumer () { }
 virtual bool HandleTopLevelDecl(DeclGroupRef decls)
 {
 clang::DeclGroupRef::iterator it;
 for( it = decls.begin(); it != decls.end(); it++)
 {
 clang::VarDecl *vd = llvm::dyn_cast(*it);
 if(vd)
 std::cout << vd->getDeclName().getAsString() << std::endl;;
 }
 return true;
 }
 };

class CustomASTConsumer : public ASTConsumer {

public:

CustomASTConsumer () : ASTConsumer() { }

virtual ~ CustomASTConsumer () { }

virtual bool HandleTopLevelDecl(DeclGroupRef decls)

{

clang::DeclGroupRef::iterator it;

for( it = decls.begin(); it != decls.end(); it++)

{

clang::VarDecl *vd = llvm::dyn_cast(*it);

if(vd)

std::cout << vd->getDeclName().getAsString() << std::endl;;

}

return true;

}

};

您將使用自己的版本覆蓋 HandleTopLevelDecl 方法（最初在 ASTConsumer 中提供）。Clang 將全域性變數列表傳遞給您；您對該列表進行迭代並輸出變數名稱。清單 19 摘錄自 ASTConsumer.h，顯示了客戶端 consumer 程式碼可以覆蓋的一些其他方法。

清單 19. 其他一些可以在客戶端程式碼中覆蓋的方法

/// HandleInterestingDecl - Handle the specified interesting declaration. This
 /// is called by the AST reader when deserializing things that might interest
 /// the consumer. The default implementation forwards to HandleTopLevelDecl.
 virtual void HandleInterestingDecl(DeclGroupRef D);
/// HandleTranslationUnit - This method is called when the ASTs for entire
 /// translation unit have been parsed.
 virtual void HandleTranslationUnit(ASTContext &Ctx) {}
/// HandleTagDeclDefinition - This callback is invoked each time a TagDecl
 /// (e.g. struct, union, enum, class) is completed. This allows the client to
 /// hack on the type, which can occur at any point in the file (because these
 /// can be defined in declspecs).
 virtual void HandleTagDeclDefinition(TagDecl *D) {}
/// Note that at this point it does not have a body, its body is
 /// instantiated at the end of the translation unit and passed to
 /// HandleTopLevelDecl.
 virtual void HandleCXXImplicitFunctionInstantiation(FunctionDecl *D) {}

/// HandleInterestingDecl - Handle the specified interesting declaration. This

/// is called by the AST reader when deserializing things that might interest

/// the consumer. The default implementation forwards to HandleTopLevelDecl.

virtual void HandleInterestingDecl(DeclGroupRef D);

/// HandleTranslationUnit - This method is called when the ASTs for entire

/// translation unit have been parsed.

virtual void HandleTranslationUnit(ASTContext &Ctx) {}

/// HandleTagDeclDefinition - This callback is invoked each time a TagDecl

/// (e.g. struct, union, enum, class) is completed. This allows the client to

/// hack on the type, which can occur at any point in the file (because these

/// can be defined in declspecs).

virtual void HandleTagDeclDefinition(TagDecl *D) {}

/// Note that at this point it does not have a body, its body is

/// instantiated at the end of the translation unit and passed to

/// HandleTopLevelDecl.

virtual void HandleCXXImplicitFunctionInstantiation(FunctionDecl *D) {}

最後，清單 20 顯示了您開發的定製 AST consumer 類的實際客戶端程式碼。

清單 20. 使用定製 AST consumer 的客戶端程式碼

int main() {
 CompilerInstance ci;
 ci.createDiagnostics(0,NULL);
 TargetOptions to;
 to.Triple = llvm::sys::getDefaultTargetTriple();
 TargetInfo *tin = TargetInfo::CreateTargetInfo(ci.getDiagnostics(), to);
 ci.setTarget(tin);
 ci.createFileManager();
 ci.createSourceManager(ci.getFileManager());
 ci.createPreprocessor();
 ci.createASTContext();
 CustomASTConsumer *astConsumer = new CustomASTConsumer ();
 ci.setASTConsumer(astConsumer);
 const FileEntry *file = ci.getFileManager().getFile("hello.c");
 ci.getSourceManager().createMainFileID(file);
 ci.getDiagnosticClient().BeginSourceFile(
 ci.getLangOpts(), &ci.getPreprocessor());
 clang::ParseAST(ci.getPreprocessor(), astConsumer, ci.getASTContext());
 ci.getDiagnosticClient().EndSourceFile();
 return 0;
 }

int main() {

CompilerInstance ci;

ci.createDiagnostics(0,NULL);

TargetOptions to;

to.Triple = llvm::sys::getDefaultTargetTriple();

TargetInfo *tin = TargetInfo::CreateTargetInfo(ci.getDiagnostics(), to);

ci.setTarget(tin);

ci.createFileManager();

ci.createSourceManager(ci.getFileManager());

ci.createPreprocessor();

ci.createASTContext();

CustomASTConsumer *astConsumer = new CustomASTConsumer ();

ci.setASTConsumer(astConsumer);

const FileEntry *file = ci.getFileManager().getFile("hello.c");

ci.getSourceManager().createMainFileID(file);

ci.getDiagnosticClient().BeginSourceFile(

ci.getLangOpts(), &ci.getPreprocessor());

clang::ParseAST(ci.getPreprocessor(), astConsumer, ci.getASTContext());

ci.getDiagnosticClient().EndSourceFile();

return 0;

}

結束語

這篇兩部分的系列文章涵蓋了大量內容：它探討了 LLVM IR，提供了通過手動建立和 LLVM API 生成 IR 的方法，展示瞭如何為 LLVM 後端建立一個定製外掛，以及解釋了 LLVM 前端及其豐富的標頭集。您還了解了如何使用該前端進行預處理和使用 AST。在計算史上，建立一個編譯器並進行擴充套件，特別是針對 C++ 等複雜的語言，看上去是個非常複雜的過程，但是有了 LLVM，一切都變得非常簡單。文件工作是 LLVM 和 clang 需要繼續加強的部分，但是在此之前，我建議嘗試 VIM/doxygen 來瀏覽這些標頭。祝您使用愉快！

(Xcode) 編譯器小白筆記 – LLVM前端Clang
2019-03-03
XCode編譯筆記LVM前端
(Xcode) 編譯器小白筆記 - LLVM前端Clang
2018-11-19
XCode編譯筆記LVM前端
LLVM編譯器中的內建(built-in)函式
2019-05-05
LVM編譯UI函式
LLVM之父分享發明編譯器的經驗 - hpcwire
2021-12-28
LVM編譯
Go 編譯器內部知識：向 Go 新增新語句-第 2 部分
2020-08-18
Go編譯
VS設定 LLVM-Clang 編譯器進行編譯C++專案
2024-08-07
LVM編譯C++
利用LLVM實現JS的編譯器，創造屬於自己的語言
2018-08-31
LVMJS編譯
LLVM之父談當年為何發明一個新的編譯器？ - hpcwire
2021-12-28
LVM編譯
【譯】使用Kotlin和RxJava測試MVP架構的完整示例 – 第2部分
2018-11-22
KotlinRxJavaMVP架構
[Flutter翻譯]使用Flutter WEB實現桌面GUI（第2部分：Dock）
2021-05-26
FlutterWebGUI
【譯】什麼是SOLID原則（第2部分）
2018-12-22
Solid
[譯] 理解編譯器 —— 從人類的角度（版本 2）
2018-12-12
編譯
Solidity語言學習筆記————2、使用編譯器
2018-06-22
Solid筆記編譯
[譯] 使用 React, Redux, and SVG 開發遊戲 - 第 3 部分
2018-05-09
ReactReduxSVG開發遊戲
重新編譯python(親證有效)
2024-03-13
編譯Python
Rust 編譯器探索使用 PGO
2021-03-29
Rust編譯Go
【技術分享】幾維安全CTO劉柏江：IoT時代LLVM編譯器防護的藝術
2018-09-12
LVM編譯
[譯]使用 Rust 開發一個簡單的 Web 應用，第 1 部分
2019-03-04
RustWeb
hadoop編譯—+2.x編譯
2019-01-19
Hadoop編譯
富文字編譯器UEditor+SSM的使用
2018-12-19
編譯SSM
第19篇 Protocol Buffers 編譯器生成proto檔案
2024-09-09
Protocol編譯
執行時框架，編譯時框架
2024-11-12
框架編譯
編譯器的自展和自舉、交叉編譯
2020-12-24
編譯
淺談彙編器、編譯器和直譯器
2019-06-26
編譯
手寫 Vue2 系列之編譯器
2022-03-16
Vue編譯
前端工具 | JS編譯器Monaco使用教程
2021-06-08
前端JS編譯
ZOMI的AI編譯原理2
2024-04-10
AI編譯原理
[譯] 使用 NodeJS 建立一個 GraphQL 伺服器
2018-11-30
NodeJS伺服器
《Divinuet》的互動音樂系統 – 第 2 部分
2021-03-23
TDengine 3.0 中如何編譯、建立和使用自定義函式
2022-10-13
編譯函式
vue編譯器
2024-04-11
Vue編譯
CUDAFORTRAN編譯器
2018-03-06
編譯
[譯] Transducers: JavaScript 中高效的資料處理 Pipeline（第 18 部分）
2019-01-07
JavaScript
Go編譯器簡介【譯】
2019-02-16
Go編譯
使用makefile編譯
2024-04-23
編譯
【譯】使用javascript建立圖
2019-05-14
JavaScript
【譯】使用 JavaScript 建立圖
2019-05-14
JavaScript
如何使用 vue + typescript 編寫頁面 ( vuex裝飾器部分 )
2019-02-16
VueTypeScript
程式碼線上編譯器（上）- 編輯及編譯
2018-10-30
編譯

使用 LLVM 框架建立有效的編譯器，第 2 部分

相關文章