手把手教你做一個 C 語言編譯器（6）：函式定義

LotAbout發表於2016-01-21

由於語法分析本身比較複雜，所以我們將它拆分成 3 個部分進行講解，分別是：變數定義、函式定義、表示式。本章講解函式定義相關的內容。

本系列：

EBNF 表示

這是上一章的 EBNF 方法中與函式定義相關的內容。

variable_decl ::= type {'*'} id { ',' {'*'} id } ';'

function_decl ::= type {'*'} id '(' parameter_decl ')' '{' body_decl '}'

parameter_decl ::= type {'*'} id {',' type {'*'} id}

body_decl ::= {variable_decl}, {statement}

statement ::= non_empty_statement | empty_statement

non_empty_statement ::= if_statement | while_statement | '{' statement '}'
                     | 'return' expression | expression ';'

if_statement ::= 'if' '(' expression ')' statement ['else' non_empty_statement]

while_statement ::= 'while' '(' expression ')' non_empty_statement

variable_decl ::= type {'*'} id { ',' {'*'} id } ';'

function_decl ::= type {'*'} id '(' parameter_decl ')' '{' body_decl '}'

parameter_decl ::= type {'*'} id {',' type {'*'} id}

body_decl ::= {variable_decl}, {statement}

statement ::= non_empty_statement | empty_statement

non_empty_statement ::= if_statement | while_statement | '{' statement '}'

| 'return' expression | expression ';'

if_statement ::= 'if' '(' expression ')' statement ['else' non_empty_statement]

while_statement ::= 'while' '(' expression ')' non_empty_statement

解析函式的定義

上一章的程式碼中，我們已經知道了什麼時候開始解析函式的定義，相關的程式碼如下：

...
if (token == '(') {
    current_id[Class] = Fun;
    current_id[Value] = (int)(text + 1); // the memory address of function
    function_declaration();
} else {
...

...

if (token == '(') {

current_id[Class] = Fun;

current_id[Value] = (int)(text + 1); // the memory address of function

function_declaration();

} else {

...

即在這斷程式碼之前，我們已經為當前的識別符號（identifier）設定了正確的型別，上面這斷程式碼為當前的識別符號設定了正確的類別（Fun），以及該函式在程式碼段（text segment）中的位置。接下來開始解析函式定義相關的內容：parameter_decl 及 body_decl。

函式引數與彙編程式碼

現在我們要回憶如何將“函式”轉換成對應的彙編程式碼，因為這決定了在解析時我們需要哪些相關的資訊。考慮下列函式：

int demo(int param_a, int *param_b) {
    int local_1;
    char local_2;

    ...
}

int demo(int param_a, int *param_b) {

int local_1;

char local_2;

...

}

那麼它應該被轉換成什麼樣的彙編程式碼呢？在思考這個問題之前，我們需要了解當 demo函式被呼叫時，計算機的棧的狀態，如下（參照第三章講解的虛擬機器）：

|    ....       | high address
+---------------+
| arg: param_a  |    new_bp + 3
+---------------+
| arg: param_b  |    new_bp + 2
+---------------+
|return address |    new_bp + 1
+---------------+
| old BP        | <- new BP
+---------------+
| local_1       |    new_bp - 1
+---------------+
| local_2       |    new_bp - 2
+---------------+
|    ....       |  low address

| .... | high address

+---------------+

| arg: param_a | new_bp + 3

+---------------+

| arg: param_b | new_bp + 2

+---------------+

|return address | new_bp + 1

+---------------+

| old BP | <- new BP

+---------------+

| local_1 | new_bp - 1

+---------------+

| local_2 | new_bp - 2

+---------------+

| .... | low address

這裡最為重要的一點是，無論是函式的引數（如 param_a）還是函式的區域性變數（如local_1）都是存放在計算機的棧上的。因此，與存放在 資料段 中的全域性變數不同，在函式內訪問它們是通過 new_bp 指標和對應的位移量進行的。因此，在解析的過程中，我們需要知道引數的個數，各個引數的位移量。

函式定義的解析

這相當於是整個函式定義的語法解析的框架，程式碼如下：

void function_declaration() {
    // type func_name (...) {...}
    //               | this part

    match('(');
    function_parameter();
    match(')');
    match('{');
    function_body();
    //match('}');                 //  ①

    // ②
    // unwind local variable declarations for all local variables.
    current_id = symbols;
    while (current_id[Token]) {
        if (current_id[Class] == Loc) {
            current_id[Class] = current_id[BClass];
            current_id[Type]  = current_id[BType];
            current_id[Value] = current_id[BValue];
        }
        current_id = current_id + IdSize;
    }
}

void function_declaration() {

// type func_name (...) {...}

// | this part

match('(');

function_parameter();

match(')');

match('{');

function_body();

//match('}'); // ①

// ②

// unwind local variable declarations for all local variables.

current_id = symbols;

while (current_id[Token]) {

if (current_id[Class] == Loc) {

current_id[Class] = current_id[BClass];

current_id[Type] = current_id[BType];

current_id[Value] = current_id[BValue];

}

current_id = current_id + IdSize;

}

其中①中我們沒有消耗最後的}字元。這麼做的原因是：variable_decl 與 function_decl是放在一起解析的，而 variable_decl 是以字元 ; 結束的。而 function_decl 是以字元 }結束的，若在此通過 match 消耗了 ‘;’ 字元，那麼外層的 while 迴圈就沒法準確地知道函式定義已經結束。所以我們將結束符的解析放在了外層的 while 迴圈中。

而②中的程式碼是用於將符號表中的資訊恢復成全域性的資訊。這是因為，區域性變數是可以和全域性變數同名的，一旦同名，在函式體內區域性變數就會覆蓋全域性變數，出了函式體，全域性變數就恢復了原先的作用。這段程式碼線性地遍歷所有識別符號，並將儲存在 BXXX 中的資訊還原。

解析引數

parameter_decl ::= type {'*'} id {',' type {'*'} id}

1	parameter_decl ::= type {''} id {',' type {''} id}

解析函式的引數就是解析以逗號分隔的一個個識別符號，同時記錄它們的位置與型別。

int index_of_bp; // index of bp pointer on stack

void function_parameter() {
    int type;
    int params;
    params = 0;
    while (token != ')') {
        // ①

        // int name, ...
        type = INT;
        if (token == Int) {
            match(Int);
        } else if (token == Char) {
            type = CHAR;
            match(Char);
        }

        // pointer type
        while (token == Mul) {
            match(Mul);
            type = type + PTR;
        }

        // parameter name
        if (token != Id) {
            printf("%d: bad parameter declaration\n", line);
            exit(-1);
        }
        if (current_id[Class] == Loc) {
            printf("%d: duplicate parameter declaration\n", line);
            exit(-1);
        }

        match(Id);

        //②
        // store the local variable
        current_id[BClass] = current_id[Class]; current_id[Class]  = Loc;
        current_id[BType]  = current_id[Type];  current_id[Type]   = type;
        current_id[BValue] = current_id[Value]; current_id[Value]  = params++;   // index of current parameter

        if (token == ',') {
            match(',');
        }
    }

    // ③
    index_of_bp = params+1;
}

int index_of_bp; // index of bp pointer on stack

void function_parameter() {

int type;

int params;

params = 0;

while (token != ')') {

// ①

// int name, ...

type = INT;

if (token == Int) {

match(Int);

} else if (token == Char) {

type = CHAR;

match(Char);

}

// pointer type

while (token == Mul) {

match(Mul);

type = type + PTR;

}

// parameter name

if (token != Id) {

printf("%d: bad parameter declaration\n", line);

exit(-1);

}

if (current_id[Class] == Loc) {

printf("%d: duplicate parameter declaration\n", line);

exit(-1);

}

match(Id);

//②

// store the local variable

current_id[BClass] = current_id[Class]; current_id[Class] = Loc;

current_id[BType] = current_id[Type]; current_id[Type] = type;

current_id[BValue] = current_id[Value]; current_id[Value] = params++; // index of current parameter

if (token == ',') {

match(',');

}

// ③

index_of_bp = params+1;

}

其中①與全域性變數定義的解析十分一樣，用於解析該引數的型別。

而②則與上節中提到的“區域性變數覆蓋全域性變數”相關，先將全域性變數的資訊儲存（無論是是否真的在全域性中用到了這個變數）在 BXXX 中，再賦上區域性變數相關的資訊，如 Value 中存放的是引數的位置（是第幾個引數）。

③則與彙編程式碼的生成有關，index_of_bp 就是前文提到的 new_bp 的位置。

函式體的解析

我們實現的 C 語言與現代的 C 語言不太一致，我們需要所有的變數定義出現在所有的語句之前。函式體的程式碼如下：

void function_body() {
    // type func_name (...) {...}
    //                   -->|   |<--

    // ... {
    // 1. local declarations
    // 2. statements
    // }

    int pos_local; // position of local variables on the stack.
    int type;
    pos_local = index_of_bp;

    // ①
    while (token == Int || token == Char) {
        // local variable declaration, just like global ones.
        basetype = (token == Int) ? INT : CHAR;
        match(token);

        while (token != ';') {
            type = basetype;
            while (token == Mul) {
                match(Mul);
                type = type + PTR;
            }

            if (token != Id) {
                // invalid declaration
                printf("%d: bad local declaration\n", line);
                exit(-1);
            }
            if (current_id[Class]) {
                // identifier exists
                printf("%d: duplicate local declaration\n", line);
                exit(-1);
            }
            match(Id);

            // store the local variable
            current_id[BClass] = current_id[Class]; current_id[Class]  = Loc;
            current_id[BType]  = current_id[Type];  current_id[Type]   = type;
            current_id[BValue] = current_id[Value]; current_id[Value]  = ++pos_local;   // index of current parameter

            if (token == ',') {
                match(',');
            }
        }
        match(';');
    }

    // ②
    // save the stack size for local variables
    *++text = ENT;
    *++text = pos_local - index_of_bp;

    // statements
    while (token != '}') {
        statement();
    }

    // emit code for leaving the sub function
    *++text = LEV;
}

void function_body() {

// type func_name (...) {...}

// -->| |<--

// ... {

// 1. local declarations

// 2. statements

// }

int pos_local; // position of local variables on the stack.

int type;

pos_local = index_of_bp;

// ①

while (token == Int || token == Char) {

// local variable declaration, just like global ones.

basetype = (token == Int) ? INT : CHAR;

match(token);

while (token != ';') {

type = basetype;

while (token == Mul) {

match(Mul);

type = type + PTR;

}

if (token != Id) {

// invalid declaration

printf("%d: bad local declaration\n", line);

exit(-1);

}

if (current_id[Class]) {

// identifier exists

printf("%d: duplicate local declaration\n", line);

exit(-1);

}

match(Id);

// store the local variable

current_id[BClass] = current_id[Class]; current_id[Class] = Loc;

current_id[BType] = current_id[Type]; current_id[Type] = type;

current_id[BValue] = current_id[Value]; current_id[Value] = ++pos_local; // index of current parameter

if (token == ',') {

match(',');

}

match(';');

}

// ②

// save the stack size for local variables

*++text = ENT;

*++text = pos_local - index_of_bp;

// statements

while (token != '}') {

statement();

}

// emit code for leaving the sub function

*++text = LEV;

}

其中①用於解析函式體內的區域性變數的定義，程式碼與全域性的變數定義幾乎一樣。

而②則用於生成彙編程式碼，我們在第三章的虛擬機器中提到過，我們需要在棧上為區域性變數預留空間，這兩行程式碼起的就是這個作用。

程式碼

本章的程式碼可以在 Github 上下載，也可以直接 clone

git clone -b step-4 https://github.com/lotabout/write-a-C-interpreter

1	git clone -b step-4 https://github.com/lotabout/write-a-C-interpreter

本章的程式碼依舊無法執行，還有兩個重要函式沒有完成：statement 及 expression，感興趣的話可以嘗試自己實現它們。

小結

本章中我們用了不多的程式碼完成了函式定義的解析。大部分的程式碼依舊是用於解析變數：引數和區域性變數，而它們的邏輯和全域性變數的解析幾乎一致，最大的區別就是儲存的資訊不同。

當然，要理解函式定義的解析過程，最重要的是理解我們會為函式生成怎樣的彙編程式碼，因為這決定了我們需要從解析中獲取什麼樣的資訊（例如引數的位置，個數等），而這些可能需要你重新回顧一下“虛擬機器”這一章，或是重新學習學習彙編相關的知識。

下一章中我們將講解最複雜的表示式的解析，同時也是整個編譯器最後的部分，敬請期待。

打賞支援我寫出更多好文章，謝謝！
打賞作者

打賞支援我寫出更多好文章，謝謝！

任選一種支付方式

手把手教你做一個 C 語言編譯器（6）：函式定義

手把手教你做一個 C 語言編譯器（7）：語句
2016-01-21
編譯
手把手教你做一個 C 語言編譯器（0）：前言
2016-01-18
編譯
手把手教你做一個 C 語言編譯器（9）：總結
2016-01-22
編譯
手把手教你做一個 C 語言編譯器（2）：虛擬機器
2016-01-19
編譯虛擬機
c語言函式指標的定義
2018-09-10
C語言函式指標
第一個C語言編譯器是怎樣編寫的？
2015-11-30
C語言編譯
第一個 C 語言編譯器是怎樣編寫的？
2015-11-10
編譯
C語言基礎-2、函式的定義與使用
2024-03-10
C語言函式
C語言函式手冊：c語言庫函式大全|C語言標準函式庫|c語言常用函式查詢
2014-11-08
C語言函式
C語言編譯器手機版
2020-12-08
C語言編譯
C語言編譯工具
2014-01-07
C語言編譯
java開發C語言編譯器：為C語言提供API呼叫
2017-01-10
JavaC語言編譯API
C語言（巨集定義）
2017-12-26
C語言
C語言setgroups()函式：設定組程式碼函式
2016-07-18
C語言函式
用 golang 寫一個語言（編譯器，虛擬機器）
2020-05-08
Golang編譯虛擬機
源語言、目標語言、翻譯器、編譯器、直譯器
2019-05-07
編譯
C語言 - 條件編譯
2024-08-31
C語言編譯
java編譯、編碼、語言設定
2015-11-02
Java編譯
C語言 execve()函式
2022-01-09
C語言函式
C語言常用函式
2020-11-08
C語言函式
C語言的函式
2024-06-28
C語言函式
一個C++巨集定義與列舉定義重複的編譯錯誤
2013-09-07
C++編譯
C語言編譯器開發之旅（一）：詞法分析掃描器
2021-06-04
C語言編譯詞法分析
微控制器-C語言-定義和申明
2018-09-08
C語言
[翻譯] JavaScript函式的6個基本術語
2019-04-09
JavaScript函式
C語言編譯器開發之旅（二）：解析器
2021-06-09
C語言編譯
Python如何定義一個函式
2021-09-11
Python函式
mysql函式定義語法
2017-11-30
MySql函式
C語言 itoa函式及atoi函式
2010-12-09
C語言函式
學習較底層程式設計：動手寫一個C語言編譯器
2013-07-19
程式設計C語言編譯
C語言_來了解一下GCC編譯器編譯C可執行指令碼的過程
2017-12-27
C語言GC編譯指令碼
C語言學習第18篇---巨集定義與使用 / 條件編譯使用分析
2018-05-24
C語言編譯
c語言多檔案編譯
2018-05-15
C語言編譯
C語言編譯全過程
2010-10-19
C語言編譯
C語言函式呼叫棧
2022-05-14
C語言函式
詳解C語言函式
2017-09-23
C語言函式
tmpnam() - C語言庫函式
2016-07-17
C語言函式
tmpfile() - C語言庫函式
2016-07-17
C語言函式

手把手教你做一個 C 語言編譯器（6）：函式定義

EBNF 表示

解析函式的定義

函式引數與彙編程式碼

函式定義的解析

解析引數

函式體的解析

程式碼

小結

打賞支援我寫出更多好文章，謝謝！

相關文章