C語言編譯器開發之旅（一）：詞法分析掃描器

毅澤發表於2021-06-04

原文網址 : https://www.cnblogs.com/adamson-shaw/p/14847714.html

本節我們先從一個簡易的可以識別四則運算和整數值的詞法分析掃描器開始。它實現的功能也很簡單，就是讀取我們給定的檔案，並識別出檔案中的token將其輸出。

這個簡易的掃描器支援的詞法元素只有五個：

四個基本的算術運算子：+、-、*、/
十進位制整數

我們需要事先定義好每一個token，使用列舉型別來表示：

//defs.h

// Tokens
enum {
  T_PLUS, T_MINUS, T_STAR, T_SLASH, T_INTLIT
};

在掃描到token後將其儲存在一個如下的結構體中，當標記是 T_INTLIT（即整數文字）時，該intvalue 欄位將儲存我們掃描的整數值：

//defs.h

// Token structure
struct token {
  int token;
  int intvalue;
};

我們現在假定有一個檔案，其內部的的程式碼就是一個四則運算表示式：

2 + 34 * 5 - 8 / 3

我們要實現的是讀取他的每一個有效字元並輸出，就像這樣：

Token intlit, value 2
Token +
Token intlit, value 34
Token *
Token intlit, value 5
Token -
Token intlit, value 8
Token /
Token intlit, value 3

我們看到了最終要實現的目標，讓我們來一步步分析需要的功能。

首先我們需要一個逐字元的讀出檔案中的內容並返回的函式。當我們在輸入流中讀的太遠時，需要將讀取到的字元放回（如上例當讀到數字時，因無法直接獲取數字是否結束，只能迴圈讀取，當讀到第一個非數字字元時則判定該十進位制數讀取結束，需將該十進位制數返回並將讀取的非數字字元放回），記錄行號的的功能也是在這裡實現。

// Get the next character from the input file.
static int next(void) {
  int c;

  if (Putback) {                // Use the character put
    c = Putback;                // back if there is one
    Putback = 0;
    return c;
  }

  c = fgetc(Infile);            // Read from input file
  if ('\n' == c)
    Line++;                     // Increment line count
  return c;
}

我們只需要有效字元，所以需要去除空白字元的功能

// Skip past input that we don't need to deal with, 
// i.e. whitespace, newlines. Return the first
// character we do need to deal with.
static int skip(void) {
  int c;

  c = next();
  while (' ' == c || '\t' == c || '\n' == c || '\r' == c || '\f' == c) {
    c = next();
  }
  return (c);
}

當讀到的是數字的時候，怎麼確定數字有多少位呢？所以我們需要一個專門處理數字的函式。

// Return the position of character c
// in string s, or -1 if c not found
static int chrpos(char *s, int c) {
  char *p;

  p = strchr(s, c);
  return (p ? p - s : -1);
}


// Scan and return an integer literal
// value from the input file. Store
// the value as a string in Text.
static int scanint(int c) {
  int k, val = 0;

  // Convert each character into an int value
  while ((k = chrpos("0123456789", c)) >= 0) { 
    val = val * 10 + k;
    c = next();
  }

  // We hit a non-integer character, put it back.
  putback(c);
  return val;
}

所以現在我們可以在跳過空格的同時讀取字元；如果我們讀到一個字元太遠，我們也可以放回一個字元。我們現在可以編寫我們的第一個詞法掃描器：

int scan(struct token *t) {
  int c;

  // Skip whitespace
  c = skip();

  // Determine the token based on
  // the input character
  switch (c) {
  case EOF:
    return (0);
  case '+':
    t->token = T_PLUS;
    break;
  case '-':
    t->token = T_MINUS;
    break;
  case '*':
    t->token = T_STAR;
    break;
  case '/':
    t->token = T_SLASH;
    break;
  default:

    // If it's a digit, scan the
    // literal integer value in
    if (isdigit(c)) {
      t->intvalue = scanint(c);
      t->token = T_INTLIT;
      break;
    }

    printf("Unrecognised character %c on line %d\n", c, Line);
    exit(1);
  }
  // We found a token
  return (1);
}

現在我們可以讀取token並將其返回。

main() 函式開啟一個檔案，然後掃描它的令牌：

void main(int argc, char *argv[]) {
  ...
  init();
  ...
  Infile = fopen(argv[1], "r");
  ...
  scanfile();
  exit(0);
}

並scanfile()在有新token時迴圈並列印出token的詳細資訊：

// List of printable tokens
char *tokstr[] = { "+", "-", "*", "/", "intlit" };

// Loop scanning in all the tokens in the input file.
// Print out details of each token found.
static void scanfile() {
  struct token T;

  while (scan(&T)) {
    printf("Token %s", tokstr[T.token]);
    if (T.token == T_INTLIT)
      printf(", value %d", T.intvalue);
    printf("\n");
  }
}

我們本節的內容就到此為止。下一部分中，我們將構建一個解析器來解釋我們輸入檔案的語法，並計算並列印出每個檔案的最終值。

本文Github地址：https://github.com/Shaw9379/acwj/tree/master/01_Scanner

C語言編譯器開發之旅（二）：解析器
2021-06-09
C語言編譯
小C語言--詞法分析程式（編譯原理實驗一）
2018-09-22
C語言詞法分析編譯原理
用Java寫編譯器（1）- 詞法和語法分析
2020-09-02
Java編譯語法分析
C++原始碼單詞掃描程式（詞法分析）
2020-10-16
C++原始碼詞法分析
【編譯原理】手工打造詞法分析器
2024-03-28
編譯原理詞法分析
C語言編譯器手機版
2020-12-08
C語言編譯
源語言、目標語言、翻譯器、編譯器、直譯器
2019-05-07
編譯
精讀《手寫 SQL 編譯器 - 詞法分析》
2018-07-09
SQL編譯詞法分析
精讀《手寫 SQL 編譯器 – 詞法分析》
2019-03-04
SQL編譯詞法分析
【編譯原理】手工打造語法分析器
2024-04-07
編譯原理語法分析
[譯]用javascript實現一門程式語言-詞法分析
2019-02-27
JavaScript詞法分析
編譯器實現之旅——第五章實現語法分析器前的準備
2021-02-19
編譯語法分析
Go編譯原理系列2（詞法分析&語法分析基礎）
2021-12-23
Go編譯原理詞法分析語法分析
精讀《手寫 SQL 編譯器 - 語法分析》
2018-07-23
SQL編譯語法分析
Hanlp自然語言處理工具之詞法分析器
2019-04-10
HanLP自然語言處理詞法分析
【水汐の編譯原理】詞法分析器課題1
2020-10-02
編譯原理詞法分析
編譯器前端之如何實現基於DFA的詞法分析器
2021-09-21
編譯前端詞法分析
掃描器的存在、奧普掃描器
2020-02-06
【軟體開發底層知識修煉】五 gcc-C語言編譯器
2019-01-29
GCC語言編譯
掃描器
2019-12-27
實現指令碼直譯器 - 詞法分析器
2019-05-11
指令碼詞法分析
用 golang 寫一個語言（編譯器，虛擬機器）
2020-05-08
Golang編譯虛擬機
【開發語言】PHP、Java、C語言的編譯執行過程
2019-05-13
PHPJavaC語言編譯
編譯warp,d語言寫的c/c++前處理器.
2020-11-10
編譯C++
Go 語言的詞法分析和語法分析(1)
2021-03-23
Go詞法分析語法分析
編譯器有關的Makefile語法
2024-03-17
編譯
Babel：下一代Javascript語法編譯器
2020-07-16
BabelJavaScript編譯
埠掃描器
2020-11-19
AWVS掃描器掃描web漏洞操作
2018-08-06
Web
詞法分析器
2021-05-08
詞法分析
win10系統掃描器提示掃描不到掃描器如何解決
2019-01-07
Win10
C語言 - 條件編譯
2024-08-31
C語言編譯
win10掃描器在哪開啟 win10如何啟動掃描器
2022-03-19
Win10
Go編譯原理系列3（詞法分析）
2022-01-02
Go編譯原理詞法分析
Lex詞法分析器
2019-05-10
詞法分析
GO語言實現埠掃描
2020-12-04
Go
c語言多檔案編譯
2018-05-15
C語言編譯
幽默：Go語言的編譯器 - programmerjoke9
2021-01-17
Go編譯

C語言編譯器開發之旅（一）：詞法分析掃描器

相關文章