如何手寫實現 JSON Parser

袋鼠云数栈前端發表於2024-11-15

原文網址 : https://www.cnblogs.com/dtux/p/18547513

JSON

JSON.parse 是我們在前端開發中經常會用到API，如果我們要自己實現一個JSON.parse，我們應該怎麼實現呢？今天我們就試著手寫一個JSON Parser，瞭解下其內部實現原理。

JSON語法

JSON 是一種語法，用來序列化物件、陣列、數值、字串、布林值和 null 。語法規則如下：

資料使用名/值對錶示。
使用大括號({})儲存物件，每個名稱後面跟著一個 ':'(冒號)，名/值對使用 ,(逗號)分割。

file

使用方括號([])儲存陣列，陣列值使用 ,(逗號)分割。

file

JSON值可以是：數字(整數或浮點數)/字串(在雙引號中)/邏輯值(true 或 false)/陣列(在方括號中)/物件(在花括號中)/null

file

實現Parser

Parser 一般會經過下面幾個過程，分為詞法分析、語法分析、轉換、程式碼生成過程。

詞法分析

file

透過對 JSON 語法的瞭解，我們可以看到 JSON 中會有一下型別及其特徵如下表：

型別	基本特徵
Object	"{" ":" "," "}"
Array	"[" "," "]"
String	'"'
Number	"0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
Boolean	"true" "false"
Null	"null"

所以根據這些特徵，對 JSON 字串進行遍歷操作並與上述特徵進行對比可以得到相應的 token。詞法分析實現程式碼如下：

// 詞法分析
const TokenTypes = {
  OPEN_OBJECT: '{',
  CLOSE_OBJECT: '}',
  OPEN_ARRAY: '[',
  CLOSE_ARRAY: ']',
  STRING: 'string',
  NUMBER: 'number',
  TRUE: 'true',
  FALSE: 'false',
  NULL: 'null',
  COLON: ':',
  COMMA: ',',
}

class Lexer {
  constructor(json) {
    this._json = json
    this._index = 0
    this._tokenList = []
  }

  createToken(type, value) {
    return { type, value: value || type }
  }

  getToken() {
    while (this._index < this._json.length) {
      const token = this.bigbang()
      this._tokenList.push(token)
    }
    return this._tokenList
  }

  bigbang() {
    const key = this._json[this._index]
    switch (key) {
      case ' ':
        this._index++
        return this.bigbang()
      case '{':
        this._index++
        return this.createToken(TokenTypes.OPEN_OBJECT)
      case '}':
        this._index++
        return this.createToken(TokenTypes.CLOSE_OBJECT)
      case '[':
        this._index++
        return this.createToken(TokenTypes.OPEN_ARRAY)
      case ']':
        this._index++
        return this.createToken(TokenTypes.CLOSE_ARRAY)
      case ':':
        this._index++
        return this.createToken(TokenTypes.COLON)
      case ',':
        this._index++
        return this.createToken(TokenTypes.COMMA)
      case '"':
        return this.parseString()
    }
    // number
    if (this.isNumber(key)) {
      return this.parseNumber()
    }
    // true false null
    const result = this.parseKeyword(key)
    if (result.isKeyword) {
      return this.createToken(TokenTypes[result.keyword])
    }
  }

  isNumber(key) {
    return key >= '0' && key <= '9'
  }

  parseString() {
    this._index++
    let key = ''
    while (this._index < this._json.length && this._json[this._index] !== '"') {
      key += this._json[this._index]
      this._index++
    }
    this._index++
    return this.createToken(TokenTypes.STRING, key)
  }

  parseNumber() {
    let key = ''
    while (this._index < this._json.length && '0' <= this._json[this._index] && this._json[this._index] <= '9') {
      key += this._json[this._index]
      this._index++
    }
    return this.createToken(TokenTypes.NUMBER, Number(key))
  }

  parseKeyword(key) {
    let isKeyword = false
    let keyword = ''
    switch (key) {
      case 't':
        isKeyword = this._json.slice(this._index, this._index + 4) === 'true'
        keyword = 'TRUE'
        break
      case 'f':
        isKeyword = this._json.slice(this._index, this._index + 5) === 'false'
        keyword = 'FALSE'
        break
      case 'n':
        isKeyword = this._json.slice(this._index, this._index + 4) === 'null'
        keyword = 'NULL'
        break
    }
    this._index += keyword.length
    return {
      isKeyword,
      keyword,
    }
  }
}

語法分析

file

語法分析是遍歷每個 Token，尋找語法資訊，並且構建一個叫做 AST(抽象語法樹)的物件。在正式進行語法分析前，我們針對 JSON 的語法特徵建立不同的類來記錄 AST 上每個節點的資訊。

class NumericLiteral {
  constructor(type, value) {
    this.type = type
    this.value = value
  }
}

class StringLiteral {
  constructor(type, value) {
    this.type = type
    this.value = value
  }
}

class BooleanLiteral {
  constructor(type, value) {
    this.type = type
    this.value = value
  }
}

class NullLiteral {
  constructor(type, value) {
    this.type = type
    this.value = value
  }
}

class ArrayExpression {
  constructor(type, elements) {
    this.type = type
    this.elements = elements || []
  }
}

class ObjectExpression {
  constructor(type, properties) {
    this.type = type
    this.properties = [] || properties
  }
}

class ObjectProperty {
  constructor(type, key, value) {
    this.type = type
    this.key = key
    this.value = value
  }
}

接下來正式進行語法分析，對 Token 進行遍歷並對其型別進行檢查，建立節點資訊，構建一個 AST(抽象語法樹)的物件。程式碼如下：

// 語法分析
class Parser {
  constructor(tokens) {
    this._tokens = tokens
    this._index = 0
    this.node = null
  }

  jump() {
    this._index++
  }

  getValue() {
    const value = this._tokens[this._index].value
    this._index++
    return value
  }

  parse() {
    const type = this._tokens[this._index].type
    const value = this.getValue()
    switch (type) {
      case TokenTypes.OPEN_ARRAY:
        const array = this.parseArray()
        this.jump()
        return array
      case TokenTypes.OPEN_OBJECT:
        const object = this.parseObject()
        this.jump()
        return object
      case TokenTypes.STRING:
        return new StringLiteral('StringLiteral', value)
      case TokenTypes.NUMBER:
        return new NumericLiteral('NumericLiteral', Number(value))
      case TokenTypes.TRUE:
        return new BooleanLiteral('BooleanLiteral', true)
      case TokenTypes.FALSE:
        return new BooleanLiteral('BooleanLiteral', false)
      case TokenTypes.NULL:
        return new NullLiteral('NullLiteral', null)
    }
  }

  parseArray() {
    const _array = new ArrayExpression('ArrayExpression')
    while(true) {
      const value = this.parse()
      _array.elements.push(value)
      if (this._tokens[this._index].type !== TokenTypes.COMMA) break
      this.jump() // 跳過 ,
    }
    return _array
  }

  parseObject() {
    const _object = new ObjectExpression('ObjectExpression')
    _object.properties = []
    while(true) {
      const key = this.parse()
      this.jump() // 跳過 : 
      const value = this.parse()
      const property = new ObjectProperty('ObjectProperty', key, value)
      _object.properties.push(property)
      if (this._tokens[this._index].type !== TokenTypes.COMMA) break
      this.jump() // 跳過 ,
    }
    return _object
  }
}

轉換

經過語法分析後得到了 AST，轉換階段可以對樹節點進行增刪改等操作，轉換為新的 AST 樹。

程式碼生成

生成程式碼階段，是對轉換後的 AST 進行遍歷，根據每個節點的語法資訊轉換成最終的程式碼。

// 程式碼生成
class Generate {
  constructor(tree) {
    this.tree = tree
  }

  getResult() {
    let result = this.getData(this.tree)
    return result
  }

  getData(data) {
    if (data.type === 'ArrayExpression') {
      let result = []
      data.elements.map(item => {
        let element = this.getData(item)
        result.push(element)
      })
      return result
    }
    if (data.type === 'ObjectExpression') {
      let result = {}
      data.properties.map(item => {
        let key = this.getData(item.key)
        let value = this.getData(item.value)
        result[key] = value
      })
      return result
    }
    if (data.type === 'ObjectProperty') {
      return this.getData(data)
    }
    if (data.type === 'NumericLiteral') {
      return data.value
    }
    if (data.type === 'StringLiteral') {
      return data.value
    }
    if (data.type === 'BooleanLiteral') {
      return data.value
    }
    if (data.type === 'NullLiteral') {
      return data.value
    }
  }
}

使用

function JsonParse(b) {
  const lexer = new Lexer(b)
  const tokens = lexer.getToken() // 獲取Token
  const parser = new Parser(tokens)
  const tree = parser.parse() // 生成語法樹
  const generate = new Generate(tree)
  const result = generate.getResult() // 生成程式碼
  return result
}

總結

至此我們就實現了一個簡單的 JSON Parse 解析器，透過對 JSON Parse 實現的探究，我們可以總結出此類解析器的實現步驟，首先對目標值的語法進行了解，提取其特徵，然後透過詞法分析，與目標特徵進行比對得到 token，然後對 token 進行語法分析生成 AST(抽象語法樹)，再對 AST 進行增刪改等操作，生成新的 AST，最終對 AST 進行遍歷就會生成我們需要的目標值。

參考

https://www.json.org/json-en.html
https://lihautan.com/json-parser-with-javascript/
https://developer.mozilla.org/zh-CN/docs/Web/JavaScript/Reference/Global_Objects/JSON

最後

歡迎關注【袋鼠雲數棧UED團隊】~
袋鼠雲數棧 UED 團隊持續為廣大開發者分享技術成果，相繼參與開源了歡迎 star

大資料分散式任務排程系統——Taier
輕量級的 Web IDE UI 框架——Molecule
針對大資料領域的 SQL Parser 專案——dt-sql-parser
袋鼠雲數棧前端團隊程式碼評審工程實踐文件——code-review-practices
一個速度更快、配置更靈活、使用更簡單的模組打包器——ko
一個針對 antd 的元件測試工具庫——ant-design-testing

精讀《手寫 JSON Parser》
2020-02-17
JSON
通過閱讀 Douglas Crockford 的原始碼學習如何寫 JSON parser（一）
2019-01-30
原始碼JSON
tree-sitter編寫parser，用external scanner實現eof規則
2024-12-03
從零手寫實現 nginx-35-proxy_pass netty 如何實現？
2024-07-28
NginxNetty
【肥朝】如何手寫實現簡易的Dubbo？
2019-05-06
手寫javascript json解析器
2019-04-02
JavaScriptJSON
如何實現一個JSON.parse
2018-08-29
JSON
android 利用path 實現手寫板的手寫效果
2018-07-05
Android
手寫IOC實現過程
2020-08-08
手寫AOP實現過程
2020-08-12
Express中介軟體body-parser簡單實現
2018-12-07
Express
Python技法：實現簡單的遞迴下降Parser
2022-04-30
Python遞迴
手寫一個Parser - 程式碼簡單而功能強大的Pratt Parsing
2022-02-25
用原生js手寫實現promise
2019-04-14
JSPromise
小程式實現手寫簽名
2022-02-18
手寫Json解析器學習心得
2020-12-08
JSON
JS手寫狀態管理的實現
2019-03-03
JS
純手寫實現JDK動態代理
2019-01-24
JDK
對HashMap的思考及手寫實現
2019-01-28
HashMap
如何實現 Markdown 撰寫文章
2019-12-29
從零手寫實現 tomcat-03-基本的 socket 實現
2024-05-08
Tomcat
手寫call、apply、bind實現及詳解
2019-02-03
APP
如何手寫xpath
2020-08-10
從零手寫實現 nginx-03-nginx 基於 Netty 實現
2024-06-04
NginxNetty
TiDB 原始碼閱讀系列文章（五）TiDB SQL Parser 的實現
2019-03-03
TiDB原始碼SQL
Express使用進階：cookie-parser中介軟體實現深入剖析
2019-03-03
ExpressCookie
Nodejs 進階：Express 常用中介軟體 body-parser 實現解析
2019-03-02
NodeJSExpress
Apache頂級專案ShardingSphere — SQL Parser的設計與實現
2020-12-16
ApacheSQL
Aop 設計 - 使用 PHP-parser 重寫 PHP 類
2018-07-14
PHP
如何讀取和寫入JSON檔案
2023-12-21
JSON
c++實現Json庫
2024-12-07
C++JSON
Tensorflow實現RNN（LSTM）手寫數字識別
2018-05-27
RNN
ProxySQL實現Mysql讀寫分離 - 部署手冊
2018-07-02
MySql
動手編寫—動態陣列（Java實現）
2020-09-12
陣列Java
從零手寫實現 nginx-27-return 指令
2024-07-14
Nginx
手寫程式語言-實現運算子過載
2022-09-19
一步一步實現手寫Promise
2020-12-04
Promise
手寫演算法-python程式碼實現Kmeans
2020-12-17
演算法Python

如何手寫實現 JSON Parser

JSON語法

實現Parser

詞法分析

語法分析

轉換

程式碼生成

使用

總結

參考

最後

相關文章