Using sub-generators for lexical scanning in Python

jieforest發表於2012-08-14

原文網址 : http://blog.itpub.net/301743/viewspace-740958/

A few days ago I watched a very interesting talk by Rob Pike about writing a non-trivial lexer in Go. Rob discussed how the traditional switch-based state machine approach is cumbersome to write, because it’s not really compatible with the algorithm we want to express. The main problem is that when we return a new token, a traditional state-machine structure forces us to explicitly pack up the state of where we are and return to the caller. Especially in cases where we just want to stay in the same state, this makes code unnecessarily convoluted.

This struck a chord with me, because I’ve already written about simplifying state machine code in Python with coroutines. I couldn’t help but wonder what would be an elegant Pythonic way to implement Rob’s template lexer (watch the talk or take a look at his slides for the syntax).

What follows is my attempt, which uses the new yield from syntax from PEP 380, and hence requires Python 3.3 (which is currently in beta, but should be released soon). I’ll present the code in small chunks with explanations; the full source is available for download here. It’s heavily commented, so should be easy to grok.

First, some helper types and constants:

CODE:

TOK_TEXT       = 'TOK_TEXT'
TOK_LEFT_META = 'TOK_LEFT_META'
TOK_RIGHT_META  = 'TOK_RIGHT_META'
TOK_PIPE       = 'TOK_PIPE'
TOK_NUMBER    = 'TOK_NUMBER'
TOK_ID       = 'TOK_ID'

# A token has
# type: one of the TOK_* constants
# value: string value, as taken from input
#
Token = namedtuple('Token', 'type value')

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/301743/viewspace-740958/，如需轉載，請註明出處，否則將追究法律責任。

用Python實現詞法分析器（Lexical Analyzer）
2019-12-17
Python詞法分析
pdf crop using python
2024-03-18
Python
wlan0 interface does‘t support scanning
2020-11-01
UVA12421 (Jiandan) Mua (I) - Lexical Analyzer題解
2024-08-30
SciTech-Mathmatics-ImageProcessing-Remove the Background from an image using Python?
2024-08-05
REMPython
SyntaxError: EOL while scanning string literal錯誤解決
2020-11-07
ErrorWhile
MySQL 索引優化 Using where, Using filesort
2020-10-26
MySql索引優化
論文閱讀：End to End Chinese Lexical Fusion Recognition with Sememe Knowledge
2024-07-31
MySQL explain結果Extra中"Using Index"與"Using where; Using index"區別
2022-03-01
MySqlAIIndex
解決 ideal 卡死一直 scanning files to index....
2020-12-20
IdeaIndex
Anaiable執行出現[WARNING]: Platform linux on hostis using the discovered Python interpreter at /usr/bin/python
2024-08-17
AIPlatformLinuxPython
Using hints for Postgresql
2020-02-03
SQL
String interpolation using $
2024-10-18
using的用法
2021-11-15
Using mysqldump for backups
2020-12-02
MySql
MySQL 之 USING
2021-04-08
MySql
MGTSC 212 using Excel
2024-12-05
Excel
Video Division with using OpenCv
2018-03-29
IDEOpenCV
Dictionary application using Swing
2024-12-01
APP
What are the benefits of using an proxy?
2021-09-11
淺談Using filesort和Using temporary 為什麼這麼慢
2022-03-01
ARS Reinforcement Learning using Gymnasium
2024-11-20
Using MATLAB with CANoe 快讀
2024-10-20
Matlab
SEC504.2 Recon, Scanning, and Enumeration Attacks 偵察、掃描和列舉攻擊
2024-05-31
Building OpenNI using a cross-compiler
2019-01-18
UIROSCompile
【Using English】28 - Security with HTTPS and SSL
2018-06-27
HTTP
LeetCode | 232 Implement Queue Using Stacks
2024-08-10
LeetCode
[Javascript] Using IIFE to improve code performance
2024-12-03
JavaScriptORM
fribidi not found using pkg-config
2024-11-09
Mysql using使用詳解ZCSF
2022-03-01
MySql
cdMysql?using?用法示例詳解
2022-03-01
MySql
recover database using backup controlfile理解
2021-11-19
Database
Dog robot MPC Cotroller using Pybullet
2020-12-26
How to get the description of blast hit using blastdbcmd?
2018-10-17
AST
Fatal error in launcher: Unable to create process using '"'
2018-05-04
Error
Step by Step Data Replication Using Oracle GoldenGate
2024-03-21
OracleGo
PostgreSQL DBA(181) - Using PostgreSQL as a Data Warehouse
2021-05-20
SQL
How to develop locally a Laravel app using Laragon
2022-11-25
devLaravelAPPGo
[20181214]open file using O_DIRECT.txt
2018-12-14

Using sub-generators for lexical scanning in Python

相關文章