COMPILER PROJECT 2024
The goal of the term-project is to implement a bottom-up syntax analyzer (a.k.a., parser) as we’velearned. More specifically, you will implement the syntax analyzer for a simplified C programminglanguage with the following context free grammar G;
CFG G:
01: CODE → VDECL CODE | FDECL CODE | ϵ
02: VDECL → vtype id semi | vtype ASSIGN semi
03: ASSIGN → id assign RHS
04: RHS → EXPR | literal | character | boolstr
05: EXPR → EXPR addsub EXPR | EXPR multdiv EXPR
06: EXPR → lparen EXPR rparen | id | num
07: FDECL → vtype id lparen ARG rparen lbrace BLOCK RETURN rbrace
08: ARG → vtype id MOREARGS | ϵ
09: MOREARGS → comma vtype id MOREARGS | ϵ
10: BLOCK → STMT BLOCK | ϵ
11: STMT → VDECL | ASSIGN semi
12: STMT → if lparen COND rparen lbrace BLOCK rbrace ELSE
13: STMT → while lparen COND rparen lbrace BLOCK rbrace
14: COND → COND comp COND | boolstr
15: ELSE → else lbrace BLOCK rbrace | ϵ
16: RETURN → return RHS semi
✓ Terminals (21)
- vtype for the types of variables and functions
- num for signed integers
- character for a single character
- boolstr for Boolean strings
- literal for literal strings
- id for the identifiers of variables and functions
- if, else, while, and return for if, else, while, and return statements respectively8. class for class declarations
- addsub for + and - arithmetic operators
- multdiv for * and / arithmetic operators
- assign for assignment operators
- comp for comparison operators
- semi and comma for semicolons and commas respectively
- lparen, rparen, lbrace, and rbrace for (, ), {, and } respectively
✓ Non-terminals (13) CODE, VDECL, ASSIGN, RHS, EXPR, FDECL, ARG, MOREARGS, BLOCK, STMT, COND, ELSE,RETURN
✓ Start symbol: CODE
Descriptions
✓ The given CFG G is non-left recursive, but ambiguous.
✓ Codes include zero or more declarations of functions and variables (CFG line 1)
✓ Variables are declared with or without initialization (CFG line 2 ~ 3)
✓ The right hand side of assignment operations can be classified into four types; 1) arithmeticoperations (expressions), 2) literal strings, 3) a single character, and 4) Boolean strings (CFG4)
✓ Arithmetic operations are the combinations of +, -, *, / operators (CFG line 5 ~ 6)
✓ Functions can have zero or more input arguments (CFG line 7 ~ 9)
✓ Function blocks include zero or more statements (CFG line 10)
✓ There are four types of statements: 1) variable declarations, 2) assignment operations, 3) iflse statements, and 4) while statements (CFG line 11 ~ 13)
✓ if and while statements include a conditional operation which consists of Boolean stringsand condition operators (CFG line 12 ~ 14)✓ if statements can be used with or without an else statement (CFG line 12 & 15)
✓ return statements return 1) the computation result of arithmetic operations, 2) literal strings,
- 3) a single character, or 4) Boolean strings (CFG line 16)
✓ This is not a CFG for C. This is for simplified C. So, you don’t need to consider grammarsand structures not mentioned in this specification.Based on thiCFG, you should implement a bottom-up parser as follows:
✓ Discard an ambiguity in the CFG
✓ Construct a SLR parsing table for the non-ambiguous CFG through the following website:http://jsmachines.sourceforge.net/machines/slr.html
✓ Implement a SLR parsing program for the simplified Java programming language by using theconstructed table.For the implementation, please use C, C++, or Python (If you want to use . Your syntax analyzermust run on Linux or Unix-like OS without any error.
Your syntax analyzer should work as follows:
✓ The execution flow of your syntax analyzer: yntax_analyzer <input file>
✓ Input: A sequence of tokens (terminals) written in the input filee.g., vtype id semi vtype id lparen rparen lbrace if lparen boolstr comp boolstr rparen lbracerbrace
✓ Output
◼ (If a parsing decision output is “accept”) please construct a parse tree (not abstract syntax tree) for the input sequence
◆ You can design the data structure to represent the tree as you want.
◼ (If an output is “reject”) please make an error report which explains why and where the error occurred (e.g., line number)
Term-project schedule and submission
✓ Deadline: 6/9, 23:59 (through an e-class system)
◼ For a delayed submission, you will lose 0.1 * your original project score per eachdelayed day
✓ Submission file: team_<your_team_number>.zip or .tar.gz
◼ The compressed file should contain
◆ The source code of your syntax analyzer with detailed comments
◆ The executable binary file of your syntax analyzer (if you implemented using
a complied language)
◆ Documentation (the most important thing!)
⚫ It must include 1) your non-ambiguous CFG G and 2) your SLR parsing table
⚫ It must also include any change in the CFG G and all about how your syntaxnalyzer works for validating token sequences (for example, overallprocedures, implementation details like algorithms and data structures,working examples, and so on)◆ Test input files and outputs which you used inthis project
⚫ The test input files are not given. You should make the test files, by yourself,which can examine all the syntax grammars.
✓ If there exist any error in the given CFG, please send an e-mail to hskimhello@cau.ac.kr