是一個動態資料管理框架。 它包含許多組成典型資料庫管理系統的部分,但省略了儲存原語。它提供了行業標準的SQL解析器和驗證器,具有可插入規則和成本函式的可自定義優化器,邏輯和物理代數運算子,從SQL到代數(以及相反)的各種轉換。
以上是官方描述,用大白話描述就是,calcite實現了一套標準的sql解析功能,比如實現了標準hive sql的解析,可以避免繁雜且易出錯的語法問題。並暴露了相關的擴充套件介面供使用者自定義使用。其提供了邏輯計劃修改功能,使用者可以實現自己的優化。(害,好像還是很繞!不管了)
1. calcite的兩大方向
從核心功能上講,或者某種程度上講,我們可以將calicite分為兩大塊,一塊是對sql語法的解析,另一塊是對語義的轉化與實現;
為什麼要將其分為幾塊呢?我們知道,基本上所有的分層,都是為了簡化各層的邏輯。如果我們將所有的邏輯全放在一個層上,必然存在大量的耦合,互相巢狀,很難實現專業的人做專業的事。語法解析,本身是一件比較難的事情,但是因為有很多成熟的編譯原理理論支援,所以,這方面有許多現成的實現可以利用,或者即使是自己單獨實現這一塊,也不會有太大麻煩。所以,這一層是一定要分出來的。
而對語義的轉化與實現,則是使用者更關注的一層,如果說前面的語法是標準規範的話,那麼語義才是實現者最關心的東西。規範是為了減輕使用者的使用難度,而其背後的邏輯則可能有天壤之別。當有了前面的語法解析樹之後,再來進一步處理語義的東西,必然方便了許多。但也必定是個複雜的工作,因為上下文關聯語義,並不好處理。
而我們本篇只關注語法解析這一塊功能,而calcite使用javacc作為其語法解析器,所以我們自然主關注向javacc了。與javacc類似的,還有antlr,這個留到我們後面再說。
calcite中,javacc應該屬於一階段的編譯,而java中引入javacc編譯後的樣板程式碼,再執行自己的邏輯,可以算作是二階段編譯。我們可以簡單的參考下下面這個圖,說明其意義。
2. javacc的語法框架
本文僅站在一個使用者的角度來講解javacc, 因為javacc本身必然是複雜的難以講清的, 而且如果想要細緻瞭解javacc則肯定是需要藉助官網的。
首先,來看下javacc的編碼框架:
javacc_options /* javacc 的各種配置選項設定,需瞭解具體配置含義後以kv形式配置 */ "PARSER_BEGIN" "(" <IDENTIFIER> ")" /* parser程式碼開始定義,標識下面的程式碼是純粹使用java編寫的 */ java_compilation_unit /* parser的入口程式碼編寫,純java, 此處將決定外部如何呼叫parser */ "PARSER_END" "(" <IDENTIFIER> ")" /* parser程式碼結束標識,javacc將會把以上程式碼純粹當作原文copy到parser中 */ ( production )* /* 各種語法產生式,按照編譯原理的類似樣子定義語法產生式,由javacc去分析具體程式碼邏輯,嵌入到parser中,該部分產生式程式碼將被編譯到上面的parser中,所以方法可以完全供parser呼叫 */ <EOF> /* 檔案結束標識 */
以上就是javacc的語法定義的框架了,它是一個整個的parser.jj檔案。即這個檔案只要按照這種框架寫了,然後呼叫javacc進行編譯後,可以得到一系列的編譯器樣板程式碼了。
但是,如何去編寫去編寫這些語法呢?啥都不知道,好尷尬。不著急,且看下一節。
3. javacc中的關鍵詞及使用
之所以我們無從下手寫javacc的jj檔案,是因為我們不知道有些什麼關鍵詞,以及沒有給出一些樣例。主要熟能生巧嘛。
javacc中的關鍵詞非常的少,一個是因為這種詞法解析器的方法論是非常成熟的,它可以按照任意的語法作出解析。二一個是它不負責太多的業務實現相關的東西,它只管理解語義,翻譯即可。而它其中僅有的幾個關鍵詞,也還有一些屬於輔助類的功能。真正必須的關鍵詞就更少了。列舉如下:
TOKEN /* 定義一些確定的普通詞或關鍵詞,主要用於被引用 */ SPECIAL_TOKEN /* 定義一些確定的特殊用途的普通詞或關鍵詞,主要用於被引用或拋棄 */ SKIP /* 定義一些需要跳過或者忽略的單詞或短語,主要用於分詞或者註釋 */ MORE /* token的輔助定義工具,用於確定連續的多個token */ EOF /* 檔案結束標識或者語句結束標識 */ IGNORE_CASE /* 輔助選項,忽略大小寫 */ JAVACODE /* 輔助選項,用於標識本段程式碼是java */ LOOKAHEAD /* 語法二義性處理工具,用於預讀多個token,以便明確語義 */ PARSER_BEGIN /* 樣板程式碼,固定開頭 */ PARSER_END /* 樣板程式碼,固定結尾 */ TOKEN_MGR_DECLS /* 輔助選項 */
有了這些關鍵詞的定義,我們就可以來寫個hello world 了。其主要作用就是驗證語法是否是 hello world.
options { STATIC = false; ERROR_REPORTING = true; JAVA_UNICODE_ESCAPE = true; UNICODE_INPUT = false; IGNORE_CASE = true; DEBUG_PARSER = false; DEBUG_LOOKAHEAD = false; DEBUG_TOKEN_MANAGER = false; } PARSER_BEGIN(HelloWorldParser) package my; import java.io.FileInputStream; /** * hello world parser */ @SuppressWarnings({"nls", "unused"}) public class HelloWorldParser { /** * 測試入口 */ public static void main( String args[] ) throws Throwable { // 編譯器會預設生成構造方法 String sqlFilePath = args[0]; final HelloWorldParser parser = new HelloWorldParser(new FileInputStream(sqlFilePath)); try { parser.hello(); } catch(Throwable t) { System.err.println(":1: not parsed"); t.printStackTrace(); return; } System.out.println("ok"); } public void hello () throws ParseException { helloEof(); } } // end class PARSER_END(HelloWorldParser) void helloEof() : {} { // 匹配到hello world 後,列印文字,否則丟擲異常 ( <HELLO> | "HELLO2" ) <WORLD> { System.out.println("ok to match hello world."); } } TOKEN : { <HELLO: "hello"> | <WORLD: "world"> } SKIP: { " " | "\t" | "\r" | "\n" }
命名為 hello.jj, 執行 javacc 編譯該jj檔案。
> javacc hello.jj > javac my/*.java > java my.HelloWorldParser
4. javacc中的編譯原理
javacc作為一個詞法解析器,其主要作用是提供詞法解析功能。當然,只有它自己知道詞是不夠的,它還有一個非常重要的功能,能夠翻譯成java語言(不止java)的解析器,這樣使用者就可以呼叫這些解析器進行業務邏輯實現了。所以,從某種角度上說,它相當於是一個腳手架,幫我們生成一些模板程式碼。
詞法解析作為一個非常通用的話題,各種大牛科學家們,早就總結出非常多的方法論的東西了。即編譯原理。但要想深入理解其理論,還是非常難的,只能各自隨緣了。隨便列舉幾個名詞,供大家參考:
產生式
終結符與非終結符,運算分量
預測分析法,左遞迴,回溯,上下文無關
DFA, NFA, 正則匹配,模式,kmp演算法,trie樹
附加操作,宣告
LL, LR, 二義性
詞法
語法
可以說,整個javacc就是編譯原理的其中一小部分實現。當然了,我們平時遇到編譯的地方非常多,因為我們所使用的語言,都需要被編譯成彙編或機器語言,才能被執行,比如javacc, gcc...。所以,編譯原理無處不在。
這裡,我們單說jj檔案如何被編譯成java檔案?總體上,大的原理就按照編譯原理來就好了。我們只說一些對映關係。
"a" "b" -> 代表多個連續token | -> 對應if或者switch語義 (..)* -> 對應while語義 ["a"] -> 對應if語句,可匹配0-1次 (): {} -> 對應語法的產生式 {} -> 附加操作,在匹配後嵌入執行 <id> 對應常量詞或容易描述的token描述
javacc 預設會生成幾個輔助類:
XXConstants: 定義一些常量值,比如將TOKEN定義的值轉換為一個個的數字;
HelloWorldParserTokenManager: token管理器, 用於讀取token, 可以自定義處理;
JavaCharStream: CharStream的實現,會根據配置選項生成不同的類;
ParseException: 解析錯誤時丟擲的類;
Token: 讀取到的單詞描述類;
TokenMgrError: 讀取token錯誤時丟擲的錯誤;
具體看下javacc中有些什麼選項配置,請檢視官網。https://javacc.github.io/javacc/documentation/grammar.html#javacc-options
從編寫程式碼的角度來說,我們基本上只要掌握基本的樣板格式和正規表示式就可以寫出javacc的語法了。如果想要在具體的java程式碼中應用,則需要自己組織需要的語法樹結構或其他了。
5. javacc 編譯實現原始碼解析
javacc本身也是用java寫的,可讀性還是比較強的。我們就略微掃一下吧。它的倉庫地址: https://github.com/javacc/javacc
其入口在: src/main/java/org/javacc/parser/Main.java
/** * A main program that exercises the parser. */ public static void main(String args[]) throws Exception { int errorcode = mainProgram(args); System.exit(errorcode); } /** * The method to call to exercise the parser from other Java programs. * It returns an error code. See how the main program above uses * this method. */ public static int mainProgram(String args[]) throws Exception { if (args.length == 1 && args[args.length -1].equalsIgnoreCase("-version")) { System.out.println(Version.versionNumber); return 0; } // Initialize all static state reInitAll(); JavaCCGlobals.bannerLine("Parser Generator", ""); JavaCCParser parser = null; if (args.length == 0) { System.out.println(""); help_message(); return 1; } else { System.out.println("(type \"javacc\" with no arguments for help)"); } if (Options.isOption(args[args.length-1])) { System.out.println("Last argument \"" + args[args.length-1] + "\" is not a filename."); return 1; } for (int arg = 0; arg < args.length-1; arg++) { if (!Options.isOption(args[arg])) { System.out.println("Argument \"" + args[arg] + "\" must be an option setting."); return 1; } Options.setCmdLineOption(args[arg]); } try { java.io.File fp = new java.io.File(args[args.length-1]); if (!fp.exists()) { System.out.println("File " + args[args.length-1] + " not found."); return 1; } if (fp.isDirectory()) { System.out.println(args[args.length-1] + " is a directory. Please use a valid file name."); return 1; } // javacc 本身也使用的語法解析器生成 JavaCCParser, 即相當於自依賴咯 parser = new JavaCCParser(new java.io.BufferedReader(new java.io.InputStreamReader(new java.io.FileInputStream(args[args.length-1]), Options.getGrammarEncoding()))); } catch (SecurityException se) { System.out.println("Security violation while trying to open " + args[args.length-1]); return 1; } catch (java.io.FileNotFoundException e) { System.out.println("File " + args[args.length-1] + " not found."); return 1; } try { System.out.println("Reading from file " + args[args.length-1] + " . . ."); // 使用靜態變數來實現全域性資料共享 JavaCCGlobals.fileName = JavaCCGlobals.origFileName = args[args.length-1]; JavaCCGlobals.jjtreeGenerated = JavaCCGlobals.isGeneratedBy("JJTree", args[args.length-1]); JavaCCGlobals.toolNames = JavaCCGlobals.getToolNames(args[args.length-1]); // javacc 語法解析入口 // 經過解析後,它會將各種解析資料放入到全域性變數中 parser.javacc_input(); // 2012/05/02 - Moved this here as cannot evaluate output language // until the cc file has been processed. Was previously setting the 'lg' variable // to a lexer before the configuration override in the cc file had been read. String outputLanguage = Options.getOutputLanguage(); // TODO :: CBA -- Require Unification of output language specific processing into a single Enum class boolean isJavaOutput = Options.isOutputLanguageJava(); boolean isCPPOutput = outputLanguage.equals(Options.OUTPUT_LANGUAGE__CPP); // 2013/07/22 Java Modern is a boolean isJavaModern = isJavaOutput && Options.getJavaTemplateType().equals(Options.JAVA_TEMPLATE_TYPE_MODERN); if (isJavaOutput) { lg = new LexGen(); } else if (isCPPOutput) { lg = new LexGenCPP(); } else { return unhandledLanguageExit(outputLanguage); } JavaCCGlobals.createOutputDir(Options.getOutputDirectory()); if (Options.getUnicodeInput()) { NfaState.unicodeWarningGiven = true; System.out.println("Note: UNICODE_INPUT option is specified. " + "Please make sure you create the parser/lexer using a Reader with the correct character encoding."); } // 將詞法解析得到的資訊,重新語義加強,構造出更連貫的上下文資訊,供後續使用 Semanticize.start(); boolean isBuildParser = Options.getBuildParser(); // 2012/05/02 -- This is not the best way to add-in GWT support, really the code needs to turn supported languages into enumerations // and have the enumerations describe the deltas between the outputs. The current approach means that per-langauge configuration is distributed // and small changes between targets does not benefit from inheritance. if (isJavaOutput) { if (isBuildParser) { // 1. 生成parser框架資訊 new ParseGen().start(isJavaModern); } // Must always create the lexer object even if not building a parser. // 2. 生成語法解析資訊 new LexGen().start(); // 3. 生成其他輔助類 Options.setStringOption(Options.NONUSER_OPTION__PARSER_NAME, JavaCCGlobals.cu_name); OtherFilesGen.start(isJavaModern); } else if (isCPPOutput) { // C++ for now if (isBuildParser) { new ParseGenCPP().start(); } if (isBuildParser) { new LexGenCPP().start(); } Options.setStringOption(Options.NONUSER_OPTION__PARSER_NAME, JavaCCGlobals.cu_name); OtherFilesGenCPP.start(); } else { unhandledLanguageExit(outputLanguage); } // 編譯結果狀態判定,輸出 if ((JavaCCErrors.get_error_count() == 0) && (isBuildParser || Options.getBuildTokenManager())) { if (JavaCCErrors.get_warning_count() == 0) { if (isBuildParser) { System.out.println("Parser generated successfully."); } } else { System.out.println("Parser generated with 0 errors and " + JavaCCErrors.get_warning_count() + " warnings."); } return 0; } else { System.out.println("Detected " + JavaCCErrors.get_error_count() + " errors and " + JavaCCErrors.get_warning_count() + " warnings."); return (JavaCCErrors.get_error_count()==0)?0:1; } } catch (MetaParseException e) { System.out.println("Detected " + JavaCCErrors.get_error_count() + " errors and " + JavaCCErrors.get_warning_count() + " warnings."); return 1; } catch (ParseException e) { System.out.println(e.toString()); System.out.println("Detected " + (JavaCCErrors.get_error_count()+1) + " errors and " + JavaCCErrors.get_warning_count() + " warnings."); return 1; } }
以上,就是javacc的編譯執行框架,其詞法解析仍然靠著自身的jj檔案,生成的 JavaCCParser 進行解析:
1. 生成的 JavaCCParser, 然後呼叫 javacc_input() 解析出詞法資訊;
2. 將解析出的語法資訊放入到全域性變數中;
3. 使用Semanticize 將詞法語義加強,轉換為javacc可處理的結構;
4. 使用ParseGen 生成parser框架資訊;
5. 使用LexGen 生成語法描述方法;
6. 使用OtherFilesGen 生成同級輔助類;
下面我們就前面幾個重點類,展開看看其實現就差不多了。
5.1. javacc語法定義
前面說了,javacc在編譯其他語言時,它自己又定義了一個語法檔案,用於第一步的詞法分析。可見這功能的普啟遍性。我們大致看下入口即可,更多完整原始碼可檢視: src/main/javacc/JavaCC.jj
void javacc_input() : { String id1, id2; initialize(); } { javacc_options() { } "PARSER_BEGIN" "(" id1=identifier() { addcuname(id1); } ")" { processing_cu = true; parser_class_name = id1; if (!isJavaLanguage()) { JavaCCGlobals.otherLanguageDeclTokenBeg = getToken(1); while(getToken(1).kind != _PARSER_END) { getNextToken(); } JavaCCGlobals.otherLanguageDeclTokenEnd = getToken(1); } } CompilationUnit() { processing_cu = false; } "PARSER_END" "(" id2=identifier() { compare(getToken(0), id1, id2); } ")" ( production() )+ <EOF> } ...
可以看出,這種語法定義,與說明文件相差不太多,可以說是一種比較接近自然語言的實現了。
5.2. Semanticize 語義處理
Semanticize 將前面詞法解析得到資料,進一步轉換成容易被理解的語法樹或者其他資訊。
// org.javacc.parser.Semanticize#start static public void start() throws MetaParseException { if (JavaCCErrors.get_error_count() != 0) throw new MetaParseException(); if (Options.getLookahead() > 1 && !Options.getForceLaCheck() && Options.getSanityCheck()) { JavaCCErrors.warning("Lookahead adequacy checking not being performed since option LOOKAHEAD " + "is more than 1. Set option FORCE_LA_CHECK to true to force checking."); } /* * The following walks the entire parse tree to convert all LOOKAHEAD's * that are not at choice points (but at beginning of sequences) and converts * them to trivial choices. This way, their semantic lookahead specification * can be evaluated during other lookahead evaluations. */ for (Iterator<NormalProduction> it = bnfproductions.iterator(); it.hasNext();) { ExpansionTreeWalker.postOrderWalk(((NormalProduction)it.next()).getExpansion(), new LookaheadFixer()); } /* * The following loop populates "production_table" */ for (Iterator<NormalProduction> it = bnfproductions.iterator(); it.hasNext();) { NormalProduction p = it.next(); if (production_table.put(p.getLhs(), p) != null) { JavaCCErrors.semantic_error(p, p.getLhs() + " occurs on the left hand side of more than one production."); } } /* * The following walks the entire parse tree to make sure that all * non-terminals on RHS's are defined on the LHS. */ for (Iterator<NormalProduction> it = bnfproductions.iterator(); it.hasNext();) { ExpansionTreeWalker.preOrderWalk((it.next()).getExpansion(), new ProductionDefinedChecker()); } /* * The following loop ensures that all target lexical states are * defined. Also piggybacking on this loop is the detection of * <EOF> and <name> in token productions. After reporting an * error, these entries are removed. Also checked are definitions * on inline private regular expressions. * This loop works slightly differently when USER_TOKEN_MANAGER * is set to true. In this case, <name> occurrences are OK, while * regular expression specs generate a warning. */ for (Iterator<TokenProduction> it = rexprlist.iterator(); it.hasNext();) { TokenProduction tp = (TokenProduction)(it.next()); List<RegExprSpec> respecs = tp.respecs; for (Iterator<RegExprSpec> it1 = respecs.iterator(); it1.hasNext();) { RegExprSpec res = (RegExprSpec)(it1.next()); if (res.nextState != null) { if (lexstate_S2I.get(res.nextState) == null) { JavaCCErrors.semantic_error(res.nsTok, "Lexical state \"" + res.nextState + "\" has not been defined."); } } if (res.rexp instanceof REndOfFile) { //JavaCCErrors.semantic_error(res.rexp, "Badly placed <EOF>."); if (tp.lexStates != null) JavaCCErrors.semantic_error(res.rexp, "EOF action/state change must be specified for all states, " + "i.e., <*>TOKEN:."); if (tp.kind != TokenProduction.TOKEN) JavaCCErrors.semantic_error(res.rexp, "EOF action/state change can be specified only in a " + "TOKEN specification."); if (nextStateForEof != null || actForEof != null) JavaCCErrors.semantic_error(res.rexp, "Duplicate action/state change specification for <EOF>."); actForEof = res.act; nextStateForEof = res.nextState; prepareToRemove(respecs, res); } else if (tp.isExplicit && Options.getUserTokenManager()) { JavaCCErrors.warning(res.rexp, "Ignoring regular expression specification since " + "option USER_TOKEN_MANAGER has been set to true."); } else if (tp.isExplicit && !Options.getUserTokenManager() && res.rexp instanceof RJustName) { JavaCCErrors.warning(res.rexp, "Ignoring free-standing regular expression reference. " + "If you really want this, you must give it a different label as <NEWLABEL:<" + res.rexp.label + ">>."); prepareToRemove(respecs, res); } else if (!tp.isExplicit && res.rexp.private_rexp) { JavaCCErrors.semantic_error(res.rexp, "Private (#) regular expression cannot be defined within " + "grammar productions."); } } } removePreparedItems(); /* * The following loop inserts all names of regular expressions into * "named_tokens_table" and "ordered_named_tokens". * Duplications are flagged as errors. */ for (Iterator<TokenProduction> it = rexprlist.iterator(); it.hasNext();) { TokenProduction tp = (TokenProduction)(it.next()); List<RegExprSpec> respecs = tp.respecs; for (Iterator<RegExprSpec> it1 = respecs.iterator(); it1.hasNext();) { RegExprSpec res = (RegExprSpec)(it1.next()); if (!(res.rexp instanceof RJustName) && !res.rexp.label.equals("")) { String s = res.rexp.label; Object obj = named_tokens_table.put(s, res.rexp); if (obj != null) { JavaCCErrors.semantic_error(res.rexp, "Multiply defined lexical token name \"" + s + "\"."); } else { ordered_named_tokens.add(res.rexp); } if (lexstate_S2I.get(s) != null) { JavaCCErrors.semantic_error(res.rexp, "Lexical token name \"" + s + "\" is the same as " + "that of a lexical state."); } } } } /* * The following code merges multiple uses of the same string in the same * lexical state and produces error messages when there are multiple * explicit occurrences (outside the BNF) of the string in the same * lexical state, or when within BNF occurrences of a string are duplicates * of those that occur as non-TOKEN's (SKIP, MORE, SPECIAL_TOKEN) or private * regular expressions. While doing this, this code also numbers all * regular expressions (by setting their ordinal values), and populates the * table "names_of_tokens". */ tokenCount = 1; for (Iterator<TokenProduction> it = rexprlist.iterator(); it.hasNext();) { TokenProduction tp = (TokenProduction)(it.next()); List<RegExprSpec> respecs = tp.respecs; if (tp.lexStates == null) { tp.lexStates = new String[lexstate_I2S.size()]; int i = 0; for (Enumeration<String> enum1 = lexstate_I2S.elements(); enum1.hasMoreElements();) { tp.lexStates[i++] = (String)(enum1.nextElement()); } } Hashtable table[] = new Hashtable[tp.lexStates.length]; for (int i = 0; i < tp.lexStates.length; i++) { table[i] = (Hashtable)simple_tokens_table.get(tp.lexStates[i]); } for (Iterator<RegExprSpec> it1 = respecs.iterator(); it1.hasNext();) { RegExprSpec res = (RegExprSpec)(it1.next()); if (res.rexp instanceof RStringLiteral) { RStringLiteral sl = (RStringLiteral)res.rexp; // This loop performs the checks and actions with respect to each lexical state. for (int i = 0; i < table.length; i++) { // Get table of all case variants of "sl.image" into table2. Hashtable table2 = (Hashtable)(table[i].get(sl.image.toUpperCase())); if (table2 == null) { // There are no case variants of "sl.image" earlier than the current one. // So go ahead and insert this item. if (sl.ordinal == 0) { sl.ordinal = tokenCount++; } table2 = new Hashtable(); table2.put(sl.image, sl); table[i].put(sl.image.toUpperCase(), table2); } else if (hasIgnoreCase(table2, sl.image)) { // hasIgnoreCase sets "other" if it is found. // Since IGNORE_CASE version exists, current one is useless and bad. if (!sl.tpContext.isExplicit) { // inline BNF string is used earlier with an IGNORE_CASE. JavaCCErrors.semantic_error(sl, "String \"" + sl.image + "\" can never be matched " + "due to presence of more general (IGNORE_CASE) regular expression " + "at line " + other.getLine() + ", column " + other.getColumn() + "."); } else { // give the standard error message. JavaCCErrors.semantic_error(sl, "Duplicate definition of string token \"" + sl.image + "\" " + "can never be matched."); } } else if (sl.tpContext.ignoreCase) { // This has to be explicit. A warning needs to be given with respect // to all previous strings. String pos = ""; int count = 0; for (Enumeration<RegularExpression> enum2 = table2.elements(); enum2.hasMoreElements();) { RegularExpression rexp = (RegularExpression)(enum2.nextElement()); if (count != 0) pos += ","; pos += " line " + rexp.getLine(); count++; } if (count == 1) { JavaCCErrors.warning(sl, "String with IGNORE_CASE is partially superseded by string at" + pos + "."); } else { JavaCCErrors.warning(sl, "String with IGNORE_CASE is partially superseded by strings at" + pos + "."); } // This entry is legitimate. So insert it. if (sl.ordinal == 0) { sl.ordinal = tokenCount++; } table2.put(sl.image, sl); // The above "put" may override an existing entry (that is not IGNORE_CASE) and that's // the desired behavior. } else { // The rest of the cases do not involve IGNORE_CASE. RegularExpression re = (RegularExpression)table2.get(sl.image); if (re == null) { if (sl.ordinal == 0) { sl.ordinal = tokenCount++; } table2.put(sl.image, sl); } else if (tp.isExplicit) { // This is an error even if the first occurrence was implicit. if (tp.lexStates[i].equals("DEFAULT")) { JavaCCErrors.semantic_error(sl, "Duplicate definition of string token \"" + sl.image + "\"."); } else { JavaCCErrors.semantic_error(sl, "Duplicate definition of string token \"" + sl.image + "\" in lexical state \"" + tp.lexStates[i] + "\"."); } } else if (re.tpContext.kind != TokenProduction.TOKEN) { JavaCCErrors.semantic_error(sl, "String token \"" + sl.image + "\" has been defined as a \"" + TokenProduction.kindImage[re.tpContext.kind] + "\" token."); } else if (re.private_rexp) { JavaCCErrors.semantic_error(sl, "String token \"" + sl.image + "\" has been defined as a private regular expression."); } else { // This is now a legitimate reference to an existing RStringLiteral. // So we assign it a number and take it out of "rexprlist". // Therefore, if all is OK (no errors), then there will be only unequal // string literals in each lexical state. Note that the only way // this can be legal is if this is a string declared inline within the // BNF. Hence, it belongs to only one lexical state - namely "DEFAULT". sl.ordinal = re.ordinal; prepareToRemove(respecs, res); } } } } else if (!(res.rexp instanceof RJustName)) { res.rexp.ordinal = tokenCount++; } if (!(res.rexp instanceof RJustName) && !res.rexp.label.equals("")) { names_of_tokens.put(new Integer(res.rexp.ordinal), res.rexp.label); } if (!(res.rexp instanceof RJustName)) { rexps_of_tokens.put(new Integer(res.rexp.ordinal), res.rexp); } } } removePreparedItems(); /* * The following code performs a tree walk on all regular expressions * attaching links to "RJustName"s. Error messages are given if * undeclared names are used, or if "RJustNames" refer to private * regular expressions or to regular expressions of any kind other * than TOKEN. In addition, this loop also removes top level * "RJustName"s from "rexprlist". * This code is not executed if Options.getUserTokenManager() is set to * true. Instead the following block of code is executed. */ if (!Options.getUserTokenManager()) { FixRJustNames frjn = new FixRJustNames(); for (Iterator<TokenProduction> it = rexprlist.iterator(); it.hasNext();) { TokenProduction tp = (TokenProduction)(it.next()); List<RegExprSpec> respecs = tp.respecs; for (Iterator<RegExprSpec> it1 = respecs.iterator(); it1.hasNext();) { RegExprSpec res = (RegExprSpec)(it1.next()); frjn.root = res.rexp; ExpansionTreeWalker.preOrderWalk(res.rexp, frjn); if (res.rexp instanceof RJustName) { prepareToRemove(respecs, res); } } } } removePreparedItems(); /* * The following code is executed only if Options.getUserTokenManager() is * set to true. This code visits all top-level "RJustName"s (ignores * "RJustName"s nested within regular expressions). Since regular expressions * are optional in this case, "RJustName"s without corresponding regular * expressions are given ordinal values here. If "RJustName"s refer to * a named regular expression, their ordinal values are set to reflect this. * All but one "RJustName" node is removed from the lists by the end of * execution of this code. */ if (Options.getUserTokenManager()) { for (Iterator<TokenProduction> it = rexprlist.iterator(); it.hasNext();) { TokenProduction tp = (TokenProduction)(it.next()); List<RegExprSpec> respecs = tp.respecs; for (Iterator<RegExprSpec> it1 = respecs.iterator(); it1.hasNext();) { RegExprSpec res = (RegExprSpec)(it1.next()); if (res.rexp instanceof RJustName) { RJustName jn = (RJustName)res.rexp; RegularExpression rexp = (RegularExpression)named_tokens_table.get(jn.label); if (rexp == null) { jn.ordinal = tokenCount++; named_tokens_table.put(jn.label, jn); ordered_named_tokens.add(jn); names_of_tokens.put(new Integer(jn.ordinal), jn.label); } else { jn.ordinal = rexp.ordinal; prepareToRemove(respecs, res); } } } } } removePreparedItems(); /* * The following code is executed only if Options.getUserTokenManager() is * set to true. This loop labels any unlabeled regular expression and * prints a warning that it is doing so. These labels are added to * "ordered_named_tokens" so that they may be generated into the ...Constants * file. */ if (Options.getUserTokenManager()) { for (Iterator<TokenProduction> it = rexprlist.iterator(); it.hasNext();) { TokenProduction tp = (TokenProduction)(it.next()); List<RegExprSpec> respecs = tp.respecs; for (Iterator<RegExprSpec> it1 = respecs.iterator(); it1.hasNext();) { RegExprSpec res = (RegExprSpec)(it1.next()); Integer ii = new Integer(res.rexp.ordinal); if (names_of_tokens.get(ii) == null) { JavaCCErrors.warning(res.rexp, "Unlabeled regular expression cannot be referred to by " + "user generated token manager."); } } } } if (JavaCCErrors.get_error_count() != 0) throw new MetaParseException(); // The following code sets the value of the "emptyPossible" field of NormalProduction // nodes. This field is initialized to false, and then the entire list of // productions is processed. This is repeated as long as at least one item // got updated from false to true in the pass. boolean emptyUpdate = true; while (emptyUpdate) { emptyUpdate = false; for (Iterator<NormalProduction> it = bnfproductions.iterator(); it.hasNext();) { NormalProduction prod = (NormalProduction)it.next(); if (emptyExpansionExists(prod.getExpansion())) { if (!prod.isEmptyPossible()) { emptyUpdate = prod.setEmptyPossible(true); } } } } if (Options.getSanityCheck() && JavaCCErrors.get_error_count() == 0) { // The following code checks that all ZeroOrMore, ZeroOrOne, and OneOrMore nodes // do not contain expansions that can expand to the empty token list. for (Iterator<NormalProduction> it = bnfproductions.iterator(); it.hasNext();) { ExpansionTreeWalker.preOrderWalk(((NormalProduction)it.next()).getExpansion(), new EmptyChecker()); } // The following code goes through the productions and adds pointers to other // productions that it can expand to without consuming any tokens. Once this is // done, a left-recursion check can be performed. for (Iterator<NormalProduction> it = bnfproductions.iterator(); it.hasNext();) { NormalProduction prod = it.next(); addLeftMost(prod, prod.getExpansion()); } // Now the following loop calls a recursive walk routine that searches for // actual left recursions. The way the algorithm is coded, once a node has // been determined to participate in a left recursive loop, it is not tried // in any other loop. for (Iterator<NormalProduction> it = bnfproductions.iterator(); it.hasNext();) { NormalProduction prod = it.next(); if (prod.getWalkStatus() == 0) { prodWalk(prod); } } // Now we do a similar, but much simpler walk for the regular expression part of // the grammar. Here we are looking for any kind of loop, not just left recursions, // so we only need to do the equivalent of the above walk. // This is not done if option USER_TOKEN_MANAGER is set to true. if (!Options.getUserTokenManager()) { for (Iterator<TokenProduction> it = rexprlist.iterator(); it.hasNext();) { TokenProduction tp = (TokenProduction)(it.next()); List<RegExprSpec> respecs = tp.respecs; for (Iterator<RegExprSpec> it1 = respecs.iterator(); it1.hasNext();) { RegExprSpec res = (RegExprSpec)(it1.next()); RegularExpression rexp = res.rexp; if (rexp.walkStatus == 0) { rexp.walkStatus = -1; if (rexpWalk(rexp)) { loopString = "..." + rexp.label + "... --> " + loopString; JavaCCErrors.semantic_error(rexp, "Loop in regular expression detected: \"" + loopString + "\""); } rexp.walkStatus = 1; } } } } /* * The following code performs the lookahead ambiguity checking. */ if (JavaCCErrors.get_error_count() == 0) { for (Iterator<NormalProduction> it = bnfproductions.iterator(); it.hasNext();) { ExpansionTreeWalker.preOrderWalk((it.next()).getExpansion(), new LookaheadChecker()); } } } // matches "if (Options.getSanityCheck()) {" if (JavaCCErrors.get_error_count() != 0) throw new MetaParseException(); } // org.javacc.parser.ExpansionTreeWalker#postOrderWalk // 後續遍歷節點,與前序遍歷類似 /** * Visits the nodes of the tree rooted at "node" in post-order. * i.e., it visits the children first and then executes * opObj.action. */ static void postOrderWalk(Expansion node, TreeWalkerOp opObj) { if (opObj.goDeeper(node)) { if (node instanceof Choice) { for (Iterator it = ((Choice)node).getChoices().iterator(); it.hasNext();) { postOrderWalk((Expansion)it.next(), opObj); } } else if (node instanceof Sequence) { for (Iterator it = ((Sequence)node).units.iterator(); it.hasNext();) { postOrderWalk((Expansion)it.next(), opObj); } } else if (node instanceof OneOrMore) { postOrderWalk(((OneOrMore)node).expansion, opObj); } else if (node instanceof ZeroOrMore) { postOrderWalk(((ZeroOrMore)node).expansion, opObj); } else if (node instanceof ZeroOrOne) { postOrderWalk(((ZeroOrOne)node).expansion, opObj); } else if (node instanceof Lookahead) { Expansion nested_e = ((Lookahead)node).getLaExpansion(); if (!(nested_e instanceof Sequence && (Expansion)(((Sequence)nested_e).units.get(0)) == node)) { postOrderWalk(nested_e, opObj); } } else if (node instanceof TryBlock) { postOrderWalk(((TryBlock)node).exp, opObj); } else if (node instanceof RChoice) { for (Iterator it = ((RChoice)node).getChoices().iterator(); it.hasNext();) { postOrderWalk((Expansion)it.next(), opObj); } } else if (node instanceof RSequence) { for (Iterator it = ((RSequence)node).units.iterator(); it.hasNext();) { postOrderWalk((Expansion)it.next(), opObj); } } else if (node instanceof ROneOrMore) { postOrderWalk(((ROneOrMore)node).regexpr, opObj); } else if (node instanceof RZeroOrMore) { postOrderWalk(((RZeroOrMore)node).regexpr, opObj); } else if (node instanceof RZeroOrOne) { postOrderWalk(((RZeroOrOne)node).regexpr, opObj); } else if (node instanceof RRepetitionRange) { postOrderWalk(((RRepetitionRange)node).regexpr, opObj); } } opObj.action(node); }
5.3. ParseGen 生成parser框架
ParseGen 生成一些header, 將java_compilation 寫進去等。
// org.javacc.parser.ParseGen#start public void start(boolean isJavaModernMode) throws MetaParseException { Token t = null; if (JavaCCErrors.get_error_count() != 0) { throw new MetaParseException(); } if (Options.getBuildParser()) { final List<String> tn = new ArrayList<String>(toolNames); tn.add(toolName); // This is the first line generated -- the the comment line at the top of the generated parser genCodeLine("/* " + getIdString(tn, cu_name + ".java") + " */"); boolean implementsExists = false; final boolean extendsExists = false; if (cu_to_insertion_point_1.size() != 0) { Object firstToken = cu_to_insertion_point_1.get(0); printTokenSetup((Token) firstToken); ccol = 1; for (final Iterator<Token> it = cu_to_insertion_point_1.iterator(); it.hasNext();) { t = it.next(); if (t.kind == IMPLEMENTS) { implementsExists = true; } else if (t.kind == CLASS) { implementsExists = false; } printToken(t); } } if (implementsExists) { genCode(", "); } else { genCode(" implements "); } genCode(cu_name + "Constants "); if (cu_to_insertion_point_2.size() != 0) { printTokenSetup((Token) (cu_to_insertion_point_2.get(0))); for (final Iterator<Token> it = cu_to_insertion_point_2.iterator(); it.hasNext();) { printToken(it.next()); } } genCodeLine(""); genCodeLine(""); new ParseEngine().build(this); if (Options.getStatic()) { genCodeLine(" static private " + Options.getBooleanType() + " jj_initialized_once = false;"); } if (Options.getUserTokenManager()) { genCodeLine(" /** User defined Token Manager. */"); genCodeLine(" " + staticOpt() + "public TokenManager token_source;"); } else { genCodeLine(" /** Generated Token Manager. */"); genCodeLine(" " + staticOpt() + "public " + cu_name + "TokenManager token_source;"); if (!Options.getUserCharStream()) { if (Options.getJavaUnicodeEscape()) { genCodeLine(" " + staticOpt() + "JavaCharStream jj_input_stream;"); } else { genCodeLine(" " + staticOpt() + "SimpleCharStream jj_input_stream;"); } } } genCodeLine(" /** Current token. */"); genCodeLine(" " + staticOpt() + "public Token token;"); genCodeLine(" /** Next token. */"); genCodeLine(" " + staticOpt() + "public Token jj_nt;"); if (!Options.getCacheTokens()) { genCodeLine(" " + staticOpt() + "private int jj_ntk;"); } if (Options.getDepthLimit() > 0) { genCodeLine(" " + staticOpt() + "private int jj_depth;"); } if (jj2index != 0) { genCodeLine(" " + staticOpt() + "private Token jj_scanpos, jj_lastpos;"); genCodeLine(" " + staticOpt() + "private int jj_la;"); if (lookaheadNeeded) { genCodeLine(" /** Whether we are looking ahead. */"); genCodeLine(" " + staticOpt() + "private " + Options.getBooleanType() + " jj_lookingAhead = false;"); genCodeLine(" " + staticOpt() + "private " + Options.getBooleanType() + " jj_semLA;"); } } if (Options.getErrorReporting()) { genCodeLine(" " + staticOpt() + "private int jj_gen;"); genCodeLine(" " + staticOpt() + "final private int[] jj_la1 = new int[" + maskindex + "];"); final int tokenMaskSize = (tokenCount - 1) / 32 + 1; for (int i = 0; i < tokenMaskSize; i++) { genCodeLine(" static private int[] jj_la1_" + i + ";"); } genCodeLine(" static {"); for (int i = 0; i < tokenMaskSize; i++) { genCodeLine(" jj_la1_init_" + i + "();"); } genCodeLine(" }"); for (int i = 0; i < tokenMaskSize; i++) { genCodeLine(" private static void jj_la1_init_" + i + "() {"); genCode(" jj_la1_" + i + " = new int[] {"); for (final Iterator it = maskVals.iterator(); it.hasNext();) { final int[] tokenMask = (int[]) (it.next()); genCode("0x" + Integer.toHexString(tokenMask[i]) + ","); } genCodeLine("};"); genCodeLine(" }"); } } if (jj2index != 0 && Options.getErrorReporting()) { genCodeLine(" " + staticOpt() + "final private JJCalls[] jj_2_rtns = new JJCalls[" + jj2index + "];"); genCodeLine(" " + staticOpt() + "private " + Options.getBooleanType() + " jj_rescan = false;"); genCodeLine(" " + staticOpt() + "private int jj_gc = 0;"); } genCodeLine(""); if (Options.getDebugParser()) { genCodeLine(" {"); genCodeLine(" enable_tracing();"); genCodeLine(" }"); } if (!Options.getUserTokenManager()) { if (Options.getUserCharStream()) { genCodeLine(" /** Constructor with user supplied CharStream. */"); genCodeLine(" public " + cu_name + "(CharStream stream) {"); if (Options.getStatic()) { genCodeLine(" if (jj_initialized_once) {"); genCodeLine(" System.out.println(\"ERROR: Second call to constructor of static parser. \");"); genCodeLine(" System.out.println(\" You must either use ReInit() " + "or set the JavaCC option STATIC to false\");"); genCodeLine(" System.out.println(\" during parser generation.\");"); genCodeLine(" throw new "+(Options.isLegacyExceptionHandling() ? "Error" : "RuntimeException")+"();"); genCodeLine(" }"); genCodeLine(" jj_initialized_once = true;"); } if (Options.getTokenManagerUsesParser()) { genCodeLine(" token_source = new " + cu_name + "TokenManager(this, stream);"); } else { genCodeLine(" token_source = new " + cu_name + "TokenManager(stream);"); } genCodeLine(" token = new Token();"); if (Options.getCacheTokens()) { genCodeLine(" token.next = jj_nt = token_source.getNextToken();"); } else { genCodeLine(" jj_ntk = -1;"); } if (Options.getDepthLimit() > 0) { genCodeLine(" jj_depth = -1;"); } if (Options.getErrorReporting()) { genCodeLine(" jj_gen = 0;"); if (maskindex > 0) { genCodeLine(" for (int i = 0; i < " + maskindex + "; i++) jj_la1[i] = -1;"); } if (jj2index != 0) { genCodeLine(" for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();"); } } genCodeLine(" }"); genCodeLine(""); genCodeLine(" /** Reinitialise. */"); genCodeLine(" " + staticOpt() + "public void ReInit(CharStream stream) {"); if (Options.isTokenManagerRequiresParserAccess()) { genCodeLine(" token_source.ReInit(this,stream);"); } else { genCodeLine(" token_source.ReInit(stream);"); } genCodeLine(" token = new Token();"); if (Options.getCacheTokens()) { genCodeLine(" token.next = jj_nt = token_source.getNextToken();"); } else { genCodeLine(" jj_ntk = -1;"); } if (Options.getDepthLimit() > 0) { genCodeLine(" jj_depth = -1;"); } if (lookaheadNeeded) { genCodeLine(" jj_lookingAhead = false;"); } if (jjtreeGenerated) { genCodeLine(" jjtree.reset();"); } if (Options.getErrorReporting()) { genCodeLine(" jj_gen = 0;"); if (maskindex > 0) { genCodeLine(" for (int i = 0; i < " + maskindex + "; i++) jj_la1[i] = -1;"); } if (jj2index != 0) { genCodeLine(" for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();"); } } genCodeLine(" }"); } else { if (!isJavaModernMode) { genCodeLine(" /** Constructor with InputStream. */"); genCodeLine(" public " + cu_name + "(java.io.InputStream stream) {"); genCodeLine(" this(stream, null);"); genCodeLine(" }"); genCodeLine(" /** Constructor with InputStream and supplied encoding */"); genCodeLine(" public " + cu_name + "(java.io.InputStream stream, String encoding) {"); if (Options.getStatic()) { genCodeLine(" if (jj_initialized_once) {"); genCodeLine(" System.out.println(\"ERROR: Second call to constructor of static parser. \");"); genCodeLine(" System.out.println(\" You must either use ReInit() or " + "set the JavaCC option STATIC to false\");"); genCodeLine(" System.out.println(\" during parser generation.\");"); genCodeLine(" throw new "+(Options.isLegacyExceptionHandling() ? "Error" : "RuntimeException")+"();"); genCodeLine(" }"); genCodeLine(" jj_initialized_once = true;"); } if (Options.getJavaUnicodeEscape()) { if (!Options.getGenerateChainedException()) { genCodeLine(" try { jj_input_stream = new JavaCharStream(stream, encoding, 1, 1); } " + "catch(java.io.UnsupportedEncodingException e) {" + " throw new RuntimeException(e.getMessage()); }"); } else { genCodeLine(" try { jj_input_stream = new JavaCharStream(stream, encoding, 1, 1); } " + "catch(java.io.UnsupportedEncodingException e) { throw new RuntimeException(e); }"); } } else { if (!Options.getGenerateChainedException()) { genCodeLine(" try { jj_input_stream = new SimpleCharStream(stream, encoding, 1, 1); } " + "catch(java.io.UnsupportedEncodingException e) { " + "throw new RuntimeException(e.getMessage()); }"); } else { genCodeLine(" try { jj_input_stream = new SimpleCharStream(stream, encoding, 1, 1); } " + "catch(java.io.UnsupportedEncodingException e) { throw new RuntimeException(e); }"); } } if (Options.getTokenManagerUsesParser() && !Options.getStatic()) { genCodeLine(" token_source = new " + cu_name + "TokenManager(this, jj_input_stream);"); } else { genCodeLine(" token_source = new " + cu_name + "TokenManager(jj_input_stream);"); } genCodeLine(" token = new Token();"); if (Options.getCacheTokens()) { genCodeLine(" token.next = jj_nt = token_source.getNextToken();"); } else { genCodeLine(" jj_ntk = -1;"); } if (Options.getDepthLimit() > 0) { genCodeLine(" jj_depth = -1;"); } if (Options.getErrorReporting()) { genCodeLine(" jj_gen = 0;"); if (maskindex > 0) { genCodeLine(" for (int i = 0; i < " + maskindex + "; i++) jj_la1[i] = -1;"); } if (jj2index != 0) { genCodeLine(" for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();"); } } genCodeLine(" }"); genCodeLine(""); genCodeLine(" /** Reinitialise. */"); genCodeLine(" " + staticOpt() + "public void ReInit(java.io.InputStream stream) {"); genCodeLine(" ReInit(stream, null);"); genCodeLine(" }"); genCodeLine(" /** Reinitialise. */"); genCodeLine(" " + staticOpt() + "public void ReInit(java.io.InputStream stream, String encoding) {"); if (!Options.getGenerateChainedException()) { genCodeLine(" try { jj_input_stream.ReInit(stream, encoding, 1, 1); } " + "catch(java.io.UnsupportedEncodingException e) { " + "throw new RuntimeException(e.getMessage()); }"); } else { genCodeLine(" try { jj_input_stream.ReInit(stream, encoding, 1, 1); } " + "catch(java.io.UnsupportedEncodingException e) { throw new RuntimeException(e); }"); } if (Options.isTokenManagerRequiresParserAccess()) { genCodeLine(" token_source.ReInit(this,jj_input_stream);"); } else { genCodeLine(" token_source.ReInit(jj_input_stream);"); } genCodeLine(" token = new Token();"); if (Options.getCacheTokens()) { genCodeLine(" token.next = jj_nt = token_source.getNextToken();"); } else { genCodeLine(" jj_ntk = -1;"); } if (Options.getDepthLimit() > 0) { genCodeLine(" jj_depth = -1;"); } if (jjtreeGenerated) { genCodeLine(" jjtree.reset();"); } if (Options.getErrorReporting()) { genCodeLine(" jj_gen = 0;"); genCodeLine(" for (int i = 0; i < " + maskindex + "; i++) jj_la1[i] = -1;"); if (jj2index != 0) { genCodeLine(" for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();"); } } genCodeLine(" }"); genCodeLine(""); } final String readerInterfaceName = isJavaModernMode ? "Provider" : "java.io.Reader"; final String stringReaderClass = isJavaModernMode ? "StringProvider" : "java.io.StringReader"; genCodeLine(" /** Constructor. */"); genCodeLine(" public " + cu_name + "(" + readerInterfaceName + " stream) {"); if (Options.getStatic()) { genCodeLine(" if (jj_initialized_once) {"); genCodeLine(" System.out.println(\"ERROR: Second call to constructor of static parser. \");"); genCodeLine(" System.out.println(\" You must either use ReInit() or " + "set the JavaCC option STATIC to false\");"); genCodeLine(" System.out.println(\" during parser generation.\");"); genCodeLine(" throw new "+(Options.isLegacyExceptionHandling() ? "Error" : "RuntimeException")+"();"); genCodeLine(" }"); genCodeLine(" jj_initialized_once = true;"); } if (Options.getJavaUnicodeEscape()) { genCodeLine(" jj_input_stream = new JavaCharStream(stream, 1, 1);"); } else { genCodeLine(" jj_input_stream = new SimpleCharStream(stream, 1, 1);"); } if (Options.getTokenManagerUsesParser() && !Options.getStatic()) { genCodeLine(" token_source = new " + cu_name + "TokenManager(this, jj_input_stream);"); } else { genCodeLine(" token_source = new " + cu_name + "TokenManager(jj_input_stream);"); } genCodeLine(" token = new Token();"); if (Options.getCacheTokens()) { genCodeLine(" token.next = jj_nt = token_source.getNextToken();"); } else { genCodeLine(" jj_ntk = -1;"); } if (Options.getDepthLimit() > 0) { genCodeLine(" jj_depth = -1;"); } if (Options.getErrorReporting()) { genCodeLine(" jj_gen = 0;"); if (maskindex > 0) { genCodeLine(" for (int i = 0; i < " + maskindex + "; i++) jj_la1[i] = -1;"); } if (jj2index != 0) { genCodeLine(" for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();"); } } genCodeLine(" }"); genCodeLine(""); // Add-in a string based constructor because its convenient (modern only to prevent regressions) if (isJavaModernMode) { genCodeLine(" /** Constructor. */"); genCodeLine(" public " + cu_name + "(String dsl) throws ParseException, "+Options.getTokenMgrErrorClass() +" {"); genCodeLine(" this(new " + stringReaderClass + "(dsl));"); genCodeLine(" }"); genCodeLine(""); genCodeLine(" public void ReInit(String s) {"); genCodeLine(" ReInit(new " + stringReaderClass + "(s));"); genCodeLine(" }"); } genCodeLine(" /** Reinitialise. */"); genCodeLine(" " + staticOpt() + "public void ReInit(" + readerInterfaceName + " stream) {"); if (Options.getJavaUnicodeEscape()) { genCodeLine(" if (jj_input_stream == null) {"); genCodeLine(" jj_input_stream = new JavaCharStream(stream, 1, 1);"); genCodeLine(" } else {"); genCodeLine(" jj_input_stream.ReInit(stream, 1, 1);"); genCodeLine(" }"); } else { genCodeLine(" if (jj_input_stream == null) {"); genCodeLine(" jj_input_stream = new SimpleCharStream(stream, 1, 1);"); genCodeLine(" } else {"); genCodeLine(" jj_input_stream.ReInit(stream, 1, 1);"); genCodeLine(" }"); } genCodeLine(" if (token_source == null) {"); if (Options.getTokenManagerUsesParser() && !Options.getStatic()) { genCodeLine(" token_source = new " + cu_name + "TokenManager(this, jj_input_stream);"); } else { genCodeLine(" token_source = new " + cu_name + "TokenManager(jj_input_stream);"); } genCodeLine(" }"); genCodeLine(""); if (Options.isTokenManagerRequiresParserAccess()) { genCodeLine(" token_source.ReInit(this,jj_input_stream);"); } else { genCodeLine(" token_source.ReInit(jj_input_stream);"); } genCodeLine(" token = new Token();"); if (Options.getCacheTokens()) { genCodeLine(" token.next = jj_nt = token_source.getNextToken();"); } else { genCodeLine(" jj_ntk = -1;"); } if (Options.getDepthLimit() > 0) { genCodeLine(" jj_depth = -1;"); } if (jjtreeGenerated) { genCodeLine(" jjtree.reset();"); } if (Options.getErrorReporting()) { genCodeLine(" jj_gen = 0;"); if (maskindex > 0) { genCodeLine(" for (int i = 0; i < " + maskindex + "; i++) jj_la1[i] = -1;"); } if (jj2index != 0) { genCodeLine(" for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();"); } } genCodeLine(" }"); } } genCodeLine(""); if (Options.getUserTokenManager()) { genCodeLine(" /** Constructor with user supplied Token Manager. */"); genCodeLine(" public " + cu_name + "(TokenManager tm) {"); } else { genCodeLine(" /** Constructor with generated Token Manager. */"); genCodeLine(" public " + cu_name + "(" + cu_name + "TokenManager tm) {"); } if (Options.getStatic()) { genCodeLine(" if (jj_initialized_once) {"); genCodeLine(" System.out.println(\"ERROR: Second call to constructor of static parser. \");"); genCodeLine(" System.out.println(\" You must either use ReInit() or " + "set the JavaCC option STATIC to false\");"); genCodeLine(" System.out.println(\" during parser generation.\");"); genCodeLine(" throw new "+(Options.isLegacyExceptionHandling() ? "Error" : "RuntimeException")+"();"); genCodeLine(" }"); genCodeLine(" jj_initialized_once = true;"); } genCodeLine(" token_source = tm;"); genCodeLine(" token = new Token();"); if (Options.getCacheTokens()) { genCodeLine(" token.next = jj_nt = token_source.getNextToken();"); } else { genCodeLine(" jj_ntk = -1;"); } if (Options.getDepthLimit() > 0) { genCodeLine(" jj_depth = -1;"); } if (Options.getErrorReporting()) { genCodeLine(" jj_gen = 0;"); if (maskindex > 0) { genCodeLine(" for (int i = 0; i < " + maskindex + "; i++) jj_la1[i] = -1;"); } if (jj2index != 0) { genCodeLine(" for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();"); } } genCodeLine(" }"); genCodeLine(""); if (Options.getUserTokenManager()) { genCodeLine(" /** Reinitialise. */"); genCodeLine(" public void ReInit(TokenManager tm) {"); } else { genCodeLine(" /** Reinitialise. */"); genCodeLine(" public void ReInit(" + cu_name + "TokenManager tm) {"); } genCodeLine(" token_source = tm;"); genCodeLine(" token = new Token();"); if (Options.getCacheTokens()) { genCodeLine(" token.next = jj_nt = token_source.getNextToken();"); } else { genCodeLine(" jj_ntk = -1;"); } if (Options.getDepthLimit() > 0) { genCodeLine(" jj_depth = -1;"); } if (jjtreeGenerated) { genCodeLine(" jjtree.reset();"); } if (Options.getErrorReporting()) { genCodeLine(" jj_gen = 0;"); if (maskindex > 0) { genCodeLine(" for (int i = 0; i < " + maskindex + "; i++) jj_la1[i] = -1;"); } if (jj2index != 0) { genCodeLine(" for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();"); } } genCodeLine(" }"); genCodeLine(""); genCodeLine(" " + staticOpt() + "private Token jj_consume_token(int kind) throws ParseException {"); if (Options.getCacheTokens()) { genCodeLine(" Token oldToken = token;"); genCodeLine(" if ((token = jj_nt).next != null) jj_nt = jj_nt.next;"); genCodeLine(" else jj_nt = jj_nt.next = token_source.getNextToken();"); } else { genCodeLine(" Token oldToken;"); genCodeLine(" if ((oldToken = token).next != null) token = token.next;"); genCodeLine(" else token = token.next = token_source.getNextToken();"); genCodeLine(" jj_ntk = -1;"); } genCodeLine(" if (token.kind == kind) {"); if (Options.getErrorReporting()) { genCodeLine(" jj_gen++;"); if (jj2index != 0) { genCodeLine(" if (++jj_gc > 100) {"); genCodeLine(" jj_gc = 0;"); genCodeLine(" for (int i = 0; i < jj_2_rtns.length; i++) {"); genCodeLine(" JJCalls c = jj_2_rtns[i];"); genCodeLine(" while (c != null) {"); genCodeLine(" if (c.gen < jj_gen) c.first = null;"); genCodeLine(" c = c.next;"); genCodeLine(" }"); genCodeLine(" }"); genCodeLine(" }"); } } if (Options.getDebugParser()) { genCodeLine(" trace_token(token, \"\");"); } genCodeLine(" return token;"); genCodeLine(" }"); if (Options.getCacheTokens()) { genCodeLine(" jj_nt = token;"); } genCodeLine(" token = oldToken;"); if (Options.getErrorReporting()) { genCodeLine(" jj_kind = kind;"); } genCodeLine(" throw generateParseException();"); genCodeLine(" }"); genCodeLine(""); if (jj2index != 0) { genCodeLine(" @SuppressWarnings(\"serial\")"); genCodeLine(" static private final class LookaheadSuccess extends "+(Options.isLegacyExceptionHandling() ? "java.lang.Error" : "java.lang.RuntimeException")+" {"); genCodeLine(" @Override"); genCodeLine(" public Throwable fillInStackTrace() {"); genCodeLine(" return this;"); genCodeLine(" }"); genCodeLine(" }"); genCodeLine(" static private final LookaheadSuccess jj_ls = new LookaheadSuccess();"); genCodeLine(" " + staticOpt() + "private " + Options.getBooleanType() + " jj_scan_token(int kind) {"); genCodeLine(" if (jj_scanpos == jj_lastpos) {"); genCodeLine(" jj_la--;"); genCodeLine(" if (jj_scanpos.next == null) {"); genCodeLine(" jj_lastpos = jj_scanpos = jj_scanpos.next = token_source.getNextToken();"); genCodeLine(" } else {"); genCodeLine(" jj_lastpos = jj_scanpos = jj_scanpos.next;"); genCodeLine(" }"); genCodeLine(" } else {"); genCodeLine(" jj_scanpos = jj_scanpos.next;"); genCodeLine(" }"); if (Options.getErrorReporting()) { genCodeLine(" if (jj_rescan) {"); genCodeLine(" int i = 0; Token tok = token;"); genCodeLine(" while (tok != null && tok != jj_scanpos) { i++; tok = tok.next; }"); genCodeLine(" if (tok != null) jj_add_error_token(kind, i);"); if (Options.getDebugLookahead()) { genCodeLine(" } else {"); genCodeLine(" trace_scan(jj_scanpos, kind);"); } genCodeLine(" }"); } else if (Options.getDebugLookahead()) { genCodeLine(" trace_scan(jj_scanpos, kind);"); } genCodeLine(" if (jj_scanpos.kind != kind) return true;"); genCodeLine(" if (jj_la == 0 && jj_scanpos == jj_lastpos) throw jj_ls;"); genCodeLine(" return false;"); genCodeLine(" }"); genCodeLine(""); } genCodeLine(""); genCodeLine("/** Get the next Token. */"); genCodeLine(" " + staticOpt() + "final public Token getNextToken() {"); if (Options.getCacheTokens()) { genCodeLine(" if ((token = jj_nt).next != null) jj_nt = jj_nt.next;"); genCodeLine(" else jj_nt = jj_nt.next = token_source.getNextToken();"); } else { genCodeLine(" if (token.next != null) token = token.next;"); genCodeLine(" else token = token.next = token_source.getNextToken();"); genCodeLine(" jj_ntk = -1;"); } if (Options.getErrorReporting()) { genCodeLine(" jj_gen++;"); } if (Options.getDebugParser()) { genCodeLine(" trace_token(token, \" (in getNextToken)\");"); } genCodeLine(" return token;"); genCodeLine(" }"); genCodeLine(""); genCodeLine("/** Get the specific Token. */"); genCodeLine(" " + staticOpt() + "final public Token getToken(int index) {"); if (lookaheadNeeded) { genCodeLine(" Token t = jj_lookingAhead ? jj_scanpos : token;"); } else { genCodeLine(" Token t = token;"); } genCodeLine(" for (int i = 0; i < index; i++) {"); genCodeLine(" if (t.next != null) t = t.next;"); genCodeLine(" else t = t.next = token_source.getNextToken();"); genCodeLine(" }"); genCodeLine(" return t;"); genCodeLine(" }"); genCodeLine(""); if (!Options.getCacheTokens()) { genCodeLine(" " + staticOpt() + "private int jj_ntk_f() {"); genCodeLine(" if ((jj_nt=token.next) == null)"); genCodeLine(" return (jj_ntk = (token.next=token_source.getNextToken()).kind);"); genCodeLine(" else"); genCodeLine(" return (jj_ntk = jj_nt.kind);"); genCodeLine(" }"); genCodeLine(""); } if (Options.getErrorReporting()) { if (!Options.getGenerateGenerics()) { genCodeLine(" " + staticOpt() + "private java.util.List jj_expentries = new java.util.ArrayList();"); } else { genCodeLine(" " + staticOpt() + "private java.util.List<int[]> jj_expentries = new java.util.ArrayList<int[]>();"); } genCodeLine(" " + staticOpt() + "private int[] jj_expentry;"); genCodeLine(" " + staticOpt() + "private int jj_kind = -1;"); if (jj2index != 0) { genCodeLine(" " + staticOpt() + "private int[] jj_lasttokens = new int[100];"); genCodeLine(" " + staticOpt() + "private int jj_endpos;"); genCodeLine(""); genCodeLine(" " + staticOpt() + "private void jj_add_error_token(int kind, int pos) {"); genCodeLine(" if (pos >= 100) {"); genCodeLine(" return;"); genCodeLine(" }"); genCodeLine(""); genCodeLine(" if (pos == jj_endpos + 1) {"); genCodeLine(" jj_lasttokens[jj_endpos++] = kind;"); genCodeLine(" } else if (jj_endpos != 0) {"); genCodeLine(" jj_expentry = new int[jj_endpos];"); genCodeLine(""); genCodeLine(" for (int i = 0; i < jj_endpos; i++) {"); genCodeLine(" jj_expentry[i] = jj_lasttokens[i];"); genCodeLine(" }"); genCodeLine(""); if (!Options.getGenerateGenerics()) { genCodeLine(" for (java.util.Iterator it = jj_expentries.iterator(); it.hasNext();) {"); genCodeLine(" int[] oldentry = (int[])(it.next());"); } else { genCodeLine(" for (int[] oldentry : jj_expentries) {"); } genCodeLine(" if (oldentry.length == jj_expentry.length) {"); genCodeLine(" boolean isMatched = true;"); genCodeLine(""); genCodeLine(" for (int i = 0; i < jj_expentry.length; i++) {"); genCodeLine(" if (oldentry[i] != jj_expentry[i]) {"); genCodeLine(" isMatched = false;"); genCodeLine(" break;"); genCodeLine(" }"); genCodeLine(""); genCodeLine(" }"); genCodeLine(" if (isMatched) {"); genCodeLine(" jj_expentries.add(jj_expentry);"); genCodeLine(" break;"); genCodeLine(" }"); genCodeLine(" }"); genCodeLine(" }"); genCodeLine(""); genCodeLine(" if (pos != 0) {"); genCodeLine(" jj_lasttokens[(jj_endpos = pos) - 1] = kind;"); genCodeLine(" }"); genCodeLine(" }"); genCodeLine(" }"); } genCodeLine(""); genCodeLine(" /** Generate ParseException. */"); genCodeLine(" " + staticOpt() + "public ParseException generateParseException() {"); genCodeLine(" jj_expentries.clear();"); genCodeLine(" " + Options.getBooleanType() + "[] la1tokens = new " + Options.getBooleanType() + "[" + tokenCount + "];"); genCodeLine(" if (jj_kind >= 0) {"); genCodeLine(" la1tokens[jj_kind] = true;"); genCodeLine(" jj_kind = -1;"); genCodeLine(" }"); genCodeLine(" for (int i = 0; i < " + maskindex + "; i++) {"); genCodeLine(" if (jj_la1[i] == jj_gen) {"); genCodeLine(" for (int j = 0; j < 32; j++) {"); for (int i = 0; i < (tokenCount - 1) / 32 + 1; i++) { genCodeLine(" if ((jj_la1_" + i + "[i] & (1<<j)) != 0) {"); genCode(" la1tokens["); if (i != 0) { genCode((32 * i) + "+"); } genCodeLine("j] = true;"); genCodeLine(" }"); } genCodeLine(" }"); genCodeLine(" }"); genCodeLine(" }"); genCodeLine(" for (int i = 0; i < " + tokenCount + "; i++) {"); genCodeLine(" if (la1tokens[i]) {"); genCodeLine(" jj_expentry = new int[1];"); genCodeLine(" jj_expentry[0] = i;"); genCodeLine(" jj_expentries.add(jj_expentry);"); genCodeLine(" }"); genCodeLine(" }"); if (jj2index != 0) { genCodeLine(" jj_endpos = 0;"); genCodeLine(" jj_rescan_token();"); genCodeLine(" jj_add_error_token(0, 0);"); } genCodeLine(" int[][] exptokseq = new int[jj_expentries.size()][];"); genCodeLine(" for (int i = 0; i < jj_expentries.size(); i++) {"); if (!Options.getGenerateGenerics()) { genCodeLine(" exptokseq[i] = (int[])jj_expentries.get(i);"); } else { genCodeLine(" exptokseq[i] = jj_expentries.get(i);"); } genCodeLine(" }"); if (isJavaModernMode) { // Add the lexical state onto the exception message genCodeLine(" return new ParseException(token, exptokseq, tokenImage, token_source == null ? null : " +cu_name+ "TokenManager.lexStateNames[token_source.curLexState]);"); } else { genCodeLine(" return new ParseException(token, exptokseq, tokenImage);"); } genCodeLine(" }"); } else { genCodeLine(" /** Generate ParseException. */"); genCodeLine(" " + staticOpt() + "public ParseException generateParseException() {"); genCodeLine(" Token errortok = token.next;"); if (Options.getKeepLineColumn()) { genCodeLine(" int line = errortok.beginLine, column = errortok.beginColumn;"); } genCodeLine(" String mess = (errortok.kind == 0) ? tokenImage[0] : errortok.image;"); if (Options.getKeepLineColumn()) { genCodeLine(" return new ParseException(" + "\"Parse error at line \" + line + \", column \" + column + \". " + "Encountered: \" + mess);"); } else { genCodeLine(" return new ParseException(\"Parse error at <unknown location>. " + "Encountered: \" + mess);"); } genCodeLine(" }"); } genCodeLine(""); genCodeLine(" " + staticOpt() + "private " + Options.getBooleanType() + " trace_enabled;"); genCodeLine(""); genCodeLine("/** Trace enabled. */"); genCodeLine(" " + staticOpt() + "final public boolean trace_enabled() {"); genCodeLine(" return trace_enabled;"); genCodeLine(" }"); genCodeLine(""); if (Options.getDebugParser()) { genCodeLine(" " + staticOpt() + "private int trace_indent = 0;"); genCodeLine("/** Enable tracing. */"); genCodeLine(" " + staticOpt() + "final public void enable_tracing() {"); genCodeLine(" trace_enabled = true;"); genCodeLine(" }"); genCodeLine(""); genCodeLine("/** Disable tracing. */"); genCodeLine(" " + staticOpt() + "final public void disable_tracing() {"); genCodeLine(" trace_enabled = false;"); genCodeLine(" }"); genCodeLine(""); genCodeLine(" " + staticOpt() + "protected void trace_call(String s) {"); genCodeLine(" if (trace_enabled) {"); genCodeLine(" for (int i = 0; i < trace_indent; i++) { System.out.print(\" \"); }"); genCodeLine(" System.out.println(\"Call: \" + s);"); genCodeLine(" }"); genCodeLine(" trace_indent = trace_indent + 2;"); genCodeLine(" }"); genCodeLine(""); genCodeLine(" " + staticOpt() + "protected void trace_return(String s) {"); genCodeLine(" trace_indent = trace_indent - 2;"); genCodeLine(" if (trace_enabled) {"); genCodeLine(" for (int i = 0; i < trace_indent; i++) { System.out.print(\" \"); }"); genCodeLine(" System.out.println(\"Return: \" + s);"); genCodeLine(" }"); genCodeLine(" }"); genCodeLine(""); genCodeLine(" " + staticOpt() + "protected void trace_token(Token t, String where) {"); genCodeLine(" if (trace_enabled) {"); genCodeLine(" for (int i = 0; i < trace_indent; i++) { System.out.print(\" \"); }"); genCodeLine(" System.out.print(\"Consumed token: <\" + tokenImage[t.kind]);"); genCodeLine(" if (t.kind != 0 && !tokenImage[t.kind].equals(\"\\\"\" + t.image + \"\\\"\")) {"); genCodeLine(" System.out.print(\": \\\"\" + "+Options.getTokenMgrErrorClass() + ".addEscapes("+"t.image) + \"\\\"\");"); genCodeLine(" }"); genCodeLine(" System.out.println(\" at line \" + t.beginLine + " + "\" column \" + t.beginColumn + \">\" + where);"); genCodeLine(" }"); genCodeLine(" }"); genCodeLine(""); genCodeLine(" " + staticOpt() + "protected void trace_scan(Token t1, int t2) {"); genCodeLine(" if (trace_enabled) {"); genCodeLine(" for (int i = 0; i < trace_indent; i++) { System.out.print(\" \"); }"); genCodeLine(" System.out.print(\"Visited token: <\" + tokenImage[t1.kind]);"); genCodeLine(" if (t1.kind != 0 && !tokenImage[t1.kind].equals(\"\\\"\" + t1.image + \"\\\"\")) {"); genCodeLine(" System.out.print(\": \\\"\" + "+Options.getTokenMgrErrorClass() + ".addEscapes("+"t1.image) + \"\\\"\");"); genCodeLine(" }"); genCodeLine(" System.out.println(\" at line \" + t1.beginLine + \"" + " column \" + t1.beginColumn + \">; Expected token: <\" + tokenImage[t2] + \">\");"); genCodeLine(" }"); genCodeLine(" }"); genCodeLine(""); } else { genCodeLine(" /** Enable tracing. */"); genCodeLine(" " + staticOpt() + "final public void enable_tracing() {"); genCodeLine(" }"); genCodeLine(""); genCodeLine(" /** Disable tracing. */"); genCodeLine(" " + staticOpt() + "final public void disable_tracing() {"); genCodeLine(" }"); genCodeLine(""); } if (jj2index != 0 && Options.getErrorReporting()) { genCodeLine(" " + staticOpt() + "private void jj_rescan_token() {"); genCodeLine(" jj_rescan = true;"); genCodeLine(" for (int i = 0; i < " + jj2index + "; i++) {"); genCodeLine(" try {"); genCodeLine(" JJCalls p = jj_2_rtns[i];"); genCodeLine(""); genCodeLine(" do {"); genCodeLine(" if (p.gen > jj_gen) {"); genCodeLine(" jj_la = p.arg; jj_lastpos = jj_scanpos = p.first;"); genCodeLine(" switch (i) {"); for (int i = 0; i < jj2index; i++) { genCodeLine(" case " + i + ": jj_3_" + (i + 1) + "(); break;"); } genCodeLine(" }"); genCodeLine(" }"); genCodeLine(" p = p.next;"); genCodeLine(" } while (p != null);"); genCodeLine(""); genCodeLine(" } catch(LookaheadSuccess ls) { }"); genCodeLine(" }"); genCodeLine(" jj_rescan = false;"); genCodeLine(" }"); genCodeLine(""); genCodeLine(" " + staticOpt() + "private void jj_save(int index, int xla) {"); genCodeLine(" JJCalls p = jj_2_rtns[index];"); genCodeLine(" while (p.gen > jj_gen) {"); genCodeLine(" if (p.next == null) { p = p.next = new JJCalls(); break; }"); genCodeLine(" p = p.next;"); genCodeLine(" }"); genCodeLine(""); genCodeLine(" p.gen = jj_gen + xla - jj_la; "); genCodeLine(" p.first = token;"); genCodeLine(" p.arg = xla;"); genCodeLine(" }"); genCodeLine(""); } if (jj2index != 0 && Options.getErrorReporting()) { genCodeLine(" static final class JJCalls {"); genCodeLine(" int gen;"); genCodeLine(" Token first;"); genCodeLine(" int arg;"); genCodeLine(" JJCalls next;"); genCodeLine(" }"); genCodeLine(""); } if (cu_from_insertion_point_2.size() != 0) { printTokenSetup((Token) (cu_from_insertion_point_2.get(0))); ccol = 1; for (final Iterator it = cu_from_insertion_point_2.iterator(); it.hasNext();) { t = (Token) it.next(); printToken(t); } printTrailingComments(t); } genCodeLine(""); saveOutput(Options.getOutputDirectory() + File.separator + cu_name + getFileExtension(Options.getOutputLanguage())); } // matches "if (Options.getBuildParser())" }
5.4. LexGen 語法解析生成
LexGen 生成語法樹,將所有的產生式轉換成相應的語法表示。
// org.javacc.parser.LexGen#start public void start() throws IOException { if (!Options.getBuildTokenManager() || Options.getUserTokenManager() || JavaCCErrors.get_error_count() > 0) return; final String codeGeneratorClass = Options.getTokenManagerCodeGenerator(); keepLineCol = Options.getKeepLineColumn(); errorHandlingClass = Options.getTokenMgrErrorClass(); List choices = new ArrayList(); Enumeration e; TokenProduction tp; int i, j; staticString = (Options.getStatic() ? "static " : ""); tokMgrClassName = cu_name + "TokenManager"; if (!generateDataOnly && codeGeneratorClass == null) PrintClassHead(); BuildLexStatesTable(); e = allTpsForState.keys(); boolean ignoring = false; while (e.hasMoreElements()) { int startState = -1; NfaState.ReInit(); RStringLiteral.ReInit(); String key = (String)e.nextElement(); lexStateIndex = GetIndex(key); lexStateSuffix = "_" + lexStateIndex; List<TokenProduction> allTps = (List<TokenProduction>)allTpsForState.get(key); initStates.put(key, initialState = new NfaState()); ignoring = false; singlesToSkip[lexStateIndex] = new NfaState(); singlesToSkip[lexStateIndex].dummy = true; if (key.equals("DEFAULT")) defaultLexState = lexStateIndex; for (i = 0; i < allTps.size(); i++) { tp = (TokenProduction)allTps.get(i); int kind = tp.kind; boolean ignore = tp.ignoreCase; List<RegExprSpec> rexps = tp.respecs; if (i == 0) ignoring = ignore; for (j = 0; j < rexps.size(); j++) { RegExprSpec respec = (RegExprSpec)rexps.get(j); curRE = respec.rexp; rexprs[curKind = curRE.ordinal] = curRE; lexStates[curRE.ordinal] = lexStateIndex; ignoreCase[curRE.ordinal] = ignore; if (curRE.private_rexp) { kinds[curRE.ordinal] = -1; continue; } if (!Options.getNoDfa() && curRE instanceof RStringLiteral && !((RStringLiteral)curRE).image.equals("")) { ((RStringLiteral)curRE).GenerateDfa(this, curRE.ordinal); if (i != 0 && !mixed[lexStateIndex] && ignoring != ignore) { mixed[lexStateIndex] = true; } } else if (curRE.CanMatchAnyChar()) { if (canMatchAnyChar[lexStateIndex] == -1 || canMatchAnyChar[lexStateIndex] > curRE.ordinal) canMatchAnyChar[lexStateIndex] = curRE.ordinal; } else { Nfa temp; if (curRE instanceof RChoice) choices.add(curRE); temp = curRE.GenerateNfa(ignore); temp.end.isFinal = true; temp.end.kind = curRE.ordinal; initialState.AddMove(temp.start); } if (kinds.length < curRE.ordinal) { int[] tmp = new int[curRE.ordinal + 1]; System.arraycopy(kinds, 0, tmp, 0, kinds.length); kinds = tmp; } //System.out.println(" ordina : " + curRE.ordinal); kinds[curRE.ordinal] = kind; if (respec.nextState != null && !respec.nextState.equals(lexStateName[lexStateIndex])) newLexState[curRE.ordinal] = respec.nextState; if (respec.act != null && respec.act.getActionTokens() != null && respec.act.getActionTokens().size() > 0) actions[curRE.ordinal] = respec.act; switch(kind) { case TokenProduction.SPECIAL : hasSkipActions |= (actions[curRE.ordinal] != null) || (newLexState[curRE.ordinal] != null); hasSpecial = true; toSpecial[curRE.ordinal / 64] |= 1L << (curRE.ordinal % 64); toSkip[curRE.ordinal / 64] |= 1L << (curRE.ordinal % 64); break; case TokenProduction.SKIP : hasSkipActions |= (actions[curRE.ordinal] != null); hasSkip = true; toSkip[curRE.ordinal / 64] |= 1L << (curRE.ordinal % 64); break; case TokenProduction.MORE : hasMoreActions |= (actions[curRE.ordinal] != null); hasMore = true; toMore[curRE.ordinal / 64] |= 1L << (curRE.ordinal % 64); if (newLexState[curRE.ordinal] != null) canReachOnMore[GetIndex(newLexState[curRE.ordinal])] = true; else canReachOnMore[lexStateIndex] = true; break; case TokenProduction.TOKEN : hasTokenActions |= (actions[curRE.ordinal] != null); toToken[curRE.ordinal / 64] |= 1L << (curRE.ordinal % 64); break; } } } // Generate a static block for initializing the nfa transitions NfaState.ComputeClosures(); for (i = 0; i < initialState.epsilonMoves.size(); i++) ((NfaState)initialState.epsilonMoves.elementAt(i)).GenerateCode(); if (hasNfa[lexStateIndex] = (NfaState.generatedStates != 0)) { initialState.GenerateCode(); startState = initialState.GenerateInitMoves(this); } if (initialState.kind != Integer.MAX_VALUE && initialState.kind != 0) { if ((toSkip[initialState.kind / 64] & (1L << initialState.kind)) != 0L || (toSpecial[initialState.kind / 64] & (1L << initialState.kind)) != 0L) hasSkipActions = true; else if ((toMore[initialState.kind / 64] & (1L << initialState.kind)) != 0L) hasMoreActions = true; else hasTokenActions = true; if (initMatch[lexStateIndex] == 0 || initMatch[lexStateIndex] > initialState.kind) { initMatch[lexStateIndex] = initialState.kind; hasEmptyMatch = true; } } else if (initMatch[lexStateIndex] == 0) initMatch[lexStateIndex] = Integer.MAX_VALUE; RStringLiteral.FillSubString(); if (hasNfa[lexStateIndex] && !mixed[lexStateIndex]) RStringLiteral.GenerateNfaStartStates(this, initialState); if (generateDataOnly || codeGeneratorClass != null) { RStringLiteral.UpdateStringLiteralData(totalNumStates, lexStateIndex); NfaState.UpdateNfaData(totalNumStates, startState, lexStateIndex, canMatchAnyChar[lexStateIndex]); } else { RStringLiteral.DumpDfaCode(this); if (hasNfa[lexStateIndex]) { NfaState.DumpMoveNfa(this); } } totalNumStates += NfaState.generatedStates; if (stateSetSize < NfaState.generatedStates) stateSetSize = NfaState.generatedStates; } for (i = 0; i < choices.size(); i++) ((RChoice)choices.get(i)).CheckUnmatchability(); CheckEmptyStringMatch(); if (generateDataOnly || codeGeneratorClass != null) { tokenizerData.setParserName(cu_name); NfaState.BuildTokenizerData(tokenizerData); RStringLiteral.BuildTokenizerData(tokenizerData); int[] newLexStateIndices = new int[maxOrdinal]; StringBuilder tokenMgrDecls = new StringBuilder(); if (token_mgr_decls != null && token_mgr_decls.size() > 0) { Token t = (Token)token_mgr_decls.get(0); for (j = 0; j < token_mgr_decls.size(); j++) { tokenMgrDecls.append(((Token)token_mgr_decls.get(j)).image + " "); } } tokenizerData.setDecls(tokenMgrDecls.toString()); Map<Integer, String> actionStrings = new HashMap<Integer, String>(); for (i = 0; i < maxOrdinal; i++) { if (newLexState[i] == null) { newLexStateIndices[i] = -1; } else { newLexStateIndices[i] = GetIndex(newLexState[i]); } // For java, we have this but for other languages, eventually we will // simply have a string. Action act = actions[i]; if (act == null) continue; StringBuilder sb = new StringBuilder(); for (int k = 0; k < act.getActionTokens().size(); k++) { sb.append(((Token)act.getActionTokens().get(k)).image); sb.append(" "); } actionStrings.put(i, sb.toString()); } tokenizerData.setDefaultLexState(defaultLexState); tokenizerData.setLexStateNames(lexStateName); tokenizerData.updateMatchInfo( actionStrings, newLexStateIndices, toSkip, toSpecial, toMore, toToken); if (generateDataOnly) return; Class<TokenManagerCodeGenerator> codeGenClazz; TokenManagerCodeGenerator gen; try { codeGenClazz = (Class<TokenManagerCodeGenerator>)Class.forName(codeGeneratorClass); gen = codeGenClazz.newInstance(); } catch(Exception ee) { JavaCCErrors.semantic_error( "Could not load the token manager code generator class: " + codeGeneratorClass + "\nError: " + ee.getMessage()); return; } gen.generateCode(tokenizerData); gen.finish(tokenizerData); return; } RStringLiteral.DumpStrLiteralImages(this); DumpFillToken(); NfaState.DumpStateSets(this); NfaState.DumpNonAsciiMoveMethods(this); DumpGetNextToken(); if (Options.getDebugTokenManager()) { NfaState.DumpStatesForKind(this); DumpDebugMethods(); } if (hasLoop) { genCodeLine(staticString + "int[] jjemptyLineNo = new int[" + maxLexStates + "];"); genCodeLine(staticString + "int[] jjemptyColNo = new int[" + maxLexStates + "];"); genCodeLine(staticString + "" + Options.getBooleanType() + "[] jjbeenHere = new " + Options.getBooleanType() + "[" + maxLexStates + "];"); } DumpSkipActions(); DumpMoreActions(); DumpTokenActions(); NfaState.PrintBoilerPlate(this); String charStreamName; if (Options.getUserCharStream()) charStreamName = "CharStream"; else { if (Options.getJavaUnicodeEscape()) charStreamName = "JavaCharStream"; else charStreamName = "SimpleCharStream"; } writeTemplate(BOILERPLATER_METHOD_RESOURCE_URL, "charStreamName", charStreamName, "lexStateNameLength", lexStateName.length, "defaultLexState", defaultLexState, "noDfa", Options.getNoDfa(), "generatedStates", totalNumStates); DumpStaticVarDeclarations(charStreamName); genCodeLine(/*{*/ "}"); // TODO :: CBA -- Require Unification of output language specific processing into a single Enum class String fileName = Options.getOutputDirectory() + File.separator + tokMgrClassName + getFileExtension(Options.getOutputLanguage()); if (Options.getBuildParser()) { saveOutput(fileName); } }
5.5. 生成其他輔助類
OtherFilesGen 生成其他幾個輔助類,比如Toekn, ParseException...
// org.javacc.parser.OtherFilesGen#start static public void start(boolean isJavaModern) throws MetaParseException { JavaResourceTemplateLocations templateLoc = isJavaModern ? JavaFiles.RESOURCES_JAVA_MODERN : JavaFiles.RESOURCES_JAVA_CLASSIC; Token t = null; if (JavaCCErrors.get_error_count() != 0) throw new MetaParseException(); // Added this if condition -- 2012/10/17 -- cba if ( Options.isGenerateBoilerplateCode()) { if (isJavaModern) { JavaFiles.gen_JavaModernFiles(); } JavaFiles.gen_TokenMgrError(templateLoc); JavaFiles.gen_ParseException(templateLoc); JavaFiles.gen_Token(templateLoc); } if (Options.getUserTokenManager()) { // CBA -- I think that Token managers are unique so will always be generated JavaFiles.gen_TokenManager(templateLoc); } else if (Options.getUserCharStream()) { // Added this if condition -- 2012/10/17 -- cba if (Options.isGenerateBoilerplateCode()) { JavaFiles.gen_CharStream(templateLoc); } } else { // Added this if condition -- 2012/10/17 -- cba if (Options.isGenerateBoilerplateCode()) { if (Options.getJavaUnicodeEscape()) { JavaFiles.gen_JavaCharStream(templateLoc); } else { JavaFiles.gen_SimpleCharStream(templateLoc); } } } try { ostr = new java.io.PrintWriter( new java.io.BufferedWriter( new java.io.FileWriter( new java.io.File(Options.getOutputDirectory(), cu_name + CONSTANTS_FILENAME_SUFFIX) ), 8192 ) ); } catch (java.io.IOException e) { JavaCCErrors.semantic_error("Could not open file " + cu_name + "Constants.java for writing."); throw new Error(); } List<String> tn = new ArrayList<String>(toolNames); tn.add(toolName); ostr.println("/* " + getIdString(tn, cu_name + CONSTANTS_FILENAME_SUFFIX) + " */"); if (cu_to_insertion_point_1.size() != 0 && ((Token)cu_to_insertion_point_1.get(0)).kind == PACKAGE ) { for (int i = 1; i < cu_to_insertion_point_1.size(); i++) { if (((Token)cu_to_insertion_point_1.get(i)).kind == SEMICOLON) { printTokenSetup((Token)(cu_to_insertion_point_1.get(0))); for (int j = 0; j <= i; j++) { t = (Token)(cu_to_insertion_point_1.get(j)); printToken(t, ostr); } printTrailingComments(t, ostr); ostr.println(""); ostr.println(""); break; } } } ostr.println(""); ostr.println("/**"); ostr.println(" * Token literal values and constants."); ostr.println(" * Generated by org.javacc.parser.OtherFilesGen#start()"); ostr.println(" */"); if(Options.getSupportClassVisibilityPublic()) { ostr.print("public "); } ostr.println("interface " + cu_name + "Constants {"); ostr.println(""); RegularExpression re; ostr.println(" /** End of File. */"); ostr.println(" int EOF = 0;"); for (java.util.Iterator<RegularExpression> it = ordered_named_tokens.iterator(); it.hasNext();) { re = it.next(); ostr.println(" /** RegularExpression Id. */"); ostr.println(" int " + re.label + " = " + re.ordinal + ";"); } ostr.println(""); if (!Options.getUserTokenManager() && Options.getBuildTokenManager()) { for (int i = 0; i < Main.lg.lexStateName.length; i++) { ostr.println(" /** Lexical state. */"); ostr.println(" int " + LexGen.lexStateName[i] + " = " + i + ";"); } ostr.println(""); } ostr.println(" /** Literal token values. */"); ostr.println(" String[] tokenImage = {"); ostr.println(" \"<EOF>\","); for (java.util.Iterator<TokenProduction> it = rexprlist.iterator(); it.hasNext();) { TokenProduction tp = (TokenProduction)(it.next()); List<RegExprSpec> respecs = tp.respecs; for (java.util.Iterator<RegExprSpec> it2 = respecs.iterator(); it2.hasNext();) { RegExprSpec res = (RegExprSpec)(it2.next()); re = res.rexp; ostr.print(" "); if (re instanceof RStringLiteral) { ostr.println("\"\\\"" + add_escapes(add_escapes(((RStringLiteral)re).image)) + "\\\"\","); } else if (!re.label.equals("")) { ostr.println("\"<" + re.label + ">\","); } else { if (re.tpContext.kind == TokenProduction.TOKEN) { JavaCCErrors.warning(re, "Consider giving this non-string token a label for better error reporting."); } ostr.println("\"<token of kind " + re.ordinal + ">\","); } } } ostr.println(" };"); ostr.println(""); ostr.println("}"); ostr.close(); }
以上解析,感悟,待完善中。