首先確保
data.txt的換行字元(Newline Character)是CR+TF(Windows)
Encoding是UTF-8 without Signature
如果不按這個標準來,很容易出現詞條數目對不上。
下面正式進行troubleshooting:
1.Invalid keyword at position
這是最常見的問題之一,可以使用谷歌搜尋:
site:pdawiki.com Invalid keyword at position:
Begining loading source file...
Content is longer then 8388608 at position: 0 of the source file
Failed to load source file, process cancelled
Begining loading source file...
Invalid keyword at position: 155387606 of the source file
Failed to load source file, process cancelled
Begining loading source file...
Invalid keyword at position: 155387606 of the source file
Failed to load source file, process cancelled
可能是某個詞頭缺失,導致</>的上一行,是空行。
可以在sublime text中開啟檔案,使用正則搜尋搜尋:
^\s\n
把詞頭新增上去即可。
如果想要精確定位到position: 155387606
可以在notepad++中開啟檔案,然後在Search > Go To(或者直接快捷方式:Ctrl + G),在彈出的皮膚中,選擇Offset,把155387606輸入,按OK,即可定位。
參照:https://www.pdawiki.com/forum/forum.php?mod=redirect&goto=findpost&ptid=29944&pid=829673
Begining loading source file...
Done
Time used for this section: 2 seconds
Sorting dictionary...
Done!
Begin processing index...
Done!
Original index size = 1607KB, compressed size = 630KB, compression ratio = 39%
Time used for this section: 0 seconds
Begin processing data contents...
Done!
Original text size = 385953KB, compressed size = 46816KB, compression ratio = 12%
Time used for this section: 10 seconds
Number of entries: 91531
Conversion succeed!