Hunspell介紹及試用

振宇要低調發表於2018-07-10

1、簡介

  Hunspell是一個為擁有多型和複雜組合詞的語言所設計的拼寫檢查器,原本為匈牙利語設計。

  Hunspell是一個自由軟體,在GPL、LGPL和MPL三許可證下發行。

  Hunspell對主要平臺和程式語言都有介面和封裝。Hunspell基於MySpell,並且與MySpell詞典後端相容。MySpell使用單位元組字元編碼,而Hunspell則可以使用Unicode UTF-8編碼的詞典。

2、以下應用程式使用Hunspell作為拼寫檢查器:

  Mac OS X10.6 以及之後版本

  Eclipse,使用Hunspell4Eclipse

  Google Chrome,Google開發的一個網頁瀏覽器

  Evernote,筆記軟體

  LibreOffice和OpenOffice.org,開源辦公元件

  Mozilla Firefox和Thunderbird以及SeaMonkey

  Opera,一個跨平臺的網頁瀏覽器

  Scribus,桌面出版應用

  Vim,一個文字編輯器

  WPS Office,國產辦公元件

3、使用docker映象測試Hunspell的功能:

  3.1檢視可用字典

[root@host-10-0-251-159 hunspell]# docker run --rm tmaier/hunspell -D
SEARCH PATH:
.::/usr/share/hunspell:/usr/share/myspell:/usr/share/myspell/dicts:/Library/Spelling:/root/.openoffice.org/3/user/wordbook:/root/.openoffice.org2/user/wordbook:/root/.openoffice.org2.0/user/w/lib/openoffice.org/basis3.0/share/dict/ooo:/opt/openoffice.org2.4/share/dict/ooo:/usr/lib/openoffice.org2.4/share/dict/ooo:/opt/openoffice.org2.3/share/dict/ooo:/usr/lib/openoffice.org2.3/shhare/dict/ooo:/opt/openoffice.org2.1/share/dict/ooo:/usr/lib/openoffice.org2.1/share/dict/ooo:/opt/openoffice.org2.0/share/dict/ooo:/usr/lib/openoffice.org2.0/share/dict/ooo
AVAILABLE DICTIONARIES (path is not mandatory for -d option):
/usr/share/hunspell/en_CA
/usr/share/hunspell/de_DE_comb
/usr/share/hunspell/en_ZA
/usr/share/hunspell/en_US
/usr/share/hunspell/en_GB
/usr/share/hunspell/en_AU
/usr/share/hunspell/de_CH
/usr/share/hunspell/de_DE_neu
/usr/share/hunspell/en_NZ
/usr/share/hunspell/de_AT
/usr/share/hunspell/default
LOADED DICTIONARY:
/usr/share/hunspell/default.aff
/usr/share/hunspell/default.dic
Hunspell 1.6.2

  3.2檢視幫助資訊

[root@host-10-0-251-159 hunspell]# docker run --rm -v $(pwd):/workdir tmaier/hunspell -u3 -i utf-8 -d de_DE_neu,en_US,de_CH -p words  -h
Usage: hunspell [OPTION]... [FILE]...
Check spelling of each FILE. Without FILE, check standard input.
 
  -1        check only first field in lines (delimiter = tabulator)
  -a        Ispell's pipe interface
  --check-url   check URLs, e-mail addresses and directory paths
  --check-apostrophe    check Unicode typographic apostrophe
  -d d[,d2,...] use d (d2 etc.) dictionaries
  -D        show available dictionaries
  -G        print only correct words or lines
  -h, --help    display this help and exit
  -H        HTML input file format
  -i enc    input encoding
  -l        print misspelled words(只列印錯誤的單詞)
  -L        print lines with misspelled words(列印錯誤單詞所在行)
  -m        analyze the words of the input text
  -n        nroff/troff input file format
  -O        OpenDocument (ODF or Flat ODF) input file format
  -p dict   set dict custom dictionary
  -r        warn of the potential mistakes (rare words)
  -P password   set password for encrypted dictionaries
  -s        stem the words of the input text
  -S        suffix words of the input text
  -t        TeX/LaTeX input file format
  -v, --version print version number
  -vv       print Ispell compatible version number
  -w        print misspelled words (= lines) from one word/line input.
  -X        XML input file format
 
Example: hunspell -d en_US file.txt    # interactive spelling
         hunspell -i utf-8 file.txt    # check UTF-8 encoded file
         hunspell -l *.odt             # print misspelled words of ODF files
 
         # Quick fix of ODF documents by personal dictionary creation
 
         # 1 Make a reduced list from misspelled and unknown words:
 
         hunspell -l *.odt | sort | uniq >words
 
         # 2 Delete misspelled words of the file by a text editor.
         # 3 Use this personal dictionary to fix the deleted words:
 
         hunspell -p words *.odt
 
Bug reports: http://hunspell.github.io/

  3.3檢查某個文件的拼寫(顯示錯誤詞所在行數及建議更改)原文:test1.TXT(連結:https://pan.baidu.com/s/17JRmtnebLblVsMG05CIm-w 密碼:l3q9)

[root@host-10-0-251-159 hunspell]# docker run --rm -v $(pwd):/workdir tmaier/hunspell -u3 -i utf-8 -d de_DE_neu,en_US,de_CH -p words  test1.TXT
test1.TXT:7: Locate: rans | Try: rand
test1.TXT:15: Locate: wew | Try: woo
test1.TXT:23: Locate: Sevenn | Try: Severn
test1.TXT:27: Locate: cannt | Try: canny
test1.TXT:203: Locate: Hmm | Try: Mm
test1.TXT:211: Locate: Lele | Try: Lee
test1.TXT:215: Locate: Lele | Try: Lee
test1.TXT:243: Locate: Lele | Try: Lee
test1.TXT:247: Locate: Lele | Try: Lee
test1.TXT:284: Locate: Hmm | Try: Mm
test1.TXT:292: Locate: Hmm | Try: Mm
test1.TXT:468: Locate: ve | Try: be
test1.TXT:500: Locate: ve | Try: be
test1.TXT:516: Locate: ve | Try: be
test1.TXT:564: Locate: Hmm | Try: Mm
test1.TXT:644: Locate: ve | Try: be
test1.TXT:776: Locate: hasn | Try: has
test1.TXT:921: Locate: isn | Try: sin
test1.TXT:945: Locate: ve | Try: be
test1.TXT:953: Locate: ve | Try: be
test1.TXT:989: Locate: Hmm | Try: Mm
test1.TXT:1005: Locate: Hmm | Try: Mm
test1.TXT:1085: Locate: wasn | Try: wans
test1.TXT:1129: Locate: isn | Try: sin
test1.TXT:1145: Locate: isn | Try: sin
test1.TXT:1173: Locate: vomeronasal | Try: astronomer
test1.TXT:1213: Locate: didn | Try: did
test1.TXT:1289: Locate: ve | Try: be
test1.TXT:1329: Locate: weren | Try: were
test1.TXT:1349: Locate: wasn | Try: wans
test1.TXT:1425: Locate: wouldn | Try: would
test1.TXT:1425: Locate: weren | Try: were
test1.TXT:1470: Locate: ve | Try: be
test1.TXT:1495: Locate: ve | Try: be
test1.TXT:1803: Locate: cefepime | Try: timepiece
test1.TXT:1807: Locate: amikacin | Try: Kamikaze
test1.TXT:1819: Locate: Mmm | Try: Mm
test1.TXT:1839: Locate: kuai | Try: Kauai
test1.TXT:1895: Locate: ve | Try: be
test1.TXT:1903: Locate: isn | Try: sin
test1.TXT:2012: Locate: ve | Try: be
test1.TXT:2096: Locate: aren | Try: earn
test1.TXT:2116: Locate: shouldn | Try: should
test1.TXT:2168: Locate: whould | Try: would
test1.TXT:2232: Locate: Hmm | Try: Mm
test1.TXT:2800: Locate: Hmm | Try: Mm
test1.TXT:2820: Locate: Hmm | Try: Mm
test1.TXT:2930: Locate: ve | Try: be
test1.TXT:2993: Locate: Hmm | Try: Mm
test1.TXT:2997: Locate: Hmm | Try: Mm
test1.TXT:3076: Locate: Uhh | Try: Shh
test1.TXT:3331: Locate: Chh | Try: Ch
test1.TXT:3376: Locate: Hmm | Try: Mm
test1.TXT:3412: Locate: isn | Try: sin
test1.TXT:3436: Locate: ve | Try: be
test1.TXT:3448: Locate: exfoliator | Try: defoliator
test1.TXT:3518: Locate: didn | Try: did
test1.TXT:3531: Locate: didn | Try: did
test1.TXT:3652: Locate: Hmm | Try: Mm
test1.TXT:3696: Locate: ve | Try: be

 

相關文章