最近開始研究自然語言處理了,所以準備好好學習一下,就跟著《Python自然語言處理》這本書,邊學邊整理吧
安裝
Mac裡面自帶了python2.7,所以直接安裝nltk就可以了。
預設執行sudo pip install -U nltk
會報錯:
Collecting nltk
Downloading nltk-3.2.4.tar.gz (1.2MB)
100% |████████████████████████████████| 1.2MB 555kB/s
Collecting six (from nltk)
Downloading six-1.11.0-py2.py3-none-any.whl
Installing collected packages: six, nltk
Found existing installation: six 1.4.1
DEPRECATION: Uninstalling a distutils installed project (six) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
Uninstalling six-1.4.1:
這是因為系統內部已經有six包了,不能被修改。所以可以跳過six,直接安裝nltk
sudo pip install -U nltk --ignore-installed six
這樣可以看到輸出:
Collecting nltk
Downloading nltk-3.2.4.tar.gz (1.2MB)
100% |████████████████████████████████| 1.2MB 552kB/s
Collecting six
Downloading six-1.11.0-py2.py3-none-any.whl
Installing collected packages: six, nltk
Running setup.py install for nltk ... done
測試一下:
xingoodeMacBook-Pro:~ xingoo$ python
Python 2.7.10 (default, Feb 7 2017, 00:08:15)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
沒有錯誤,說明安裝成功了。
下載資料集
然後就可以下載資料集了,執行命令nltk.download()
彈出下載對話方塊。點選下載就可以用nltk為我們提供的語料庫了。
參考
《python自然語言處理》