set(stopwords.words(‘english‘))
轉載於:https://blog.csdn.net/miaoxiaowuseng/article/details/107343427
停用詞是什麼
將資料轉換為計算機可以理解的內容的過程稱為預處理。預處理的主要形式之一是過濾掉無用的資料。在自然語言處理中,無用的單詞(資料)稱為停用詞。
停用詞是指搜尋引擎已程式設計忽略的常用詞(例如“the”,“a”,“an”,“in”)。
我們不希望這些單詞佔用我們資料庫中的空間,或佔用寶貴的處理時間。為此,我們可以通過儲存要停止使用的單詞的列表輕鬆刪除它們。python中的NLTK(自然語言工具包)具有以16種不同語言儲存的停用詞列表。可以在nltk_data目錄中找到它們。home / pratima / nltk_data / corpora / stopwords是目錄地址(不要忘記更改你的主目錄名稱)
從一段文字中刪除停用詞
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
example_sent = "This is a sample sentence, showing off the stop words filtration."
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(example_sent)
filtered_sentence = [w for w in word_tokens if not w in stop_words]
print(word_tokens)
print(filtered_sentence)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
輸出為
['This', 'is', 'a', 'sample', 'sentence', ',', 'showing',
'off', 'the', 'stop', 'words', 'filtration', '.']
['This', 'sample', 'sentence', ',', 'showing', 'stop',
'words', 'filtration', '.']
- 1
- 2
- 3
- 4
相關文章
- learn english, a good website to learn englishGoWeb
- Interview EnglishView
- English 4
- My new English
- English Tips
- IBM EnglishIBM
- This is English 3 Unit one
- How to express money in EnglishExpress
- The Future of the English [Supplementary Exercises]
- English Metric Units 介紹
- begin use english in my daily lifeAI
- NEW CONCEPT ENGLISH 51 - 60
- The English names of various berries All In One
- Leetcode 273 Integer to English WordsLeetCode
- LeetCode-Integer to English WordsLeetCode
- Learn English 10 times faster with these tipsAST
- 【Using English】28 - Security with HTTPS and SSLHTTP
- 練習英文寫作 Learn to write the english word
- where can i download JiVE(with source, english Ver)?
- 比德《英吉利教會史》(Ecclesiastical History of the English People)AST
- Set
- set /?
- [English Homwork] Make 10 sentences by using new lesson words
- ARABIC-ENGLISH DICTIONARY: THE HANS WEHR DICTIONARY OF MODERN WRITTEN ARABIC
- lombok get/set 與 JavaBean get/setLombokJavaBean
- Bilinguals-English-對於“multiple objects”的“優(advantages)缺(disadvantages)點”的對比的English template(英文模板)Object
- customized English word breaker for sql server 2008ZedSQLServer
- SET NEWNAME FOR
- Jet Set
- set -e
- Vue.set與vue.$set的使用Vue
- set pause on,set pagesize N小知識點。
- [Javascript] Perform Set Operations using JavaScript Set MethodsJavaScriptORM
- Collabration Web Application Screenshot(English Language) Free download now!WebAPP
- SQL Server 2005 Developer Edition English Version Setup on Windows 7SQLServerDeveloperWindows
- alter system set event和set events的區別
- Redis 入門 - 3(集合 set、有序集合 sort set)Redis
- JavaScript Set物件JavaScript物件