Elasticsearch Analysis 分析器
Elasticsearch Analysis 分析器
- Analysis —文字分析是把全文字轉換一系列單詞(term/token)的過程,也叫分詞
- Analysis是透過Analyzer來實現的
- 可使用 Elasticsearch 內建的分析器/或者按需定製化分析器
- 除了在資料寫入時轉換詞條,匹配Query語句時候也需要用相同的分析器對查詢語句進行分析
Analyzer 分析器組成
分詞器是專門處理分詞的元件,由三部分組成
- Character Filters(針對原始文字處理,例如去除HTML)
- Tokenizer 安裝規則分詞
- Token Filter 將切分的單詞進行加工、小寫,刪除stopwords,增加同義詞
使用 Analyzer 分析器進行分詞
analyzer 分析器:
- Simple Analyzer – 按照非字母切分(符號被過濾),小寫處理
- Stop Analyzer – 小寫處理,停用詞過濾(the,a,is)
- Whitespace Analyzer – 按照空格切分,不轉小寫
- Keyword Analyzer – 不分詞,直接將輸入當作輸出
- Patter Analyzer – 正規表示式,預設 W+ (非字元分隔)
- Language – 提供了30多種常見語言的分詞器
檢視不同 analyzer 分析器的效果
standard 標準分析器(預設)
GET _analyze
{
"analyzer": "standard",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
=================== 結果 V ===================
{
"tokens" : [
{
"token" : "2",
"start_offset" : 0,
"end_offset" : 1,
"type" : "" ,
"position" : 0
},
{
"token" : "running",
"start_offset" : 2,
"end_offset" : 9,
"type" : "" ,
"position" : 1
},
......
{
"token" : "evening",
"start_offset" : 62,
"end_offset" : 69,
"type" : "" ,
"position" : 12
}
]
}
GET _analyze
{
"analyzer": "stop",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
=================== 結果 V ===================
{
"tokens" : [
{
"token" : "running",
"start_offset" : 2,
"end_offset" : 9,
"type" : "word",
"position" : 0
},
{
"token" : "quick",
"start_offset" : 10,
"end_offset" : 15,
"type" : "word",
"position" : 1
},
......
{
"token" : "evening",
"start_offset" : 62,
"end_offset" : 69,
"type" : "word",
"position" : 11
}
]
}
更多分詞器例子
#simpe
GET _analyze
{
"analyzer": "simple",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
GET _analyze
{
"analyzer": "stop",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
#stop
GET _analyze
{
"analyzer": "whitespace",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
#keyword
GET _analyze
{
"analyzer": "keyword",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
GET _analyze
{
"analyzer": "pattern",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
#english
GET _analyze
{
"analyzer": "english",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
POST _analyze
{
"analyzer": "icu_analyzer",
"text": "他說的確實在理”"
}
POST _analyze
{
"analyzer": "standard",
"text": "他說的確實在理”"
}
POST _analyze
{
"analyzer": "icu_analyzer",
"text": "這個蘋果不大好吃"
}
需要注意的是,
icu_analyzer
分析器; 包括ik
分析器; 並非 Elasticsearch 7.8.0 自帶分析器.
需要執行命令:./bin/elasticsearch-plugin install analysis-icu
自行安裝並重啟 elasticsearch 才能使用
更多中文分詞
ik
支援自定義詞庫,支援熱更新分詞
gitee.com/mirrors/elasticsearch-analysis-ik?_from=gitee_search
THULAC
清華大學自然語言處理和社會人文計算實驗室的一套中文分詞器
gitee.com/puremilk/THULAC-Python?_from=gitee_search
相關閱讀
- www.elastic.co/guide/en/elasticsearch/reference/current/analyzer-anatomy.html
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/430/viewspace-2807063/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- HanLP Analysis for ElasticsearchHanLPElasticsearch
- 09.elasticsearch-analysis-normalizer應用ElasticsearchORM
- Maven編譯elasticsearch-analysis-ik報錯Maven編譯Elasticsearch
- Flutter Analysis OptionsFlutter
- Oracle Hang AnalysisOracle
- 詞法分析器詞法分析
- A Security Analysis Of Browser Extensions
- 生存分析(survival analysis)
- Profitability Analysis – General tables
- Lex詞法分析器詞法分析
- 詞法分析器Java詞法分析Java
- Slither: A Static Analysis Framework For SmartFramework
- UEFI BIOS Rootkit AnalysisiOS
- Regression Analysis Using ExcelExcel
- An Analysis of Sequential Recommendation Datasets
- Amandroid - Argus static analysis frameworkAndroidFramework
- 詞法分析器的實現詞法分析
- 逆火網站日誌分析器網站
- MySQL SQL Profiler效能分析器(轉)MySql
- what-i-learned-from-analysis-vuepressVue
- Oracle Respones-Time Analysis ReportsOracle
- R語言-Survival analysis(生存分析)R語言
- PHP Lex Engine Sourcecode Analysis(undone)PHP
- Looking for an “official” app server market analysisAPPServer
- 結果分析碼( results analysis key )
- CoreOS釋出Clair,容器映象分析器AI
- SQL 效能分析器(SPA)工具概覽SQL
- 【MySQL】如何使用SQL Profiler 效能分析器MySql
- Pycharm——安裝mypy(靜態分析器)PyCharm
- 【Elasticsearch】Elasticsearch 索引模板Elasticsearch索引
- 統計學 迴歸分析( Regression Analysis)
- SQLChop、SQLWall(Druid)、PHP Syntax Parser AnalysisSQLUIPHP
- 使用elasticsearch,Elasticsearch Scripts disabledElasticsearch
- memray: Python的記憶體分析器Python記憶體
- 開源Html分析器解析庫對比HTML
- Wireshark分析器分析資料流過程
- 查詢分析器中用到的快捷鍵
- 【編譯原理】手工打造詞法分析器編譯原理詞法分析