ElasticSearch(四) ElasticSearch中文分詞外掛IK的簡單測試

OldBoy~發表於2018-01-11

先來一個簡單的測試

# curl -XPOST  "http://192.168.9.155:9200/_analyze?analyzer=standard&pretty" -d 'PHP是世界上最好的語言'   //_analyze表示分析分詞；analyzer=standard，表示分詞方式standard; -d表示測試的一段文字

測試結果

{
  "tokens" : [
    {
      "token" : "php",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "世",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "界",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "上",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "最",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    },
    {
      "token" : "好",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    },
    {
      "token" : "的",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "<IDEOGRAPHIC>",
      "position" : 7
    },
    {
      "token" : "語",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "<IDEOGRAPHIC>",
      "position" : 8
    },
    {
      "token" : "言",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "<IDEOGRAPHIC>",
      "position" : 9
    }
  ]
}

接下來使用我們的IK

ik 帶有兩個分詞器
ik_max_word ：會將文字做最細粒度的拆分；儘可能多的拆分出詞語，拼接各種可能的組合。
ik_smart：會做最粗粒度的拆分；已被分出的詞語將不會再次被其它詞語佔有。

curl -XPOST  "http://192.168.9.155:9200/_analyze?analyzer=ik_smart&pretty" -d 'PHP是世界上最好的語言'  //ik_smart方式

{
  "tokens" : [
    {
      "token" : "php",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "ENGLISH",
      "position" : 0
    },
    {
      "token" : "世界上",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "最好",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "語言",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

curl -XPOST  "http://192.168.9.155:9200/_analyze?analyzer=ik_max_word&pretty" -d 'PHP是世界上最好的語言'    //ik_max_word方式

{
  "tokens" : [
    {
      "token" : "php",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "ENGLISH",
      "position" : 0
    },
    {
      "token" : "世界上",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "世界",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "上",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "最好",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "語言",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 5
    }
  ]
}

區別很明顯~

#Elasticsearch中文分詞器 #IK分詞器 @FDDLC
2020-11-07
Elasticsearch中文分詞
Elasticsearch IK分詞器
2021-08-18
Elasticsearch分詞
IK 分詞器外掛
2020-11-13
分詞
elasticsearch安裝和使用ik分詞器
2022-08-01
Elasticsearch分詞
ElasticSearch-IK分詞器和整合使用
2021-01-26
Elasticsearch分詞
小白折騰伺服器（十）：docker 下安裝 Elasticsearch+ik 分詞外掛
2019-05-18
伺服器DockerElasticsearch分詞
ElasticSearch7.3學習(十五)----中文分詞器(IK Analyzer)及自定義詞庫
2022-03-28
Elasticsearch中文分詞
ElasticSearch中使用ik分詞器進行實現分詞操作
2024-03-21
Elasticsearch分詞
Helm3安裝帶有ik分詞的ElasticSearch
2022-07-13
分詞Elasticsearch
elasticsearch之ik分詞器和自定義詞庫實現
2024-06-13
Elasticsearch分詞
Elasticsearch學習系列一（部署和配置IK分詞器）
2022-06-18
Elasticsearch分詞
自己動手製作elasticsearch的ik分詞器的Docker映象
2022-08-06
Elasticsearch分詞Docker
Elasticsearch使用系列-ES增刪查改基本操作+ik分詞
2022-01-25
Elasticsearch分詞
elasticsearch教程--中文分詞器作用和使用
2019-06-12
Elasticsearch中文分詞
ElasticSearch7.6.2在windows上如何配置ik分詞器與用法
2020-12-22
ElasticsearchWindows分詞
Elasticsearch 分詞器
2021-02-08
Elasticsearch分詞
HanLP中文分詞Lucene外掛
2019-04-15
HanLP中文分詞
Elasticsearch外掛安裝
2018-04-14
Elasticsearch
基於 HanLP 的 ES 中文分詞外掛
2018-12-23
HanLP中文分詞
es筆記四之中文分詞外掛安裝與使用
2023-04-14
筆記中文分詞
ElasticSearch之ICU分詞器
2020-04-07
Elasticsearch分詞
Elasticsearch整合HanLP分詞器
2018-10-08
ElasticsearchHanLP分詞
Elasticsearch精進之路：elasticsearch-head外掛使用教程
2021-03-04
Elasticsearch
ElasticSearch IK熱詞自動熱更新原理與Golang實現
2021-10-15
ElasticsearchGolang
IK 分詞器
2022-01-09
分詞
Elasticsearch Head外掛使用小結
2022-12-13
Elasticsearch
安裝elasticsearch-head外掛
2018-09-16
Elasticsearch
Elasticsearch-sql 外掛安裝
2018-05-23
ElasticsearchSQL
Elasticsearch（ES）分詞器的那些事兒
2021-09-19
Elasticsearch分詞
Simple: SQLite3 中文結巴分詞外掛
2021-02-21
SQLite分詞
Elasticsearch-head外掛使用小結
2019-06-29
Elasticsearch
ElasticSearch6.2.3安裝Head外掛
2018-03-26
Elasticsearch
MybatisPlus的分頁外掛簡單使用
2024-08-08
MyBatis
pytest-req外掛：更簡單的做介面測試
2024-07-26
ElasticSearch 實現分詞全文檢索 - 概述
2023-03-03
Elasticsearch分詞
Elasticsearch 6.x 倒排索引與分詞
2018-08-19
Elasticsearch索引分詞
Elasticsearch就這麼簡單
2018-03-23
Elasticsearch
十四、.net core（.NET 6）搭建ElasticSearch(ES)系列之給ElasticSearch新增SQL外掛和瀏覽器外掛
2021-06-08
ElasticsearchSQL瀏覽器
Springboot整合ElasticSearch進行簡單的測試及用Kibana進行檢視
2022-02-15
Spring BootElasticsearch

ElasticSearch(四) ElasticSearch中文分詞外掛IK的簡單測試

相關文章