Elasticsearch 的配置與使用，為了全文搜尋

Jourdon發表於2018-04-19

原文網址 : https://learnku.com/articles/10126/the-configuration-and-use-of-elasticsearch-for-full-text-search?order_by=created_at&

Elasticsearch

file

最近公司專案要使用全文搜尋引擎，之前使用過的 sphInx ,似乎沒有那麼好用了，而且中文分詞也沒有合適的，所以準備換個其它的來試試，老專案使用的是 thinkphp 3.1 框架，雖然框架老了點。但是新的想法還是可以用上的，這裡只是簡單演示下 elasticsearch 的上手體難，實際專案中還需要完善。

Elasticsearch 安裝

因本文環境為 Laradock ,所以直接使用 elasticsearch的映象即可，這裡省略了 java 環境的安裝及 elasticsearch 軟體的安裝，網上教程很多，請自行查詢，後期會補上一個,這裡預設已經安裝好了。
瀏覽器開啟 http://localhost:9200/ 或者終端執行

curl 'http://localhost:9200/?pretty'

你會看到如下響應

{
        name: "g2ODObY",
        cluster_name: "laradock-cluster",
        cluster_uuid: "w8Hhov2bQDi_Wo2DEx044Q",
        version: {
        number: "6.2.3",
        build_hash: "c59ff00",
        build_date: "2018-03-13T10:06:29.741383Z",
        build_snapshot: false,
        lucene_version: "7.2.1",
        minimum_wire_compatibility_version: "5.6.0",
        minimum_index_compatibility_version: "5.0.0"
        },
        tagline: "You Know, for Search"
}

如果響應正常顯示，說明你安裝成功了。
這裡需注意一點，檢視elasticsearch配置：vi config/elasticsearch.yml

network.host: 0.0.0.0  //這裡預設不繫結，但安全起見，在實際專案中，請繫結本機 IP。

Elasticsearch 中文外掛

這裡使用的是 analysis-ik 中文外掛，專案地址，需根據不同的 Elasticsearch 版本選擇插本版本，本專案使用的最新的 6.2.3 版本。
進入 elasticsearch 目錄

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.2.3/elasticsearch-analysis-ik-6.2.3.zip

檢視是否安裝成功(注意你的 elasticsearch 版本，版本不同命令不同)

./bin/elasticsearch-plugin list

返回結果

analysis-ik
ingest-geoip
ingest-user-agent
...

看到 analysis-ik 證明你外掛安裝成功了，你也可以到 plugins 目錄下檢視外掛是否存在

$ cd plugins 

drwxr-xr-x  2 root          root 4096 Apr 17 03:08 analysis-ik
drwxrwxr-x  2 elasticsearch root 4096 Mar 13 11:35 ingest-geoip
drwxrwxr-x  2 elasticsearch root 4096 Mar 13 11:35 ingest-user-agent
drwxrwxr-x 11 elasticsearch root 4096 Mar 13 11:35 x-pack

Elasticsearch 索引的使用

Thinkphp 沒有 laravel 那麼方便，但是用個 trait 還是可以的。
新建一個traits

<?php
use Elasticsearch\ClientBuilder;
trait Elastic
{
    private $client;
    public function __construct()
    {
        $hosts=[
            env('ELASTICSEARCH_URL','localhost:9200')
        ];
        $this->client = ClientBuilder::create()
            ->setHosts($hosts)  //因 docker的特殊性，這裡需指定 IP 地址,env 檔案中配置
            ->build();
    }
}

laravel 的 env 實現其它是用的 vlucas/phpdotenv，所以我在 Thinkphp中也把他拿了過來。
這裡首先例項化一個 client。

建立索引

        $params = [
            'index' => $index,//索引名（相當於mysql的資料庫）
            'body' => [
                'settings' => [
                    'number_of_shards' => 1, //一個索引中含有的主分片的數量
                    'number_of_replicas' => 0 //每一個主分片關聯的副本分片的數量
                ],
                'mappings' => [
                    $type => [  //型別名（相當於mysql的表）
                        '_all'=>[   //  是否開啟所有欄位的檢索
                            'enabled' => 'false'
                        ],
                        '_source' => [ //  儲存原始文件
                            'enabled' => true
                        ],
                        'properties' => [   //文件型別設定（相當於mysql的資料型別）
                            'id' => [
                                'type' => 'integer', // //型別 string、integer、float、double、boolean、date,text,keyword
                                //'index'=> 'not_analyzed',//索引是否精確值  analyzed not_analyzed

                            ],
                            'title' => [
                                'type' => 'text', // 欄位型別為全文檢索,如果需要關鍵字,則修改為keyword,注意keyword欄位為整體查詢,不能作為模糊搜尋
                                "analyzer"=> "ik_max_word",
                                "search_analyzer"=> "ik_max_word",
                            ],
                            'body'  =>  [
                                'type'  => 'text',
                                "analyzer"=> "ik_max_word",
                                "search_analyzer"=> "ik_max_word",
                            ]
                        ]
                    ]
                ]
            ]
        ];
        return $this->client->indices()->create($params);

這裡需要注意的是 analyzer, IK外掛目前只支援兩種： ik_max_word 和ik_smart，

ik_max_word: 會將文字做最細粒度的拆分，比如會將“中華人民共和國國歌”拆分為“中華人民共和國,中華人民,中華,華人,人民共和國,人民,人,民,共和國,共和,和,國國,國歌”，會窮盡各種可能的組合；
ik_smart : 會做最粗粒度的拆分，比如會將“中華人民共和國國歌”拆分為“中華人民共和國,國歌”。

返回如下資訊說明建立成功

array:3 [▼
  "acknowledged" => true
  "shards_acknowledged" => true
  "index" => "my_index"
]

這裡使用的 dd列印的結果，之後的結果承現一樣用 dd列印。

刪除索引

$params = [
            'index' => 'my_index',
        ];
 return $this->client->indices()->delete($params);//刪除索引設定

返回如下

array:1 [▼
  "acknowledged" => true
]

檢視索引設定

// 檢視一個索引的設定
$params = ['index' => 'my_index'];
$response = $client->indices()->getSettings($params);

// 檢視多少索引的設定
$params = [
    'index' => [ 'my_index', 'my_index2' ]
];
$response = $client->indices()->getSettings($params);

返回資訊如下

array:1 [▼
  "my_index" => array:1 [▼
    "settings" => array:1 [▼
      "index" => array:6 [▼
        "creation_date" => "1524037463950"
        "number_of_shards" => "1"
        "number_of_replicas" => "0"
        "uuid" => "okYiWK0WRiqebMAHUCsvzA"
        "version" => array:1 [▼
          "created" => "6020399"
        ]
        "provided_name" => "my_index"
      ]
    ]
  ]
]

檢視 `mapping` 資訊

//獲取所有索引和型別的 mapping 資訊
$response = $client->indices()->getMapping();

//獲取 my_index 索引的所有型別的 mapping
$params = ['index' => 'my_index'];
$response = $client->indices()->getMapping($params);

//獲取所有型別為 my_type 的 mapping資訊，不管索引是什麼
$params = ['type' => 'my_type' ];
$response = $client->indices()->getMapping($params);

//獲取 my_index 索引 下 my_type 型別的 mapping 資訊
$params = [
    'index' => 'my_index'
    'type' => 'my_type'
];
$response = $client->indices()->getMapping($params);

//獲取多個索引的 mapping 資訊
$params = [
    'index' => [ 'my_index', 'my_index2' ]
];
$response = $client->indices()->getMapping($params);

返回如下程式碼

array:1 [▼
  "my_index" => array:1 [▼
    "mappings" => array:1 [▼
      "my_type" => array:2 [▼
        "_all" => array:1 [▼
          "enabled" => false
        ]
        "properties" => array:3 [▼
          "body" => array:2 [▼
            "type" => "text"
            "analyzer" => "ik_max_word"
          ]
          "id" => array:1 [▼
            "type" => "integer"
          ]
          "title" => array:2 [▼
            "type" => "text"
            "analyzer" => "ik_max_word"
          ]
        ]
      ]
    ]
  ]
]

Elasticsearch 的增刪改查

增加資料

1.增加單條資料

$data=[
            'title' => '我愛北京天安門',
            'body'  =>  '天安門上太陽升'
        ];
$params = [
            'index' => 'my_index',
            'type' => 'my_type',
           // 'id' => 'my_id', // 不填則會自動生成唯一的id
            'body' => $data
        ];
        return $this->client->index($params);

返回如下

array:8 [▼
  "_index" => "my_index"
  "_type" => "my_type"
  "_id" => "WKXd12IBwuLBOSSKe5k_" //當前資料的唯一id
  "_version" => 1
  "result" => "created"
  "_shards" => array:3 [▼
    "total" => 2
    "successful" => 1
    "failed" => 0
  ]
  "_seq_no" => 0
  "_primary_term" => 1
]

2.批量增加多條資料

$dataList =[
            [
                'id'    =>  '10001',
                'title' => '北京',
                'body' => '我們是首都',

            ],[
                'id'    =>  '10002',
                'title' => '上海',
                'body' => '啊啦是上海人',
            ],[
                'id'    =>  '10003',
                'title' => '廣州',
                'body' => '我們有小蠻腰',

            ],[
                'id'    =>  '10004',
                'title' => '深圳',
                'body' => '我們啥也沒有，來了就是深圳人。',
            ],
        ];

foreach($dataList as $value){
    $params['body'][] = [
        'index' => [
            '_index' => 'my_index',
            '_type' => 'my_type',
            '_id'  =>$value['id']
        ]
    ];
    $params['body'][] = [
        'id' => $value['id'],
        'title' => $value['title'],
        'body' => $value['body'],
    ];
}
return $this->client->bulk($params);

這裡需注意，批量增加多條資料時並不是直接將陣列扔進去，而是要進行處理，生成對應的陣列後使用 bulk 方法批量建立。
返回如下

array:3 [▼
  "took" => 27
  "errors" => false
  "items" => array:4 [▼
    0 => array:1 [▼
      "index" => array:9 [▼
        "_index" => "my_index"
        "_type" => "my_type"
        "_id" => "10001"
        "_version" => 1
        "result" => "created"
        "_shards" => array:3 [▼
          "total" => 2
          "successful" => 1
          "failed" => 0
        ]
        "_seq_no" => 1
        "_primary_term" => 1
        "status" => 201
      ]
    ]
    1 => array:1 [▼
      "index" => array:9 [▼
        "_index" => "my_index"
        "_type" => "my_type"
        "_id" => "10002"
        "_version" => 1
        "result" => "created"
        "_shards" => array:3 [▼
          "total" => 2
          "successful" => 1
          "failed" => 0
        ]
        "_seq_no" => 2
        "_primary_term" => 1
        "status" => 201
      ]
    ]
        ...

這裡使用了 $dataList 自帶的 id，在實際專案中建議使用資料的 id,用做資料的唯一 id，方便通過 id 查詢資料。

刪除資料

刪除文件只能單條刪除，需指定資料 ID

$param = [
            'index' => 'my_index',
            'type' => 'my_type',
            'id'    => 'my_id' // 指定資料ID
        ];
        return  $this->client->delete($param);

返回如下

array:1 [▼
  "acknowledged" => true
]

查詢資料

查詢資料需指定資料 ID

$params = [
            'index' => 'my_index',
            'type' => 'my_type',
                            'id' => 'WKXd12IBwuLBOSSKe5k_' // 此 ID 為自動生成的 ID，專案中建議查詢手動錄入ID
        ];
        return $this->client->get($params);

返回如下

array:6 [▼
  "_index" => "my_index"
  "_type" => "my_type"
  "_id" => "WKXd12IBwuLBOSSKe5k_"
  "_version" => 1
  "found" => true
  "_source" => array:2 [▼
    "title" => "我愛北京天安門"
    "body" => "天安門上太陽升"
  ]
]

資料修改

$params = [
            'index' => 'my_index',
            'type' => 'my_type',
            'id' => 'WKXd12IBwuLBOSSKe5k_',//指定 id, 這裡為之前錄入時自動生成的 id
            'body' => [
                'doc' => [  // 必須帶上doc.表示是資料操作
                    'age' => 150
                ]
            ]
        ];
return  $this->client->update($params);

返回如下

array:8 [▼
  "_index" => "my_index"
  "_type" => "my_type"
  "_id" => "WKXd12IBwuLBOSSKe5k_"
  "_version" => 2
  "result" => "updated"
  "_shards" => array:3 [▼
    "total" => 2
    "successful" => 1
    "failed" => 0
  ]
  "_seq_no" => 4
  "_primary_term" => 1
]

再次查詢資料

array:6 [▼
  "_index" => "my_index"
  "_type" => "my_type"
  "_id" => "WKXd12IBwuLBOSSKe5k_"
  "_version" => 2
  "found" => true
  "_source" => array:3 [▼
    "title" => "我愛北京天安門"
    "body" => "天安門上太陽升"
    "age" => 150
  ]
]

發現返回資料中多了 age 欄位，修改成功。

搜尋資料

先來個簡單的.

$params = [
    'index' => 'my_index', //['my_index1', 'my_index2'],可以通過這種形式進行跨庫查詢
    'type' => 'my_type',    //['my_type1', 'my_type2'],
    'body' => [
        'query'=>[
            'match'=>[
                "title"    =>  '北京',
            ],
        ],
    ]
];
return  $this->client->search($params);

返回如下

array:4 [▼
  "took" => 188
  "timed_out" => false
  "_shards" => array:4 [▼
    "total" => 5
    "successful" => 5
    "skipped" => 0
    "failed" => 0
  ]
  "hits" => array:3 [▼
    "total" => 2
    "max_score" => 1.6451461
    "hits" => array:2 [▼
      0 => array:5 [▼
        "_index" => "my_index"
        "_type" => "my_type"
        "_id" => "10001"
        "_score" => 1.6451461
        "_source" => array:3 [▼
          "id" => "10001"
          "title" => "北京"
          "body" => "我們是首都"
        ]
      ]
      1 => array:5 [▼
        "_index" => "my_index"
        "_type" => "my_type"
        "_id" => "WKXd12IBwuLBOSSKe5k_"
        "_score" => 0.94175816
        "_source" => array:3 [▼
          "title" => "我愛北京天安門"
          "body" => "天安門上太陽升"
          "age" => 150
        ]
      ]
    ]
  ]
]

可以看到，title 中包含北京 的資料已經全部返回了。

Elasticsearch 最重要的也是最靈活的就是搜尋了，你能想到的方法基本上Elasticsearch 都已經幫你做好，比如：
term,match,multi_match，range.prefix,wildcard,regexp.fuzzy,match_phrase,match_phrase_prefix,exists等等，瞭解具體方法的使用請參考：

原文地址：http://www.qiehe.net/posts/4/the-use-and-c...

本作品採用《CC 協議》，轉載必須註明作者和本文連結

Good Good Study , Day Day Up!!

Elasticsearch——全文搜尋
2019-02-18
Elasticsearch
ElasticSearch全文搜尋引擎
2019-07-29
Elasticsearch
Elasticsearch 為了搜尋
2021-03-06
Elasticsearch
使用 Laravel Scout + ElasticSearch 實現全文搜尋
2021-10-15
LaravelElasticsearch
Nebula 基於 ElasticSearch 的全文搜尋引擎的文字搜尋
2021-06-17
Elasticsearch
Laravel5.5 使用 Elasticsearch 做引擎，scout 全文搜尋
2018-11-27
LaravelElasticsearch
使用Elasticsearch快速實現社群/部落格文章全文搜尋
2018-04-17
Elasticsearch
使用 Docker 和 Elasticsearch 構建一個全文搜尋應用程式
2022-11-27
DockerElasticsearch
ES(Elasticsearch)支援PB級全文搜尋引擎入門教程
2019-01-23
Elasticsearch
Laravel xunsearch 全文搜尋
2019-02-16
Laravel
sphinx 全文搜尋引擎
2019-02-16
18. 使用MySQL之全文字搜尋
2024-11-05
MySql
Redis 也支援全文搜尋？這也太強了
2023-12-13
Redis
Laravel 使用 Elasticsearch 全域性搜尋
2019-04-17
LaravelElasticsearch
elasticsearch 搜尋引擎工具的高階使用
2024-03-18
Elasticsearch
使用elasticsearch搭建自己的搜尋系統
2020-05-11
Elasticsearch
使用 Postgres 的全文搜尋構建可擴充套件的事件驅動搜尋架構
2022-11-24
套件事件架構
Elasticsearch常用搜尋
2020-08-27
Elasticsearch
elasticsearch搜尋商品
2021-07-15
Elasticsearch
Elasticsearch 向量搜尋
2022-04-16
Elasticsearch
Elasticsearch（ES）的高階搜尋（DSL搜尋）（上篇）
2021-09-20
Elasticsearch
Elasticsearch（ES）的高階搜尋（DSL搜尋）（下篇）
2021-09-21
Elasticsearch
Tantivy與Quickwit：類似Lucene的Rust全文搜尋引擎庫
2022-03-11
UIRust
像使用 Laravel Query 一樣的搜尋 Elasticsearch
2018-04-04
LaravelElasticsearch
如何使用ABAP Restful API進行程式碼的全文搜尋
2018-10-21
RESTAPI行程
在 Spring Boot 中使用搜尋引擎 Elasticsearch
2021-11-18
Spring BootElasticsearch
elasticsearch之拼音搜尋
2022-01-14
Elasticsearch
SQL Server 全文搜尋功能、全文索引方式介紹
2019-01-30
SQLServer索引
elasticsearch(五)---分散式搜尋
2018-08-21
Elasticsearch分散式
認識搜尋引擎 Elasticsearch
2021-07-15
Elasticsearch
ElasticSearch 簡單的搜尋聚合分析
2018-04-16
Elasticsearch
（1）分散式搜尋ElasticSearch認識ElasticSearch
2019-05-11
分散式Elasticsearch
Elasticsearch：使用同義詞 synonyms 來提高搜尋效率
2021-11-03
Elasticsearch
搜尋引擎ElasticSearch18_ElasticSearch簡介1
2024-05-23
Elasticsearch
IKA全文搜尋工具-桌面版（原創）
2020-11-21
Elasticsearch 實現簡單搜尋
2019-03-07
Elasticsearch
Laravel + Elasticsearch 實現中文搜尋
2020-02-05
LaravelElasticsearch
【elasticsearch】搜尋過程詳解
2022-03-19
Elasticsearch

Elasticsearch 的配置與使用，為了全文搜尋

Elasticsearch 安裝

Elasticsearch 中文外掛

Elasticsearch 索引的使用

建立索引

刪除索引

檢視索引設定

檢視 mapping 資訊

Elasticsearch 的增刪改查

增加資料

刪除資料

查詢資料

資料修改

搜尋資料

相關文章

檢視 `mapping` 資訊