ES ElasticSearch 7.x 下動態擴大索引的shard數量

PassZhang發表於2020-09-10

原文網址 : https://www.cnblogs.com/passzhang/p/13645000.html

Elasticsearch索引

ES ElasticSearch 7.x 下動態擴大索引的shard數量

背景

在老版本的ES（例如2.3版本）中， index的shard數量定好後，就不能再修改，除非重建資料才能實現。

從ES6.1開始，ES 支援可以線上操作擴大shard的數量（注意：操作期間也需要對index鎖寫）

從ES7.0開始，split時候，不再需要加引數 index.number_of_routing_shards

具體參考官方文件：

https://www.elastic.co/guide/en/elasticsearch/reference/7.5/indices-split-index.html

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/indices-split-index.html

split的過程：

1、建立一個新的目標index，其定義與源index相同，但是具有更多的primary shard。

2、將segment從源index硬連結到目標index。（如果檔案系統不支援硬連結，則將所有segment都複製到新索引中，這是一個非常耗時的過程。）

3、建立低階檔案後，再次對所有文件進行雜湊處理，以刪除屬於不同shard的documents

4、恢復目標索引，就像它是剛剛重新開啟的封閉索引一樣。

為啥ES不支援增量resharding？

從N個分片到N + 1個分片。增量重新分片確實是許多鍵值儲存支援的功能。僅新增一個新的分片並將新的資料推入該新的分片是不可行的：這可能是一個索引瓶頸，並根據給定的_id來確定文件所屬的分片，這對於獲取，刪除和更新請求是必需的，會變得很複雜。這意味著我們需要使用其他雜湊方案重新平衡現有資料。

鍵值儲存有效執行此操作的最常見方式是使用一致的雜湊。當分片的數量從N增加到N + 1時，一致的雜湊僅需要重定位鍵的1 / N。但是，Elasticsearch的儲存單位（碎片）是Lucene索引。由於它們以搜尋為導向的資料結構，僅佔Lucene索引的很大一部分，即僅佔5％的文件，將其刪除並在另一個分片上建立索引通常比鍵值儲存要高得多的成本。如上節所述，當通過增加乘數來增加分片數量時，此成本保持合理：這允許Elasticsearch在本地執行拆分，這又允許在索引級別執行拆分，而不是為需要重新索引的文件重新編制索引移動，以及使用硬連結進行有效的檔案複製。

對於僅追加資料，可以通過建立新索引並將新資料推送到其中，同時新增一個別名來覆蓋讀取操作的新舊索引，從而獲得更大的靈活性。假設舊索引和新索引分別具有M和N個分片，與搜尋具有M + N個分片的索引相比，這沒有開銷。

索引能進行split的前提條件：

1、目標索引不能存在。

2、源索引必須比目標索引具有更少的primary shard。

3、目標索引中主shard的數量必須是源索引中主shard的數量的倍數。

4、處理拆分過程的節點必須具有足夠的可用磁碟空間，以容納現有索引的第二個副本。

操作

下面是具體的實驗部分：

tips：實驗機器有限，索引的replica都設定為0，生產上至少replica>=1

建立一個索引，2個主shard，沒有副本

curl -s -X PUT "http://localhost:9200/twitter?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index.number_of_shards": 2,
    "index.number_of_replicas": 0
  },
    "aliases": {
    "my_search_indices": {}
  }
}'

# index.number_of_shards：主分片設定個數
# index.number_of_replicas：副本分片設定個數，一個副本就等於把整個索引備份1份
# aliases：設定索引別名"my_search_indices"

# 寫入幾條測試資料

curl -s -X PUT "http://localhost:9200/my_search_indices/_doc/11?pretty" -H 'Content-Type: application/json' -d '{
  "id": 11,
  "name":"lee",
  "age":"23"
}'
curl -s -X PUT "http://localhost:9200/my_search_indices/_doc/22?pretty" -H 'Content-Type: application/json' -d '{
  "id": 22,
  "name":"amd",
  "age":"22"
}'

# 查詢資料

curl -s -XGET "http://localhost:9200/my_search_indices/_search" | jq .

對索引鎖寫，以便下面執行split操作

curl -s -X PUT "http://localhost:9200/twitter/_settings?pretty" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index.blocks.write": true
  }
}'

# index.blocks.write：寫入鎖定，只能讀，不能寫

# 寫資料測試，確保鎖寫生效

curl -s -X PUT "http://localhost:9200/twitter/_doc/33?pretty" -H 'Content-Type: application/json' -d '{
  "id": 33,
  "name":"amd",
  "age":"33"
}'

# 測試寫入失敗

# 取消 twitter 索引的alias

curl -s -X POST "http://localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d '{
    "actions" : [
        { "remove" : { "index" : "twitter", "alias" : "my_search_indices" } }
    ]
}'

curl -s -X GET "http://localhost:9200/_cat/aliases"

第二種方式：

# 取消索引別名
curl -s -X DELETE "http://localhost:9200/twitter/_alias/my_search_indices"

curl -s -X GET "http://localhost:9200/_cat/aliases"

開始執行 split 切分索引的操作，調整後索引名稱為new_twitter，且主shard數量為8

curl -s -X POST "http://localhost:9200/twitter/_split/new_twitter?pretty" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index.number_of_shards": 8,
    "index.number_of_replicas": 0
  }
}'

# 對新的index新增alias

curl -s -X POST "http://localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d '{
    "actions" : [
        { "add" : { "index" : "new_twitter", "alias" : "my_search_indices" } }
    ]
}'

第二種方式：

# 新建索引別名
curl -s -X PUT "http://localhost:9200/new_twitter/_alias/my_search_indices"

結果：

{
 "acknowledged" : true,
 "shards_acknowledged" : true,
 "index" : "new_twitter"
}

補充：

檢視split的進度，可以使用 _cat/recovery 這個api，或者在 cerebro 介面上檢視。

檢視新索引的資料，能正常檢視

curl -s -XGET "http://localhost:9200/my_search_indices/_search" | jq .

檢視split的進度，可以使用 _cat/recovery 這個api，或者在 cerebro 介面上檢視。

curl -s -X GET "http://localhost:9200/_cat/recovery"

# 對新索引寫資料測試,可以看到失敗的

curl -s -X PUT "localhost:9200/my_search_indices/_doc/33?pretty" -H 'Content-Type: application/json' -d '{
  "id": 33,
  "name":"amd",
  "age":"33"
}'
# 寫入失敗

# 開啟索引的寫功能

curl -s -X PUT "localhost:9200/my_search_indices/_settings?pretty" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index.blocks.write": false 
  }
}'

# 再次對新索引寫資料測試,可以看到此時，寫入是成功的

curl -s -X PUT "localhost:9200/my_search_indices/_doc/33?pretty" -H 'Content-Type: application/json' -d '{
  "id": 33,
  "name":"amd",
  "age":"33"
}'

curl -s -X PUT "localhost:9200/my_search_indices/_doc/44?pretty" -H 'Content-Type: application/json' -d '{
  "id": 44,
  "name":"intel",
  "age":"4"
}'

# 此時，老的那個索引還是隻讀的，我們確保新索引OK後，就可以考慮關閉或者刪除老的 twitter索引了。

測試將新資料寫入別名


curl -s -X PUT "localhost:9200/my_search_indices/_doc/44?pretty" -H 'Content-Type: application/json' -d '{
	"id": 44,
    "name":"amd",
    "age":"44"
}'


寫入也是ok 的

刪除索引

curl -s -X DELETE "http://localhost:9200/new_twitter"

總結

貼一張生產環境執行後的index的截圖，可以看到新的index的每個shard體積只有老index的一半，這樣也就分攤了index的壓力：

ES7.5 下動態擴大索引的shard數量

Elasticsearch 7.x：2、索引管理
2019-01-07
Elasticsearch索引
使用Elasticsearch的動態索引和索引優化
2019-03-28
Elasticsearch索引優化
ElasticSearch兩個節點的情況下，shard是如何分配的
2019-03-08
Elasticsearch
Elasticsearch 7.x Nested 巢狀型別查詢 | ES 乾貨
2019-07-23
Elasticsearch巢狀型別
Elasticsearch 模組 - Shard Allocation 機制
2021-03-07
Elasticsearch
【Mongo】mongos shard 唯一索引的問題
2018-06-22
Go索引
Elasticsearch 7.x 之文件、索引和 REST API 【基礎入門篇】
2019-10-16
Elasticsearch索引RESTAPI
ElasticSearch 獲取es資訊以及索引操作
2018-03-29
Elasticsearch索引
【ElasticSearch】ElasticSearch 7.x 預設不在支援指定索引型別 Failed to parse mapping [_doc]: Root mapping definitio
2020-10-02
Elasticsearch索引型別AIAPP
大數量的DML時對索引處理的技巧
2024-07-24
索引
【Elasticsearch】Elasticsearch 索引模板
2020-10-02
Elasticsearch索引
Elasticsearch必知必會的乾貨知識二：ES索引操作技巧
2020-07-27
Elasticsearch索引
elasticsearch 6.x 與elasticsearch 7.x 配置與使用（Java）
2020-01-13
ElasticsearchJava
.NetCore下ES查詢驅動 PlainElastic .Net 升級官方驅動 Elasticsearch .Net
2019-07-23
NetCoreAIElasticsearch
ElasticSearch分片互動過程(建立索引、刪除索引、查詢索引)
2020-11-14
Elasticsearch索引
管理 ES 叢集：Hot & Warm 架構與 Shard Filtering
2020-02-19
架構Filter
ElasticSearch 索引 VS MySQL 索引
2020-10-09
Elasticsearch索引MySql
elasticsearch的字串動態對映
2022-09-06
Elasticsearch字串
剖析 Elasticsearch 的索引原理
2019-05-13
Elasticsearch索引
Elasticsearch 7.x 安裝及配置指導
2020-08-28
Elasticsearch
elasticsearch索引原理
2019-03-07
Elasticsearch索引
ElasticSearch 7.X版本19個常用的查詢語句
2020-08-11
Elasticsearch
es6-數值擴充套件
2018-09-16
套件
Elasticsearch（ES）叢集的搭建
2021-09-23
Elasticsearch
大量索引場景下 Easysearch 和 Elasticsearch 的吞吐量差異
2023-11-25
索引Elasticsearch
elasticsearch配置注入索引
2020-11-20
Elasticsearch索引
Elasticsearch 學習索引
2020-04-30
Elasticsearch索引
「Elasticsearch」ES重建索引怎麼才能做到資料無縫遷移呢？
2021-01-02
Elasticsearch索引
python建立elasticsearch索引的探討
2018-11-12
PythonElasticsearch索引
Elasticsearch 7.x 之節點、叢集、分片及副本
2020-08-23
Elasticsearch
Elasticsearch Auditing（es的審計功能）
2020-10-06
Elasticsearch
ES6入門之數值的擴充套件
2019-04-26
套件
實現動態大數結構
2023-10-27
Elasticsearch Query DSL建立滾動索引(生命週期策略)
2024-11-01
Elasticsearch索引
Elasticsearch（三）：索引查詢
2020-10-21
Elasticsearch索引
elasticsearch之多索引查詢
2021-12-31
Elasticsearch索引
elasticsearch如何設計索引
2021-02-19
Elasticsearch索引
「Elasticsearch」SpringBoot快速整合ES
2020-12-07
ElasticsearchSpring Boot

ES ElasticSearch 7.x 下動態擴大索引的shard數量

ES ElasticSearch 7.x 下動態擴大索引的shard數量

背景

操作

建立一個索引，2個主shard，沒有副本

對索引鎖寫，以便下面執行split操作

開始執行 split 切分索引的操作，調整後索引名稱為new_twitter，且主shard數量為8

檢視新索引的資料，能正常檢視

總結

相關文章