Elasticsearch系列---生產叢集的索引管理

firefule發表於2021-09-09

原文網址 : http://blog.itpub.net/1834/viewspace-2825703/

概要

索引是我們使用Elasticsearch裡最頻繁的部分日常的操作都與索引有關，本篇從運維人員的視角，來玩一玩Elasticsearch的索引操作。

基本操作

在運維童鞋的視角里，索引的日常操作除了CRUD，還是開啟關閉、壓縮、alias重置，我們來了解一下。

建立索引

[esuser@elasticsearch02 ~]$curl -XPUT 'http://elasticsearch02:9200/music?pretty' -H 'Content-Type: application/json' -d '
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, 
            "number_of_replicas" : 2 
        }
    },
    "mappings" : {
        "type1" : {
            "properties" : {
                "name" : { "type" : "text" }
            }
        }
    }
}'

{
    "acknowledged": true,
    "shards_acknowledged": true
}

預設情況下，索引建立命令會在每個primary shard的replica shard 開始進行復制後，或者是請求超時之後，返回響應訊息，如上。

acknowledged表示這個索引是否建立成功，shards_acknowledged表明了每個primary shard有沒有足夠數量的replica開始進行復制。

可能這兩個引數會為false，但是索引依然可以建立成功。因為這些引數僅僅是表明在請求超時之前，這兩個操作有沒有成功，也有可能請求超時了，在超時前都沒成功，但是實際上Elasticsearch Server端接收到了訊息，並且都執行了，只是響應前還沒來得及執行，所以響應的是false。

刪除索引

curl -XDELETE 'http://elasticsearch02:9200/music?pretty'

查詢索引設定資訊

curl -XGET 'http://elasticsearch02:9200/music?pretty'

開啟/關閉索引

curl -XPOST 'http://elasticsearch02:9200/music/_close?pretty'
curl -XPOST 'http://elasticsearch02:9200/music/_open?pretty'

如果一個索引關閉了，那麼這個索引就沒有任何的效能開銷了，只要保留這個索引的後設資料即可，然後對這個索引的讀寫操作都不會成功。一個關閉的索引可以接著再開啟，開啟以後會進行shard recovery過程。

如果叢集資料定時有備份，在執行恢復的操作之前，必須將待恢復的索引關閉，否則恢復會報失敗。

壓縮索引

我們知道索引的primary shard數量在建立時一旦指定，後期就不能修改了，但是有一個這樣的情況：預估的shard數量在實際生產之後，發現估算得有點高，比如原來設定number_of_shards為8，結果生產上線後發現資料量沒那麼大，我想把這個索引的primary shard壓縮一下，該如何操作呢？

shrink命令的作用就是對索引進行壓縮的，不過有個限制：壓縮後的shard數量必須可以被原來的shard數量整除。如我們的8個primary shard的index可以只能被壓縮成4個，2個，或者1個primary shard的index。

shrink命令的工作流程：

建立一個跟source index的定義一樣的target index，但是唯一的變化就是primary shard變成了指定的數量。
將source index的segment file直接用hard-link的方式連線到target index的segment file，如果作業系統不支援hard-link，那麼就會將source index的segment file都拷貝到target index的data dir中，會很耗時。如果用hard-link會很快。
target index進行shard recovery恢復。

案例演示

我們建立一個number_of_shards為8的索引，名稱為music8

curl -XPUT 'http://elasticsearch02:9200/music8?pretty' -H 'Content-Type: application/json' -d '
{
    "settings" : {
        "index" : {
            "number_of_shards" : 8, 
            "number_of_replicas" : 2 
        }
    },
    "mappings" : {
        "children" : {
            "properties" : {
                "name" : { "type" : "text" }
            }
        }
    }
}'

在索引內灌點資料進去
將索引的shard都移到一個node上去，如node1

curl -XPUT 'http://elasticsearch02:9200/music8/_settings?pretty' -H 'Content-Type: application/json' -d '
{
  "settings": {
    "index.routing.allocation.require._name": "node-1", 
    "index.blocks.write": true 
  }
}'

這個過程叫shard copy relocate，使用

`curl -XGET 'http://elasticsearch02:9200/_cat/recovery?v'

可以檢視該過程的進度。

執行shrink命令，新的索引名稱為music9

curl -XPOST 'http://elasticsearch02:9200/music8/_shrink/music9?pretty' -H 'Content-Type: application/json' -d '
{
  "settings": {
	"index.number_of_shards": 2, 
    "index.number_of_replicas": 1,
    "index.codec": "best_compression" 
  }
}'

執行完成後，可以看到music9的shard資料變化了，並且擁有music8所有的資料。

將別名指向新的music9索引，客戶端訪問無感知。

rollover索引

我們最常見的日誌索引，需要每天建立一個新的帶日期的索引，但客戶端又使用同一個alias進行寫入，此時可以用rollover命令將alias重置到這個新的索引上。

假設log_write別名已經存在，示例命令：

curl -XPOST 'http://elasticsearch02:9200/log_write/_rollover/log-20120122
-H 'Content-Type: application/json' -d '
{
  "conditions": {
    "max_age":   "1d"
  }
}'

用crontab定時每天執行一次，並且將日期部分用shell指令碼進行引數化，這樣每天都建立一個帶日期的索引名字，而客戶端那邊一直使用log_write別名作寫入操作，對日誌系統非常實用。

索引mapping管理

索引的mapping管理是非常基礎的操作，我們可以在建立索引時定義mapping資訊，也可以在索引建立成功後執行增加欄位操作。

列舉以下幾個常用示例：

檢視索引的mapping資訊

curl -XGET 'http://elasticsearch02:9200/music/_mapping/children?pretty'

檢視索引指定field的mapping資訊

curl -XGET 'http://elasticsearch02:9200/music/_mapping/children/field/content?pretty'

建立索引時帶上mapping資訊

# 節省篇幅，省略大部分欄位
curl -XPUT 'http://elasticsearch02:9200/music?pretty' -H 'Content-Type: application/json' -d ' 
{
  "mappings": {
    "children": {
      "properties": {
        "content": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
		}
      }
    }
  }
}'

為索引增加一個欄位name，型別為text

curl -XPUT 'http://elasticsearch02:9200/music/_mapping/children?pretty' -H 'Content-Type: application/json' -d ' 
{
  "properties": {
    "name": {
      "type": "text"
    }
  }
}'

索引別名

客戶端訪問Elasticsearch的索引時，規範化操作都不會直接使用索引名稱，而是使用索引別名，索引別名能夠起到封裝Elasticsearch真實索引的作用，像上面的rollover操作，索引重建操作，別名起到了非常關鍵的作用。

我們來簡單看一下索引的基本操作：

# 建立索引別名
curl -XPOST 'http://elasticsearch02:9200/_aliases?pretty' -H 'Content-Type: application/json' -d '
{
    "actions" : [
        { "add" : { "index" : "music", "alias" : "music_prd" } }
    ]
}'

# 刪除索引別名
curl -XPOST 'http://elasticsearch02:9200/_aliases?pretty' -H 'Content-Type: application/json' -d '
{
    "actions" : [
        { "remove" : { "index" : "music", "alias" : "music_prd" } }
    ]
}'

# 重新命名別名：先刪掉後新增
curl -XPOST 'http://elasticsearch02:9200/_aliases?pretty' -H 'Content-Type: application/json' -d '
{
    "actions" : [
        { "remove" : { "index" : "music", "alias" : "music_prd" } },
        { "add" : { "index" : "music2", "alias" : "music_prd" } }
    ]
}'

# 多個索引繫結一個別名
curl -XPOST 'http://elasticsearch02:9200/_aliases?pretty' -H 'Content-Type: application/json' -d '
{
    "actions" : [
        { "add" : { "indices" : ["music1", "music2"], "alias" : "music_prd" } }
    ]
}'

索引setting修改

檢視索引setting資訊：

curl -XGET 'http://elasticsearch02:9200/music/_settings?pretty'

修改setting資訊：

curl -XPUT 'http://elasticsearch02:9200/music/_settings?pretty' -H 'Content-Type: application/json' -d '
{
    "index" : {
        "number_of_replicas" : 1
    }
}'

setting最常見的修改項就是replicas的數量，其他的引數修改的場景不是特別多。

索引template

假設我們正在設計日誌系統的索引結構，日誌資料量較大，可能每天建立一個新的索引，索引名稱按日期標記，但別名是同一個，這種場景就比較適合使用index template。

我們舉個示例，先建立一個索引模板：

curl -XPUT 'http://elasticsearch02:9200/_template/template_access_log?pretty' -H 'Content-Type: application/json' -d '
{
  "template": "access-log-*",
  "settings": {
    "number_of_shards": 2
  },
  "mappings": {
    "log": {
      "_source": {
        "enabled": false
      },
      "properties": {
        "host_name": {
          "type": "keyword"
        },
		"thread_name": {
          "type": "keyword"
        },
        "created_at": {
          "type": "date",
          "format": "YYYY-MM-dd HH:mm:ss"
        }
      }
    }
  },
  "aliases" : {
      "access-log" : {}
  }
}'

索引名稱符合"access-log-*"將使用該模板，我們建立一個索引：

curl -XPUT 'http://elasticsearch02:9200/access-log-01?pretty'

檢視該索引：

curl -XGET 'http://elasticsearch02:9200/access-log-01?pretty'

可以看到如下結構：

[esuser@elasticsearch02 bin]$ curl -XGET 'http://elasticsearch02:9200/access-log-01?pretty'
{
  "access-log-01" : {
    "aliases" : {
      "access-log" : { }
    },
    "mappings" : {
      "log" : {
        "_source" : {
          "enabled" : false
        },
        "properties" : {
          "created_at" : {
            "type" : "date",
            "format" : "YYYY-MM-dd HH:mm:ss"
          },
          "host_name" : {
            "type" : "keyword"
          },
          "thread_name" : {
            "type" : "keyword"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1581373546223",
        "number_of_shards" : "2",
        "number_of_replicas" : "1",
        "uuid" : "N8AHh3wITg-Zh4T6umCS2Q",
        "version" : {
          "created" : "6030199"
        },
        "provided_name" : "access-log-01"
      }
    }
  }
}

說明使用了模板的內容。

當然也有命令可以檢視和刪除template:

curl -XGET 'http://elasticsearch02:9200/_template/template_access_log?pretty'

curl -XDELETE 'http://elasticsearch02:9200/_template/template_access_log?pretty'

索引常用查詢

索引操作統計查詢

發生在索引上的所有CRUD操作，Elasticsearch都是會做統計的，而且統計的內容非常翔實，我們可以使用這條命令：

curl -XGET 'http://elasticsearch02:9200/music/_stats?pretty'

內容非常詳細，有好幾百行，從doc的資料和佔用的磁碟位元組數，到get、search、merge、translog等底層資料應有盡有。

segment資訊查詢

索引下的segment資訊，可以使用這條命令進行查詢：

curl -XGET 'http://elasticsearch02:9200/music/_segments?pretty'

內容也同樣挺多，我們摘抄出關鍵的部分做個示例：

"segments" : {
  "_1" : {
    "generation" : 1,
    "num_docs" : 1,
    "deleted_docs" : 0,
    "size_in_bytes" : 7013,
    "memory_in_bytes" : 3823,
    "committed" : true,
    "search" : true,
    "version" : "7.3.1",
    "compound" : true,
    "attributes" : {
      "Lucene50StoredFieldsFormat.mode" : "BEST_SPEED"
    }
  }
}

這個片段表示名稱為_1的segment的資訊。詳細如下：

_1：segment的名稱
generation：segment的自增長ID
num_docs：segment中沒有被刪除的document的數量
deleted_docs：segment中被刪除的document數量
size_in_bytes：segment佔用的磁碟空間
memory_in_bytes：segment會將一些資料快取在記憶體中，這個數值就是segment佔用的記憶體的空間大小
committed：segment是否被sync到磁碟上去了
search：segment是否可被搜尋，如果這個segment已經被sync到磁碟上，但是還沒有進行refresh，值為false
version：lucene的版本號
compound：true表示lucene已將這個segment所有的檔案都merge成了一個檔案

shard儲存資訊

檢視索引下shard的儲存情況，分佈在哪個node上，這條命令還是挺有用處的:

curl -XGET 'http://elasticsearch02:9200/music/_shard_stores?status=green&pretty'

摘抄了一個片段，3表示shard的id：

"3" : {
  "stores" : [
    {
      "A1s1uus7TpuDSiT4xFLOoQ" : {
        "name" : "node-2",
        "ephemeral_id" : "Q3uoxLeJRnWQrw3E2nOq-Q",
        "transport_address" : "192.168.17.137:9300",
        "attributes" : {
          "ml.machine_memory" : "3954196480",
          "rack" : "r1",
          "xpack.installed" : "true",
          "ml.max_open_jobs" : "20",
          "ml.enabled" : "true"
        }
      },
      "allocation_id" : "o-t-AwGZRrWTflYLP030jA",
      "allocation" : "primary"
    },
    {
      "RGw1IXzZR4CeZh9FUrGHDw" : {
        "name" : "node-1",
        "ephemeral_id" : "B1pv6c4TRuu1vQNvL40iPg",
        "transport_address" : "192.168.17.138:9300",
        "attributes" : {
          "ml.machine_memory" : "3954184192",
          "rack" : "r1",
          "ml.max_open_jobs" : "20",
          "xpack.installed" : "true",
          "ml.enabled" : "true"
        }
      },
      "allocation_id" : "SaXqL8igRUmLAoBBQyQNqw",
      "allocation" : "replica"
    }
  ]
},

補充幾個操作

清空索引快取

curl -XPOST 'http://elasticsearch02:9200/music/_cache/clear?pretty'

強制flush

強行將os cache裡的資料強制fsync到磁碟上去，同時還會清理掉translog中的日誌

curl -XPOST 'http://elasticsearch02:9200/music/_flush?pretty'

refresh操作

顯式地重新整理索引，讓在自動refresh前的所有操作變成可見