ElasticSearch7.3 學習之生產環境實時重建索引

|舊市拾荒|發表於2022-03-26

1、實時重建索引

在實際的生產環境中,一個field的設定是不能被修改的,如果要修改一個Field,那麼應該重新按照新的mapping,建立一個index,然後將資料批量查詢出來,重新用bulk api寫入index中。

批量查詢的時候,建議採用scroll api,並且採用多執行緒併發的方式來reindex資料。例如說每次scoll就查詢指定日期的一段資料,交給一個執行緒即可。

(1) 一開始,依靠dynamic mapping,插入資料,但是不小心有些資料是2019-09-10這種日期格式的,所以title這種field被自動對映為了date型別,實際上它應該是string型別的。

首先插入以下資料

PUT /my_index/_doc/1
{
  "title": "2019-09-10"
}

PUT /my_index/_doc/2
{
  "title": "2019-09-11"
}

(2)當後期向索引中加入string型別的title值的時候,就會報錯

PUT /my_index/_doc/3
{
  "title": "my first article"
}

報錯

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse field [title] of type [date] in document with id '3'. Preview of field's value: 'my first article'"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse field [title] of type [date] in document with id '3'. Preview of field's value: 'my first article'",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "failed to parse date field [my first article] with format [strict_date_optional_time||epoch_millis]",
      "caused_by": {
        "type": "date_time_parse_exception",
        "reason": "Failed to parse with all enclosed parsers"
      }
    }
  },
  "status": 400
}

(3)如果此時想修改title的型別,是不可能的

PUT /my_index/_mapping
{
  "properties": {
    "title": {
      "type": "text"
   	}
  }
}

報錯

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "mapper [title] of different type, current_type [date], merged_type [text]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "mapper [title] of different type, current_type [date], merged_type [text]"
  },
  "status": 400
}

(4)此時,唯一的辦法,就是進行reindex,也就是說,重新建立一個索引,將舊索引的資料查詢出來,再匯入新索引。

(5)如果說舊索引的名字,是old_index,新索引的名字是new_index,終端java應用,已經在使用old_index在操作了,難道還要去停止java應用,修改使用的indexnew_index,才重新啟動java應用嗎?這個過程中,就會導致java應用停機,可用性降低。

(6)所以說,給java應用一個別名,這個別名是指向舊索引的,java應用先用著,java應用先用prod_index來操作,此時實際指向的是舊的my_index

PUT /my_index/_alias/prod_index

(7)檢視別名,會發現my_index已經存在一個別名prod_index了。

GET my_index/_alias

(8)新建一個index,調整其title的型別為string

PUT /my_index_new
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      }
    }
  }
}

(9)使用scroll api將資料批量查詢出來

GET /my_index/_search?scroll=1m
{
  "query": {
    "match_all": {}
  },
  "size": 1
}

返回

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAARUMWQWx5bzRmTW9TeUNpNmVvN0E2dF9YQQ==",
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "2019-09-10"
        }
      }
    ]
  }
}

(9)採用bulk apiscoll查出來的一批資料,批量寫入新索引

POST /_bulk
{"index":{"_index":"my_index_new","_id":"1"}}
{"title":"2019-09-10"}

(10)反覆迴圈8~9,查詢一批又一批的資料出來,採取bulk api將每一批資料批量寫入新索引

(11)將my_index索引的別名prod_index切換到my_index_new上去,java應用會直接通過index別名使用新的索引中的資料,java應用程式不需要停機,零提交,高可用

POST /_aliases
{
  "actions": [
    {
      "remove": {
        "index": "my_index",
        "alias": "prod_index"
      }
    },
    {
      "add": {
        "index": "my_index_new",
        "alias": "prod_index"
      }
    }
  ]
}

(12)直接通過prod_index別名來查詢,是否ok

GET prod_index/_search

可以看到能夠查詢到新索引my_index_new的資料了

{
  "took" : 1117,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index_new",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "2019-09-10"
        }
      }
    ]
  }
}

2、總結:

基於aliasclient透明切換index

PUT /my_index_v1/_alias/my_index

clientmy_index進行操作

reindex操作,完成之後,切換v1到v2

POST /_aliases
{
    "actions": [
        { "remove": { "index": "my_index_v1", "alias": "my_index" }},
        { "add":    { "index": "my_index_v2", "alias": "my_index" }}
    ]
}

 

相關文章