Elasticsearch中關於transform的一個問題分析

下午喝什麼茶發表於2021-12-07

背景:現在有一個業務,派件業務,業務員今天去派件(掃描產生一條派件記錄),派件可能會有重複派件的情況,第二天再派送(記錄被更新,以最新的派件操作為準)。現在需要分業務員按天統計每天的派件數量。
es版本:7.15.1
1、建立索引:

PUT t_test_001
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "city_id": {
        "type": "long"
      },
      "city_name": {
        "type": "keyword"
      },
      "create_time": {
        "type": "date"
      },
      "push_date": {
        "type": "date"
      },
      "update_time": {
        "type": "date"
      }
    }
  }
}

2、插入測試資料

POST /t_test_001/_bulk
{ "index": {}}
{ "order_no" : 1,"employee":"張三",  "create_time" : "2021-12-06T08:00:00.000Z", "push_date" : "2021-12-06T08:00:00.000Z", "update_time" : "2021-12-06T08:00:00.000Z"}
{ "index": {}}
{ "order_no" : 2,"employee":"張三",  "create_time" : "2021-12-06T08:00:00.000Z", "push_date" : "2021-12-06T08:00:00.000Z", "update_time" : "2021-12-06T08:00:00.000Z"}
{ "index": {}}
{ "order_no" : 3,"employee":"張三",  "create_time" : "2021-12-07T00:00:00.000Z", "push_date" : "2021-12-07T00:00:00.000Z", "update_time" : "2021-12-07T00:00:00.000Z"}
{ "index": {}}
{ "order_no" : 4,"employee":"張三",  "create_time" : "2021-12-07T00:00:00.000Z", "push_date" : "2021-12-07T00:00:00.000Z", "update_time" : "2021-12-07T00:00:00.000Z"}
{ "index": {}}
{ "order_no" : 5,"employee":"王五",  "create_time" : "2021-12-06T08:00:00.000Z", "push_date" : "2021-12-06T08:00:00.000Z", "update_time" : "2021-12-06T08:00:00.000Z"}
{ "index": {}}
{ "order_no" : 6,"employee":"王五",  "create_time" : "2021-12-06T08:00:00.000Z", "push_date" : "2021-12-06T08:00:00.000Z", "update_time" : "2021-12-06T08:00:00.000Z"}
{ "index": {}}
{ "order_no" : 7,"employee":"王五",  "create_time" : "2021-12-07T00:00:00.000Z", "push_date" : "2021-12-07T00:00:00.000Z", "update_time" : "2021-12-07T00:00:00.000Z"}
{ "index": {}}
{ "order_no" : 8,"employee":"王五",  "create_time" : "2021-12-07T00:00:00.000Z", "push_date" : "2021-12-07T00:00:00.000Z", "update_time" : "2021-12-07T00:00:00.000Z"}

3、查詢一下看看

GET /t_test_001/_search
{
  "size": 10
}

結果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "t_test_001",
        "_type" : "_doc",
        "_id" : "GLztkn0BDKE3xmcewwIG",
        "_score" : 1.0,
        "_source" : {
          "order_no" : 1,
          "employee" : "張三",
          "create_time" : "2021-12-06T08:00:00.000Z",
          "push_date" : "2021-12-06T08:00:00.000Z",
          "update_time" : "2021-12-06T08:00:00.000Z"
        }
      },
      {
        "_index" : "t_test_001",
        "_type" : "_doc",
        "_id" : "Gbztkn0BDKE3xmcewwIG",
        "_score" : 1.0,
        "_source" : {
          "order_no" : 2,
          "employee" : "張三",
          "create_time" : "2021-12-06T08:00:00.000Z",
          "push_date" : "2021-12-06T08:00:00.000Z",
          "update_time" : "2021-12-06T08:00:00.000Z"
        }
      },
      {
        "_index" : "t_test_001",
        "_type" : "_doc",
        "_id" : "Grztkn0BDKE3xmcewwIG",
        "_score" : 1.0,
        "_source" : {
          "order_no" : 3,
          "employee" : "張三",
          "create_time" : "2021-12-07T00:00:00.000Z",
          "push_date" : "2021-12-07T00:00:00.000Z",
          "update_time" : "2021-12-07T00:00:00.000Z"
        }
      },
      {
        "_index" : "t_test_001",
        "_type" : "_doc",
        "_id" : "G7ztkn0BDKE3xmcewwIG",
        "_score" : 1.0,
        "_source" : {
          "order_no" : 4,
          "employee" : "張三",
          "create_time" : "2021-12-07T00:00:00.000Z",
          "push_date" : "2021-12-07T00:00:00.000Z",
          "update_time" : "2021-12-07T00:00:00.000Z"
        }
      },
      {
        "_index" : "t_test_001",
        "_type" : "_doc",
        "_id" : "HLztkn0BDKE3xmcewwIG",
        "_score" : 1.0,
        "_source" : {
          "order_no" : 5,
          "employee" : "王五",
          "create_time" : "2021-12-06T08:00:00.000Z",
          "push_date" : "2021-12-06T08:00:00.000Z",
          "update_time" : "2021-12-06T08:00:00.000Z"
        }
      },
      {
        "_index" : "t_test_001",
        "_type" : "_doc",
        "_id" : "Hbztkn0BDKE3xmcewwIG",
        "_score" : 1.0,
        "_source" : {
          "order_no" : 6,
          "employee" : "王五",
          "create_time" : "2021-12-06T08:00:00.000Z",
          "push_date" : "2021-12-06T08:00:00.000Z",
          "update_time" : "2021-12-06T08:00:00.000Z"
        }
      },
      {
        "_index" : "t_test_001",
        "_type" : "_doc",
        "_id" : "Hrztkn0BDKE3xmcewwIG",
        "_score" : 1.0,
        "_source" : {
          "order_no" : 7,
          "employee" : "王五",
          "create_time" : "2021-12-07T00:00:00.000Z",
          "push_date" : "2021-12-07T00:00:00.000Z",
          "update_time" : "2021-12-07T00:00:00.000Z"
        }
      },
      {
        "_index" : "t_test_001",
        "_type" : "_doc",
        "_id" : "H7ztkn0BDKE3xmcewwIG",
        "_score" : 1.0,
        "_source" : {
          "order_no" : 8,
          "employee" : "王五",
          "create_time" : "2021-12-07T00:00:00.000Z",
          "push_date" : "2021-12-07T00:00:00.000Z",
          "update_time" : "2021-12-07T00:00:00.000Z"
        }
      }
    ]
  }
}

4、建立一個transform,將資料按天、業務員  聚合

PUT _transform/t_test_transform
{
  "id": "t_test_transform",
  "source": {
    "index": [
      "t_test_001"
    ]
  },
  "dest": {
    "index": "t_test_x"
  },
  "frequency": "60s",
  "sync": {
    "time": {
      "field": "update_time",
      "delay": "60s"
    }
  },
  "pivot": {
    "group_by": {
      "employee": {
        "terms": {
          "field": "employee"
        }
      },
      "push_date": {
        "date_histogram": {
          "field": "push_date",
          "calendar_interval": "1d"
        }
      }
    },
    "aggregations": {
      "sum_all": {
        "value_count": {
          "field": "_id"
        }
      }
    }
  }
}

5、開啟transform

POST _transform/t_test_transform/_start

6、檢視transform轉換的索引結果

GET /t_test_x/_search
{}

結果:如圖,張三2021-12-06和07號各派送兩單:

 7、12月7號,訂單order_no = 1的單子再次被張三派送;資料被更新

POST /t_test_001/_update/GLztkn0BDKE3xmcewwIG
{
  "doc": {
    "push_date": "2021-12-07T03:27:12.000Z",
    "update_time": "2021-12-07T03:27:12.000Z"
  }
}

注意模擬運算元據的真實性,更新時間在上一個檢查點之後!

8、預期transfrom轉換的結果是張三12-6號的派單統計資料由2減少為1;12-7號的派單資料從2增加到3。


9、查詢transform轉換的索引結果

GET /t_test_x/_search
{}

結果:張三12-6號的派單統計資料為2沒有減少,不符合預期;12-7號的派單資料為3,符合預期。

 10,再查詢一下原始資料:

GET /t_test_001/_search
{}

11、再統計一下資料:

GET /t_test_001/_search
{
  "size": 0,
  "aggs": {
    "employee": {
      "terms": {
        "field": "employee"
      },
      "aggs": {
        "push_date": {
          "date_histogram": {
            "field": "push_date",
            "calendar_interval": "1d"
          }
        }
      }
    }
  }
}

結果很顯然:張三 12-06號派送量為1,12-07號派送量為3!!!而transform統計的結果,此時就錯了!!!這個怎麼理解呢?是他es的transform不支援這種場景資料變化的聚合,還是說這是一個bug呢?我理解,可能是因為考慮到效能的原因,es的transform在這種場景下是有這種問題的。

 

若有錯誤之處,望大家指正。謝謝。

相關文章