關於 Elasticsearch nested field /script 的一些複雜查詢

quickly3發表於2020-06-09

原文網址 : https://learnku.com/articles/45662

1.擬相似度百分比評分

nested filed 模擬相似度按照相似度百分比，給予不同評分
關鍵：重複打分機制

minimum_should_match 為輸入文字分詞總數的最小匹配百分比,比如當你輸入的查詢文字的”you are here for whole day”該文字有6個分詞，同時設定minimum_should_match 為50%，即6*50% = 3 這個查詢就只會返回至少有3個分詞匹配的文件

例子：experiences.workSoldDesc 欄位相似度為50% socre 為原始socre，達到70%，socre 為兩倍

GET 1_Talents/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": ["experiences"],
            "query": {
              "query_string": {
                "default_field": "experiences.workSoldDesc", 
                "query": "Saas Information",
                "minimum_should_match": "50%"
              }
            }
          }
        },    
        {
          "nested": {
            "path": ["experiences"],
            "query": {
              "query_string": {
                "default_field": "experiences.workSoldDesc", 
                "query": "Saas Information",
                "minimum_should_match": "70%"
              }
            }
          }
        }
      ]
    }
  },
  "_source": [
    "experiences.roles.id",
    "experiences.workSoldDesc",
    "experiences.industries.id"
    ]
}

2.histogram查詢

histogram:

POST /1_Talents/_search
{
  "query": {
    "nested": {
      "path": "experiences",
      "query": {
        "query_string": {
          "query": "experiences.talentId:*"
        }        
      }
    }
  },
  "size": 0, 
  "_source":["experiences.talentId"],
  "aggs" : {
    "nesting" : {
        "nested": {
          "path": "experiences"
        },
        "aggs": {
          "cate":{
            "histogram": {
              "field" : "experiences.talentId",
              "interval" : 10
            }
          }
        }
    }
  }  
}

3.terms聚合查詢:

注意：
返回查詢結果中的doc_count 並不是一般的doc總數，而是一個doc 的nested 欄位的匹配次數之和，即doc數會因為nested欄位的原因被重複統計。

建議：使用copy_to屬性將nested欄位的中的屬性，自動複製到一個非nested欄位中，或者由程式碼處理

POST /1_Talents/_search?size=0
{
  "aggs" : {
    "nesting" : {
        "nested": {
          "path": "experiences"
        },
        "aggs": {
          "cate":{
            "terms": {
              "field" : "experiences.talentId"
            }
          }
        }
    }
  }
}

4.script fileds (使用nested型別欄位，格式化並求和，計算工作時長）

painless 是將java 一些物件和函式封裝成painless api 。

例子：
根據experince 的startAt/endAt 計算每個exp的工作時長，當不存在endAt 時候，預設endAt 為now

GET /1_Talents/_search
{
  "query" : {
    "nested": {
      "path": "experiences",
      "query": {
        "query_string": {
          "fields": ["experiences"], 
          "query": "experiences:* && -experiences.endAt:*"
        }
      }
    }
  },
  "script_fields": {
    "exp_work_length": {
      "script": {
        "lang": "painless",
        "source": """
        def resp = [];
        for(exp in params._source.experiences){
          def item = ['exp_id':exp.id];
          if(exp.startAt != null){
            item['startAt'] = exp.startAt;
            item['title'] = exp.title;
          ZonedDateTime zdt1 = ZonedDateTime.parse(exp.startAt);
          ZonedDateTime zdt2;
            if(exp.endAt != null){
              zdt2 = ZonedDateTime.parse(exp.endAt);
              item['current'] = false;
            }else{
              def now_ts = new Date().getTime();
              def now_inst = Instant.ofEpochMilli(now_ts);
              zdt2 = ZonedDateTime.ofInstant(now_inst,ZoneId.of('Z'));
              item['current'] = true;
            }
            def diff = ChronoUnit.MONTHS.between(zdt1, zdt2);
            item['endAt'] = exp.endAt;
            item['wrok_len_of_months'] = diff;
            resp.add(item);      
          }
        }
        return resp
        """
      }
    }
  }
}

5.script aggragation 的子聚合查詢（包含nested的欄位）


注意：
agg script 主要工作原理是通過獲得兄弟agg 結果來進行程式設計。
agg script 拿不到doc欄位，因此無法根據doc來計算
子agg script拿不到父親agg 兄弟的agg結果
agg 不能使用script_fileds 進行計算。(https://discuss.elastic.co/t/can-elasticsearch-do-group-by-and-order-by-count/65365/2)

所以需要先計算再進行統計的欄位，不能在script中實現，建議還是先由程式計算後，直接儲存到indx裡面。

例子：
下面的例子是通過expereices 中的talentId ，來計算文件分佈情況，同時想要獲得不同 talentId 分佈下experience.id的id求和，可以用於實驗上述注意項。
POST /1_Talents/_search
{
  "query": {
    "nested": {
      "path": "experiences",
      "query": {
        "query_string": {
          "query": "experiences.talentId:*"
        }        
      }
    }
  },
  "size": 0, 
  "_source":["experiences.talentId"],
  "aggs" : {
    "nesting" : {
      "nested": {
        "path": "experiences"
      },
      "aggs": {
        "cate":{
          "histogram": {
            "field" : "experiences.talentId",
            "interval" : 10
          },
          "aggs": {
            "total_id":{
                "sum": {
                    "field": "experiences.id"
                }
            },
            "script_aggs": {
              "bucket_script": {
                "buckets_path": {
                  "total_id":"total_id"
                }, 
                "script": "params.total_id"
              }
            }
          }
        }
      }
    },
    "p_id":{
      "sum": {
          "field": "id"
      }      
    }
  }  
}

6.script query

下面的script查詢模擬一般查詢的experiences.startAt:*

注意script query 無法使用params[‘_source’]

POST /1_Talents/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "experiences",
            "query": {
              "script": {
                "script": """
                  return doc['experiences.startAt'].size()>0
                """
              }
            }
          }
        }
      ]
    }
  }
}

7.function score => scirpt score

根據不同條件和doc 值返回不同的score 權重乘數
無法得到該文件條目1的效果，因為function score 中拿不到欄位的匹配百分比和匹配次數

GET 1_Talents/_search
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            {
              "nested": {
                "path": ["experiences"],
                "query": {
                  "query_string": {
                    "default_field": "experiences.workSoldDesc", 
                    "query": "Saas Information",
                    "minimum_should_match": "50%"
                  }
                }
              }
            },    
            {
              "nested": {
                "path": ["experiences"],
                "query": {
                  "query_string": {
                    "default_field": "experiences.workSoldDesc", 
                    "query": "Saas Information",
                    "minimum_should_match": "70%"
                  }
                }
              }
            }
          ]
        }
      },
      "functions":[
        {
            "filter": { "match": { "id": 76 } },
            "random_score": {}, 
            "weight": 10
        },
        {
            "filter": { "match": { "id": "91" } },
            "weight": 100
        },
        {
          "script_score": {
            "script": "if(doc['id'].value == 109){return 1000}"
          }
        }
      ]
    }
  },
  "_source": ["_score"],
  "explain": false
}

8.painless 上下文

painless 的程式設計環境，有很多記憶體變數和獲取資料的API，但是在不同的功能裡面，這個能用的api都是不一樣的。

例如：
首先params._source, doc , ctx 概念上屬於painless context（painless上下文，三者都是用於Script 程式設計中獲取doc field 用的，
但是不是所有情況都有這三個物件

詳情可以查閱 painless 上下文列表：
script query 使用的是filter context
www.elastic.co/guide/en/elasticsea...

本作品採用《CC 協議》，轉載必須註明作者和本文連結

複雜查詢—子查詢
2020-11-16
Elasticsearch複合查詢——boosting查詢
2021-11-17
Elasticsearch
ElasticSearch多層nested查詢、nested過濾排除非結果內容
2020-11-29
Elasticsearch
基於 MongoTemplate 實現MongoDB的複雜查詢
2024-12-03
MongoDB
SQL 複雜查詢
2022-03-14
SQL
JPA的多表複雜查詢
2019-08-03
oracle表複雜查詢
2020-04-04
Oracle
Solr複雜查詢一：函式查詢
2020-04-17
Solr函式
Laravel使用MongoDB複雜的查詢
2021-01-14
LaravelMongoDB
Elasticsearch 複合查詢——多字串多欄位查詢
2021-03-14
Elasticsearch字串
Elasticsearch 7.x Nested 巢狀型別查詢 | ES 乾貨
2019-07-23
Elasticsearch巢狀型別
Elasticsearch複合查詢—constant score query
2021-11-17
Elasticsearch
SQL學習(三）複雜查詢
2020-12-20
SQL
linux中查詢find命令的複雜用法
2021-09-19
Linux
關於查詢最佳化的一些總結
2024-04-09
Spring JPA聯表情況下的複雜查詢
2019-02-23
Spring
探討一個比較複雜的查詢
2020-10-19
如何完成複雜查詢的動態構建？
2021-11-17
Laravel Query Builder 複雜查詢案例：子查詢實現分割槽查詢 partition by
2018-11-27
LaravelUI
es的複雜查詢測試，使用jest的dsl工具寫查詢語句
2020-12-19
基於Lucene查詢原理分析Elasticsearch的效能
2018-10-30
Elasticsearch
Elasticsearch查詢
2018-12-01
Elasticsearch
elasticsearch的模糊查詢
2019-01-04
Elasticsearch
ElasticSearch的查詢（二）
2021-02-03
Elasticsearch
複雜查詢還是直接寫sql吧
2024-08-08
SQL
微服務複雜查詢之快取策略
2021-03-15
微服務快取
日常分享：關於時間複雜度和空間複雜度的一些優化心得分享(C#)
2021-01-23
時間複雜度優化C#
Elasticsearch中的Term查詢和全文查詢
2021-07-06
Elasticsearch
寫一個“特殊”的查詢構造器 – (四、條件查詢：複雜條件)
2019-02-16
mybatis plus 使用LambdaQueryWrapper設定複雜的條件查詢
2024-07-27
MyBatisAPP
Elasticsearch 高亮查詢
2019-01-24
Elasticsearch
ElasticSearch DSL 查詢
2021-02-23
Elasticsearch
ES 20 - 查詢Elasticsearch中的資料 (基於DSL查詢, 包括查詢校驗match + bool + term)
2019-06-27
Elasticsearch
Spring Data Jpa 複雜查詢總結 (多表關聯以及自定義分頁 )
2018-05-13
Spring
關於oracle的空間查詢
2018-03-15
Oracle
Mysql 日期格式化複雜日期區間查詢
2021-08-31
MySql
關於定時任務的一些雜談
2021-01-14
Elasticsearch script sort 排序
2019-01-22
Elasticsearch排序