ES 筆記二十一:單字串多欄位查詢: Multi Match

CrazyZard發表於2019-11-14
  • 最佳欄位(Best Fields)
    • 當欄位之間相互競爭,又相互關聯。例如title 和body 這樣的欄位,評分來自最匹配欄位
  • 多數字段(Most Fields)
    • 處理英文內容時:一種常見的手段是,在主欄位(English Analyzer),抽取詞幹,加入同義詞,以匹配更多的文件。相同的文字,加入子欄位(Standard Analyzer),以提供更加精確的匹配。其他欄位作為匹配文件提高性相關度的訊號。匹配欄位越多越好
  • 混合欄位(Cross Field)
    • 對於某些實體,例如人名,地址,圖書資訊。需要在多個欄位中確定資訊,單個欄位只能作為整體的一部分。希望在任何這些列出的欄位中儘可能找出多的詞
  • Best Fields 是預設型別,可不指定
  • Minimum should match 等引數可以傳遞到生成的query中
POST blogs/_search
{
  "query": {
    "multi_match": {
      "type": "best_fields",
      "query": "Quick pets",
      "fields": ["title","body"],
      "tie_breaker": 0.2,
      "minimum_should_match": "20%"
    }
  }
}

查詢案例

PUT /titles
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "english"
      }
    }
  }
}

POST titles/_bulk
{"index":{"_id":1}}
{"title":"My dog barks"}
{"index":{"_id":2}}
{"title":"I see a lot of barking dogs on the road "}

GET titles/_search
{
  "query": {
    "match": {
      "title": "barking dogs"
    }
  }
}
//結果 因為是english 分詞 ,且短 則 id 排第一個
"hits" : [
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.24399278,
        "_source" : {
          "title" : "My dog barks"
        }
      },
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.1854345,
        "_source" : {
          "title" : "I see a lot of barking dogs on the road "
        }
      }
    ]

重新設定mapping

DELETE titles

PUT /titles
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "english",
        "fields": {
          "std":{
            "type":"text",
            "analyzer":"standard"
          }
        }
      }
    }
  }
}
POST titles/_bulk
{"index":{"_id":1}}
{"title":"My dog barks"}
{"index":{"_id":2}}
{"title":"I see a lot of barking dogs on the road "}
//multi_match 查詢
GET titles/_search
{
  "query": {
    "multi_match": {
      "query": "barking dogs",
      "type": "most_fields", //預設是best_fields
      "fields": ["title","title.std"]//累計疊加
    }
  }
}
//返回
"hits" : [
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.4569323,
        "_source" : {
          "title" : "I see a lot of barking dogs on the road "
        }
      },
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.42221838,
        "_source" : {
          "title" : "My dog barks"
        }
      }
    ]

使用多欄位匹配解決

  • 用廣度匹配欄位title包括儘可能多的文件- 以提高召回率 ,同時又使用欄位title.std 作為資訊將相關度更高的文件結至於文件頂部
  • 每個欄位對於最終評分的貢獻可以通過自定義值boost來控制。比如,使title欄位更為重要,這樣同時也降低了其他訊號欄位的作用
GET titles/_search
{
  "query": {
    "multi_match": {
      "query": "barking dogs",
      "type": "most_fields", 
      "fields": ["title^10","title.std"]
    }
  }
}

跨欄位搜尋

  • most_fields無法使用opeartor
  • 可以用copy_to解決,但是需要額外的儲存空間
  • cross_fields可以支援operator
  • 與copy_to 相比,其中一個優勢就是可以在搜尋時為某個欄位提升權重
PUT address/_doc/1
{
  "street":"5 Poland Street",
  "city" : "Lodon",
  "country":"United Kingdom",
  "postcode" : "W1V 3DG"
}

POST address/_search
{
  "query":{
    "multi_match": {
      "query": "Poland Street W1V",
      "type": "cross_fields",  //most_fields查詢為空
      "operator": "and", 
      "fields": ["street","city","country","postcode"]
    }
  }
}
"hits" : [
      {
        "_index" : "address",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.8630463,
        "_source" : {
          "street" : "5 Poland Street",
          "city" : "Lodon",
          "country" : "United Kingdom",
          "postcode" : "W1V 3DG"
        }
      }
    ]
本作品採用《CC 協議》,轉載必須註明作者和本文連結

快樂就是解決一個又一個的問題!

相關文章