ElasticSearch 遇見(4)

daxuesheng發表於2021-09-09
希望給同樣遇見es的你帶來幫助,也希望彼此有更多的討論
版本選擇6.4.3
1-Java 客戶端的使用 (下)
  批次插入
  聚合查詢
  scroll-scan  

批次插入

一般快速匯入資料,會選擇批次插入的方式,比如重新索引資料的時候
    @Override
    public void bulk(List<CometIndex> list) {
        //批次插入資料
        BulkRequest request = new BulkRequest();
        try {
            for (CometIndex cometIndex:list){
                request.add(new IndexRequest(CometIndexKey.INDEX_NAME, CometIndexKey.INDEX_NAME)
                        .source(objectMapper.writeValueAsString(cometIndex), XContentType.JSON));
            }

            BulkResponse bulkResponse = client.bulk(request, RequestOptions.DEFAULT);
            //The Bulk response provides a method to quickly check if one or more operation has failed:
            if (bulkResponse.hasFailures()) {
               log.info("all success");
            }
            TimeValue took = bulkResponse.getTook();
            log.info("[批次新增花費的毫秒]:{},({}),{}", took, took.getMillis(), took.getSeconds());
            //所有操作結果進行迭代
            /*for (BulkItemResponse bulkItemResponse : bulkResponse) {
                if (bulkItemResponse.isFailed()) {
                    BulkItemResponse.Failure failure = bulkItemResponse.getFailure();
                }
            }*/
        }catch (Exception e){
            e.printStackTrace();
        }
    }
    
    @Test
    public void bulkAdd(){
       List<CometIndex>list=new ArrayList<>();
       int count=0;
       for (int i=0;i<1000;i++){
           CometIndex cometIndex=new CometIndex();
           cometIndex.setCometId((long)i);
           cometIndex.setAuthor("心機boy");
           cometIndex.setCategory("movie");
           cometIndex.setContent("肖申克的救贖"+i);
           cometIndex.setDescription("肖申克的救贖滿分");
           cometIndex.setEditor("cctv");
           cometIndex.setTitle("肖申克的救贖"+i);
           cometIndex.setCreateTime(new Date());
           list.add(cometIndex);
           count++;
           if (count%100==0) {//批次100
               searchService.bulk(list);
               list.clear();
           }
       }
    }

聚合查詢

1-Metric聚合
  基於一組文件進行聚合,比如mysql中的MIN(), MAX(), STDDEV(), SUM() 等方法。
獲取最大的值  
GET _search
{
    "aggs":{
        "max_id":{
            "max":{
                "field":"cometId" 
            }
        }
    }
}
2-Bucketing聚合
  基於檢索構成了邏輯文件組,滿足特定規則的文件放置到一個桶裡,每一個桶關聯一個key。比如mysql中的group by。
按照分類聚合  
GET _search
{
    "aggs" : {
        "category_agg" : {
            "terms" : { "field" : "category",
            "order" : { "_count" : "desc" }
            }
      }
    }
}
按照分類分組聚合後繼續按照編輯分組聚合
GET _search
{
    "aggs" : {
        "category_agg" : {
            "terms" : { "field" : "category",
            "order" : { "_count" : "desc" }
            },
         "aggs" : {
            "author_agg" : {
               "terms": {
                 "field": "editor"
               }
             }
         }
      }
    }
}
    @Override
    public Map <Object,Long> aggregateCategory() {
        //按照分類 聚合 獲取每種分類的個數

       Map <Object,Long>result=new HashMap<>();

       try {
           SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

           TermsAggregationBuilder aggregation = AggregationBuilders.terms(CometIndexKey.CATEGORY_AGG)
                   .field(CometIndexKey.CATEGORY).order((BucketOrder.aggregation("_count", false)));

           //聚合
           searchSourceBuilder.aggregation(aggregation).size(0);
           SearchRequest searchRequest = new SearchRequest();
           searchRequest.indices(CometIndexKey.INDEX_NAME);
           searchRequest.source(searchSourceBuilder);
           SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

           Aggregations aggregations = searchResponse.getAggregations();
           Terms byCategoryAggregation = aggregations.get(CometIndexKey.CATEGORY_AGG);

           if(byCategoryAggregation.getBuckets()!=null && !byCategoryAggregation.getBuckets().isEmpty()){

               List <? extends Terms.Bucket>list=byCategoryAggregation.getBuckets();

               for (Terms.Bucket bucket:list){
                   bucket.getDocCount();
                   bucket.getKey();
                   log.info("key:{},value:{}",bucket.getKey(),bucket.getDocCount());
                   result.put(bucket.getKey(),bucket.getDocCount());
               }
           }

       }catch (Exception e){
           log.error("agg error");
           return result;
       }
       return result;
    }

    @Test
    public void testAgg(){

        Map<Object,Long>result=searchService.aggregateCategory();

        for (Map.Entry<Object,Long> entry : result.entrySet()) {
            System.out.println("Key = " + entry.getKey() + ", Value = " + entry.getValue());
        }
    }
聚合的種類很多,這裡只給出簡單的一種,大家可以多在dev_tools中嘗試

scroll-scan

1-from-size 的限制: 資料越多,其效率就越低
2-scroll:
滾動搜尋,它會及時獲取一個快照(先做一次初始化搜尋把所有符合搜尋條件的結果快取起來生成一個快照,然後持續地、批次地從快照里拉取資料直到沒有資料剩下)。這不會受到後來對索引的改變的影響。
3-scan:
深度分頁的最耗資源的部分就是對結果的整體排序,但是如果我們關閉排序,那麼可以消耗極少資源返回所有的文件.
我們可以使用 scan 搜尋型別。scan 會告訴ES 不去排序,而是僅僅從每個仍然有結果的分片中返回下一批資料。
    @Override
    public void scrollScan() {
        //scroll 查詢  批次插入
        try {

            final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
            SearchRequest searchRequest = new SearchRequest();
            searchRequest.indices(CometIndexKey.INDEX_NAME);//設定指定的索引
            searchRequest.scroll(scroll);
            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
            searchSourceBuilder.query(QueryBuilders.matchAllQuery());//查詢所有
            searchSourceBuilder.size(1000);
            searchRequest.source(searchSourceBuilder);

            SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
            String scrollId = searchResponse.getScrollId();//獲取第一次scrollId

            SearchHits searchHits=searchResponse.getHits();
            log.info("scrollId:{},total:{}",scrollId,searchHits.getTotalHits());
            SearchHit[] hits=searchHits.getHits();

            while (hits != null && hits.length > 0) {

                for (SearchHit hit : hits) {
                    // do something with the SearchHit
                    Map<String, Object> sourceAsMap =  hit.getSourceAsMap();
                    log.info("title:{}",sourceAsMap.get(CometIndexKey.TITLE));
                }

                log.info("scrollId:{}",scrollId);
                SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);//根據scrollId檢索
                scrollRequest.scroll(scroll);
                searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
                scrollId = searchResponse.getScrollId();//獲取下一次scrollId
                log.info("scrollId:{}",scrollId);
                hits = searchResponse.getHits().getHits();

            }

            //release the search context
            ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
            clearScrollRequest.addScrollId(scrollId);
            ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
            boolean succeeded = clearScrollResponse.isSucceeded();

            log.info("ScrollRequest result:{}",succeeded);
        }catch (Exception e){
            e.printStackTrace();
        }

    }

    @Test
    public void scrollScan(){
        searchService.scrollScan();
    }

掌握了使用的api,我們可以透過批次插入資料的api,生成資料,然後進行測試.
後面會介紹我們怎麼使用它。
  • 完整程式碼,完結後會提供github地址

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/4692/viewspace-2817572/,如需轉載,請註明出處,否則將追究法律責任。

相關文章