經過之前的工作,目前已經完成了資料地圖的資料格式化和錄入記錄,目前我們的資料地圖專案已經進行到最後階段,所以現在需要一個介面,進行格式化資料並輸出,其中需要用到Elasticsearch的全文檢索,檢索出資料後,使用php介面格式化資料輸出
一、全文檢索
- 搜尋條件(時間,空間)
- 輸出結果(使用者數量)
例如,一個小時內,在中國範圍內,各個經緯度座標的,有操作行為的,使用者個數
由此需求,可以得到相應的Elasticsearch的搜尋語句,如下:
{ "size": 0, "aggs": { "filter_agg": { "filter": { "geo_bounding_box": { "location": { "top_left": { "lat": 90, "lon": -34.453125 }, "bottom_right": { "lat": -90, "lon": 34.453125 } } } }, "aggs": { "2": { "geohash_grid": { "field": "location", "precision": 2 }, "aggs": { "3": { "geo_centroid": { "field": "location" } } } } } } }, "stored_fields": [ "*" ], "docvalue_fields": [ "@timestamp" ], "query": { "bool": { "must": [ { "range": { "@timestamp": { "gte": 1542692193461, "lte": 1542695793461, "format": "epoch_millis" } } } ] } } } 複製程式碼
- size=0表示不分頁
- query為搜尋主體,其中的必要條件為時間引數,即,搜尋此段時間內的所有資料
- aggs中相當於spl中的where條件,而其中geo_bounding_box為地理範圍,由左上角經緯度點到右下角經緯度點所界定的一個矩形方框。
- aggs巢狀,即上層條件的結果上,繼續做篩選
- geohash_grid表示,按照你定義的精度計算每一個點的 geohash 值而將附近的位置聚合在一起,其中field為目前篩選的的欄位, precision為經度,單位為km
- 最後,通過geo_centroid得到key為location的聚合資料
結果資料格式如下:
{ "took": 428, "timed_out": false, "_shards": { "total": 131, "successful": 126, "skipped": 121, "failed": 5, "failures": [ { "shard": 0, "index": "elastalert_status_status", "node": "w10b9zEBRpuUEQsWvNqEig", "reason": { "type": "query_shard_exception", "reason": "failed to find geo_point field [location]", "index_uuid": "Dm4dpUtTTHitYN-TZFC-1g", "index": "elastalert_status_status" } } ] }, "hits": { "total": 360942, "max_score": 0, "hits": [] }, "aggregations": { "filter_agg": { "2": { "buckets": [ { "3": { "location": { "lat": 48.58949514372008, "lon": 7.584022147181843 }, "count": 252 }, "key": "u0", "doc_count": 252 }, { "3": { "location": { "lat": 54.420127907268785, "lon": -3.120888938036495 }, "count": 181 }, "key": "gc", "doc_count": 181 }, { "3": { "location": { "lat": 42.32862451614172, "lon": 3.7518564593602917 }, "count": 67 }, "key": "sp", "doc_count": 67 }, { "3": { "location": { "lat": 45.40799999143928, "lon": 11.88589995726943 }, "count": 21 }, "key": "u2", "doc_count": 21 }, { "3": { "location": { "lat": 46.65579996071756, "lon": 32.61779992841184 }, "count": 1 }, "key": "u8", "doc_count": 1 } ] }, "doc_count": 522 } } } 複製程式碼
- aggregations中是我們最終需要的資料
- 其中location為聚合的經緯度座標,緊跟著的count則指的是,在此點2km*2km範圍之內的使用者數。
自此,我們首先明白了,在Elasticsearch,如何書寫search語句查詢我們想要的東西。 接下來,我們需要書寫相應的php介面,來格式化輸出資料
二、介面書寫
- 使用Elasticseach的PHP API
- 確定輸入引數:時間範圍,空間範圍
- 確定輸出資料結構,並格式化資料輸出
程式碼如下,有註釋:
<?php
/**
* Created by PhpStorm.
* User: ekisong
* Date: 2018/11/13
* Time: 15:55
*/
require 'vendor/autoload.php';
ini_set('display_errors','on');
error_reporting(E_ALL);
use Elasticsearch\ClientBuilder;
//建立Elasticsearch 的搜尋物件client
$client = ClientBuilder::create()->setHosts(["localhost:9200"])->build();
//需要被篩選的欄位名,預設值為location
$fieldName = isset($_GET['field']) ? $_GET['field'] : 'location';
//地理圍欄左上角緯度,預設值90
$topLeftLat = isset($_GET['top_left_lat']) ? $_GET['top_left_lat'] : 90;
//地理圍欄左上角經度,預設值-180
$topLeftLon = isset($_GET['top_left_lon']) ? $_GET['top_left_lon'] : -180;
//地理圍欄右下角緯度,預設值-90
$bottomRightLat = isset($_GET['bottom_right_lat']) ? $_GET['bottom_right_lat'] : -90;
//地理圍欄右下角經度,預設值180
$bottomRightLon = isset($_GET['bottom_right_lon']) ? $_GET['bottom_right_lon'] : 180;
//時間範圍結束時間,預設當前時間
$endTime = isset($_GET['end_time']) ? $_GET['end_time'] : time()*1000;
//時間範圍其實時間,預設當前時間前15分鐘
$startTime = isset($_GET['start_time']) ? $_GET['start_time'] : $endTime - 15*60*1000;
//建立查詢結構體
$body = [
'size' => 0,
'query' => [
'bool' => [
'must' => [
[
'range' => [
'@timestamp' => [
'gte' => $startTime,
'lte' => $endTime,
'format' => 'epoch_millis'
]
]
]
]
]
],
'aggs' => [
'filter_agg' => [
'filter' => [
'geo_bounding_box' => [
'location' => [
'top_left' => [
'lat' => $topLeftLat,
'lon' => $topLeftLon
],
'bottom_right' => [
'lat' => $bottomRightLat,
'lon' => $bottomRightLon
]
]
]
],
'aggs' => [
'2' => [
'geohash_grid' => [
'field' => $fieldName,
'precision' => 1
],
'aggs' => [
'3' => [
'geo_centroid' => [
'field' => $fieldName
]
]
]
]
]
]
],
'stored_fields' => [
'*'
],
'docvalue_fields' => [
'@timestamp'
]
];
//搜尋引數
$params = [
'index' => 'logstash-*',
'body' => $body
];
//Elasticsearch搜尋結果原始資料
$response = $client->search($params);
$resultTmp = $response['aggregations']['filter_agg']['2']['buckets'];
$data = array();
//格式化資料
foreach ($resultTmp as $doc)
{
$lat = $doc['3'][$fieldName]['lat'];
$lon = $doc['3'][$fieldName]['lon'];
$count = $doc['doc_count'];
$tmp = [
'count' => $count,
'geometry' => [
'type' => 'Point',
'coordinates' => [$lon,$lat]
]
];
$data[] = $tmp;
}
$result = array('data'=>$data,'error_msg'=>'','flag'=>1);
if (empty($data))
{
$result['error_msg'] = 'no data';
$result['flag'] = 0;
}
//最終輸出
echo json_encode($result);
exit();
複製程式碼
由於H5頁面外掛限制,所以需要特定的資料格式。所以最終輸出結果如下:
[{
"count": 6,
"geometry": {
"type": "Point",
"coordinates": ["116.395645", "39.929986"]
}
}, {
"count": 6,
"geometry": {
"type": "Point",
"coordinates": ["121.487899", "31.249162"]
}
}, {
"count": 5,
"geometry": {
"type": "Point",
"coordinates": ["117.210813", "39.14393"]
}
}, {
"count": 4,
"geometry": {
"type": "Point",
"coordinates": ["106.530635", "29.544606"]
}
}]
複製程式碼
至此,我們資料地圖專案在資料方面的工作暫且告一段落。
參考文件:
www.elastic.co/guide/en/elasticsearch/reference/current/search.html