HybridDBforPostgreSQL軌跡相似(伴隨分析)

德哥發表於2018-09-15

原文網址 : https://flycode.co/archives/185367

背景

《阿里雲 PostgreSQL 產品生態；案例、開發實踐、管理實踐、學習資料、學習視訊 – 珍藏級》

以上有個CASE是講：如何找出在同一輛車內的人，實際上就是通過車輛、人物的軌跡點資料，進行多輪的求交集，將結果收斂到一個較小的範圍。

Greenplum伴隨分析測試

1、建立測試表

create table tbl_pos (uid int8, phonenum text, ts timestamp, pos geometry, ghash text);

2、寫入測試資料3億條左右

insert into tbl_pos  
select uid, md5(uid::text), ts, pos, st_geohash(pos,8) from  
(  
  select   
    (random()*10000000)::int8 as uid,   
    ts,   
    st_setsrid(st_makepoint(120.2+0.5-random(), 30.3+0.5-random()), 4326) as pos   
  from   
  generate_series(`2018-01-01`::timestamp, `2019-01-01`::timestamp, interval `0.1 sec`)   
t (ts)  
) t;

3、重複生成資料到100億

postgres=> insert into tbl_pos select * from tbl_pos;  
INSERT 0 315360001  
postgres=> 	iming  
Timing is on.  
postgres=> insert into tbl_pos select * from tbl_pos;  
INSERT 0 630720002  
Time: 8992.130 ms (00:08.992)  
postgres=> insert into tbl_pos select * from tbl_pos;  
INSERT 0 1261440004  
Time: 19517.465 ms (00:19.517)  
postgres=> insert into tbl_pos select * from tbl_pos;  
INSERT 0 2522880008  
Time: 37244.890 ms (00:37.245)  
postgres=> insert into tbl_pos select * from tbl_pos;  
INSERT 0 5045760016  
Time: 73391.309 ms (01:13.391)

4、新增索引

create index idx_tbl_pos_1 on tbl_pos(ghash);  
create index idx_tbl_pos_2 on tbl_pos using gist(pos);  
create index idx_tbl_pos_3 on tbl_pos (ts);

5、收集統計資訊

postgres=> vacuum analyze tbl_pos;

6、伴隨分析

postgres=> select count(*) from tbl_pos where ghash >= `wtmm1k` and ghash < `wtmm1`||chr(ascii(`k`)+1);  
 count    
--------  
 609632  
(1 row)  
  
Time: 60.275 ms  
  
postgres=> select count(*) from tbl_pos where ts between `2018-01-01 00:06:25.2` and `2018-01-01 01:06:25.2` and ghash >= `wtmmmm` and ghash < `wtmmm`||chr(ascii(`m`)+1);  
 count   
-------  
 69984  
(1 row)  
  
Time: 70.161 ms  
  
  
postgres=> explain select count(*) from tbl_pos where ts between `2018-01-01 00:06:25.2` and `2018-01-01 01:06:25.2` and ghash >= `wtmmmm` and ghash < `wtmmm`||chr(ascii(`m`)+1);  
                                                                                                     QUERY PLAN                                                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=10681940.85..10681940.86 rows=1 width=8)  
   ->  Gather Motion 1024:1  (slice1; segments: 1024)  (cost=10681930.57..10681940.83 rows=1 width=8)  
         ->  Aggregate  (cost=10681930.57..10681930.58 rows=1 width=8)  
               ->  Bitmap Heap Scan on tbl_pos  (cost=10520118.62..10681889.69 rows=16 width=0)  
                     Recheck Cond: ts >= `2018-01-01 00:06:25.2`::timestamp without time zone AND ts <= `2018-01-01 01:06:25.2`::timestamp without time zone AND ghash >= `wtmmmm`::text AND ghash < `wtmmmn`::text  
                     ->  BitmapAnd  (cost=10520118.62..10520118.62 rows=2 width=0)  
                           ->  Bitmap Index Scan on idx_tbl_pos_3  (cost=0.00..1751.33 rows=161 width=0)  
                                 Index Cond: ts >= `2018-01-01 00:06:25.2`::timestamp without time zone AND ts <= `2018-01-01 01:06:25.2`::timestamp without time zone  
                           ->  Bitmap Index Scan on idx_tbl_pos_1  (cost=0.00..10518358.87 rows=97466 width=0)  
                                 Index Cond: ghash >= `wtmmmm`::text AND ghash < `wtmmmn`::text  
 Settings:  effective_cache_size=8GB; enable_seqscan=off; gp_statistics_use_fkeys=on; optimizer=off; work_mem=64MB  
 Optimizer status: legacy query optimizer  
(12 rows)

在某5天出現在某個範圍的人的交集：

select uid from   
(select distinct uid from tbl_pos where ts between `2018-01-01 00:06:25.2` and `2018-01-01 01:06:25.2` and ghash >= `wtmmmm` and ghash < `wtmmm`||chr(ascii(`m`)+1)) t1  
join  
(select distinct uid from tbl_pos where ts between `2018-02-01 00:06:25.2` and `2018-02-01 01:06:25.2` and ghash >= `wtmmmm` and ghash < `wtmmm`||chr(ascii(`m`)+1)) t2  
using (uid)  
join  
(select distinct uid from tbl_pos where ts between `2018-03-01 00:06:25.2` and `2018-03-01 01:06:25.2` and ghash >= `wtmmmm` and ghash < `wtmmm`||chr(ascii(`m`)+1)) t3  
using (uid)  
join  
(select distinct uid from tbl_pos where ts between `2018-04-01 00:06:25.2` and `2018-04-01 01:06:25.2` and ghash >= `wtmmmm` and ghash < `wtmmm`||chr(ascii(`m`)+1)) t4  
using (uid)  
join  
(select distinct uid from tbl_pos where ts between `2018-05-01 00:06:25.2` and `2018-05-01 01:06:25.2` and ghash >= `wtmmmm` and ghash < `wtmmm`||chr(ascii(`m`)+1)) t5  
using (uid);  
    
  
 uid   
-----  
(0 rows)  
  
Time: 207.750 ms

小結

1、PostgreSQL bitmap scan支援將多個索引合併，在本例中體現在時間、空間、（其他）條件。

《PostgreSQL 資料庫多列複合索引的欄位順序選擇原理》

《PostgreSQL bitmapAnd, bitmapOr, bitmap index scan, bitmap heap scan》

2、除了使用這種暴力求交集的方法，還有別的方法來計算伴隨麼？相似軌跡分析可能也是一個不錯的選擇

參考

《PostgreSQL + PostGIS 時態分析》

《阿里雲 PostgreSQL 產品生態；案例、開發實踐、管理實踐、學習資料、學習視訊 – 珍藏級》

如何判斷兩條軌跡（或曲線）的相似度？
2020-12-01
伴隨症狀伴隨因素迴歸分析相關係數多元分析
2024-11-16
KITTI_00_SPTAM軌跡和KITTI_00_ORB軌跡
2024-11-07
ORB
軌跡資料（三）
2024-11-03
軌跡規劃理解
2020-12-12
根據軌跡分析出使用者家在哪
2021-03-25
Python量化交易進階講堂-視覺化隨機漫步軌跡
2019-06-05
Python視覺化隨機
軌跡聯邦用到的具體公式 + 軌跡模組的設計
2024-11-10
公式
Leaflet 帶箭頭軌跡以及沿軌跡帶方向的動態marker
2021-03-02
聊聊 RocketMQ 訊息軌跡
2023-11-03
MQ
智慧手環軌跡API
2021-11-18
API
JUST技術：如何通過軌跡相似性度量方法，發現新冠易感人群
2020-10-09
《黎之軌跡》發售紀念訪談：《軌跡》系列的巨大分水嶺
2021-09-30
智慧控制基礎實驗2：根軌跡分析法
2020-12-30
網頁中複雜伴隨式動畫的分析
2018-11-28
網頁動畫
not noly go —— 執行軌跡[一]
2021-11-13
Go
GPS軌跡記錄器：myTracks for mac
2024-01-09
Mac
如何繪製完美的滑鼠軌跡
2021-11-03
大事記管理，OA系統讓員工隨時知曉企業發展軌跡
2020-02-09
從行為軌跡進行大資料分析有什麼好處？
2022-12-29
大資料
百度地圖軌跡(Andriod SDK)
2018-10-11
地圖
自動生成相機標定軌跡
2020-11-04
Mapbox 輕鬆規劃起始點軌跡
2024-11-26
cesium製作自己的騎行軌跡
2021-11-03
風場視覺化：繪製軌跡
2022-03-12
視覺化
mapboxgl實現帶箭頭軌跡線
2021-01-04
5分鐘學會物流軌跡地圖API嵌入到頁面中，實現物流軌跡視覺化
2024-03-01
地圖API視覺化
智慧手環運動軌跡API獲取
2021-10-28
API
伴隨疫情的網路安全風險
2020-08-17
求任意矩陣的伴隨矩陣
2024-06-18
矩陣
css3實現橢圓軌跡旋轉
2019-04-21
CSSS3
1個小時快速搭建一個物流系統（包括軌跡查詢，物流動態提醒，軌跡視覺化，寄件）
2020-12-12
視覺化
GPS軌跡記錄程式 myTracks 最新免啟用版
2023-10-26
百度路書實現軌跡回放（標準）
2020-10-20
資料變更白屏化利器-推送軌跡上線
2022-09-05
Falcom 之軌跡：40 年不變的 JRPG 魂（上）
2021-03-25
Falcom 之軌跡：40 年不變的 JRPG 魂（下）
2021-03-29
【論文研讀】通過deep representation learning軌跡聚類
2018-12-19
聚類

HybridDBforPostgreSQL軌跡相似(伴隨分析)

標籤

背景

Greenplum伴隨分析測試

小結

參考

相關文章