海量使用者實時定位和圈人-團圓社會公益系統(位置尋人圈人)

德哥發表於2017-09-17

背景

老人、兒童是最容易走丟的人群，一定要看好老人和小孩，但是萬一走丟了怎麼辦呢？

阿里有一個公益系統，團圓，這個系統是用來幫助釋出走丟人群資訊的，公安通過釋出的走丟人的照片，最後一次的位置資訊，向社會發布。

通過公益平臺的合作伙伴（例如運營商、購物軟體等）可以向最後一次走丟人士出現的位置附近的人推送尋人啟事，調動社會力量幫助尋找丟失人。

為了實現這個目的，需要收集社會人士的實時位置，現在有很多技術可以實現，例如手機基站定位、GPS定位等。

假設有10億手機使用者，使用者的位置實時變動，實時的位置資訊需要更新到資料庫中。每天可能有千億次位置更新。

同時釋出走失資訊後，需要到資料庫中，根據走失位置圈出附近的人。

簡單粗暴設計1 – geometry + GiST空間索引

1、表結構設計：

create table tbl_pos(      
  id int primary key,  -- 使用者ID      
  pos point  -- 使用者實時位置      
);

2、空間索引

create index idx_tbl_pos on tbl_pos using gist(pos);

效能評測

實時更新10億使用者位置，使用insert on conflict語法。

vi test.sql      
      
set id random(1,1000000000)      
insert into tbl_pos values (:id, point(random()*180,random()*90)) on conflict (id) do update set pos=excluded.pos;

使用32個併發，實時生成使用者隨機位置.

nohup pgbench -M prepared -n -r -P 5 -f ./test.sql -c 32 -j 32 -T 120000 > ./pos.log 2>&1 &

1、實時位置更新TPS，約18萬/s。

伺服器負載，伺服器還是非常空閒的，有足夠的資源提供給查詢

top - 01:52:34 up 76 days, 15:32,  2 users,  load average: 33.74, 33.56, 31.47      
Tasks: 1064 total,  34 running, 1030 sleeping,   0 stopped,   0 zombie      
%Cpu(s): 47.6 us,  5.4 sy,  0.0 ni, 46.9 id,  0.2 wa,  0.0 hi,  0.0 si,  0.0 st      
KiB Mem : 52807456+total, 32911484+free, 10949652 used, 18801006+buff/cache      
KiB Swap:        0 total,        0 free,        0 used. 42997945+avail Mem

2、查詢效能。

在位置更新的同時，測試查詢效能。

假設走失人口最後位置出現在杭州，那麼我們需要查詢在某個平面（例如杭州市）內的點。返回500萬個點（社會使用者），僅需28秒。

使用空間索引，返回速度槓槓的。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl_pos where box(point(1,1), point(25.5,25.5)) @> pos limit 5000000;      
                                                                      QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------      
 Limit  (cost=0.55..412954.11 rows=407872 width=20) (actual time=1.433..27536.623 rows=5000000 loops=1)      
   Output: id, pos      
   Buffers: shared hit=6183117 dirtied=31842      
   ->  Index Scan using idx_tbl_pos on public.tbl_pos  (cost=0.55..412954.11 rows=407872 width=20) (actual time=1.431..26861.352 rows=5000000 loops=1)      
         Output: id, pos      
         Index Cond: (`(25.5,25.5),(1,1)`::box @> tbl_pos.pos)      
         Buffers: shared hit=6183117 dirtied=31842      
 Planning time: 0.353 ms      
 Execution time: 27950.171 ms      
(9 rows)

實際查詢用，可以使用遊標，流式返回。例子

postgres=# begin;      
BEGIN      
postgres=# declare cur cursor for select * from tbl_pos where box(point(1,1), point(25.5,25.5)) @> pos;      
DECLARE CURSOR      
postgres=# fetch 10 from cur;      
    id     |                 pos                       
-----------+-------------------------------------      
 680844515 | (2.08381220698357,1.25674836337566)      
 498274514 | (2.23715107887983,1.27883949782699)      
  72310072 | (2.1013452205807,1.32945269811898)      
 301147261 | (2.12246049195528,1.33455505594611)      
 186462127 | (2.13169047608972,1.24054086394608)      
 726143191 | (2.27320306934416,1.31862969137728)      
 902518425 | (2.27059512399137,1.32658164482564)      
 534516939 | (2.18118946999311,1.29441328346729)      
 329417311 | (2.27630747482181,1.2547113513574)      
 853173913 | (2.28139906190336,1.33868838194758)      
(10 rows)      
      
postgres=# 	iming      
Timing is on.      
      
postgres=# fetch 10 from cur;      
    id     |                 pos                       
-----------+-------------------------------------      
 223759458 | (2.24917919375002,1.31508464924991)      
 215111891 | (2.10541740059853,1.26674327999353)      
 925178989 | (2.08201663568616,1.2974686967209)      
 954808979 | (2.10515496321023,1.32548315450549)      
 514021414 | (2.17867707833648,1.27732987515628)      
 872436892 | (2.22504794597626,1.31386948283762)      
 507169369 | (2.05484946258366,1.30171341821551)      
 317349985 | (2.25962312892079,1.30945896729827)      
 200956423 | (2.10705514065921,1.30409182514995)      
 598969258 | (1.98812280781567,1.30866004619747)      
(10 rows)      
      
Time: 0.306 ms

通過遊標，客戶端可以邊接收，邊發簡訊或者向軟體推送尋人啟事。

實現流式推送，節省寶貴的尋人時間。

簡單粗暴設計2 – geohash + btree索引

團圓系統對空間資料精度要求並不像一些LBS社交軟體那麼高，可以使用GEOHASH+btree索引代替 geometry+GIST空間索引。

轉換方法使用 PostGIS的 st_geohash(geometry, int) 函式。不再贅述。

《PostGIS空間索引(GiST、BRIN、R-Tree)選擇、優化 – 阿里雲RDS PostgreSQL最佳實踐》

優化設計 – schemaless分割槽法

單表十億空間資料，對於查詢來說，前面已經看到了，毫無壓力。但是隨著頻繁的更新，可能到GiST索引的膨脹，膨脹後，PostgreSQL提供了並行建立索引的方法（不影響堵塞，可以在一個列建立同樣的索引），來維護索引。但是10億資料建立索引會變得很久。

為了解決這個問題，建議使用分割槽表。例如將ID雜湊，分成64個分割槽，每個分割槽1500萬左右資料。

在PostgreSQL中，目前效能最好的分割槽是pg_pathman外掛。或者使用schemaless的方式。下面以schemaless為例子。其實在我曾經寫過的另外的案例中也非常常見

《行為、審計日誌 (實時索引/實時搜尋)建模 – 最佳實踐 2》

《PostgreSQL 時序最佳實踐 – 證券交易系統資料庫設計 – 阿里雲RDS PostgreSQL最佳實踐》

下面以geometry + gist空間索引為例講解schemaless分割槽法。

定義基表

postgres=# create table tbl_pos(id int primary key, pos point);    
CREATE TABLE    
postgres=# create index idx_tbl_pos_1 on tbl_pos using gist(pos);    
CREATE INDEX

定義自動建表函式

create or replace function create_schemaless(    
  target name,   -- 目標表名    
  src name       -- 源表名    
) returns void as $$        
declare        
begin        
  execute format(`create table if not exists %I (like %I including all)`, target, src);        
  execute format(`alter table %I inherit %I`, target, src);        
exception when others then        
  return;        
end;        
$$ language plpgsql strict;

定義以schemaless的方式寫資料的函式

建立一個插入資料的函式，使用動態SQL，如果遇到表不存在的錯誤，則呼叫建表函式進行建表。

create or replace function ins_schemaless(    
  id int,   -- id    
  md int,   -- 取模數    
  pos point -- 位置    
) returns void as $$        
declare       
  target name := `tbl_pos_`||mod(id,md) ;      
begin        
  execute format(`insert into %I values (%L, %L) on conflict (id) do update set pos=point_add(%I.pos, point(random()*10-5, random()*10-5))`, target, id, pos, target);       
  -- 為了模擬真實情況，因為人的移動速度有限，即使駕車，飛機（少數情況），所以用了pos=point_add(%I.pos, point(random()*10-5, random()*10-5))這種方法模擬更真實的情況    
  -- 實際場景，請改成pos=excluded.pos    
  exception         
    WHEN SQLSTATE `42P01` THEN         
    perform create_schemaless(target, `tbl_pos`);        
    execute format(`insert into %I values (%L, %L) on conflict (id) do update set pos=point_add(%I.pos, point(random()*10-5, random()*10-5))`, target, id, pos, target);         
    -- 為了模擬真實情況，因為人的移動速度有限，即使駕車，飛機（少數情況），所以用了pos=point_add(%I.pos, point(random()*10-5, random()*10-5))這種方法模擬更真實的情況    
    -- 實際場景，請改成pos=excluded.pos    
end;        
$$ language plpgsql strict;

資料庫端的schemaless會犧牲一部分效能，因為無法使用繫結變數。

如果可能的話，建議業務層實現schemaless(自動拼接表名，自動建表，自動寫入)，以提高效能。

測試功能

postgres=# select ins_schemaless(2,32,point(1,2));    
 ins_schemaless     
----------------    
     
(1 row)    
    
postgres=# select ins_schemaless(1,32,point(1,2));    
 ins_schemaless     
----------------    
     
(1 row)    
    
postgres=# select tableoid::regclass,* from tbl_pos;    
 tableoid  | id |  pos      
-----------+----+-------    
 tbl_pos_2 |  2 | (1,2)    
 tbl_pos_1 |  1 | (1,2)    
(2 rows)

schemaless設計壓測

vi ~/test.sql    
set id random(1,1000000000)    
select ins_schemaless(:id, 32, point(random()*360-180, random()*180-90));    
    
    
nohup pgbench -M prepared -n -r -P 5 -f ./test.sql -c 32 -j 32 -T 120000 > ./pos.log 2>&1 &

效能依舊槓槓的。

125977 tps

小結

1、通過PostgreSQL的空間資料型別、空間索引。加上insert on conflict的特性。實現了單機約18萬行/s的10億使用者的實時位置更新，同時輸出500萬個點的量級，僅需20幾秒。真正實現了團圓公益系統的時效性。

2、採用遊標，流式返回，實現了邊獲取資料，邊向社會各界傳送尋人啟事的目的。

3、另一方面，使用者位置的變更，實際上是有一定過濾性的，比如使用者從辦公室去上個洗手間，雖然位置可能發生了變化，但是非常細微，這種變化在這套系統中可以過濾（不更新），從而減少資料的更新量。

按照現有的測試資料，可以做到每天155億次的更新。假設每10條更新僅有1條是有效更新，那麼實際上可以支援1550億次的MOVE採集。

4、PostgreSQL是一個很有愛心的資料庫系統哦。

5、將來流計算引擎pipelinedb外掛化後，PostgreSQL內部將整合這個流計算引擎，通過流計算引擎，理論上可以輕鬆實現40萬行/s級別的更新速度，每天支撐300多億次的實時位置更新。

6、採用流計算的方法除了提高效能，同時也降低了XID的消耗，在目前32BIT XID的情形下，可以有效的環節FREEZE帶來的負擔。如果不使用流計算，也建議合併更新，例如一個事務中更新若干條，比如100條，那麼一天的事務數就將到了1.5億。

7、參考

https://www.postgresql.org/docs/9.6/static/gist-implementation.html#GIST-BUFFERING-BUILD