恭迎萬億級營銷(圈人)瀟灑的邁入毫秒時代-萬億user_tags級實時推薦系統資料庫設計

德哥發表於2016-12-27

背景

我們僅用了PostgreSQL的兩個小特性，卻解決了業務困擾已久的大問題。

推薦系統是廣告營銷平臺的奶牛，其核心是精準、實時、高效。

這麼多廣告平臺，到底誰家強？誰的核心牛逼？

1. 精準，指對使用者的描述精準，通常需要基於大量的使用者行為資料，經歷深度學習後形成的使用者畫像，或稱之為標籤系統。標籤的準確性關係到推薦的精準度，比如你可能不會對一個正常的年輕人推薦老花眼鏡（當然如果有其他購買意向的標籤來指出他有購買老花眼鏡的慾望除外）。

2. 實時，指標籤的更新實時性，很多標籤是具有非常強的時效性的，比如一次營銷的目標人群，又或者使用者最近瀏覽的一些商品可能是有潛在購買慾望的商品，都具備時效性。如果你的標籤生成是隔天，或者個很多天的，那麼可能已經錯過了推薦時機。因此實時性在推薦系統中是非常重要的。

3. 高效，指基於標籤圈人的動作的效率與併發能力，作為購買廣告的金主，當然是期望他們拿到資料的速度越快越好。並且會有很多人向你的平臺購買廣告，這考驗的是併發能力。

做到以上三點，這樣的廣告平臺才具備一定的競爭力。

除此之外還需要關注的是平臺的成本，包括硬體的成本，開發成本，維護成本等。

下面將以電商的推薦系統為例，介紹推薦系統的資料庫設計與優化技巧。

以及如何讓營銷瀟灑 (低成本，高併發，高效率) 的邁入毫秒時代。

電商推薦系統部分需求介紹

比如一家店鋪，如何找到它的目標消費群體？

要回答這個問題，首先我們需要收集一些資料，比如：

1. 這家店鋪以及其他的同類店鋪的瀏覽、購買群體。

我們在逛電商時，會產生一些行為的記錄，比如在什麼時間，逛了哪些店鋪，看了哪些商品，最後在哪家店鋪購買了什麼商品。

然後，對於單個商店來說，有哪些使用者逛過他們的商店，購買過哪些商品，可以抽取出一部分人群。

2. 得到這些使用者群體後，篩選出有同類消費慾望、或者具備相同屬性的群體。

對這部分人群的屬性進行分析，可以獲得一個更大範圍的群體，從而可以對這部分群體進行營銷。

以上是對電商推薦系統的兩個簡單的推理。

量級

電商的使用者量級，放眼全球，可能會達到幾十億的級別。

店鋪數量，放眼全球，可能會達到千萬級別。

商品數量（細分種類，比如條形碼），放眼全球，可能會達到億級。

店鋪標籤數量，針對單個使用者而言，逛了哪些店，多少次，看了哪些商品，多少次，買了哪些商品等。通常一個人，在一定的時間範圍內，會產生上千的這樣的標籤。

使用者標籤，畫像，10萬級別，這個量級可以描述清楚人的屬性。

根據以上的估算，user_tags可能達到萬億（user幾十億, 店鋪商品瀏覽相關的標籤數千級別）的級別。

高效設計

資料庫設計

我們首先整理一下關鍵因素

使用者ID、瀏覽過的店鋪以及瀏覽次數、瀏覽過的商品以及瀏覽次數、購買的商品以及購買數量。(次數數量可以根據區間，設定為列舉型別，比如0表示100次以下，1表示100到500次，。。。)

這幾個要素很容易從使用者的行為資料中生成，從而當某家店鋪需要做推廣，或者針對某個產品做推廣時，可以結合這些因素，產生一批目標人群。

比如1周內瀏覽護手霜相關商品超過10次的人群。

表結構設計

1. 店鋪、商品編成ID

2. 瀏覽過多少次、購買了多少某個商品

由於每個使用者在某個時間段內，都可能瀏覽或者購買多個店鋪或商品。如果每個商店，每個商品都使用一條記錄來儲存，會產生很多很多的記錄。浪費空間，並且影響查詢效率。

PostgreSQL 支援陣列型別，可以很好的完成這樣的任務，減少儲存，同時支援陣列索引，提高查詢效率。

3. 表結構如下

3.1 範圍表，約幾十或上百條記錄

欄位和描述

class int,    -- 維度,對應使用者標籤表的s1,s2,s3,s4
id int,       -- 偏移量(或者叫列舉值)    
description   -- 描述(例如 1-10000,10001-100000，。。。。。)

3.2 使用者標籤表

uid int primary key,  -- 使用者ID  
s1 int[],  -- 瀏覽過的店鋪以及次數範圍(店鋪ID雜湊 + 範圍表id)     
s2 int[],  -- 瀏覽過的商品以及次數範圍(商品ID雜湊 + 範圍表id)  
s3 int[],  -- 購買的商品以及數量範圍(商品ID雜湊 + 範圍表id)
s4 int[],   -- ....其他維度以此類推 
時間區間1,   -- 比如按天, 每天統計一次，寫入該表

3.3 次數階梯化

瀏覽次數，購買個數都是連續值，為了更好的進行挖掘，建議將其階梯化。

對應3.1的設計，例如1-10000一個階級，10001-100000又一個階級。

例子

軌跡 s1 對應的階梯
1 -> 0
2 -> 1-10
3 -> 11-100
4 -> 101-500
5 -> 501-
...
9 -> 10000+

3.4 將（商品、店鋪ID）與階梯組合成一個新的值 – 方法1

使用text[]例如店鋪ID:OFFSET 表示。text[]陣列效率可能不如整型陣列INT[]，空間也比INT[]要大一點。

如果業務方可以容忍，使用text[]開發工作量小點。

例子

userid|s1|s2|s3
1|{`1:2`, `109:9`}|{`2:2`, `88:19`}|{`2:1`, `88:2`}

含義解釋:

使用者ID：1，
瀏覽了店鋪1（次數階梯=2）、店鋪109（次數階梯9），
瀏覽了商品2（次數階梯=2）、商品88（次數階梯19），
購買了商品2（次數階梯=1）、商品88（次數階梯2）。

3.5 將（商品、店鋪ID）與階梯組合成一個新的值 – 方法2

方法1用了text陣列，方法2將使用int/int8陣列，效率更高。

要使用一個int/int8新值表達兩層含義(原始店鋪、商品ID，以及階梯)，需要公式支援。

公式設計如下，(公式、常量)。

以流量店鋪次數(s1)欄位為例：

新值起始ID = new_start_val = 1                  -- 常量，使用者可以自由指定，但是固定下來了就不要變它
對應維度步長(比如流量店鋪的階梯數) = step = 9   -- 常量，使用者可以自由指定(每個維度的階數可以不一樣)，但是固定下來了就不要變它
店鋪ID = dp_id                                  -- 指原來的店鋪ID
int/int8新值 = new_val                          -- 生成的，帶有兩層含義（店鋪ID，階數）的新值

已知店鋪ID求new_val(寫入、查詢過程):

$new_val = new_start_val + (step+1)*(dp_id-1)

已知new_val求店鋪ID(翻譯過程):

$dp_id = 1 + (new_val-new_start_val)/(step+1)

例子(step=19階, new_start_val=1)

瀏覽店鋪ID=1，1階
瀏覽店鋪ID=192，15階

根據以上資訊、常量、公式 生成new_val陣列:  

{1, 3821}

根據以上陣列、常量、公式 翻譯出店鋪ID:  

{1, 192}

4. 分割槽

例如，建議每500萬或1000萬一個分割槽，查詢時，可以並行查詢，提高效率。

如果要快速圈得所有的使用者，建議使用並行查詢（plproxy，每個分割槽一個連線，並行查詢）。

如果要快速的得到使用者，流式返回，建議使用繼承（如果是多節點，可以使用postgres_fdw+pg_pathman，或者postgres_fdw+繼承），使用遊標返回。

效能指標

幾十億使用者，每個使用者將時間區間的瀏覽過的店鋪、商品、購買過的商品以及數量級聚合成標籤化的陣列，產生萬億級別的user_tags組合。

根據tags從幾十億的使用者群體中圈選人群，能達到什麼樣的效能呢？

由於使用了索引，如果使用流式返回的話可以控制在10毫秒左右。

是不是頓時覺得分析型的業務進入了毫秒時代？

如果你對PostgreSQL接觸不多，可能會感到很驚奇，接觸多了就習慣了，PostgreSQL有很多功能會幫你解決很多問題，有時候甚至給你大開腦洞的。

實時設計

前面講了如何高效的獲得使用者，接下來我們要看看如何實時的更新TAG了。

流處理

目的是實時的更新使用者的TAG，比如一個使用者，一天可能產生幾萬比瀏覽的跟蹤記錄，這些記錄要合併到他的標籤中。

如果活躍使用者達到億級別，那麼一天產生的更新流水就達到了萬億級別，這個怎麼能實時的在資料庫中處理呢？估計很多使用者會使用T+1的方式，放棄實時性。

但是實際上，並不是做不到的，比如我們可以使用PostgreSQL資料庫的流處理功能來實現這種超高流水的更新。

你可能要疑問了，資料庫能處理流嗎？資料如何在資料庫中完成實時的更新呢?

PostgreSQL社群的一個開源產品pipelinedb，（基於postgresql，與postgresql全相容），就是用來幹這個的，它會幫你實時的進行合併，（使用者可以設定合併的時間間隔，或者累計的ROWS變更數）達到閾值後，進行持久化的動作，否則會先持續的在記憶體中進行更新。

有兩篇文章可以參考

《”物聯網”流式處理應用 – 用PostgreSQL實時處理(萬億每天)》

《流計算風雲再起 – PostgreSQL攜PipelineDB力挺IoT》

當然如果使用者沒有實時的要求，T+1 就能滿足需求的話，你大可不必使用pipelinedb.

為什麼要在資料庫中完成流式處理

我們知道，標籤資料最後都要進到資料庫後，才能施展資料庫的圈人功能，完成圈人的查詢，如果不在資料庫中實現流計算，而是使用類似JSTROM的框架的話，實際上是使用JSTROM擋了一層，比如將1000億次的更新轉化成了1億的更新。

但是使用外部的流處理會引入一些問題

1. 額外增加了JSTROM所需的計算資源，並行效率實際上還不如pipelinedb

2. 使用者查資料的時效性不如直接放在資料庫中的流計算

3. 增加了開發成本

壓測

進入壓測環節，我選擇了一臺32CORE，2塊SSD卡，512GB的記憶體的機器進行壓測。

存放3.2億使用者，每個使用者4個陣列欄位，每個欄位包括1000個元素，即4000*3.2億 = 1.28萬億 user_tags。

用例1

10張表，每張表儲存1000萬使用者，4個標籤欄位，使用tsvector儲存標籤。

使用rum索引。

postgres=# create tablespace tbs1 location `/u01/digoal/tbs1`;  
CREATE TABLESPACE  
  
postgres=# create tablespace tbs2 location `/u02/digoal/tbs2`;  
CREATE TABLESPACE  
  
do language plpgsql $$  
declare  
  i int;  
  suffix text;  
  tbs text;  
begin  
  for i in 0..10 loop  
    if i=0 then  
      suffix := ``;  
      tbs := `tbs1`;  
    elsif i >=1 and i<=5 then  
      suffix := i::text;  
      tbs := `tbs1`;  
    else  
      suffix := i::text;  
      tbs := `tbs2`;  
    end if;  
    if i=0 then  
      execute `create unlogged table test`||suffix||`(uid int primary key USING INDEX TABLESPACE `||tbs||`, s1 tsvector, s2 tsvector, s3 tsvector, s4 tsvector) with (autovacuum_enabled=off, toast.autovacuum_enabled=off) tablespace `||tbs;  
    else  
      execute `create unlogged table test`||suffix||`(uid int primary key USING INDEX TABLESPACE `||tbs||`, s1 tsvector, s2 tsvector, s3 tsvector, s4 tsvector) inherits(test) with (autovacuum_enabled=off, toast.autovacuum_enabled=off) tablespace `||tbs;  
    end if;  
    execute `create index idx_test`||suffix||`_s1 on test`||suffix||` using rum(s1 rum_tsvector_ops) tablespace `||tbs;  
    execute `create index idx_test`||suffix||`_s2 on test`||suffix||` using rum(s2 rum_tsvector_ops) tablespace `||tbs;  
    execute `create index idx_test`||suffix||`_s3 on test`||suffix||` using rum(s3 rum_tsvector_ops) tablespace `||tbs;   
    execute `create index idx_test`||suffix||`_s4 on test`||suffix||` using rum(s4 rum_tsvector_ops) tablespace `||tbs;  
  end loop;  
end;  
$$;  
  
select relname,reltablespace from pg_class  where relname ~ `test` order by 2,1;

產生測試資料的指令碼

vi test.sql  
  
set uid1 random(1,10000000)  
set uid2 random(10000001,20000000)  
set uid3 random(20000001,30000000)  
set uid4 random(30000001,40000000)  
set uid5 random(40000001,50000000)  
set uid6 random(50000001,60000000)  
set uid7 random(60000001,70000000)  
set uid8 random(70000001,80000000)  
set uid9 random(80000001,90000000)  
set uid10 random(90000001,100000000)  
insert into test1 (uid,s1,s2,s3,s4) select :uid1+id, (select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)) from generate_series(1,1000) t(id) on conflict do nothing;  
insert into test2 (uid,s1,s2,s3,s4) select :uid2+id, (select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)) from generate_series(1,1000) t(id) on conflict do nothing;  
insert into test3 (uid,s1,s2,s3,s4) select :uid3+id, (select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)) from generate_series(1,1000) t(id) on conflict do nothing;  
insert into test4 (uid,s1,s2,s3,s4) select :uid4+id, (select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)) from generate_series(1,1000) t(id) on conflict do nothing;  
insert into test5 (uid,s1,s2,s3,s4) select :uid5+id, (select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)) from generate_series(1,1000) t(id) on conflict do nothing;  
insert into test6 (uid,s1,s2,s3,s4) select :uid6+id, (select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)) from generate_series(1,1000) t(id) on conflict do nothing;  
insert into test7 (uid,s1,s2,s3,s4) select :uid7+id, (select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)) from generate_series(1,1000) t(id) on conflict do nothing;  
insert into test8 (uid,s1,s2,s3,s4) select :uid8+id, (select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)) from generate_series(1,1000) t(id) on conflict do nothing;  
insert into test9 (uid,s1,s2,s3,s4) select :uid9+id, (select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)) from generate_series(1,1000) t(id) on conflict do nothing;  
insert into test10 (uid,s1,s2,s3,s4) select :uid10+id, (select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)),(select array_to_tsvector(array_agg(trunc(5000000*random())||`_`||trunc(20*random()))) from generate_series(1,1000)) from generate_series(1,1000) t(id) on conflict do nothing;  
  
  
nohup pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 1000000 >/dev/null 2>&1 &

標籤由500萬個唯一ID+20個唯一ID的組合過程，每個tsvector中存放1000個這樣的組合。

用例2

10張表，每張表儲存1000萬使用者，4個標籤欄位，使用text[]儲存標籤。

索引使用的是GIN索引，其他與用例1一致。

do language plpgsql $$  
declare  
  i int;  
  suffix text;  
  tbs text;  
begin  
  for i in 0..10 loop  
    if i=0 then  
      suffix := ``;  
      tbs := `tbs1`;  
    elsif i >=1 and i<=5 then  
      suffix := i::text;  
      tbs := `tbs1`;  
    else  
      suffix := i::text;  
      tbs := `tbs2`;  
    end if;  
    if i=0 then  
      execute `create unlogged table test`||suffix||`(uid int primary key USING INDEX TABLESPACE `||tbs||`, s1 text[], s2 text[], s3 text[], s4 text[]) with (autovacuum_enabled=off, toast.autovacuum_enabled=off) tablespace `||tbs;  
    else  
      execute `create unlogged table test`||suffix||`(uid int primary key USING INDEX TABLESPACE `||tbs||`, s1 text[], s2 text[], s3 text[], s4 text[]) inherits(test) with (autovacuum_enabled=off, toast.autovacuum_enabled=off) tablespace `||tbs;  
    end if;  
    execute `create index idx_test`||suffix||`_s1 on test`||suffix||` using gin(s1 ) tablespace `||tbs;  
    execute `create index idx_test`||suffix||`_s2 on test`||suffix||` using gin(s2 ) tablespace `||tbs;  
    execute `create index idx_test`||suffix||`_s3 on test`||suffix||` using gin(s3 ) tablespace `||tbs;   
    execute `create index idx_test`||suffix||`_s4 on test`||suffix||` using gin(s4 ) tablespace `||tbs;  
  end loop;  
end;  
$$;  
  
select relname,reltablespace from pg_class  where relname ~ `test` order by 2,1;

用例3以及壓測

64張分割槽表，每個分割槽500萬記錄，使用int陣列儲存標籤，標籤總量400萬，每個使用者4000個隨機標籤，確保圈人時可以圈到足夠多的人群。

同樣使用GIN索引圈人。

alter role postgres set gin_pending_list_limit=`128MB`;  
  
do language plpgsql $$  
declare  
  i int;  
  suffix text;  
  tbs text;  
begin  
  for i in 0..64 loop  
    if i=0 then  
      suffix := ``;  
      tbs := `tbs1`;  
    elsif i >=1 and i<=32 then  
      suffix := i::text;  
      tbs := `tbs1`;  
    else  
      suffix := i::text;  
      tbs := `tbs2`;  
    end if;  
    if i=0 then  
      execute `create unlogged table test`||suffix||`(uid int primary key USING INDEX TABLESPACE `||tbs||`, s1 int[], s2 int[], s3 int[], s4 int[]) with (autovacuum_enabled=off, toast.autovacuum_enabled=off) tablespace `||tbs;  
    else  
      execute `create unlogged table test`||suffix||`(uid int primary key USING INDEX TABLESPACE `||tbs||`, s1 int[], s2 int[], s3 int[], s4 int[]) inherits(test) with (autovacuum_enabled=off, toast.autovacuum_enabled=off) tablespace `||tbs;  
    end if;  
    execute `create index idx_test`||suffix||`_s1 on test`||suffix||` using gin(s1 ) tablespace `||tbs;  
    execute `create index idx_test`||suffix||`_s2 on test`||suffix||` using gin(s2 ) tablespace `||tbs;  
    execute `create index idx_test`||suffix||`_s3 on test`||suffix||` using gin(s3 ) tablespace `||tbs;   
    execute `create index idx_test`||suffix||`_s4 on test`||suffix||` using gin(s4 ) tablespace `||tbs;  
  end loop;  
end;  
$$;  
  
select relname,reltablespace from pg_class  where relname ~ `test` order by 2,1;

生成測試資料的指令碼

vi test1.sh  
  
for ((i=1;i<=64;i++))  
do  
echo "set uid random($((($i-1)*5000000+1)),$(($i*5000000)))" > test$i.sql  
  
echo "insert into test$i (uid,s1,s2,s3,s4) select :uid, (select array_agg(trunc(random()*4000000)) from generate_series(1,1000)) s1,(select array_agg(trunc(random()*4000000)) from generate_series(1,1000)) s2,(select array_agg(trunc(random()*4000000)) from generate_series(1,1000)) s3, (select array_agg(trunc(random()*4000000)) from generate_series(1,1000)) s4 on conflict do nothing;" >> test$i.sql  
  
done  
  
. ./test1.sh

開始生成測試資料

vi test2.sh  
  
for ((i=1;i<=64;i++))  
do  
nohup pgbench -M prepared -n -r -P 1 -f ./test$i.sql -c 1 -j 1 -T 1000000 >/dev/null 2>&1 &  
done  
  
. ./test2.sh

輸出插完後將pengding list 合併

執行vacuum analyze或gin_clean_pending_list即可，參考

https://www.postgresql.org/docs/9.6/static/functions-admin.html#FUNCTIONS-ADMIN-INDEX

https://www.postgresql.org/docs/9.6/static/sql-vacuum.html

https://www.postgresql.org/docs/9.6/static/gin-implementation.html#GIN-FAST-UPDATE

圈人需求 – 效能測試

對用例3進行壓測

1. 圈人，10毫秒以內完成。

比如查詢s1包含3, s2包含4的人群

postgres=# begin;
BEGIN
Time: 0.030 ms
postgres=# declare a cursor for select uid from test where s1 @> array[1] and s2 @> array[4];
DECLARE CURSOR
Time: 6.679 ms
postgres=# fetch 100 in a;
    uid    
-----------
  19246842
 118611240
 148504032
 185844649
(4 rows)
Time: 101.041 ms

這個人群太少，沒有代表性，我們找一個人群多一點的

postgres=# begin;
BEGIN
postgres=# declare a cursor for select uid from test where s1 @> array[1] or s2 @> array[4];
DECLARE CURSOR
Time: 3.484 ms
postgres=# fetch 100 in a;
   uid   
---------
 2911941
 2373506
 .....
   29713
 3353782
 2836804
 1602067
(100 rows)
Time: 3.892 ms

postgres=# fetch 100 in a;
   uid   
---------
  384170
 1332271
 4282941
 ......
 1190946
 4524861
 1110635
(100 rows)
Time: 4.005 ms

2. 分頁，前面已經提到了，使用遊標。

3. 流式返回，前面的例子已經提到了。

4. 並行批量返回

並行批量返回，可以使用plproxy外掛，為每個分割槽指定一個並行，從而實現並行的批量返回。效果能好到什麼程度呢？

比如序列查詢，所有的分片表是依次查詢的，所以累加的時間比較長，例如，圈出15221個人，耗時113毫秒。

postgres=# explain (analyze,verbose,timing,costs,buffers) select uid from test where s1 @> array[1];
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..233541.24 rows=206876 width=4) (actual time=0.081..108.037 rows=15221 loops=1)
   Buffers: shared hit=60641
   ->  Seq Scan on public.test  (cost=0.00..0.00 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=1)
         Output: test.uid
         Filter: (test.s1 @> `{1}`::integer[])
   ->  Bitmap Heap Scan on public.test1  (cost=33.71..2901.56 rows=3188 width=4) (actual time=0.078..0.381 rows=242 loops=1)
         Output: test1.uid
         Recheck Cond: (test1.s1 @> `{1}`::integer[])
         Heap Blocks: exact=238
         Buffers: shared hit=243
         ->  Bitmap Index Scan on idx_test1_s1  (cost=0.00..32.91 rows=3188 width=0) (actual time=0.049..0.049 rows=242 loops=1)
               Index Cond: (test1.s1 @> `{1}`::integer[])
               Buffers: shared hit=5

...中間省略62個表

   ->  Bitmap Heap Scan on public.test64  (cost=34.00..2935.31 rows=3225 width=4) (actual time=0.068..0.327 rows=214 loops=1)
         Output: test64.uid
         Recheck Cond: (test64.s1 @> `{1}`::integer[])
         Heap Blocks: exact=211
         Buffers: shared hit=216
         ->  Bitmap Index Scan on idx_test64_s1  (cost=0.00..33.19 rows=3225 width=0) (actual time=0.041..0.041 rows=214 loops=1)
               Index Cond: (test64.s1 @> `{1}`::integer[])
               Buffers: shared hit=5
 Planning time: 2.016 ms
 Execution time: 109.400 ms
(519 rows)
Time: 113.216 ms

而並行查詢的效能則相當於單個分割槽的耗時, 約0.幾毫秒。

postgres=# explain (analyze,verbose,timing,costs,buffers) select uid from test1 where s1 @> array[1];
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.test1  (cost=33.71..2901.56 rows=3188 width=4) (actual time=0.085..0.383 rows=242 loops=1)
   Output: uid
   Recheck Cond: (test1.s1 @> `{1}`::integer[])
   Heap Blocks: exact=238
   Buffers: shared hit=243
   ->  Bitmap Index Scan on idx_test1_s1  (cost=0.00..32.91 rows=3188 width=0) (actual time=0.051..0.051 rows=242 loops=1)
         Index Cond: (test1.s1 @> `{1}`::integer[])
         Buffers: shared hit=5
 Planning time: 0.097 ms
 Execution time: 0.423 ms
(10 rows)
Time: 1.011 ms

使用並行，可以大幅提升整體的效能。

參考文件

《使用Plproxy設計PostgreSQL分散式資料庫》

《A Smart PostgreSQL extension plproxy 2.2 practices》

《PostgreSQL 最佳實踐 – 水平分庫(基於plproxy)》

而如果你需要的是流式返回，則沒有必要使用並行。

sharding

當使用者數達到幾十億時，我們可以按使用者ID進行分片，使用多臺主機。

當然了，如果你的主機空間足夠大，CPU核心足夠多，可以滿足業務的需求的話，完全沒有必要使用多臺主機。

如果要使用多臺主機，有哪些方法呢? 可以參考如下文章，也很簡單，幾步完成

你就把postgres_fdw節點當成MySQL的TDDL或者DRDS就好了，支援跨節點JOIN，條件，排序，聚合的下推等，用起來和TDDL DRDS一樣的爽。

postgres_fdw是無狀態的，僅僅儲存結構（分發規則），所以postgres_fdw節點本身也可以非常方便的橫向擴充套件。

《PostgreSQL 9.6 單元化,sharding (based on postgres_fdw) – 核心層支援前傳》

《PostgreSQL 9.6 sharding + 單元化 (based on postgres_fdw) 最佳實踐 – 通用水平分庫場景設計與實踐》

《PostgreSQL 9.6 sharding based on FDW & pg_pathman》

《PostgreSQL 9.5+ 高效分割槽表實現 – pg_pathman》

基於位置圈人的需求與效能

這個需求直接落入PostgreSQL的懷抱，其實就是基於位置的KNN查詢，PostgreSQL可以通過GiST索引來支撐這個需求。

在資料分片後，PostgreSQL通過歸併排序，依舊可以快速的得到結果。

例如，

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t order by id limit 10;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.72..1.13 rows=10 width=4) (actual time=0.158..0.165 rows=10 loops=1)
   Output: t.id
   Buffers: shared hit=3 read=4
   ->  Merge Append  (cost=0.72..819.74 rows=20001 width=4) (actual time=0.157..0.162 rows=10 loops=1)
         Sort Key: t.id
         Buffers: shared hit=3 read=4
         ->  Index Only Scan using idx on public.t  (cost=0.12..2.14 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=1)
               Output: t.id
               Heap Fetches: 0
               Buffers: shared hit=1
         ->  Index Only Scan using idx1 on public.t1  (cost=0.29..225.28 rows=10000 width=4) (actual time=0.107..0.107 rows=6 loops=1)
               Output: t1.id
               Heap Fetches: 6
               Buffers: shared hit=1 read=2
         ->  Index Only Scan using idx2 on public.t2  (cost=0.29..225.28 rows=10000 width=4) (actual time=0.043..0.044 rows=5 loops=1)
               Output: t2.id
               Heap Fetches: 5
               Buffers: shared hit=1 read=2
 Planning time: 0.181 ms
 Execution time: 0.219 ms
(20 rows)

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t order by id ;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Merge Append  (cost=0.72..819.74 rows=20001 width=4) (actual time=0.043..10.324 rows=20000 loops=1)
   Sort Key: t.id
   Buffers: shared hit=149
   ->  Index Only Scan using idx on public.t  (cost=0.12..2.14 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=1)
         Output: t.id
         Heap Fetches: 0
         Buffers: shared hit=1
   ->  Index Only Scan using idx1 on public.t1  (cost=0.29..225.28 rows=10000 width=4) (actual time=0.021..3.266 rows=10000 loops=1)
         Output: t1.id
         Heap Fetches: 10000
         Buffers: shared hit=74
   ->  Index Only Scan using idx2 on public.t2  (cost=0.29..225.28 rows=10000 width=4) (actual time=0.017..3.309 rows=10000 loops=1)
         Output: t2.id
         Heap Fetches: 10000
         Buffers: shared hit=74
 Planning time: 0.175 ms
 Execution time: 11.791 ms
(17 rows)

《PostgreSQL 百億地理位置資料近鄰查詢效能》

12Core的機器，每次請求約0.8毫秒返回，TPS約8萬。