PostgreSQL秒殺場景優化

德哥發表於2015-12-11

秒殺場景的典型瓶頸在於對同一條記錄的多次更新請求，然後只有一個或者少量請求是成功的，其他請求是以失敗或更新不到告終。
例如，Iphone的1元秒殺，如果我只放出1臺Iphone，我們把它看成一條記錄，秒殺開始後，誰先搶到（更新這條記錄的鎖），誰就算秒殺成功。
例如：
使用一個標記位來表示這條記錄是否已經被更新，或者記錄更新的次數（幾臺Iphone）。
update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5;   -- 假設可以秒殺5臺
這種方法的弊端：
獲得鎖的使用者在處理這條記錄時，可能成功，也可能失敗，或者可能需要很長時間，（例如資料庫響應慢）在它結束事務前，其他會話只能等著。
等待是非常不科學的，因為對於沒有獲得鎖的使用者，等待是在浪費時間。
所以一般的優化處理方法是先使用for update nowait的方式來避免等待，即如果無法即可獲得鎖，那麼就不等待。
例如：
begin;
select 1 from tbl where id=pk for update nowait;  --  如果使用者無法即刻獲得鎖，則返回錯誤。從而這個事務回滾。
update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5;
end;
這種方法可以減少使用者的等待時間，因為無法即刻獲得鎖後就直接返回了。
但是這種方法也存在一定的弊端，對於一個商品，如果可以秒殺多臺的話，我們用1條記錄來儲存多臺，降低了秒殺的併發性。
因為我們用的是行鎖。
解決這個問題辦法很多，最終就是要提高併發性，例如：
1. 分段秒殺，把商品數量打散，拆成多個段，從而提高併發處理能力。
總體來說，優化的思路是減少鎖等待時間，避免序列，儘量並行。

優化到這裡就結束了嗎？顯然沒有，以上方法任意資料庫都可以做到，如果就這樣結束怎麼體現PostgreSQL的特性呢？
PostgreSQL還提供了一個鎖型別，advisory鎖，這種鎖比行鎖更加輕量，支援會話級別和事務級別。（但是需要注意ID是全域性的，否則會相互干擾，也就是說，所有參與秒殺或者需要用到advisory lock的ID需要在單個庫內保持全域性唯一）

例子：
update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5 and pg_try_advisory_xact_lock(:id);

最後必須要對比一下for update nowait和advisory lock的效能。
下面是在一臺本地虛擬機器上的測試。

新建一張秒殺表
postgres=# d t1
      Table "public.t1"
 Column |  Type   | Modifiers 
--------+---------+-----------
 id     | integer | not null
 info   | text    | 
Indexes:
    "t1_pkey" PRIMARY KEY, btree (id)
只有一條記錄，不斷的被更新
postgres=# select * from t1;
 id |             info              
----+-------------------------------
  1 | 2015-09-14 09:47:04.703904+08
(1 row)

壓測for update nowait的方式：
CREATE OR REPLACE FUNCTION public.f1(i_id integer)  
 RETURNS void  
 LANGUAGE plpgsql  
AS $function$ 
declare 
begin 
  perform 1 from t1 where id=i_id for update nowait; 
  update t1 set info=now()::text where id=i_id; 
  exception when others then  
  return; 
end; 
$function$;  

postgres@digoal-> cat test1.sql
setrandom id 1 1
select f1(:id);

壓測advisory lock的方式：
postgres@digoal-> cat test.sql
setrandom id 1 1
update t1 set info=now()::text where id=:id and pg_try_advisory_xact_lock(:id);

清除壓測統計資料：
postgres=# select pg_stat_reset();
 pg_stat_reset 
---------------
 
(1 row)
postgres=# select * from pg_stat_all_tables where relname=`t1`;
-[ RECORD 1 ]-------+-------
relid               | 184731
schemaname          | public
relname             | t1
seq_scan            | 0
seq_tup_read        | 0
idx_scan            | 0
idx_tup_fetch       | 0
n_tup_ins           | 0
n_tup_upd           | 0
n_tup_del           | 0
n_tup_hot_upd       | 0
n_live_tup          | 0
n_dead_tup          | 0
n_mod_since_analyze | 0
last_vacuum         | 
last_autovacuum     | 
last_analyze        | 
last_autoanalyze    | 
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 0

壓測結果：
postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 20 -j 20 -T 60
......
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 20
number of threads: 20
duration: 60 s
number of transactions actually processed: 792029
latency average: 1.505 ms
latency stddev: 4.275 ms
tps = 13196.542846 (including connections establishing)
tps = 13257.270709 (excluding connections establishing)
statement latencies in milliseconds:
        0.002625        setrandom id 1 1
        1.502420        select f1(:id);

postgres=# select * from pg_stat_all_tables where relname=`t1`;
-[ RECORD 1 ]-------+-------
relid               | 184731
schemaname          | public
relname             | t1
seq_scan            | 0
seq_tup_read        | 0
idx_scan            | 896963   //  大多數是無用功
idx_tup_fetch       | 896963   //  大多數是無用功
n_tup_ins           | 0
n_tup_upd           | 41775
n_tup_del           | 0
n_tup_hot_upd       | 41400
n_live_tup          | 0
n_dead_tup          | 928
n_mod_since_analyze | 41774
last_vacuum         | 
last_autovacuum     | 
last_analyze        | 
last_autoanalyze    | 
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 0


postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test.sql -c 20 -j 20 -T 60
......
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 20
number of threads: 20
duration: 60 s
number of transactions actually processed: 1392372
latency average: 0.851 ms
latency stddev: 2.475 ms
tps = 23194.831054 (including connections establishing)
tps = 23400.411501 (excluding connections establishing)
statement latencies in milliseconds:
        0.002594        setrandom id 1 1
        0.848536        update t1 set info=now()::text where id=:id and pg_try_advisory_xact_lock(:id);

postgres=# select * from pg_stat_all_tables where relname=`t1`;
-[ RECORD 1 ]-------+--------
relid               | 184731
schemaname          | public
relname             | t1
seq_scan            | 0
seq_tup_read        | 0
idx_scan            | 1368933  //  大多數是無用功
idx_tup_fetch       | 1368933   //  大多數是無用功
n_tup_ins           | 0
n_tup_upd           | 54957
n_tup_del           | 0
n_tup_hot_upd       | 54489
n_live_tup          | 0
n_dead_tup          | 1048
n_mod_since_analyze | 54957
last_vacuum         | 
last_autovacuum     | 
last_analyze        | 
last_autoanalyze    | 
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 0
我們注意到，不管用哪種方法，都會浪費掉很多次的無用功掃描。

為了解決無用掃描的問題，可以使用以下函式。（當然，還有更好的方法是對使用者透明。）
CREATE OR REPLACE FUNCTION public.f(i_id integer)  
 RETURNS void  
 LANGUAGE plpgsql  
AS $function$ 
declare 
  a_lock boolean := false;
begin 
  select pg_try_advisory_xact_lock(i_id) into a_lock;
  if a_lock then
    update t1 set info=now()::text where id=i_id; 
  end if;
  exception when others then  
  return; 
end; 
$function$;  

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 20
number of threads: 20
duration: 60 s
number of transactions actually processed: 1217195
latency average: 0.973 ms
latency stddev: 3.563 ms
tps = 20283.314001 (including connections establishing)
tps = 20490.143363 (excluding connections establishing)
statement latencies in milliseconds:
        0.002703        setrandom id 1 1
        0.970209        select f(:id);

postgres=# select * from pg_stat_all_tables where relname=`t1`;
-[ RECORD 1 ]-------+-------
relid               | 184731
schemaname          | public
relname             | t1
seq_scan            | 0
seq_tup_read        | 0
idx_scan            | 75927
idx_tup_fetch       | 75927
n_tup_ins           | 0
n_tup_upd           | 75927
n_tup_del           | 0
n_tup_hot_upd       | 75902
n_live_tup          | 0
n_dead_tup          | 962
n_mod_since_analyze | 75927
last_vacuum         | 
last_autovacuum     | 
last_analyze        | 
last_autoanalyze    | 
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 0
除了吞吐率的提升，我們其實還看到真實的處理數（更新次數）也有提升，所以不僅僅是降低了等待延遲，實際上也提升了處理能力。

最後提供一個物理機上的資料參考，使用128個併發連線，同時對一條記錄進行更新：
不做任何優化的併發處理能力：
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 128
number of threads: 128
duration: 100 s
number of transactions actually processed: 285673
latency average: 44.806 ms
latency stddev: 45.751 ms
tps = 2855.547375 (including connections establishing)
tps = 2855.856976 (excluding connections establishing)
statement latencies in milliseconds:
        0.002509        setrandom id 1 1
        44.803299       update t1 set info=now()::text where id=:id;

使用for update nowait的併發處理能力：
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 128
number of threads: 128
duration: 100 s
number of transactions actually processed: 6663253
latency average: 1.919 ms
latency stddev: 2.804 ms
tps = 66623.169445 (including connections establishing)
tps = 66630.307999 (excluding connections establishing)
statement latencies in milliseconds:
        0.001934        setrandom id 1 1
        1.917297        select f1(:id);

使用advisory lock後的併發處理能力：
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 80
number of threads: 80
duration: 60 s
number of transactions actually processed: 13883387
latency average: 0.344 ms
latency stddev: 0.535 ms
tps = 231197.323122 (including connections establishing)
tps = 231376.427515 (excluding connections establishing)
statement latencies in milliseconds:
        0.344042        select f(1);
此時的perf top
   PerfTop:   23883 irqs/sec  kernel:32.2%  exact:  0.0% [1000Hz cycles],  (all, 32 CPUs)
--------------------------------------------------------------------------------     

 samples  pcnt function         DSO
 _______ _____ ____________________________ ______________________________________________

10645.00  3.5% GetSnapshotData  /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 7963.00  2.6% AllocSetAlloc    /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 4720.00  1.6% _int_malloc      /lib64/libc-2.12.so   
 4270.00  1.4% __schedule       [kernel.kallsyms]     
 4234.00  1.4% fmgr_info_cxt_security       /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 4217.00  1.4% LWLockAcquire    /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 3958.00  1.3% hash_search_with_hash_value  /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 3656.00  1.2% __GI_vfprintf    /lib64/libc-2.12.so   
 3572.00  1.2% update_blocked_averages      [kernel.kallsyms]     
 3338.00  1.1% PostgresMain     /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 3267.00  1.1% __switch_to      [kernel.kallsyms]     
 3095.00  1.0% __strlen_sse42   /lib64/libc-2.12.so   
 2996.00  1.0% memcpy           /lib64/libc-2.12.so   
 2930.00  1.0% _int_free        /lib64/libc-2.12.so   
 2568.00  0.8% LWLockRelease    /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 2446.00  0.8% SearchCatCache   /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 2178.00  0.7% ExecInitExpr     /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 2053.00  0.7% hash_any         /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 2035.00  0.7% __GI___libc_malloc           /lib64/libc-2.12.so   
 2009.00  0.7% _raw_spin_lock_irqsave       [kernel.kallsyms]     
 1804.00  0.6% exec_stmt        /u02/digoal/soft_bak/pgsql9.5/lib/plpgsql.so  
 1764.00  0.6% __memset_sse2    /lib64/libc-2.12.so   
 1717.00  0.6% pqParseInput3    /u02/digoal/soft_bak/pgsql9.5/lib/libpq.so.5.8
 1696.00  0.6% do_select        [kernel.kallsyms]     
 1686.00  0.6% __strcpy_ssse3   /lib64/libc-2.12.so   
 1685.00  0.6% update_curr      [kernel.kallsyms]     
 1619.00  0.5% enqueue_entity   [kernel.kallsyms]     
 1607.00  0.5% pfree/u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 1598.00  0.5% doCustom         /u02/digoal/soft_bak/pgsql9.5/bin/pgbench     
 1594.00  0.5% idle_cpu         [kernel.kallsyms]     
 1589.00  0.5% update_cfs_rq_blocked_load   [kernel.kallsyms]     
 1554.00  0.5% lapic_next_deadline          [kernel.kallsyms]     
 1512.00  0.5% update_cfs_shares[kernel.kallsyms]     
 1491.00  0.5% MemoryContextCreate          /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 1482.00  0.5% _raw_spin_lock   [kernel.kallsyms]     
 1423.00  0.5% palloc           /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 1419.00  0.5% __GI___sigsetjmp /lib64/libc-2.12.so   
 1412.00  0.5% __cfree          /lib64/libc-2.12.so   
 1399.00  0.5% unix_stream_recvmsg          [kernel.kallsyms]     
 1393.00  0.5% __fget_light     [kernel.kallsyms]     
 1359.00  0.4% ResourceOwnerReleaseInternal /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 1351.00  0.4% AllocSetFree     /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 1277.00  0.4% unix_stream_sendmsg          [kernel.kallsyms]     
 1246.00  0.4% __memcmp_sse4_1  /lib64/libc-2.12.so   
 1240.00  0.4% plpgsql_exec_function        /u02/digoal/soft_bak/pgsql9.5/lib/plpgsql.so  
 1225.00  0.4% expression_tree_walker       /u02/digoal/soft_bak/pgsql9.5/bin/postgres    
 1160.00  0.4% exec_stmt_block  /u02/digoal/soft_bak/pgsql9.5/lib/plpgsql.so
使用advisory lock，效能相比不做任何優化效能提升了約66倍，相比for update nowait效能提升了約1.8倍。
這種優化可以快速告訴使用者是否能秒殺到此類商品，而不需要等待其他使用者更新結束後才知道。所以大大降低了RT，提高了吞吐率。

最後提一下9.5的新特性, select ,,, for update ,,, skip locked.
http://blog.163.com/digoal@126/blog/static/163877040201551552017215/
如果能做到UPDATE語法裡面，就完美了，直接跳過無法獲得鎖的行。併發能力瞬間提升，也不用advisory了。

[參考]
1. http://www.postgresql.org/docs/9.5/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS

秒殺系統的場景特點
2020-08-24
【Redis場景5】叢集秒殺最佳化-分散式鎖
2023-03-04
Redis分散式
秒殺系統前端優化
2018-01-20
前端優化
RocketMQ實戰--高併發秒殺場景
2020-01-01
MQ
高頻面試題：秒殺場景設計
2020-11-04
面試題
秒殺業務架構優化之路
2016-03-28
架構優化
架構秒殺系統優化思路
2018-05-22
架構優化
秒殺系統架構優化思路
2018-06-13
架構優化
【效能優化】秒殺系統效能優化初體驗
2021-04-27
優化
【Redis場景4】單機環境下秒殺問題
2023-02-07
Redis
高併發業務場景下的秒殺解決方案 (初探)
2019-10-15
秒殺系統中流量削峰場景應該怎麼解決？
2019-04-28
Redis+Lua解決高併發場景搶購秒殺問題
2021-07-16
Redis
電商搶購秒殺系統的設計及應用場景分析
2016-10-08
秒殺最佳化-基於阻塞佇列實現秒殺最佳化
2024-07-27
佇列
（四）Java高併發秒殺API之高併發優化
2018-12-12
JavaAPI優化
雲端計算秒殺電商秒殺
2014-07-25
儀表盤場景的前端優化
2019-01-29
前端優化
高頻寫入redis場景優化
2018-12-07
Redis優化
淘特 Flutter 流式場景的深度優化
2021-12-03
Flutter優化
用PostgreSQL找回618秒逝去的青春-遞迴收斂優化
2016-12-21
SQL遞迴優化
go-zero微服務實戰系列（九、極致優化秒殺效能）
2022-07-04
Go微服務優化
高併發秒殺系統架構詳解，不是所有的秒殺都是秒殺！
2020-10-14
架構
postgresql的failover場景處理
2015-11-24
SQLAI
PostgreSQL秒殺4種方法-增加批量流式加減庫存方法
2018-01-27
SQL
秒殺流程圖
2020-12-26
流程圖
【高併發】秒殺系統架構解密，不是所有的秒殺都是秒殺（升級版）！！
2020-10-14
架構解密
【PostgreSQL 】PostgreSQL 15對distinct的優化
2022-07-17
SQL優化
JavaScript 效能優化殺手
2015-09-21
JavaScript優化
同步秒殺實現：Redis在秒殺功能的實踐
2019-01-18
Redis
如何設計一個優秀的秒殺系統？
2021-08-19
不走索引場景的一次分析優化
2015-10-21
索引優化
【Python秒殺指令碼】淘寶或京東等秒殺搶購
2022-10-25
Python指令碼
秒殺系統分析
2020-11-27
前端效能優化--從 10 多秒到 1.05 秒
2018-05-28
前端優化
前端效能優化–從 10 多秒到 1.05 秒
2018-05-28
前端優化
postgresql 優化與維護
2012-01-13
SQL優化
從京東618秒殺聊聊秒殺限流的多種實現
2018-06-27

PostgreSQL秒殺場景優化

相關文章