理解 PostgreSQL 的 count 函式的行為

simpleapples發表於2019-04-16

原文網址 : https://juejin.im/post/5cb54bd7518825324d1df26b

關於 count 函式的使用一直存在爭議，尤其是在 MySQL 中，作為流行度越來越高的 PostgreSQL 是否也有類似的問題呢，我們通過實踐來理解一下 PostgreSQL 中 count 函式的行為。

構建測試資料庫

建立測試資料庫，並建立測試表。測試表中有自增 ID、建立時間、內容三個欄位，自增 ID 欄位是主鍵。

create database performance_test;

create table test_tbl (id serial primary key, created_at timestamp, content varchar(512));
複製程式碼

生成測試資料

使用 generate_series 函式生成自增 ID，使用 now() 函式生成 created_at 列，對於 content 列，使用了 repeat(md5(random()::text), 10) 生成 10 個 32 位長度的 md5 字串。使用下列語句，插入 1000w 條記錄用於測試。

performance_test=# insert into test_tbl select generate_series(1,10000000),now(),repeat(md5(random()::text),10);
INSERT 0 10000000
Time: 212184.223 ms (03:32.184)
複製程式碼

由 count 語句引發的思考

預設情況下 PostgreSQL 不開啟 SQL 執行時間的顯示，所以需要手動開啟一下，方便後面的測試對比。

\timing on
複製程式碼

count(*) 和 count(1) 的效能區別是經常被討論的問題，分別使用 count(*) 和 count(1) 執行一次查詢。

performance_test=# select count(*) from test_tbl;
  count
----------
 10000000
(1 row)

Time: 115090.380 ms (01:55.090)

performance_test=# select count(1) from test_tbl;
  count
----------
 10000000
(1 row)

Time: 738.502 ms
複製程式碼

可以看到兩次查詢的速度差別非常大，count(1) 真的有這麼大的效能提升？接下來再次執行查詢語句。

performance_test=# select count(*) from test_tbl;
  count
----------
 10000000
(1 row)

Time: 657.831 ms

performance_test=# select count(1) from test_tbl;
  count
----------
 10000000
(1 row)

Time: 682.157 ms
複製程式碼

可以看到第一次查詢時候會非常的慢，後面三次速度非常快並且時間相近，這裡就有兩個問題出現了：

為什麼第一次查詢速度這麼慢？
count(*) 和 count(1) 到底存不存在效能差別？

查詢快取

使用 explain 語句重新執行查詢語句

explain (analyze,buffers,verbose) select count(*) from test_tbl;
複製程式碼

可以看到如下輸出：

 Finalize Aggregate  (cost=529273.69..529273.70 rows=1 width=8) (actual time=882.569..882.570 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=96 read=476095
   ->  Gather  (cost=529273.48..529273.69 rows=2 width=8) (actual time=882.492..884.170 rows=3 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=96 read=476095
         ->  Partial Aggregate  (cost=528273.48..528273.49 rows=1 width=8) (actual time=881.014..881.014 rows=1 loops=3)
               Output: PARTIAL count(*)
               Buffers: shared hit=96 read=476095
               Worker 0: actual time=880.319..880.319 rows=1 loops=1
                 Buffers: shared hit=34 read=158206
               Worker 1: actual time=880.369..880.369 rows=1 loops=1
                 Buffers: shared hit=29 read=156424
               ->  Parallel Seq Scan on public.test_tbl  (cost=0.00..517856.98 rows=4166598 width=0) (actual time=0.029..662.165 rows=3333333 loops=3)
                     Buffers: shared hit=96 read=476095
                     Worker 0: actual time=0.026..661.807 rows=3323029 loops=1
                       Buffers: shared hit=34 read=158206
                     Worker 1: actual time=0.030..660.197 rows=3285513 loops=1
                       Buffers: shared hit=29 read=156424
 Planning time: 0.043 ms
 Execution time: 884.207 ms
複製程式碼

注意裡面的 shared hit，表示命中了記憶體中快取的資料，這就可以解釋為什麼後面的查詢會比第一次快很多。接下來去掉快取，並重啟 PostgreSQL。

service postgresql stop
echo 1 > /proc/sys/vm/drop_caches
service postgresql start
複製程式碼

重新執行 SQL 語句，速度慢了很多。

 Finalize Aggregate  (cost=529273.69..529273.70 rows=1 width=8) (actual time=50604.564..50604.564 rows=1 loops=1)
   Output: count(*)
   Buffers: shared read=476191
   ->  Gather  (cost=529273.48..529273.69 rows=2 width=8) (actual time=50604.508..50606.141 rows=3 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared read=476191
         ->  Partial Aggregate  (cost=528273.48..528273.49 rows=1 width=8) (actual time=50591.550..50591.551 rows=1 loops=3)
               Output: PARTIAL count(*)
               Buffers: shared read=476191
               Worker 0: actual time=50585.182..50585.182 rows=1 loops=1
                 Buffers: shared read=158122
               Worker 1: actual time=50585.181..50585.181 rows=1 loops=1
                 Buffers: shared read=161123
               ->  Parallel Seq Scan on public.test_tbl  (cost=0.00..517856.98 rows=4166598 width=0) (actual time=92.491..50369.691 rows=3333333 loops=3)
                     Buffers: shared read=476191
                     Worker 0: actual time=122.170..50362.271 rows=3320562 loops=1
                       Buffers: shared read=158122
                     Worker 1: actual time=14.020..50359.733 rows=3383583 loops=1
                       Buffers: shared read=161123
 Planning time: 11.537 ms
 Execution time: 50606.215 ms
複製程式碼

shared read 表示沒有命中快取，通過這個現象可以推斷出，上一小節的四次查詢中，第一次查詢沒有命中快取，剩下三次查詢都命中了快取。

count(1) 和 count(*) 的區別

接下來探究 count(1) 和 count(*) 的區別是什麼，繼續思考最開始的四次查詢，第一次查詢使用了 count(*)，第二次查詢使用了 count(1) ，卻依然命中了快取，不正是說明 count(1) 和 count(*) 是一樣的嗎？

事實上，PostgreSQL 官方對於 is there a difference performance-wise between select count(1) and select count(*)? 問題的回覆也證實了這一點：

Nope. In fact, the latter is converted to the former during parsing.[2]

既然 count(1) 在效能上沒有比 count(*) 更好，那麼使用 count(*) 就是更好的選擇。

sequence scan 和 index scan

接下來測試一下，在不同資料量大小的情況下 count(*) 的速度，將查詢語句寫在 count.sql 檔案中，使用 pgbench 進行測試。

pgbench -c 5 -t 20 performance_test -r -f count.sql
複製程式碼

分別測試 200w - 1000w 資料量下的 count 語句耗時

資料大小	count耗時(ms)
200w	738.758
300w	1035.846
400w	1426.183
500w	1799.866
600w	2117.247
700w	2514.691
800w	2526.441
900w	2568.240
1000w	2650.434

繪製成耗時曲線

曲線的趨勢在 600w - 700w 資料量之間出現了轉折，200w - 600w 是線性增長，600w 之後 count 的耗時就基本相同了。使用 explain 語句分別檢視 600w 和 700w 資料時的 count 語句執行。

700w：

 Finalize Aggregate  (cost=502185.93..502185.94 rows=1 width=8) (actual time=894.361..894.361 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=16344 read=352463
   ->  Gather  (cost=502185.72..502185.93 rows=2 width=8) (actual time=894.232..899.763 rows=3 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=16344 read=352463
         ->  Partial Aggregate  (cost=501185.72..501185.73 rows=1 width=8) (actual time=889.371..889.371 rows=1 loops=3)
               Output: PARTIAL count(*)
               Buffers: shared hit=16344 read=352463
               Worker 0: actual time=887.112..887.112 rows=1 loops=1
                 Buffers: shared hit=5459 read=118070
               Worker 1: actual time=887.120..887.120 rows=1 loops=1
                 Buffers: shared hit=5601 read=117051
               ->  Parallel Index Only Scan using test_tbl_pkey on public.test_tbl  (cost=0.43..493863.32 rows=2928960 width=0) (actual time=0.112..736.376 rows=2333333 loops=3)
                     Index Cond: (test_tbl.id < 7000000)
                     Heap Fetches: 2328492
                     Buffers: shared hit=16344 read=352463
                     Worker 0: actual time=0.107..737.180 rows=2344479 loops=1
                       Buffers: shared hit=5459 read=118070
                     Worker 1: actual time=0.133..737.960 rows=2327028 loops=1
                       Buffers: shared hit=5601 read=117051
 Planning time: 0.165 ms
 Execution time: 899.857 ms
複製程式碼

600w：

 Finalize Aggregate  (cost=429990.94..429990.95 rows=1 width=8) (actual time=765.575..765.575 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=13999 read=302112
   ->  Gather  (cost=429990.72..429990.93 rows=2 width=8) (actual time=765.557..770.889 rows=3 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=13999 read=302112
         ->  Partial Aggregate  (cost=428990.72..428990.73 rows=1 width=8) (actual time=763.821..763.821 rows=1 loops=3)
               Output: PARTIAL count(*)
               Buffers: shared hit=13999 read=302112
               Worker 0: actual time=762.742..762.742 rows=1 loops=1
                 Buffers: shared hit=4638 read=98875
               Worker 1: actual time=763.308..763.308 rows=1 loops=1
                 Buffers: shared hit=4696 read=101570
               ->  Parallel Index Only Scan using test_tbl_pkey on public.test_tbl  (cost=0.43..422723.16 rows=2507026 width=0) (actual time=0.053..632.199 rows=2000000 loops=3)
                     Index Cond: (test_tbl.id < 6000000)
                     Heap Fetches: 2018490
                     Buffers: shared hit=13999 read=302112
                     Worker 0: actual time=0.059..633.156 rows=1964483 loops=1
                       Buffers: shared hit=4638 read=98875
                     Worker 1: actual time=0.038..634.271 rows=2017026 loops=1
                       Buffers: shared hit=4696 read=101570
 Planning time: 0.055 ms
 Execution time: 770.921 ms
複製程式碼

根據以上現象推斷，PostgreSQL 似乎在 count 的資料量小於資料表長度的某一比例時，才使用 index scan，通過檢視官方 wiki 也可以看到相關描述：

It is important to realise that the planner is concerned with minimising the total cost of the query. With databases, the cost of I/O typically dominates. For that reason, "count(*) without any predicate" queries will only use an index-only scan if the index is significantly smaller than its table. This typically only happens when the table's row width is much wider than some indexes'.[3]

根據 Stackoverflow 上的回答，count 語句查詢的數量大於表大小的 3/4 時候就會用使用全表掃描代替索引掃描[4]。