深入淺出VACUUM核心原理(中): index by pass
來源:PostgreSQL學徒
前言
之前寫了一篇深入淺出VACUUM核心原理(上),趕巧最近有位讀者和我探討了一個問題,恰巧也和vacuum的有趣特性有關,藉此再增加一篇深入淺出VACUUM核心原理(中)。那麼是什麼樣的特性呢?沒錯,又是我們的老朋友——頁剪枝。
現象
簡單重溫下頁剪枝,有兩種情況會進行剪枝,以刪除在任何快照中不再可見的元組
之前的 UPDATE 操作沒有找到足夠的空間將新元組放入同一頁面。 堆頁中包含的資料多於 fillfactor 儲存引數所允許的資料,比如 fillfactor 是 80,那麼預留 20% 的空間用於更新操作,假如剩餘的空間不足 20%,就會執行頁剪枝,另外最低不能低於單個資料塊大小的 10%
之前的文章已經反覆提及,此處不再贅述細節。讓我們先看個場景,樣例來自《PostgreSQL 14 internal》。
postgres=# CREATE TABLE hot(id integer, s char(2000)) WITH (fillfactor = 75);
CREATE TABLE
postgres=# CREATE INDEX hot_id ON hot(id);
CREATE INDEX
postgres=# CREATE INDEX hot_s ON hot(s);
CREATE INDEX
postgres=# INSERT INTO hot VALUES (1, 'A');
INSERT 0 1
postgres=# UPDATE hot SET s = 'B';
UPDATE 1
postgres=# UPDATE hot SET s = 'C';
UPDATE 1
postgres=# UPDATE hot SET s = 'D';
UPDATE 1
postgres=# SELECT * FROM heap_page('hot',0);
ctid | state | xmin | xmax
-------+--------+--------+--------
(0,1) | normal | 1158 c | 1159 c
(0,2) | normal | 1159 c | 1160 c
(0,3) | normal | 1160 c | 1161
(0,4) | normal | 1161 | 0 a
(4 rows)
由於 fillfactor 設定的為 75%,並且 s 欄位是定長的 2000 位元組,所以每個頁面只能容納 4 條元組,插入 3 條,還有 1 條留作更新。然後下次頁面訪問將觸發頁剪枝,刪除所有不可見的元組
postgres=# UPDATE hot SET s = 'E';
UPDATE 1
postgres=# SELECT * FROM heap_page('hot',0);
ctid | state | xmin | xmax
-------+--------+--------+------
(0,1) | dead | |
(0,2) | dead | |
(0,3) | dead | |
(0,4) | normal | 1161 c | 1164
(0,5) | normal | 1164 | 0 a
(5 rows)
可以看到,前面 3 條元組已經被移除。透過這個例子,想必各位已經大致知曉了頁剪枝的原理。
現在讓我們變換一下
postgres=# truncate table hot;
TRUNCATE TABLE
postgres=# insert into hot values(1,'A');
INSERT 0 1
postgres=# insert into hot values(1,'B');
INSERT 0 1
postgres=# insert into hot values(1,'C');
INSERT 0 1
postgres=# update hot set s = 'D';
UPDATE 3
postgres=# SELECT * FROM heap_page('hot',0);
ctid | state | xmin | xmax
-------+--------+--------+------
(0,1) | normal | 1187 c | 1190
(0,2) | normal | 1188 c | 1190
(0,3) | normal | 1189 c | 1190
(0,4) | normal | 1190 | 0 a
(4 rows)
postgres=# SELECT * FROM heap_page('hot',1);
ctid | state | xmin | xmax
-------+--------+------+------
(1,1) | normal | 1190 | 0 a
(1,2) | normal | 1190 | 0 a
(2 rows)
postgres=# select * from pg_visibility_map('hot');
blkno | all_visible | all_frozen
-------+-------------+------------
0 | f | f
1 | f | f
(2 rows)
postgres=# explain analyze select * from hot; ---觸發了頁剪枝
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on hot (cost=0.00..10.10 rows=10 width=8008) (actual time=0.026..0.031 rows=3 loops=1)
Planning Time: 0.079 ms
Execution Time: 0.048 ms
(3 rows)
postgres=# SELECT * FROM heap_page('hot',0); ---觸發了頁剪枝
ctid | state | xmin | xmax
-------+--------+--------+------
(0,1) | dead | |
(0,2) | dead | |
(0,3) | dead | |
(0,4) | normal | 1175 c | 0 a
(4 rows)
當執行了查詢之後,觸發了頁剪枝,前面 3 條死元組都被清理了。但是!你會發現此時 vm 檔案中依舊不是 all_visible 的,因為頁剪枝期間不會去更新 vm 和 fsm,指標還在,索引對其可能還有引用。
postgres=# select * from pg_visibility_map('hot');
blkno | all_visible | all_frozen
-------+-------------+------------
0 | f | f
1 | f | f
(2 rows)
所以這個時候,假如執行計劃選擇了 index only scan ,注意並不是真正的 index only scan,還是需要去回表判斷可見性的,即我們看到的 Heap Fetchs,參照下例。至於為什麼是 6,各位讀者可以思考一下?,並且為什麼再次執行之後變成了 3,各位也可以仔細思考一下。
postgres=# set enable_seqscan to off;
SET
postgres=# explain (analyze,buffers) select id from hot where id = 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Only Scan using hot_id on hot (cost=0.14..8.15 rows=1 width=4) (actual time=0.010..0.014 rows=3 loops=1)
Index Cond: (id = 1)
Heap Fetches: 6
Buffers: shared hit=3
Planning:
Buffers: shared hit=5
Planning Time: 0.115 ms
Execution Time: 0.032 ms
(8 rows)
postgres=# explain (analyze,buffers) select id from hot where id = 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Only Scan using hot_id on hot (cost=0.14..8.15 rows=1 width=4) (actual time=0.019..0.022 rows=3 loops=1)
Index Cond: (id = 1)
Heap Fetches: 3
Buffers: shared hit=3
Planning Time: 0.077 ms
Execution Time: 0.040 ms
(6 rows)
各位思考 5 分鐘......
叮,答案在這 ?? 掃描前的索引狀態
postgres=# SELECT * FROM index_page('hot_id',1);
itemoffset | htid | dead
------------+-------+------
1 | (0,1) | f
2 | (0,2) | f
3 | (0,3) | f
4 | (0,4) | f
5 | (1,1) | f
6 | (1,2) | f
(6 rows)
掃描後的索引狀態,點到為止,全部說明就失去了意思。
postgres=# SELECT * FROM index_page('hot_id',1);
itemoffset | htid | dead
------------+-------+------
1 | (0,1) | t
2 | (0,2) | t
3 | (0,3) | t
4 | (0,4) | f
5 | (1,1) | f
6 | (1,2) | f
(6 rows)
當然,隨著你執行了 vacuum,all_visible 就變成了 true,此時 Heap Fetchs 變成了 0,也就是真正意義上的"僅索引掃描"了,透過 vm 檔案我已經知曉了資料塊中所有元組均是可見的了,不需要再回表判斷可見性了。這裡其實也可以引申出一個 PostgreSQL 中常見的面試題——索引中是否包含可見性資訊?
postgres=# vacuum hot;
VACUUM
postgres=# select * from pg_visibility_map('hot');
blkno | all_visible | all_frozen
-------+-------------+------------
0 | t | f
1 | t | f
(2 rows)
postgres=# explain (analyze,buffers) select id from hot where id = 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Only Scan using hot_id on hot (cost=0.13..4.15 rows=1 width=4) (actual time=0.012..0.013 rows=3 loops=1)
Index Cond: (id = 1)
Heap Fetches: 0
Buffers: shared hit=2
Planning:
Buffers: shared hit=18
Planning Time: 0.253 ms
Execution Time: 0.037 ms
(8 rows)
index bypass
那麼到此就完了嗎?非也,讓我們看個更加有趣的例子——index bypass。
postgres=# drop table test;
DROP TABLE
postgres=# create table test(id int,info text);
CREATE TABLE
postgres=# alter table test set (autovacuum_enabled = off);
ALTER TABLE
postgres=# create index on test(id);
CREATE INDEX
postgres=# insert into test select n,'test' from generate_series(1,100000) as n;
INSERT 0 100000
postgres=# analyze test;
ANALYZE
postgres=# bupdate test set info = 'hello' where id = 99;
UPDATE 1
postgres=# SELECT lp,lp_flags,t_xmin,t_xmax,t_ctid,t_data FROM heap_page_items(get_raw_page('test', 0)) limit 2 offset 98;
lp | lp_flags | t_xmin | t_xmax | t_ctid | t_data
-----+----------+--------+--------+-----------+----------------------
99 | 1 | 1199 | 1201 | (540,101) | \x630000000b74657374
100 | 1 | 1199 | 0 | (0,100) | \x640000000b74657374
(2 rows)
postgres=# explain (analyze,buffers) select id from test where id = 99;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Only Scan using test_id_idx on test (cost=0.29..8.31 rows=1 width=4) (actual time=0.051..0.053 rows=1 loops=1)
Index Cond: (id = 99)
Heap Fetches: 2
Buffers: shared hit=4
Planning Time: 0.075 ms
Execution Time: 0.070 ms
(6 rows)
postgres=# SELECT lp,lp_flags,t_xmin,t_xmax,t_ctid,t_data FROM heap_page_items(get_raw_page('test', 0)) limit 2 offset 98;
lp | lp_flags | t_xmin | t_xmax | t_ctid | t_data
-----+----------+--------+--------+---------+----------------------
99 | 3 | | | |
100 | 1 | 1199 | 0 | (0,100) | \x640000000b74657374
(2 rows)
同樣,經過了查詢之後,也觸發了頁剪枝,注意觀察 Heap Fetchs 值的變化。
postgres=# explain (analyze,buffers) select id from test where id = 99;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Only Scan using test_id_idx on test (cost=0.29..8.31 rows=1 width=4) (actual time=0.048..0.050 rows=1 loops=1)
Index Cond: (id = 99)
Heap Fetches: 2
Buffers: shared hit=4
Planning Time: 0.072 ms
Execution Time: 0.066 ms
(6 rows)
postgres=# explain (analyze,buffers) select id from test where id = 99;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Only Scan using test_id_idx on test (cost=0.29..8.31 rows=1 width=4) (actual time=0.021..0.023 rows=1 loops=1)
Index Cond: (id = 99)
Heap Fetches: 1
Buffers: shared hit=3
Planning Time: 0.075 ms
Execution Time: 0.039 ms
(6 rows)
那麼讓我們依葫蘆畫瓢,也執行一下 vacuum
postgres=# select * from pg_visibility_map('test',0);
all_visible | all_frozen
-------------+------------
f | f
(1 row)
postgres=# vacuum test;
VACUUM
postgres=# select * from pg_visibility_map('test',0);
all_visible | all_frozen
-------------+------------
f | f
(1 row)
納尼!怎麼 all_visible 還是 false?讓我們列印 verbose 出來看一下
postgres=# vacuum verbose test;
INFO: vacuuming "postgres.public.test"
INFO: finished vacuuming "postgres.public.test": index scans: 0
pages: 0 removed, 541 remain, 2 scanned (0.37% of total)
tuples: 0 removed, 100000 remain, 0 are dead but not yet removable
removable cutoff: 1214, which was 0 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan bypassed: 1 pages from table (0.18% of total) have 1 dead item identifiers
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 11 hits, 0 misses, 0 dirtied
WAL usage: 0 records, 0 full page images, 0 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: vacuuming "postgres.pg_toast.pg_toast_16707"
INFO: finished vacuuming "postgres.pg_toast.pg_toast_16707": index scans: 0
pages: 0 removed, 0 remain, 0 scanned (100.00% of total)
tuples: 0 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 1214, which was 0 XIDs old when operation ended
frozen: 0 pages from table (100.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (100.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 3 hits, 0 misses, 0 dirtied
WAL usage: 0 records, 0 full page images, 0 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
VACUUM
各位以往可能沒有注意這個顯眼包
“index scan bypassed: 1 pages from table (0.18% of total) have 1 dead item identifiers
index scan bypassed,顧名思義,跳過了索引掃描,有一個頁面中含有一條死元組,但是被跳過了!為什麼會這樣呢?其實官網已經有了簡單的介紹,關於 index_cleanup
“Normally,
VACUUM
will skip index vacuuming when there are very few dead tuples in the table. The cost of processing all of the table's indexes is expected to greatly exceed the benefit of removing dead index tuples when this happens. This option can be used to forceVACUUM
to process indexes when there are more than zero dead tuples. The default isAUTO
, which allowsVACUUM
to skip index vacuuming when appropriate. IfINDEX_CLEANUP
is set toON
,VACUUM
will conservatively remove all dead tuples from indexes. This may be useful for backwards compatibility with earlier releases of PostgreSQL where this was the standard behavior.
INDEX_CLEANUP
can also be set toOFF
to forceVACUUM
to always skip index vacuuming, even when there are many dead tuples in the table. This may be useful when it is necessary to makeVACUUM
run as quickly as possible to avoid imminent transaction ID wraparound (see Section 25.1.5). However, the wraparound failsafe mechanism controlled by vacuum_failsafe_age will generally trigger automatically to avoid transaction ID wraparound failure, and should be preferred. If index cleanup is not performed regularly, performance may suffer, because as the table is modified indexes will accumulate dead tuples and the table itself will accumulate dead line pointers that cannot be removed until index cleanup is completed.This option has no effect for tables that have no index and is ignored if the
FULL
option is used. It also has no effect on the transaction ID wraparound failsafe mechanism. When triggered it will skip index vacuuming, even whenINDEX_CLEANUP
is set toON
.
注意這一段,Normally, VACUUM
will skip index vacuuming when there are very few dead tuples in the table. ,通常情況下,當只有很少的死元組的時候,vacuum 會跳過索引的清理,發生這種情況時,處理所有表索引的成本預計將大大超過刪除死索引元組的好處,當遇到即將事務回捲時,將該引數設為 off,將跳過索引的清理,加速事務的回收。預設是 auto,意味著在合適的時候,自動跳過。那麼問題來了,什麼時候是"合適"的時候?讓我們分析下原始碼,程式碼也好找
/*
* This crossover point at which we'll start to do index vacuuming is
* expressed as a percentage of the total number of heap pages in the
* table that are known to have at least one LP_DEAD item. This is
* much more important than the total number of LP_DEAD items, since
* it's a proxy for the number of heap pages whose visibility map bits
* cannot be set on account of bypassing index and heap vacuuming.
*
* We apply one further precautionary test: the space currently used
* to store the TIDs (TIDs that now all point to LP_DEAD items) must
* not exceed 32MB. This limits the risk that we will bypass index
* vacuuming again and again until eventually there is a VACUUM whose
* dead_items space is not CPU cache resident.
*
* We don't take any special steps to remember the LP_DEAD items (such
* as counting them in our final update to the stats system) when the
* optimization is applied. Though the accounting used in analyze.c's
* acquire_sample_rows() will recognize the same LP_DEAD items as dead
* rows in its own stats report, that's okay. The discrepancy should
* be negligible. If this optimization is ever expanded to cover more
* cases then this may need to be reconsidered.
*/
threshold = (double) vacrel->rel_pages * BYPASS_THRESHOLD_PAGES;
bypass = (vacrel->lpdead_item_pages < threshold &&
vacrel->lpdead_items < MAXDEADITEMS(32L * 1024L * 1024L));
}
if (bypass)
{
/*
* There are almost zero TIDs. Behave as if there were precisely
* zero: bypass index vacuuming, but do index cleanup.
*
* We expect that the ongoing VACUUM operation will finish very
* quickly, so there is no point in considering speeding up as a
* failsafe against wraparound failure. (Index cleanup is expected to
* finish very quickly in cases where there were no ambulkdelete()
* calls.)
*/
vacrel->do_index_vacuuming = false;
}
/*
* Threshold that controls whether we bypass index vacuuming and heap
* vacuuming as an optimization
*/
#define BYPASS_THRESHOLD_PAGES 0.02 /* i.e. 2% of rel_pages */
可以看到,預設的閾值是 2% 的總頁面數量,當包含有 LP_DEAD 標記的頁面數量小於 2% 的頁面數量時,就會設定 bypass,設定之後,do_index_vacuuming 狀態位就會設定為 false(一個 LP 才佔 4 個位元組,不清理影響也不大),最終進入到 failsafe 的邏輯中,failsafe 是 14 引入的特性,緊急情況下切換到全速 VACUUM 模式,同時忽略 vacuum_cost_delay,防止 xid wraparound。
if (vacrel->do_index_vacuuming)
{
if (vacrel->nindexes == 0 || vacrel->num_index_scans == 0)
appendStringInfoString(&buf, _("index scan not needed: "));
else
appendStringInfoString(&buf, _("index scan needed: "));
msgfmt = _("%u pages from table (%.2f%% of total) had %lld dead item identifiers removed\n");
}
else
{
if (!vacrel->failsafe_active)
appendStringInfoString(&buf, _("index scan bypassed: "));
else
appendStringInfoString(&buf, _("index scan bypassed by failsafe: "));
msgfmt = _("%u pages from table (%.2f%% of total) have %lld dead item identifiers\n");
}
最終,就是我們日誌中看到的
“index scan bypassed: 1 pages from table (0.18% of total) have 1 dead item identifiers
所以,假如我們強制讓其清理索引就不一樣了,index only scan 變成了真正的 index only scan。
“index scan needed: 1 pages from table (0.18% of total) had 1 dead item identifiers removed
postgres=# vacuum (verbose,index_cleanup true) test;
INFO: vacuuming "postgres.public.test"
INFO: finished vacuuming "postgres.public.test": index scans: 1
pages: 0 removed, 541 remain, 2 scanned (0.37% of total)
tuples: 0 removed, 100000 remain, 0 are dead but not yet removable
removable cutoff: 1214, which was 0 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan needed: 1 pages from table (0.18% of total) had 1 dead item identifiers removed
index "test_id_idx": pages: 276 in total, 0 newly deleted, 0 currently deleted, 0 reusable
avg read rate: 0.000 MB/s, avg write rate: 31.376 MB/s
buffer usage: 288 hits, 0 misses, 5 dirtied
WAL usage: 4 records, 4 full page images, 31197 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: vacuuming "postgres.pg_toast.pg_toast_16707"
INFO: finished vacuuming "postgres.pg_toast.pg_toast_16707": index scans: 0
pages: 0 removed, 0 remain, 0 scanned (100.00% of total)
tuples: 0 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 1214, which was 0 XIDs old when operation ended
frozen: 0 pages from table (100.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (100.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 1 hits, 0 misses, 0 dirtied
WAL usage: 0 records, 0 full page images, 0 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
VACUUM
postgres=# select * from pg_visibility_map('test',0);
all_visible | all_frozen
-------------+------------
t | f
(1 row)
postgres=# explain (analyze,buffers) select id from test where id = 99;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Only Scan using test_id_idx on test (cost=0.29..4.31 rows=1 width=4) (actual time=0.010..0.011 rows=1 loops=1)
Index Cond: (id = 99)
Heap Fetches: 0
Buffers: shared hit=3
Planning:
Buffers: shared hit=4
Planning Time: 0.101 ms
Execution Time: 0.026 ms
(8 rows)
all_visible 也變成了 true,一切都通透了。
其實仔細思考一下,也不難理解其中原理:假如跳過了索引清理,但是依舊設定了 all_visible,那就亂套了,透過索引訪問,結果索引引用的死元組明明已經被清理了,並且還不回表直接就返回索引中的死元組了,那簡直是災難。
小結
這個特性是 14 版本引入的,主要是為了避免每次 vacuum 時都去清理索引,提升效率,因為很多時候索引的清理往往是大頭,過往文章已經寫過很多次。
透過這麼一個有趣的案例,相信各位對
頁剪枝的原理和實現 index only scan 的誤區(具體是否真的不回表取決於 Heap Fetchs) index scan bypass(何時跳過索引的清理) index_cleanup
幾個特性理解更加深入了。That's all,感謝這位讀者提供的素材!各位讀者粉絲私下問我的問題,我空了都會回覆的。
下期接著我們的話題——深入淺出 VACUUM 核心原理(下)。
參考
https://www.postgresql.org/docs/current/sql-vacuum.html
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/70027826/viewspace-2989260/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- PostgreSQL VACUUM 之深入淺出 (一)SQL
- PostgreSQL VACUUM 之深入淺出 (二)SQL
- PostgreSQL VACUUM 之深入淺出 (三)SQL
- 深入淺出 Java 中列舉的實現原理Java
- 深入淺出,ARCore開發原理
- 深入淺出HTTPS工作原理HTTP
- 深入淺出 Viewport 設計原理View
- 深入淺出 Node ( 四 ) HTTP核心模組HTTP
- PostgreSQL vacuum原理—vacuum揭秘SQL
- 深入淺出Service外掛化原理
- 深入淺出瀏覽器渲染原理瀏覽器
- Hive的原理—— 深入淺出學HiveHive
- [深入淺出Windows 10]佈局原理Windows
- 深入淺出 JavaScript 中的 thisJavaScript
- 深入淺出 - vue變化偵測原理Vue
- 深入淺出FE(十四)深入淺出websocketWeb
- 深入淺出理解 Spark:環境部署與工作原理Spark
- Redis Sentinel-深入淺出原理和實戰Redis
- 深入淺出 PLT/GOT Hook與原理實踐GoHook
- 深入淺出一致性Hash原理
- 深入淺出MyBatis:MyBatis解析和執行原理MyBatis
- 熱修復——深入淺出原理與實現
- 深入淺出 Vue 系列 -- 資料劫持實現原理Vue
- 深入淺出Vue響應式原理(完整版)Vue
- 深入淺出——MVCMVC
- 深入淺出mongooseGo
- HTTP深入淺出HTTP
- 深入淺出IO
- 深入淺出 RabbitMQMQ
- 深入淺出PromisePromise
- ArrayList 深入淺出
- mysqldump 深入淺出MySql
- 深入淺出decorator
- 深入淺出 ZooKeeper
- 機器學習深入淺出機器學習
- 深入淺出HTTPHTTP
- http 深入淺出HTTP
- 深入淺出 ARCore