PostgreSQL DBA(142) - PG 12(Monitoring PostgreSQL VACUUM processes)

husthxd發表於2019-12-10

本節介紹了PostgreSQL中如何監控VACUUM的處理過程。

概覽
PG的MVCC要求“過期”的資料不能馬上被物理清除,而是標記為dead rows,這些dead rows後續會通過vacuuming過程清理。
vacuuming通過以下方式讓資料庫保持健康:
1.標記dead rows可用於儲存新資料,這樣可以避免不必要的磁碟浪費以及可以跳過dead rows以提升順序掃描的效能;
2.更新vm(用於跟蹤過期或已廢棄的資料,反應在pages上)。這可以提升index-only scans的效能;
3.避免出現事務ID回捲失敗。
PG提供了autovacuum機制,通過週期性的執行ANALYZE來收集最近頻繁更新的資料表統計資訊。

監控指標
為了讓VACUUMs平滑執行,應該監控以下幾個指標:
1.dead rows
2.table disk usage
3.VACUUM/AUTOVACUUM最近執行的時間
4.監控vacuum full

dead rows
PG提供了pg_stat_user_tables檢視用於監控dead rows

[local:/data/run/pg12]:5120 pg12@testdb=# \d pg_stat_user_tables
                      View "pg_catalog.pg_stat_user_tables"
       Column        |           Type           | Collation | Nullable | Default 
---------------------+--------------------------+-----------+----------+---------
 relid               | oid                      |           |          | 
 schemaname          | name                     |           |          | 
 relname             | name                     |           |          | 
 seq_scan            | bigint                   |           |          | 
 seq_tup_read        | bigint                   |           |          | 
 idx_scan            | bigint                   |           |          | 
 idx_tup_fetch       | bigint                   |           |          | 
 n_tup_ins           | bigint                   |           |          | 
 n_tup_upd           | bigint                   |           |          | 
 n_tup_del           | bigint                   |           |          | 
 n_tup_hot_upd       | bigint                   |           |          | 
 n_live_tup          | bigint                   |           |          | 
 n_dead_tup          | bigint                   |           |          | 
 n_mod_since_analyze | bigint                   |           |          | 
 last_vacuum         | timestamp with time zone |           |          | 
 last_autovacuum     | timestamp with time zone |           |          | 
 last_analyze        | timestamp with time zone |           |          | 
 last_autoanalyze    | timestamp with time zone |           |          | 
 vacuum_count        | bigint                   |           |          | 
 autovacuum_count    | bigint                   |           |          | 
 analyze_count       | bigint                   |           |          | 
 autoanalyze_count   | bigint                   |           |          | 
[local:/data/run/pg12]:5120 pg12@testdb=# update t1 set id = 2;
UPDATE 20000
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT schemaname,relname, n_dead_tup FROM pg_stat_user_tables;
 schemaname |      relname       | n_dead_tup 
------------+--------------------+------------
 public     | tbl                |          0
 public     | t2                 |          0
 public     | b                  |          0
 public     | a                  |          0
 public     | rel                |          0
 public     | t_count            |          0
 public     | t_big_autovacuum_1 |          0
 public     | t_autovacuum_1     |          0
 public     | t1                 |      20000
(9 rows)

監控每張表的dead rows,特別是監控頻繁更新的表上,這樣有助於DBA確定VACUUM程式是否已有效的週期性的清除這些dead rows。

Table disk usage
在出現dead rows時,磁碟空間會逐步增大,vacuuming執行後可標記dead rows為空閒空間,通過監控空間的變化

[local:/data/run/pg12]:5120 pg12@testdb=# SELECT 
       relname AS "table_name", 
       pg_size_pretty(pg_table_size(C.oid)) AS "table_size" 
FROM 
       pg_class C 
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) 
WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND nspname !~ '^pg_toast' AND relkind IN ('r') 
ORDER BY pg_table_size(C.oid) 
DESC;
     table_name     | table_size 
--------------------+------------
 rel                | 845 MB
 t_big_autovacuum_1 | 498 MB
 tbl                | 100 MB
 a                  | 65 MB
 b                  | 65 MB
 t1                 | 1456 kB
 t_autovacuum_1     | 504 kB
 t2                 | 360 kB
 t_count            | 64 kB
(9 rows)
[local:/data/run/pg12]:5120 pg12@testdb=# update t1 set id = 4;
UPDATE 20000
[local:/data/run/pg12]:5120 pg12@testdb=# update t1 set id = 5;
UPDATE 20000
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT 
       relname AS "table_name", 
       pg_size_pretty(pg_table_size(C.oid)) AS "table_size" 
FROM 
       pg_class C 
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) 
WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND nspname !~ '^pg_toast' AND relkind IN ('r') 
ORDER BY pg_table_size(C.oid) 
DESC;
     table_name     | table_size 
--------------------+------------
 rel                | 845 MB
 t_big_autovacuum_1 | 498 MB
 tbl                | 100 MB
 a                  | 65 MB
 b                  | 65 MB
 t1                 | 2864 kB
 t_autovacuum_1     | 504 kB
 t2                 | 360 kB
 t_count            | 64 kB
(9 rows)

對t1執行全量更新,然後執行vacuum t1後再次插入等量的資料

[local:/data/run/pg12]:5120 pg12@testdb=# vacuum t1;
VACUUM
[local:/data/run/pg12]:5120 pg12@testdb=# select count(*) from t1;
 count 
-------
 20000
(1 row)
[local:/data/run/pg12]:5120 pg12@testdb=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 id     | integer |           |          | 
[local:/data/run/pg12]:5120 pg12@testdb=# insert into t1 select generate_series(1,20000);
INSERT 0 20000
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT 
       relname AS "table_name", 
       pg_size_pretty(pg_table_size(C.oid)) AS "table_size" 
FROM 
       pg_class C 
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) 
WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND nspname !~ '^pg_toast' AND relkind IN ('r') 
ORDER BY pg_table_size(C.oid) 
DESC;
     table_name     | table_size 
--------------------+------------
 rel                | 845 MB
 t_big_autovacuum_1 | 498 MB
 tbl                | 100 MB
 a                  | 65 MB
 b                  | 65 MB
 t1                 | 2864 kB
 t_autovacuum_1     | 504 kB
 t2                 | 360 kB
 t_count            | 64 kB
(9 rows)

可以看到table佔用的空間並沒有出現變化,原因是新的rows使用了dead rows的空間。
如不執行vacuum直接插入,則明顯可以看到table size的變化。

[local:/data/run/pg12]:5120 pg12@testdb=# update t1 set id = 10;
UPDATE 60000
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT                
       relname AS "table_name", 
       pg_size_pretty(pg_table_size(C.oid)) AS "table_size" 
FROM 
       pg_class C 
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) 
WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND nspname !~ '^pg_toast' AND relkind IN ('r') 
ORDER BY pg_table_size(C.oid) 
DESC;
     table_name     | table_size 
--------------------+------------
 rel                | 845 MB
 t_big_autovacuum_1 | 498 MB
 tbl                | 100 MB
 a                  | 65 MB
 b                  | 65 MB
 t1                 | 4288 kB -->這是原佔用空間
 t_autovacuum_1     | 504 kB
 t2                 | 360 kB
 t_count            | 64 kB
(9 rows)
[local:/data/run/pg12]:5120 pg12@testdb=# insert into t1 select generate_series(1,20000);
INSERT 0 20000
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT                                         
       relname AS "table_name", 
       pg_size_pretty(pg_table_size(C.oid)) AS "table_size" 
FROM 
       pg_class C 
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) 
WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND nspname !~ '^pg_toast' AND relkind IN ('r') 
ORDER BY pg_table_size(C.oid) 
DESC;
     table_name     | table_size 
--------------------+------------
 rel                | 845 MB
 t_big_autovacuum_1 | 498 MB
 tbl                | 100 MB
 a                  | 65 MB
 b                  | 65 MB
 t1                 | 4992 kB  --> 新增佔用空間
 t_autovacuum_1     | 504 kB
 t2                 | 360 kB
 t_count            | 64 kB
(9 rows)

Last time (auto)vacuum ran
PG提供了pg_stat_user_tables 檢視用於監控最近一次vacuum執行的時間。

[local:/data/run/pg12]:5120 pg12@testdb=# \d pg_stat_user_tables
                      View "pg_catalog.pg_stat_user_tables"
       Column        |           Type           | Collation | Nullable | Default 
---------------------+--------------------------+-----------+----------+---------
 relid               | oid                      |           |          | 
 schemaname          | name                     |           |          | 
 relname             | name                     |           |          | 
 seq_scan            | bigint                   |           |          | 
 seq_tup_read        | bigint                   |           |          | 
 idx_scan            | bigint                   |           |          | 
 idx_tup_fetch       | bigint                   |           |          | 
 n_tup_ins           | bigint                   |           |          | 
 n_tup_upd           | bigint                   |           |          | 
 n_tup_del           | bigint                   |           |          | 
 n_tup_hot_upd       | bigint                   |           |          | 
 n_live_tup          | bigint                   |           |          | 
 n_dead_tup          | bigint                   |           |          | 
 n_mod_since_analyze | bigint                   |           |          | 
 last_vacuum         | timestamp with time zone |           |          | 
 last_autovacuum     | timestamp with time zone |           |          | 
 last_analyze        | timestamp with time zone |           |          | 
 last_autoanalyze    | timestamp with time zone |           |          | 
 vacuum_count        | bigint                   |           |          | 
 autovacuum_count    | bigint                   |           |          | 
 analyze_count       | bigint                   |           |          | 
 autoanalyze_count   | bigint                   |           |          | 
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT schemaname, relname, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
 schemaname |      relname       |          last_vacuum          |        last_autovacuum        
------------+--------------------+-------------------------------+-------------------------------
 public     | tbl                |                               | 
 public     | t2                 |                               | 
 public     | b                  |                               | 
 public     | a                  |                               | 
 public     | rel                |                               | 
 public     | t_count            |                               | 
 public     | t_big_autovacuum_1 |                               | 
 public     | t_autovacuum_1     |                               | 
 public     | t1                 | 2019-12-10 14:29:31.420908+08 | 2019-12-10 14:32:02.639873+08
(9 rows)

可以看到最近一次的vacumm是14:29:31,而最近一次的autovacuum是14:32:02,autovacuum預設60s執行一次,其他沒有變化的表PG不會執行autovacuum。

[local:/data/run/pg12]:5120 pg12@testdb=# select name,setting from pg_settings where name like '%autovacuum%';
                name                 |  setting  
-------------------------------------+-----------
 autovacuum                          | on
 autovacuum_analyze_scale_factor     | 0.1
 autovacuum_analyze_threshold        | 50
 autovacuum_freeze_max_age           | 200000000
 autovacuum_max_workers              | 3
 autovacuum_multixact_freeze_max_age | 400000000
 autovacuum_naptime                  | 60 --> 60s
 autovacuum_vacuum_cost_delay        | 2
 autovacuum_vacuum_cost_limit        | -1
 autovacuum_vacuum_scale_factor      | 0.2
 autovacuum_vacuum_threshold         | 50
 autovacuum_work_mem                 | -1
 log_autovacuum_min_duration         | -1
(13 rows)

執行update操作,60s後再次查詢,發現last_autovacuum已更新。

[local:/data/run/pg12]:5120 pg12@testdb=# update t1 set id = 1;
UPDATE 80000
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT schemaname, relname, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
 schemaname |      relname       |          last_vacuum          |        last_autovacuum        
------------+--------------------+-------------------------------+-------------------------------
 public     | tbl                |                               | 
 public     | t2                 |                               | 
 public     | b                  |                               | 
 public     | a                  |                               | 
 public     | rel                |                               | 
 public     | t_count            |                               | 
 public     | t_big_autovacuum_1 |                               | 
 public     | t_autovacuum_1     |                               | 
 public     | t1                 | 2019-12-10 14:29:31.420908+08 | 2019-12-10 14:38:02.771566+08
(9 rows)

監控vacuum full
通過檢視pg_stat_progress_vacuum可監控vacuum full的進度

[local:/data/run/pg12]:5120 pg12@testdb=# \d pg_stat_progress_vacuum
           View "pg_catalog.pg_stat_progress_vacuum"
       Column       |  Type   | Collation | Nullable | Default 
--------------------+---------+-----------+----------+---------
 pid                | integer |           |          | 
 datid              | oid     |           |          | 
 datname            | name    |           |          | 
 relid              | oid     |           |          | 
 phase              | text    |           |          | 
 heap_blks_total    | bigint  |           |          | 
 heap_blks_scanned  | bigint  |           |          | 
 heap_blks_vacuumed | bigint  |           |          | 
 index_vacuum_count | bigint  |           |          | 
 max_dead_tuples    | bigint  |           |          | 
 num_dead_tuples    | bigint  |           |          | 
[local:/data/run/pg12]:5120 pg12@testdb=#

VACUUM的相關主題
如果上述指標提示VACUUMs沒有正常執行,可以通過查詢設定可發現問題所在,包括:
1.The autovacuum process is disabled on your database
2.The autovacuum process is disabled on one or more tables
3.Autovacuuming settings aren’t keeping pace with updates
4.Lock conflicts
5.Long-running open transactions

1.The autovacuum process is disabled on your database
通過執行ps -axww | grep autovacuum命令可監控autovacuum是否正在執行

[root@localhost ~]# ps -axww | grep autovacuum
55958 ?        Ss     0:00 postgres: autovacuum launcher  
56057 pts/4    S+     0:00 grep --color=auto autovacuum
[root@localhost ~]#

同時亦可通過查詢pg_settings獲得

[local:/data/run/pg12]:5120 pg12@testdb=# SELECT name, setting FROM pg_settings WHERE name='autovacuum';
    name    | setting 
------------+---------
 autovacuum | on
(1 row)

如autovacuum已開啟,但結果沒有如我們預期,那麼問題可能出現在statistics collector上面,autovacuum依賴statistics collector用於確定何時以及間隔多少時間應該執行。通常來說,statistics collector應啟用,但如果禁用此項,對autovacuum的正常執行會有較大影響。通過檢查track_counts配置項來檢查statistics collector是否啟用。

[local:/data/run/pg12]:5120 pg12@testdb=# SELECT name, setting FROM pg_settings WHERE name='track_counts';
     name     | setting 
--------------+---------
 track_counts | on
(1 row)
[local:/data/run/pg12]:5120 pg12@testdb=#

如track_counts為OFF,則statistics collector不會更新dead rows資訊,而該項是autovacuum所依賴的資訊。

[local:/data/run/pg12]:5120 pg12@testdb=# set track_counts=off;
SET
[local:/data/run/pg12]:5120 pg12@testdb=# update t1 set id = 2;
UPDATE 80000
[local:/data/run/pg12]:5120 pg12@testdb=# update t1 set id = 3;
UPDATE 80000
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT schemaname, relname, n_dead_tup, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
 schemaname |      relname       | n_dead_tup |          last_vacuum          |        last_autovacuum        
------------+--------------------+------------+-------------------------------+-------------------------------
 public     | tbl                |          0 |                               | 
 public     | t2                 |          0 |                               | 
 public     | b                  |          0 |                               | 
 public     | a                  |          0 |                               | 
 public     | rel                |          0 |                               | 
 public     | t_count            |          0 |                               | 
 public     | t_big_autovacuum_1 |          0 |                               | 
 public     | t_autovacuum_1     |          0 |                               | 
 public     | t1                 |          0 | 2019-12-10 14:29:31.420908+08 | 2019-12-10 14:38:02.771566+08
(9 rows)
[local:/data/run/pg12]:5120 pg12@testdb=#

手工把track_counts設定為off,更新t1,查詢pg_stat_user_tables發現n_dead_tup沒有統計dead rows,導致autovacuum並沒有對t1表進行“vacuum”。
手工設定track_counts為on,但沒有觸發統計資訊的更新,退出psql重新登入,更新資料表後才會出現新的統計資訊

[local:/data/run/pg12]:5120 pg12@testdb=# set track_counts=on;
SET
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT schemaname, relname, n_dead_tup, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
 schemaname |      relname       | n_dead_tup |          last_vacuum          |        last_autovacuum        
------------+--------------------+------------+-------------------------------+-------------------------------
 public     | tbl                |          0 |                               | 
 public     | t2                 |          0 |                               | 
 public     | b                  |          0 |                               | 
 public     | a                  |          0 |                               | 
 public     | rel                |          0 |                               | 
 public     | t_count            |          0 |                               | 
 public     | t_big_autovacuum_1 |          0 |                               | 
 public     | t_autovacuum_1     |          0 |                               | 
 public     | t1                 |          0 | 2019-12-10 14:29:31.420908+08 | 2019-12-10 14:38:02.771566+08
(9 rows)
[local:/data/run/pg12]:5120 pg12@testdb=# \q
[pg12@localhost ~]$ psql
Expanded display is used automatically.
psql (12.1)
Type "help" for help.
[local:/data/run/pg12]:5120 pg12@testdb=# update t1 set id = 100;
UPDATE 80000
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT schemaname, relname, n_dead_tup, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
 schemaname |      relname       | n_dead_tup |          last_vacuum          |        last_autovacuum        
------------+--------------------+------------+-------------------------------+-------------------------------
 public     | tbl                |          0 |                               | 
 public     | t2                 |          0 |                               | 
 public     | b                  |          0 |                               | 
 public     | a                  |          0 |                               | 
 public     | rel                |          0 |                               | 
 public     | t_count            |          0 |                               | 
 public     | t_big_autovacuum_1 |          0 |                               | 
 public     | t_autovacuum_1     |          0 |                               | 
 public     | t1                 |      79868 | 2019-12-10 14:29:31.420908+08 | 2019-12-10 14:38:02.771566+08
(9 rows)

2.The autovacuum process is disabled on one or more tables
PG可在表級別上設定autovacuum是否生效

[local:/data/run/pg12]:5120 pg12@testdb=# create table t2(id int);
CREATE TABLE
[local:/data/run/pg12]:5120 pg12@testdb=# alter table t2 SET (autovacuum_enabled = false);
ALTER TABLE
[local:/data/run/pg12]:5120 pg12@testdb=# \d t2
                 Table "public.t2"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 id     | integer |           |          | 
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT reloptions FROM pg_class WHERE relname='t2';
         reloptions         
----------------------------
 {autovacuum_enabled=false}
(1 row)
[local:/data/run/pg12]:5120 pg12@testdb=#

在t2上插入資料並更新

[local:/data/run/pg12]:5120 pg12@testdb=# insert into t2 select generate_series(1,100000);
INSERT 0 100000
[local:/data/run/pg12]:5120 pg12@testdb=# update t2 set id = 1;
UPDATE 100000
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT schemaname, relname, n_dead_tup, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
 schemaname |      relname       | n_dead_tup |          last_vacuum          |        last_autovacuum        
------------+--------------------+------------+-------------------------------+-------------------------------
 public     | tbl                |          0 |                               | 
 public     | b                  |          0 |                               | 
 public     | a                  |          0 |                               | 
 public     | rel                |          0 |                               | 
 public     | t2                 |     100000 |                               | 
 public     | t_count            |          0 |                               | 
 public     | t_big_autovacuum_1 |          0 |                               | 
 public     | t_autovacuum_1     |          0 |                               | 
 public     | t1                 |          0 | 2019-12-10 14:29:31.420908+08 | 2019-12-10 14:59:57.269249+08
(9 rows)

t2的dead rows為100000,但60s超時後,autovacuum並沒有對該表進行vacuum處理。

[local:/data/run/pg12]:5120 pg12@testdb=# \! date
Tue Dec 10 15:06:54 CST 2019
[local:/data/run/pg12]:5120 pg12@testdb=# \! date
Tue Dec 10 15:08:28 CST 2019
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT schemaname, relname, n_dead_tup, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
 schemaname |      relname       | n_dead_tup |          last_vacuum          |        last_autovacuum        
------------+--------------------+------------+-------------------------------+-------------------------------
 public     | tbl                |          0 |                               | 
 public     | b                  |          0 |                               | 
 public     | a                  |          0 |                               | 
 public     | rel                |          0 |                               | 
 public     | t2                 |     100000 |                               | 
 public     | t_count            |          0 |                               | 
 public     | t_big_autovacuum_1 |          0 |                               | 
 public     | t_autovacuum_1     |          0 |                               | 
 public     | t1                 |          0 | 2019-12-10 14:29:31.420908+08 | 2019-12-10 14:59:57.269249+08
(9 rows)
[local:/data/run/pg12]:5120 pg12@testdb=#

設定資料表autovacuum_enabled為true,等待60s,這時候發現t2已被vacuum

[local:/data/run/pg12]:5120 pg12@testdb=# alter table t2 SET (autovacuum_enabled = true);
ALTER TABLE
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT schemaname, relname, n_dead_tup, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
 schemaname |      relname       | n_dead_tup |          last_vacuum          |        last_autovacuum        
------------+--------------------+------------+-------------------------------+-------------------------------
 public     | tbl                |          0 |                               | 
 public     | b                  |          0 |                               | 
 public     | a                  |          0 |                               | 
 public     | rel                |          0 |                               | 
 public     | t2                 |     100000 |                               | 
 public     | t_count            |          0 |                               | 
 public     | t_big_autovacuum_1 |          0 |                               | 
 public     | t_autovacuum_1     |          0 |                               | 
 public     | t1                 |          0 | 2019-12-10 14:29:31.420908+08 | 2019-12-10 14:59:57.269249+08
(9 rows)
[local:/data/run/pg12]:5120 pg12@testdb=# \! date
Tue Dec 10 15:09:05 CST 2019
[local:/data/run/pg12]:5120 pg12@testdb=# 
[local:/data/run/pg12]:5120 pg12@testdb=# \! date
Tue Dec 10 15:10:26 CST 2019
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT schemaname, relname, n_dead_tup, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
 schemaname |      relname       | n_dead_tup |          last_vacuum          |        last_autovacuum        
------------+--------------------+------------+-------------------------------+-------------------------------
 public     | tbl                |          0 |                               | 
 public     | b                  |          0 |                               | 
 public     | a                  |          0 |                               | 
 public     | rel                |          0 |                               | 
 public     | t2                 |          0 |                               | 2019-12-10 15:09:57.621123+08
 public     | t_count            |          0 |                               | 
 public     | t_big_autovacuum_1 |          0 |                               | 
 public     | t_autovacuum_1     |          0 |                               | 
 public     | t1                 |          0 | 2019-12-10 14:29:31.420908+08 | 2019-12-10 14:59:57.269249+08
(9 rows)
[local:/data/run/pg12]:5120 pg12@testdb=#

3.Autovacuuming settings aren’t keeping pace with updates
如果autovacuum已啟用,但沒有我們想象中那麼頻繁的執行,這時候需要調整預設的配置選項。

[local:/data/run/pg12]:5120 pg12@testdb=# SELECT name,setting,boot_val,pending_restart from pg_settings where category like 'Autovacuum';
                name                 |  setting  | boot_val  | pending_restart 
-------------------------------------+-----------+-----------+-----------------
 autovacuum                          | on        | on        | f
 autovacuum_analyze_scale_factor     | 0.1       | 0.1       | f
 autovacuum_analyze_threshold        | 50        | 50        | f
 autovacuum_freeze_max_age           | 200000000 | 200000000 | f
 autovacuum_max_workers              | 3         | 3         | f
 autovacuum_multixact_freeze_max_age | 400000000 | 400000000 | f
 autovacuum_naptime                  | 60        | 60        | f
 autovacuum_vacuum_cost_delay        | 2         | 2         | f
 autovacuum_vacuum_cost_limit        | -1        | -1        | f
 autovacuum_vacuum_scale_factor      | 0.2       | 0.2       | f
 autovacuum_vacuum_threshold         | 50        | 50        | f
(11 rows)

查詢pg_settings,其中setting為當前配置的值,boot_val是預設值,可以看到當前庫的配置與預設值一樣。
確定autovacuum執行頻度的引數有:
1.autovacuum_vacuum_threshold,觸發閾值,預設為50
2.autovacuum_vacuum_scale_factor,觸發dead rows率,預設為0.2,即20%
3.表的估算行數,儲存在pg_class.reltuples中
PG結合上述3個引數來確定autovacuum是否需要執行,計算公式如下:

autovacuuming threshold = autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor * estimated number of rows in the table)

通過調整引數,如減少autovacuum_vacuum_scale_factor可觸發VACUUMs執行得更頻繁。
PG還提供了log_autovacuum_min_duration引數來診斷autovacuum的執行間隔時間,如超過該時間設定則會記錄在日誌中,這樣有助於診斷autovacuum的設定是否合理。

4.Lock conflicts
vacuum的執行需要持有SHARE UPDATE EXCLUSIVE lock,如有session持有的鎖(SHARE UPDATE EXCLUSIVE,SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE)與其衝突,則無法執行vacuum。
更新t2

[local:/data/run/pg12]:5120 pg12@testdb=#  begin;
BEGIN
[local:/data/run/pg12]:5120 pg12@testdb=#*  update t2 set id = 10;
UPDATE 100000
[local:/data/run/pg12]:5120 pg12@testdb=#* commit;
COMMIT
[local:/data/run/pg12]:5120 pg12@testdb=# 
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT schemaname, relname, n_dead_tup, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
 schemaname |      relname       | n_dead_tup |          last_vacuum          |        last_autovacuum        
------------+--------------------+------------+-------------------------------+-------------------------------
 public     | tbl                |          0 |                               | 
 public     | b                  |          0 |                               | 
 public     | a                  |          0 |                               | 
 public     | rel                |          0 |                               | 
 public     | t2                 |     100000 |                               | 2019-12-10 15:09:57.621123+08
 public     | t_count            |          0 |                               | 
 public     | t_big_autovacuum_1 |          0 |                               | 
 public     | t_autovacuum_1     |          0 |                               | 
 public     | t1                 |          0 | 2019-12-10 14:29:31.420908+08 | 2019-12-10 14:59:57.269249+08
(9 rows)

開另外一個視窗,lock表

[local:/data/run/pg12]:5120 pg12@testdb=# begin;
BEGIN
[local:/data/run/pg12]:5120 pg12@testdb=#* lock t2 in SHARE UPDATE EXCLUSIVE mode;
LOCK TABLE
[local:/data/run/pg12]:5120 pg12@testdb=#*

autovacuum由於無法獲取鎖,因此無法對錶進行vacuum

[local:/data/run/pg12]:5120 pg12@testdb=# select pid,locktype,relation::regclass,mode,granted from pg_locks where pid <> pg_backend_pid();
  pid  |  locktype  | relation |           mode           | granted 
-------+------------+----------+--------------------------+---------
 58050 | virtualxid |          | ExclusiveLock            | t
 58050 | relation   | t2       | ShareUpdateExclusiveLock | t
(2 rows)
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT schemaname, relname, n_dead_tup, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
 schemaname |      relname       | n_dead_tup |          last_vacuum          |        last_autovacuum        
------------+--------------------+------------+-------------------------------+-------------------------------
 public     | tbl                |          0 |                               | 
 public     | b                  |          0 |                               | 
 public     | a                  |          0 |                               | 
 public     | rel                |          0 |                               | 
 public     | t2                 |     100000 |                               | 2019-12-10 15:09:57.621123+08
 public     | t_count            |          0 |                               | 
 public     | t_big_autovacuum_1 |          0 |                               | 
 public     | t_autovacuum_1     |          0 |                               | 
 public     | t1                 |          0 | 2019-12-10 14:29:31.420908+08 | 2019-12-10 14:59:57.269249+08
(9 rows)

釋放鎖

[local:/data/run/pg12]:5120 pg12@testdb=#* commit;
COMMIT
[local:/data/run/pg12]:5120 pg12@testdb=#

autovacuum可正常執行,last_autovacuum已更新

[local:/data/run/pg12]:5120 pg12@testdb=# \! date 
Tue Dec 10 15:33:01 CST 2019
[local:/data/run/pg12]:5120 pg12@testdb=# \! date 
Tue Dec 10 15:33:40 CST 2019
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT schemaname, relname, n_dead_tup, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
 schemaname |      relname       | n_dead_tup |          last_vacuum          |        last_autovacuum        
------------+--------------------+------------+-------------------------------+-------------------------------
 public     | tbl                |          0 |                               | 
 public     | b                  |          0 |                               | 
 public     | a                  |          0 |                               | 
 public     | rel                |          0 |                               | 
 public     | t2                 |          0 |                               | 2019-12-10 15:32:58.743764+08
 public     | t_count            |          0 |                               | 
 public     | t_big_autovacuum_1 |          0 |                               | 
 public     | t_autovacuum_1     |          0 |                               | 
 public     | t1                 |          0 | 2019-12-10 14:29:31.420908+08 | 2019-12-10 14:59:57.269249+08
(9 rows)
[local:/data/run/pg12]:5120 pg12@testdb=#

通過程式狀態亦可診斷

-- session 1
[local:/data/run/pg12]:5120 pg12@testdb=# begin;
BEGIN
[local:/data/run/pg12]:5120 pg12@testdb=#* lock t2 in SHARE UPDATE EXCLUSIVE mode;
LOCK TABLE
[local:/data/run/pg12]:5120 pg12@testdb=#* 
-- session 2
[local:/data/run/pg12]:5120 pg12@testdb=# vacuum t2;
-- console
[pg12@localhost ~]$ ps -ef|grep 'waiting'
pg12     56540 55944  0 14:59 ?        00:00:01 postgres: pg12 testdb [local] VACUUM waiting
pg12     58502 53760  0 15:36 pts/2    00:00:00 grep --color=auto waiting
[pg12@localhost ~]$

程式顯示為VACUUM waiting

5.Long-running open transactions
MVCC的一個副作用是vacuum不能清理那些其他事務還需要訪問的過期dead rows。因此,如無必要確保事務正常完結。
通過檢視pg_stat_activity可監控事務的狀態

[local:/data/run/pg12]:5120 pg12@testdb=# SELECT pid,xact_start, state, usename FROM pg_stat_activity;
  pid  |          xact_start           | state  | usename 
-------+-------------------------------+--------+---------
 55958 |                               |        | 
 55960 |                               |        | pg12
 56540 | 2019-12-10 15:42:47.210597+08 | active | pg12
 58050 |                               | idle   | pg12
 55956 |                               |        | 
 55955 |                               |        | 
 55957 |                               |        | 
(7 rows)

如state列顯示為disabled,則需檢查系統引數track_activities

[local:/data/run/pg12]:5120 pg12@testdb=# show track_activities;
 track_activities 
------------------
 on
(1 row)
[local:/data/run/pg12]:5120 pg12@testdb=# set track_activities=off;
SET
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT pid,xact_start, state, usename FROM pg_stat_activity;
  pid  | xact_start |  state   | usename 
-------+------------+----------+---------
 55958 |            |          | 
 55960 |            |          | pg12
 56540 |            | disabled | pg12
 58050 |            | idle     | pg12
 55956 |            |          | 
 55955 |            |          | 
 55957 |            |          | 
(7 rows)
[local:/data/run/pg12]:5120 pg12@testdb=# 
[local:/data/run/pg12]:5120 pg12@testdb=# set track_activities=on;
SET
[local:/data/run/pg12]:5120 pg12@testdb=# SELECT pid,xact_start, state, usename FROM pg_stat_activity;
  pid  |          xact_start           | state  | usename 
-------+-------------------------------+--------+---------
 55958 |                               |        | 
 55960 |                               |        | pg12
 56540 | 2019-12-10 15:52:19.500017+08 | active | pg12
 58050 |                               | idle   | pg12
 55956 |                               |        | 
 55955 |                               |        | 
 55957 |                               |        | 
(7 rows)

對於長時間閒置的session,PG提供了引數idle_in_transaction_session_timeout 用於控制這些session,超過該引數配置的時間(以ms為單位),PG會自動終止這些session。

參考資料
Monitoring PostgreSQL VACUUM processes

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/6906/viewspace-2667766/,如需轉載,請註明出處,否則將追究法律責任。

相關文章