空間都去哪裡了?(中)

資料庫工作筆記發表於2023-11-16

來源:PostgreSQL學徒

前言

話接上回,空間都去哪裡了?(上),當異常退出的時候,可能會留下許多孤兒檔案,就像蛀蟲一樣,悄無聲息蠶食你的空間。上回提到可以使用SQL+人肉巡檢來排查:

SELECT
    *
FROM
    pg_ls_dir('/home/postgres/16data/base/5'AS file
WHERE
    file ~ '^[0-9]*'
    AND file::text NOT IN (
        SELECT
            oid::text
        FROM
            pg_class);

但是效率太低,並且還會誤報,當一個進行中的事務正在建表、重寫表的時候,都會被SQL查詢出來,所以你還要人為仔細檢查才行。那麼有沒有更加優雅的辦法?

pg_catacheck

首先我想到的是pg_catacheck,顧名思義,檢查系統表

!! PostgreSQL stores the metadata for SQL objects such as tables and functions using special tables called system catalog tables. Users do not normally modify these tables directly, but instead modify them using SQL commands such as CREATE, ALTER, and DROP.

DDL本質上就是系統表的DML,因此一旦系統表損壞了,就很尷尬。那麼既然是孤兒檔案,pg_catacheck能否檢測出來呢?Try try see...

postgres=# begin;
BEGIN
postgres=*# create table tt1(id int,info text);
CREATE TABLE
postgres=*# insert into tt1 select n,'test' from generate_series(1,100000) as n;
INSERT 0 100000
postgres=*# select pg_relation_filepath('tt1');
 pg_relation_filepath 
----------------------
 base/5/49765
(1 row)

postgres=*# select pg_backend_pid(),txid_current();
 pg_backend_pid | txid_current 
----------------+--------------
          20680 |         1676
(1 row)

然後kill -9殺掉該會話

[postgres@xiongcc 5]$ kill -9 20680
[postgres@xiongcc 5]$ ls -lrth 49765
-rw------- 1 postgres postgres 4.3M Nov 14 21:59 49765

這樣操作過後,系統表中便會留下死元祖,同時留下孤兒檔案

postgres=# select lp, t_xmin, t_xmax, t_ctid,
       infomask(t_infomask, 1) as infomask,
       infomask(t_infomask2, 2) as infomask2
from heap_page_items(get_raw_page('pg_class', 12)) where t_xmin = 1676;
 lp | t_xmin | t_xmax | t_ctid  |                 infomask                  |    infomask2    
----+--------+--------+---------+-------------------------------------------+-----------------
 25 |   1676 |   1676 | (12,28) | XMIN_INVALID|HASNULL                      | HOT_UPDATED
 26 |   1676 |      0 | (12,26) | XMAX_INVALID|XMIN_INVALID|HASNULL         | 
 27 |   1676 |      0 | (12,27) | XMAX_INVALID|XMIN_INVALID|HASNULL         | 
 28 |   1676 |      0 | (12,28) | UPDATED|XMAX_INVALID|XMIN_INVALID|HASNULL | HEAP_ONLY_TUPLE
(4 rows)

但是很遺憾,pg_catcheck並不能檢測出來,其實並不意外,因為本質上系統表並未損壞,充其量只是留下了一條死元祖,大不了vacuum full重建一下。

[postgres@xiongcc 5]$ pg_catcheck
progress: done (0 inconsistencies, 0 warnings, 0 errors)

當然假如你採用非常規操作惡意損壞,是可以檢測出來的

postgres=# create table t1(id int);
CREATE TABLE
postgres=# insert into t1 values(generate_series(1,100));
INSERT 0 100
postgres=# delete from pg_class where relname = 't1';
DELETE 2
[postgres@xiongcc 5]$ pg_catcheck
notice: pg_type row has invalid typrelid "33371"no matching entry in pg_class
row identityoid="33373"
notice: pg_type row has invalid typrelid "57946"no matching entry in pg_class
row identityoid="57948"
notice: pg_attribute row has invalid attrelid "33371"no matching entry in pg_class
row identity: attrelid="33371" attname="id" attnum="1"
notice: pg_attribute row has invalid attrelid "33371"no matching entry in pg_class
row identity: attrelid="33371" attname="ctid" attnum="-1"
notice: pg_attribute row has invalid attrelid "33371"no matching entry in pg_class
row identity: attrelid="33371" attname="xmin" attnum="-2"
notice: pg_attribute row has invalid attrelid "33371"no matching entry in pg_class
...
...

根據需要,可以搭配checksum定期巡檢,防患於未然。

pg_orphaned

這兩天我再搗鼓的時候,偶然發現了這個外掛,頓時喜出望外!根據其名字便可知曉作用,這個外掛提供了五個函式

  • pg_list_orphaned(interval): to list orphaned files. Orphaned files older than the interval parameter (default 1 Day) are listed with the "older" field set to true. 列出孤兒檔案,預設情況下超過1天的孤兒檔案,older欄位會顯示會true
  • pg_move_orphaned(interval): to move orphaned files to a "orphaned_backup" directory. Only orphaned files older than the interval parameter (default 1 Day) are moved. 將孤兒檔案移除到"orphaned_backup"目錄中
  • pg_list_orphaned_moved(): to list the orphaned files that have been moved to the "orphaned_backup" directory. 列出"orphaned_backup"中的孤兒檔案
  • pg_move_back_orphaned(): to move back the orphaned files from the "orphaned_backup" directory to their orginal location (if still orphaned). 將"orphaned_backup"中的孤兒檔案挪回來
  • pg_remove_moved_orphaned(): to remove the orphaned files located in the "orphaned_backup" directory.徹底刪除

感覺有點像回收站的味道。讓我們看下效果:

postgres=# select * from pg_list_orphaned();
  dbname  |  path  |   name    |  size   |        mod_time        | relfilenode | reloid | older 
----------+--------+-----------+---------+------------------------+-------------+--------+-------
 postgres | base/5 | 16998     |       0 | 2023-11-04 13:06:25+08 |       16998 |      0 | t
 postgres | base/5 | 49769     |    8192 | 2023-11-14 21:59:02+08 |       49769 |      0 | f
 postgres | base/5 | 41566     |    8192 | 2023-11-14 21:46:34+08 |       41566 |      0 | f
 postgres | base/5 | 16999     |    8192 | 2023-11-04 13:10:30+08 |       16999 |      0 | t
 postgres | base/5 | 33398     |       0 | 2023-11-14 21:45:06+08 |       33398 |      0 | f
 postgres | base/5 | 25292     |    8192 | 2023-11-10 11:28:11+08 |       25292 |      0 | t
 postgres | base/5 | 49768     |       0 | 2023-11-14 21:58:21+08 |       49768 |      0 | f
 postgres | base/5 | 33395     | 4431872 | 2023-11-14 21:45:48+08 |       33395 |      0 | f
 postgres | base/5 | 33395_fsm |   24576 | 2023-11-14 21:45:48+08 |       33395 |      0 | f
 postgres | base/5 | 41565     |       0 | 2023-11-14 21:46:02+08 |       41565 |      0 | f
 postgres | base/5 | 33371     |       0 | 2023-11-10 11:48:59+08 |       33371 |      0 | t
 postgres | base/5 | 33399     |    8192 | 2023-11-14 21:45:48+08 |       33399 |      0 | f
 postgres | base/5 | 57946     |    8192 | 2023-11-14 22:04:08+08 |       57946 |      0 | f
 postgres | base/5 | 49765     | 4431872 | 2023-11-14 21:59:02+08 |       49765 |      0 | f
 postgres | base/5 | 49765_fsm |   24576 | 2023-11-14 21:59:02+08 |       49765 |      0 | f
 postgres | base/5 | 25288     | 6832128 | 2023-11-10 11:28:11+08 |       25288 |      0 | t
 postgres | base/5 | 25288_fsm |   24576 | 2023-11-10 11:28:11+08 |       25288 |      0 | t
 postgres | base/5 | 25291     |       0 | 2023-11-10 11:27:20+08 |       25291 |      0 | t
(18 rows)

好傢伙,原來系統中已經有這麼多孤兒檔案了。那讓我們刪除一下

postgres=# select pg_move_orphaned();
 pg_move_orphaned 
------------------
                7
(1 row)

postgres=# select * from pg_list_orphaned();
  dbname  |  path  |   name    |  size   |        mod_time        | relfilenode | reloid | older 
----------+--------+-----------+---------+------------------------+-------------+--------+-------
 postgres | base/5 | 49769     |    8192 | 2023-11-14 21:59:02+08 |       49769 |      0 | f
 postgres | base/5 | 41566     |    8192 | 2023-11-14 21:46:34+08 |       41566 |      0 | f
 postgres | base/5 | 33398     |       0 | 2023-11-14 21:45:06+08 |       33398 |      0 | f
 postgres | base/5 | 49768     |       0 | 2023-11-14 21:58:21+08 |       49768 |      0 | f
 postgres | base/5 | 33395     | 4431872 | 2023-11-14 21:45:48+08 |       33395 |      0 | f
 postgres | base/5 | 33395_fsm |   24576 | 2023-11-14 21:45:48+08 |       33395 |      0 | f
 postgres | base/5 | 41565     |       0 | 2023-11-14 21:46:02+08 |       41565 |      0 | f
 postgres | base/5 | 33399     |    8192 | 2023-11-14 21:45:48+08 |       33399 |      0 | f
 postgres | base/5 | 57946     |    8192 | 2023-11-14 22:09:06+08 |       57946 |      0 | f
 postgres | base/5 | 49765     | 4431872 | 2023-11-14 21:59:02+08 |       49765 |      0 | f
 postgres | base/5 | 49765_fsm |   24576 | 2023-11-14 21:59:02+08 |       49765 |      0 | f
(11 rows)

刪除過後,這些檔案會放到orphaned_backup目錄中,那麼這個目錄在哪裡呢?讓我們看下程式碼

 /* default tablespace */
 if (!restore)
  snprintf(dir, sizeof(dir), "base/%u", dbOid);
 else
  snprintf(dir, sizeof(dir), "%s/%u/base/%u", orphaned_backup_dir, dbOid, dbOid);

很簡單,位於例項目錄下,以資料庫的oid來命名。

[postgres@xiongcc 16data]$ cd orphaned_backup/
[postgres@xiongcc orphaned_backup]$ tree 
.
└── 5
    └── base
        └── 5
            ├── 16998
            ├── 16999
            ├── 25288
            ├── 25288_fsm
            ├── 25291
            ├── 25292
            └── 33371

3 directories, 7 files

小結

很不錯的一個工具,麻雀雖小,短短1000行程式碼,卻能幫助我們解決很大的煩惱。

下一期讓我們聊聊——空間都去哪裡了(下),那些奇奇怪怪的場景。

空間都去哪裡了?(中)





來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/70027826/viewspace-2995588/,如需轉載,請註明出處,否則將追究法律責任。

相關文章