PostgreSQL DBA(19) - REDO point淺析

husthxd發表於2018-12-20

What's REDO point?

REDO point is the point PostgreSQL start to recover from;that is, the location to write the XLOG record at the moment when the latest checkpoint is started.

REDO point是PostgreSQL啟動恢復的起始點,也就是說REDO point是最後一次checkpoint啟動時寫XLOG Record的位置.

一、REDO point的儲存

checkpointer程式啟動時,從pg_control檔案中獲取Redo point並儲存在記憶體中,在執行checkpoint時更新Redo point為當前即將寫XLOG Record的位置,checkpoint執行成功後把Redo point更新到pg_control檔案中.
類似的,資料庫啟動需要執行恢復時,從pg_control檔案中獲取REDO point進行恢復.
pg_control檔案位於$PGDATA/global目錄中,可透過命令pg_controldata檢視檔案中的內容.

[xdb@localhost pg111db]$ find ./ -name pg_control
./global/pg_control
[xdb@localhost pg111db]$ pg_controldata
pg_control version number:            1100
Catalog version number:               201809051
Database system identifier:           6624362124887945794
Database cluster state:               in production
pg_control last modified:             Thu 20 Dec 2018 03:34:05 PM CST
Latest checkpoint location:           1/48447DF0
Latest checkpoint's REDO location:    1/48447DF0
Latest checkpoint's REDO WAL file:    000000010000000100000048
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:1979
Latest checkpoint's NextOID:          25238
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        561
Latest checkpoint's oldestXID's DB:   16402
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 16402
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Thu 20 Dec 2018 03:34:05 PM CST
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    minimal
wal_log_hints setting:                off
max_connections setting:              100
max_worker_processes setting:         8
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0
Mock authentication nonce:            90bf37566703859a557b7f20688eb944b6335b5c3d36f5530941ebf1dfa777c1
[xdb@localhost pg111db]$ 

其中

Latest checkpoint's REDO location:    1/48447DF0

記錄的資訊即為REDO point.

二、REDO point的變更

1.檢視當前的REDO point

[xdb@localhost pg111db]$ pg_controldata|grep 'REDO location'
Latest checkpoint's REDO location:    1/48448150

REDO point為1/48448150.

2.執行DML操作
插入3條記錄

testdb=# insert into cp values(7);
INSERT 0 1
testdb=# insert into cp values(8);
INSERT 0 1
testdb=# insert into cp values(9);
INSERT 0 1

檢視1/48448150後的XLOG Record

[xdb@localhost pg_wal]$ pg_waldump -p ./ -s 1/48448150
rmgr: XLOG        len (rec/tot):    106/   106, tx:          0, lsn: 1/48448150, prev 1/484480E0, desc: CHECKPOINT_ONLINE redo 1/48448150; tli 1; prev tli 1; fpw true; xid 0:1982; oid 25238; multi 1; offset 0; oldest xid 561 in DB 16402; oldest multi 1 in DB 16402; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; online
rmgr: Heap        len (rec/tot):     54/   474, tx:       1982, lsn: 1/484481C0, prev 1/48448150, desc: INSERT off 11, blkref #0: rel 1663/16402/17046 blk 0 FPW
rmgr: Transaction len (rec/tot):     34/    34, tx:       1982, lsn: 1/484483A0, prev 1/484481C0, desc: COMMIT 2018-12-20 16:10:17.471639 CST
rmgr: Heap        len (rec/tot):     59/    59, tx:       1983, lsn: 1/484483C8, prev 1/484483A0, desc: INSERT off 12, blkref #0: rel 1663/16402/17046 blk 0
rmgr: Transaction len (rec/tot):     34/    34, tx:       1983, lsn: 1/48448408, prev 1/484483C8, desc: COMMIT 2018-12-20 16:10:20.170594 CST
rmgr: Heap        len (rec/tot):     59/    59, tx:       1984, lsn: 1/48448430, prev 1/48448408, desc: INSERT off 13, blkref #0: rel 1663/16402/17046 blk 0
rmgr: Transaction len (rec/tot):     34/    34, tx:       1984, lsn: 1/48448470, prev 1/48448430, desc: COMMIT 2018-12-20 16:10:22.268365 CST
pg_waldump: FATAL:  error in WAL record at 1/48448470: invalid record length at 1/48448498: wanted 24, got 0

最後一個XLOG Record記錄的位置為1/48448470,加上記錄大小34(十六進位制為0x22),位置為1/48448492,按理論上來說,如果現在執行checkpoint,該位置為REDO point.

3.執行checkpoint

testdb=# checkpoint;
CHECKPOINT

檢視pg_control檔案內容

[xdb@localhost pg111db]$ pg_controldata|grep 'REDO location'
Latest checkpoint's REDO location:    1/48448498

再次檢視XLOG Record記錄

[xdb@localhost pg_wal]$ pg_waldump -p ./ -s 1/48448150
rmgr: XLOG        len (rec/tot):    106/   106, tx:          0, lsn: 1/48448150, prev 1/484480E0, desc: CHECKPOINT_ONLINE redo 1/48448150; tli 1; prev tli 1; fpw true; xid 0:1982; oid 25238; multi 1; offset 0; oldest xid 561 in DB 16402; oldest multi 1 in DB 16402; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; online
rmgr: Heap        len (rec/tot):     54/   474, tx:       1982, lsn: 1/484481C0, prev 1/48448150, desc: INSERT off 11, blkref #0: rel 1663/16402/17046 blk 0 FPW
rmgr: Transaction len (rec/tot):     34/    34, tx:       1982, lsn: 1/484483A0, prev 1/484481C0, desc: COMMIT 2018-12-20 16:10:17.471639 CST
rmgr: Heap        len (rec/tot):     59/    59, tx:       1983, lsn: 1/484483C8, prev 1/484483A0, desc: INSERT off 12, blkref #0: rel 1663/16402/17046 blk 0
rmgr: Transaction len (rec/tot):     34/    34, tx:       1983, lsn: 1/48448408, prev 1/484483C8, desc: COMMIT 2018-12-20 16:10:20.170594 CST
rmgr: Heap        len (rec/tot):     59/    59, tx:       1984, lsn: 1/48448430, prev 1/48448408, desc: INSERT off 13, blkref #0: rel 1663/16402/17046 blk 0
rmgr: Transaction len (rec/tot):     34/    34, tx:       1984, lsn: 1/48448470, prev 1/48448430, desc: COMMIT 2018-12-20 16:10:22.268365 CST
rmgr: XLOG        len (rec/tot):    106/   106, tx:          0, lsn: 1/48448498, prev 1/48448470, desc: CHECKPOINT_ONLINE redo 1/48448498; tli 1; prev tli 1; fpw true; xid 0:1985; oid 25238; multi 1; offset 0; oldest xid 561 in DB 16402; oldest multi 1 in DB 16402; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; online
pg_waldump: FATAL:  error in WAL record at 1/48448498: invalid record length at 1/48448508: wanted 24, got 0
[xdb@localhost pg_wal]$ 

可以看到1/48448498是最後一條checkpoint記錄的起始寫入位置,REDO point為1/48448498,比預想的要多出6個位元組(這6個位元組都是0x00,用於補齊?).

4.dump WAL segment file
再次使用hexdump工具檢視WAL segment file檔案.

1/48448470開始的"COMMIT"記錄

[xdb@localhost pg_wal]$ echo "obase=10;ibase=16;448470"|bc
4490352 --> 檔案位置偏移
[xdb@localhost pg_wal]$ hexdump -C 000000010000000100000048 -s 4490352 -n 34 
00448470  22 00 00 00 c0 07 00 00  30 84 44 48 01 00 00 00  |".......0.DH....|
00448480  00 01 00 00 83 5e 7a d4  ff 08 cd 2f 4a 6e 6e 20  |.....^z..../Jnn |
00448490  02 00                                             |..|
00448492

XLOG Record的頭部首先是XLogRecord結構體,第一個域是uint32的record的大小,即0x00000022,十進位制為34(大小無異).

後續的6個位元組
均為0x00

[xdb@localhost pg_wal]$ hexdump -C 000000010000000100000048 -s 4490386 -n 6 
00448492  00 00 00 00 00 00                                 |......|
00448498

XLOG Record for checkpoint

[xdb@localhost pg_wal]$ hexdump -C 000000010000000100000048 -s 4490392 -n 106 
00448498  6a 00 00 00 00 00 00 00  70 84 44 48 01 00 00 00  |j.......p.DH....|
004484a8  10 00 00 00 ac d2 8b 95  ff 50 98 84 44 48 01 00  |.........P..DH..|
004484b8  00 00 01 00 00 00 01 00  00 00 01 00 00 00 00 00  |................|
004484c8  00 00 c1 07 00 00 96 62  00 00 01 00 00 00 00 00  |.......b........|
004484d8  00 00 31 02 00 00 12 40  00 00 01 00 00 00 12 40  |..1....@.......@|
004484e8  00 00 ba 4e 1b 5c 00 00  00 00 00 00 00 00 00 00  |...N.\..........|
004484f8  00 00 00 00 00 00 00 00  00 00                    |..........|
00448502

大小為0x0000006A,即106B(頭部XLogRecord 24B + XLogRecordDataHeaderShort 2B + Checkpoint 80B ).
從FF 50為XLogRecordDataHeaderShort結構體的內容,0xFF為標誌位,0x50為Data的大小(即80B).
checkpoint記錄的內容詳見Checkpoint結構體,該結構體第一個域欄位為8個位元組的LSN-->0x00000001 48448498,
即REDO point:1/48448498.

三、參考資料

PostgreSQL 原始碼解讀(109)- WAL#5(相關資料結構)
PostgreSQL DBA(15) - WAL檔案結構
PostgreSQL DBA(16) - WAL segment file內部結構
PostgreSQL DBA(17) - XLOG Record data內部結構
PostgreSQL DBA(18) - pg_waldump工具簡介

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/6906/viewspace-2374774/,如需轉載,請註明出處,否則將追究法律責任。

相關文章