PostgreSQL 原始碼解讀(110)- WAL#6(Insert&WAL - XLogRe...
本節簡單介紹了XLogRecordAssemble函式的實現邏輯,該函式從已註冊的資料和緩衝區中組裝XLOG record到XLogRecData鏈中,為XLOG Record的插入作準備。
一、資料結構
全域性靜態變數
XLogRecordAssemble使用的全域性變數包括hdr_rdt/hdr_scratch/rdatas等.
/* flags for the in-progress insertion */
//用於插入過程中的標記資訊
static uint8 curinsert_flags = 0;
/*
* These are used to hold the record header while constructing a record.
* 'hdr_scratch' is not a plain variable, but is palloc'd at initialization,
* because we want it to be MAXALIGNed and padding bytes zeroed.
* 在構建XLOG Record時通常會儲存記錄的頭部資訊.
* 'hdr_scratch'並不是一個普通(plain)變數,而是在初始化時透過palloc初始化,
* 因為我們希望該變數已經是MAXALIGNed並且已被0x00填充.
*
* For simplicity, it's allocated large enough to hold the headers for any
* WAL record.
* 簡單起見,該變數預先會分配足夠大的空間用於儲存所有WAL Record的頭部資訊.
*/
static XLogRecData hdr_rdt;
static char *hdr_scratch = NULL;
#define SizeOfXlogOrigin (sizeof(RepOriginId) + sizeof(char))
#define HEADER_SCRATCH_SIZE \
(SizeOfXLogRecord + \
MaxSizeOfXLogRecordBlockHeader * (XLR_MAX_BLOCK_ID + 1) + \
SizeOfXLogRecordDataHeaderLong + SizeOfXlogOrigin)
/*
* An array of XLogRecData structs, to hold registered data.
* XLogRecData結構體陣列,儲存已註冊的資料.
*/
static XLogRecData *rdatas;
static int num_rdatas; /* entries currently used */
//已分配的空間大小
static int max_rdatas; /* allocated size */
//是否呼叫XLogBeginInsert函式
static bool begininsert_called = false;
static XLogCtlData *XLogCtl = NULL;
/* flags for the in-progress insertion */
static uint8 curinsert_flags = 0;
/*
* A chain of XLogRecDatas to hold the "main data" of a WAL record, registered
* with XLogRegisterData(...).
* 儲存WAL Record "main data"的XLogRecDatas資料鏈
*/
static XLogRecData *mainrdata_head;
static XLogRecData *mainrdata_last = (XLogRecData *) &mainrdata_head;
//鏈中某個位置的mainrdata大小
static uint32 mainrdata_len; /* total # of bytes in chain */
/*
* ProcLastRecPtr points to the start of the last XLOG record inserted by the
* current backend. It is updated for all inserts. XactLastRecEnd points to
* end+1 of the last record, and is reset when we end a top-level transaction,
* or start a new one; so it can be used to tell if the current transaction has
* created any XLOG records.
* ProcLastRecPtr指向當前後端插入的最後一條XLOG記錄的開頭。
* 它針對所有插入進行更新。
* XactLastRecEnd指向最後一條記錄的末尾位置 + 1,
* 並在結束頂級事務或啟動新事務時重置;
* 因此,它可以用來判斷當前事務是否建立了任何XLOG記錄。
*
* While in parallel mode, this may not be fully up to date. When committing,
* a transaction can assume this covers all xlog records written either by the
* user backend or by any parallel worker which was present at any point during
* the transaction. But when aborting, or when still in parallel mode, other
* parallel backends may have written WAL records at later LSNs than the value
* stored here. The parallel leader advances its own copy, when necessary,
* in WaitForParallelWorkersToFinish.
* 在並行模式下,這可能不是完全是最新的。
* 在提交時,事務可以假定覆蓋了使用者後臺程式或在事務期間出現的並行worker程式的所有xlog記錄。
* 但是,當中止時,或者仍然處於並行模式時,其他並行後臺程式可能在較晚的LSNs中寫入了WAL記錄,
* 而不是儲存在這裡的值。
* 當需要時,並行處理程式的leader在WaitForParallelWorkersToFinish中會推進自己的副本。
*/
XLogRecPtr ProcLastRecPtr = InvalidXLogRecPtr;
XLogRecPtr XactLastRecEnd = InvalidXLogRecPtr;
XLogRecPtr XactLastCommitEnd = InvalidXLogRecPtr;
/* For WALInsertLockAcquire/Release functions */
//用於WALInsertLockAcquire/Release函式
static int MyLockNo = 0;
static bool holdingAllLocks = false;
宏定義
XLogRegisterBuffer函式使用的flags
/* flags for XLogRegisterBuffer */
//XLogRegisterBuffer函式使用的flags
#define REGBUF_FORCE_IMAGE 0x01 /* 強制執行full-page-write;force a full-page image */
#define REGBUF_NO_IMAGE 0x02 /* 不需要FPI;don't take a full-page image */
#define REGBUF_WILL_INIT (0x04 | 0x02) /* 在回放時重新初始化page(表示NO_IMAGE);
* page will be re-initialized at
* replay (implies NO_IMAGE) */
#define REGBUF_STANDARD 0x08 /* 標準的page layout(資料在pd_lower和pd_upper之間的資料會被跳過)
* page follows "standard" page layout,
* (data between pd_lower and pd_upper
* will be skipped) */
#define REGBUF_KEEP_DATA 0x10 /* include data even if a full-page image
* is taken */
/*
* Flag bits for the record being inserted, set using XLogSetRecordFlags().
*/
#define XLOG_INCLUDE_ORIGIN 0x01 /* include the replication origin */
#define XLOG_MARK_UNIMPORTANT 0x02 /* record not important for durability */
XLogRecData
xloginsert.c中的函式構造一個XLogRecData結構體鏈用於標識最後的WAL記錄
/*
* The functions in xloginsert.c construct a chain of XLogRecData structs
* to represent the final WAL record.
* xloginsert.c中的函式構造一個XLogRecData結構體鏈用於標識最後的WAL記錄
*/
typedef struct XLogRecData
{
//鏈中的下一個結構體,如無則為NULL
struct XLogRecData *next; /* next struct in chain, or NULL */
//rmgr資料的起始地址
char *data; /* start of rmgr data to include */
//rmgr資料大小
uint32 len; /* length of rmgr data to include */
} XLogRecData;
registered_buffer
對於每一個使用XLogRegisterBuffer註冊的每個資料塊,填充到registered_buffer結構體中
/*
* For each block reference registered with XLogRegisterBuffer, we fill in
* a registered_buffer struct.
* 對於每一個使用XLogRegisterBuffer註冊的每個資料塊,
* 填充到registered_buffer結構體中
*/
typedef struct
{
//slot是否在使用?
bool in_use; /* is this slot in use? */
//REGBUF_* 相關標記
uint8 flags; /* REGBUF_* flags */
//定義關係和資料庫的識別符號
RelFileNode rnode; /* identifies the relation and block */
//fork程式編號
ForkNumber forkno;
//塊編號
BlockNumber block;
//頁內容
Page page; /* page content */
//rdata鏈中的資料總大小
uint32 rdata_len; /* total length of data in rdata chain */
//使用該資料塊註冊的資料鏈頭
XLogRecData *rdata_head; /* head of the chain of data registered with
* this block */
//使用該資料塊註冊的資料鏈尾
XLogRecData *rdata_tail; /* last entry in the chain, or &rdata_head if
* empty */
//臨時rdatas資料引用,用於儲存XLogRecordAssemble()中使用的備份塊資料
XLogRecData bkp_rdatas[2]; /* temporary rdatas used to hold references to
* backup block data in XLogRecordAssemble() */
/* buffer to store a compressed version of backup block image */
//用於儲存壓縮版本的備份塊映象的快取
char compressed_page[PGLZ_MAX_BLCKSZ];
} registered_buffer;
//registered_buffer指標(全域性變數)
static registered_buffer *registered_buffers;
//已分配的大小
static int max_registered_buffers; /* allocated size */
//最大塊號 + 1(當前註冊塊)
static int max_registered_block_id = 0; /* highest block_id + 1 currently
* registered */
二、原始碼解讀
XLogRecordAssemble函式從已註冊的資料和緩衝區中組裝XLOG record到XLogRecData鏈中,組裝完成後可以使用XLogInsertRecord()函式插入到WAL buffer中.
/*
* Assemble a WAL record from the registered data and buffers into an
* XLogRecData chain, ready for insertion with XLogInsertRecord().
* 從已註冊的資料和緩衝區中組裝XLOG record到XLogRecData鏈中,
* 組裝完成後可以使用XLogInsertRecord()函式插入.
*
* The record header fields are filled in, except for the xl_prev field. The
* calculated CRC does not include the record header yet.
* 除了xl_prev外,XLOG Record的header域已填充完畢.
* 計算的CRC還沒有包含header資訊.
*
* If there are any registered buffers, and a full-page image was not taken
* of all of them, *fpw_lsn is set to the lowest LSN among such pages. This
* signals that the assembled record is only good for insertion on the
* assumption that the RedoRecPtr and doPageWrites values were up-to-date.
* 如存在已註冊的緩衝區,而且full-page-image沒有全部包括這些資料,
* *fpw_lsn設定為這些頁面中最小的LSN.
* 基於RedoRecPtr和doPageWrites已更新為最新的假設,
* 已組裝的XLOG Record對在此假設上的插入是OK的.
*/
static XLogRecData *
XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
XLogRecPtr *fpw_lsn)
{
XLogRecData *rdt;//XLogRecData指標
uint32 total_len = 0;//XLOG Record大小
int block_id;//塊ID
pg_crc32c rdata_crc;//CRC
registered_buffer *prev_regbuf = NULL;//已註冊的buffer指標
XLogRecData *rdt_datas_last;//
XLogRecord *rechdr;//頭部資訊
char *scratch = hdr_scratch;
/*
* Note: this function can be called multiple times for the same record.
* All the modifications we do to the rdata chains below must handle that.
* 對於同一個XLOG Record,該函式可以被多次呼叫.
* 下面我們對rdata鏈進行的所有更新必須處理這種情況.
*/
/* The record begins with the fixed-size header */
//XLOG Record的頭部大小是固定的.
rechdr = (XLogRecord *) scratch;
scratch += SizeOfXLogRecord;//指標移動
hdr_rdt.next = NULL;//hdr_rdt --> static XLogRecData hdr_rdt;
rdt_datas_last = &hdr_rdt;//
hdr_rdt.data = hdr_scratch;//rmgr資料的起始偏移
/*
* Enforce consistency checks for this record if user is looking for it.
* Do this before at the beginning of this routine to give the possibility
* for callers of XLogInsert() to pass XLR_CHECK_CONSISTENCY directly for
* a record.
* 如正在搜尋此記錄,則強制檢查該記錄的一致性.
* 在該處理過程開始前執行此項處理,以便XLogInsert()的呼叫者
* 可以直接傳遞XLR_CHECK_CONSISTENCY給XLOG Record.
*/
if (wal_consistency_checking[rmid])
info |= XLR_CHECK_CONSISTENCY;
/*
* Make an rdata chain containing all the data portions of all block
* references. This includes the data for full-page images. Also append
* the headers for the block references in the scratch buffer.
* 構造儲存所有塊參考的資料部分的rdata鏈.這包括FPI的資料.
* 同時,在scratch緩衝區中為所有的塊引用追加頭部資訊.
*/
*fpw_lsn = InvalidXLogRecPtr;//初始化變數
for (block_id = 0; block_id < max_registered_block_id; block_id++)//遍歷已註冊的block
{
registered_buffer *regbuf = ®istered_buffers[block_id];//獲取根據block_id獲取緩衝區
bool needs_backup;//是否需要backup block
bool needs_data;//是否需要data
XLogRecordBlockHeader bkpb;//XLogRecordBlockHeader
XLogRecordBlockImageHeader bimg;//XLogRecordBlockImageHeader
XLogRecordBlockCompressHeader cbimg = {0};//壓縮儲存時需要
bool samerel;//是否同一個rel?
bool is_compressed = false;//是否壓縮
bool include_image;//是否包括FPI
if (!regbuf->in_use)//未在使用,繼續下一個
continue;
/* Determine if this block needs to be backed up */
//確定此block是否需要backup
if (regbuf->flags & REGBUF_FORCE_IMAGE)
needs_backup = true;//強制要求FPI
else if (regbuf->flags & REGBUF_NO_IMAGE)
needs_backup = false;//強制要求不要IMAGE
else if (!doPageWrites)
needs_backup = false;//doPageWrites標記設定為F
else//doPageWrites = T
{
/*
* We assume page LSN is first data on *every* page that can be
* passed to XLogInsert, whether it has the standard page layout
* or not.
* 不管該page是否標準page layout,
* 我們假定在每一個page中最前面的資料是page LSN,
*
*/
XLogRecPtr page_lsn = PageGetLSN(regbuf->page);//獲取LSN
needs_backup = (page_lsn <= RedoRecPtr);//是否需要backup
if (!needs_backup)//不需要
{
if (*fpw_lsn == InvalidXLogRecPtr || page_lsn < *fpw_lsn)
*fpw_lsn = page_lsn;//設定LSN
}
}
/* Determine if the buffer data needs to included */
//確定buffer中的data是否需要包括在其中
if (regbuf->rdata_len == 0)//沒有資料
needs_data = false;
else if ((regbuf->flags & REGBUF_KEEP_DATA) != 0)//需要包括data
needs_data = true;
else
needs_data = !needs_backup;//needs_backup取反
//BlockHeader設定值
bkpb.id = block_id;//塊ID
bkpb.fork_flags = regbuf->forkno;//forkno
bkpb.data_length = 0;//資料長度
if ((regbuf->flags & REGBUF_WILL_INIT) == REGBUF_WILL_INIT)
bkpb.fork_flags |= BKPBLOCK_WILL_INIT;//設定標記
/*
* If needs_backup is true or WAL checking is enabled for current
* resource manager, log a full-page write for the current block.
* 如needs_backup為T,或者當前RM的WAL檢查已啟用,
* 為當前block執行full-page-write
*/
//需要backup或者要求執行一致性檢查
include_image = needs_backup || (info & XLR_CHECK_CONSISTENCY) != 0;
if (include_image)
{
//包含塊映象
Page page = regbuf->page;//獲取對應的page
uint16 compressed_len = 0;//壓縮後的大小
/*
* The page needs to be backed up, so calculate its hole length
* and offset.
* page需要備份,計算空閒空間大小和偏移
*/
if (regbuf->flags & REGBUF_STANDARD)
{
//如為標準的REGBUF
/* Assume we can omit data between pd_lower and pd_upper */
//假定我們可以省略pd_lower和pd_upper之間的資料
uint16 lower = ((PageHeader) page)->pd_lower;//獲取lower
uint16 upper = ((PageHeader) page)->pd_upper;//獲取upper
if (lower >= SizeOfPageHeaderData &&
upper > lower &&
upper <= BLCKSZ)
{
//lower大於Page的頭部 && upper大於lower && upper小於塊大小
bimg.hole_offset = lower;
cbimg.hole_length = upper - lower;
}
else
{
/* No "hole" to remove */
//沒有空閒空間可以移除
bimg.hole_offset = 0;
cbimg.hole_length = 0;
}
}
else
{
//不是標準的REGBUF
/* Not a standard page header, don't try to eliminate "hole" */
//不是標準的page header,不要嘗試估算"hole"
bimg.hole_offset = 0;
cbimg.hole_length = 0;
}
/*
* Try to compress a block image if wal_compression is enabled
* 如果wal_compression啟用,則嘗試壓縮
*/
if (wal_compression)
{
is_compressed =
XLogCompressBackupBlock(page, bimg.hole_offset,
cbimg.hole_length,
regbuf->compressed_page,
&compressed_len);//呼叫XLogCompressBackupBlock壓縮
}
/*
* Fill in the remaining fields in the XLogRecordBlockHeader
* struct
* 填充XLogRecordBlockHeader結構體的剩餘域欄位
*/
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
/*
* Construct XLogRecData entries for the page content.
* 為page內容構造XLogRecData入口
*/
rdt_datas_last->next = ®buf->bkp_rdatas[0];
rdt_datas_last = rdt_datas_last->next;
//設定標記
bimg.bimg_info = (cbimg.hole_length == 0) ? 0 : BKPIMAGE_HAS_HOLE;
/*
* If WAL consistency checking is enabled for the resource manager
* of this WAL record, a full-page image is included in the record
* for the block modified. During redo, the full-page is replayed
* only if BKPIMAGE_APPLY is set.
* 如WAL一致性檢查已啟用,被更新的block已在XLOG Record中包含了FPI.
* 在redo期間,在設定了BKPIMAGE_APPLY標記的情況下full-page才會回放.
*/
if (needs_backup)
bimg.bimg_info |= BKPIMAGE_APPLY;//設定標記
if (is_compressed)//是否壓縮?
{
bimg.length = compressed_len;//壓縮後的空間
bimg.bimg_info |= BKPIMAGE_IS_COMPRESSED;//壓縮標記
rdt_datas_last->data = regbuf->compressed_page;//放在registered_buffer中
rdt_datas_last->len = compressed_len;//長度
}
else
{
//沒有壓縮
//image的大小
bimg.length = BLCKSZ - cbimg.hole_length;
if (cbimg.hole_length == 0)
{
rdt_datas_last->data = page;//資料指標直接指向page
rdt_datas_last->len = BLCKSZ;//大小為block size
}
else
{
/* must skip the hole */
//跳過hole
rdt_datas_last->data = page;//資料指標
rdt_datas_last->len = bimg.hole_offset;//獲取hole的偏移
rdt_datas_last->next = ®buf->bkp_rdatas[1];//第2部分
rdt_datas_last = rdt_datas_last->next;//
rdt_datas_last->data =
page + (bimg.hole_offset + cbimg.hole_length);//指標指向第二部分
rdt_datas_last->len =
BLCKSZ - (bimg.hole_offset + cbimg.hole_length);//設定長度
}
}
total_len += bimg.length;//調整總長度
}
if (needs_data)//需要包含資料
{
/*
* Link the caller-supplied rdata chain for this buffer to the
* overall list.
* 把該緩衝區連結到呼叫者提供的rdata鏈中構成一個整體的連結串列
*/
bkpb.fork_flags |= BKPBLOCK_HAS_DATA;//設定標記
bkpb.data_length = regbuf->rdata_len;//長度
total_len += regbuf->rdata_len;//總大小
rdt_datas_last->next = regbuf->rdata_head;//調整指標
rdt_datas_last = regbuf->rdata_tail;
}
//存在上一個regbuf 而且是同一個RefFileNode(關係一樣/表空間一樣/block一樣)
if (prev_regbuf && RelFileNodeEquals(regbuf->rnode, prev_regbuf->rnode))
{
samerel = true;//設定標記
bkpb.fork_flags |= BKPBLOCK_SAME_REL;//同一個REL
}
else
samerel = false;
prev_regbuf = regbuf;//切換為當前的regbuf
/* Ok, copy the header to the scratch buffer */
//已OK,複製頭部資訊到scratch緩衝區中
memcpy(scratch, &bkpb, SizeOfXLogRecordBlockHeader);
scratch += SizeOfXLogRecordBlockHeader;//調整偏移
if (include_image)
{
//包含FPI,追加SizeOfXLogRecordBlockImageHeader
memcpy(scratch, &bimg, SizeOfXLogRecordBlockImageHeader);
scratch += SizeOfXLogRecordBlockImageHeader;//調整偏移
if (cbimg.hole_length != 0 && is_compressed)
{
//壓縮儲存,追加SizeOfXLogRecordBlockCompressHeader
memcpy(scratch, &cbimg,
SizeOfXLogRecordBlockCompressHeader);
scratch += SizeOfXLogRecordBlockCompressHeader;//調整偏移
}
}
if (!samerel)
{
//不是同一個REL,追加RelFileNode
memcpy(scratch, ®buf->rnode, sizeof(RelFileNode));
scratch += sizeof(RelFileNode);//調整偏移
}
//後跟BlockNumber
memcpy(scratch, ®buf->block, sizeof(BlockNumber));
scratch += sizeof(BlockNumber);//調整偏移
}
/* followed by the record's origin, if any */
//接下來,是XLOG Record origin
if ((curinsert_flags & XLOG_INCLUDE_ORIGIN) &&
replorigin_session_origin != InvalidRepOriginId)
{
//
*(scratch++) = (char) XLR_BLOCK_ID_ORIGIN;
memcpy(scratch, &replorigin_session_origin, sizeof(replorigin_session_origin));
scratch += sizeof(replorigin_session_origin);
}
/* followed by main data, if any */
//接下來是main data
if (mainrdata_len > 0)//main data大小 > 0
{
if (mainrdata_len > 255)//超過255,則使用Long格式
{
*(scratch++) = (char) XLR_BLOCK_ID_DATA_LONG;
memcpy(scratch, &mainrdata_len, sizeof(uint32));
scratch += sizeof(uint32);
}
else//否則使用Short格式
{
*(scratch++) = (char) XLR_BLOCK_ID_DATA_SHORT;
*(scratch++) = (uint8) mainrdata_len;
}
rdt_datas_last->next = mainrdata_head;
rdt_datas_last = mainrdata_last;
total_len += mainrdata_len;
}
rdt_datas_last->next = NULL;
hdr_rdt.len = (scratch - hdr_scratch);//頭部大小
total_len += hdr_rdt.len;//總大小
/*
* Calculate CRC of the data
* 計算資料的CRC
*
* Note that the record header isn't added into the CRC initially since we
* don't know the prev-link yet. Thus, the CRC will represent the CRC of
* the whole record in the order: rdata, then backup blocks, then record
* header.
* 由於我們還不知道prev-link的數值,因此頭部不在最初的CRC中.
* 因此,CRC將按照以下順序表示整個記錄的CRC: rdata,然後是backup blocks,然後是record header。
*/
INIT_CRC32C(rdata_crc);
COMP_CRC32C(rdata_crc, hdr_scratch + SizeOfXLogRecord, hdr_rdt.len - SizeOfXLogRecord);
for (rdt = hdr_rdt.next; rdt != NULL; rdt = rdt->next)
COMP_CRC32C(rdata_crc, rdt->data, rdt->len);
/*
* Fill in the fields in the record header. Prev-link is filled in later,
* once we know where in the WAL the record will be inserted. The CRC does
* not include the record header yet.
* 填充記錄頭部資訊的其他域欄位.
* Prev-link將在該記錄插入在哪裡的時候再填充.
* CRC還不包括記錄的頭部資訊.
*/
rechdr->xl_xid = GetCurrentTransactionIdIfAny();
rechdr->xl_tot_len = total_len;
rechdr->xl_info = info;
rechdr->xl_rmid = rmid;
rechdr->xl_prev = InvalidXLogRecPtr;
rechdr->xl_crc = rdata_crc;
return &hdr_rdt;
}
三、跟蹤分析
場景一:清除資料後,執行checkpoint後的第一次插入
測試指令碼如下:
testdb=# truncate table t_wal_partition;
TRUNCATE TABLE
testdb=# checkpoint;
CHECKPOINT
testdb=# insert into t_wal_partition(c1,c2,c3) VALUES(1,'checkpoint','checkpoint');
設定斷點,進入XLogRecordAssemble
(gdb) b XLogRecordAssemble
Breakpoint 1 at 0x565411: file xloginsert.c, line 488.
(gdb) c
Continuing.
Breakpoint 1, XLogRecordAssemble (rmid=10 '\n', info=128 '\200', RedoRecPtr=5507633240, doPageWrites=true,
fpw_lsn=0x7fff05cfe378) at xloginsert.c:488
488 uint32 total_len = 0;
輸入引數:
rmid=10即0x0A --> Heap
RedoRecPtr=5507633240
doPageWrites=true,需要full-page-write
fpw_lsn=0x7fff05cfe378
接下來是變數賦值,
其中hdr_scratch的定義為:static char *hdr_scratch = NULL;
hdr_rdt的定義為:static XLogRecData hdr_rdt;
(gdb) n
491 registered_buffer *prev_regbuf = NULL;
(gdb)
494 char *scratch = hdr_scratch;
(gdb)
502 rechdr = (XLogRecord *) scratch;
(gdb)
503 scratch += SizeOfXLogRecord;
(gdb)
XLOG Record的頭部資訊
(gdb) p *(XLogRecord *)rechdr
$11 = {xl_tot_len = 114, xl_xid = 1997, xl_prev = 5507669824, xl_info = 128 '\200', xl_rmid = 1 '\001', xl_crc = 3794462175}
scratch指標指向Header之後的地址
(gdb) p hdr_scratch
$12 = 0x18a24c0 "r"
為全域性變數hdr_rdt賦值
505 hdr_rdt.next = NULL;
(gdb)
506 rdt_datas_last = &hdr_rdt;
(gdb)
507 hdr_rdt.data = hdr_scratch;
(gdb) p hdr_rdt
$5 = {next = 0x0, data = 0x18a24c0 "r", len = 26}
(gdb) p *(XLogRecord *)hdr_rdt.data
$7 = {xl_tot_len = 114, xl_xid = 1997, xl_prev = 5507669824, xl_info = 128 '\200', xl_rmid = 1 '\001', xl_crc = 3794462175}
不執行一致性檢查
(gdb) n
515 if (wal_consistency_checking[rmid])
(gdb)
523 *fpw_lsn = InvalidXLogRecPtr;
(gdb)
初始化fpw_lsn,開始迴圈.
已註冊的block id只有1個.
(gdb) n
524 for (block_id = 0; block_id < max_registered_block_id; block_id++)
(gdb) p max_registered_block_id
$13 = 1
獲取已註冊的buffer.
其中:
rnode->RelFilenode結構體,spcNode->表空間/dbNode->資料庫/relNode->關係
block->塊ID
page->資料頁指標(char *)
rdata_len->rdata鏈中的資料總大小
rdata_head->使用該資料塊註冊的資料鏈頭
rdata_tail->使用該資料塊註冊的資料鏈尾
bkp_rdatas->臨時rdatas資料引用,用於儲存XLogRecordAssemble()中使用的備份塊資料.
bkp_rdatas用於組裝block image,bkp_rdatas[0]儲存空閒空間(hole)前的資料,bkp_rdatas[1]儲存空閒空間後的資料.
(gdb) n
526 registered_buffer *regbuf = ®istered_buffers[block_id];
(gdb)
531 XLogRecordBlockCompressHeader cbimg = {0};
(gdb) p *regbuf
$14 = {in_use = true, flags = 14 '\016', rnode = {spcNode = 1663, dbNode = 16402, relNode = 25258}, forkno = MAIN_FORKNUM,
block = 0, page = 0x7fb8539e7380 "", rdata_len = 32, rdata_head = 0x18a22c0, rdata_tail = 0x18a22d8, bkp_rdatas = {{
next = 0x18a4230, data = 0x7fb85390f380 "\001", len = 252}, {next = 0x18a22a8, data = 0x7fb85390fe28 "\315\a",
len = 5464}}, compressed_page = '\000' <repeats 8195 times>}
注意:
在記憶體中,main data已由函式XLogRegisterData註冊,由mainrdata_head和mainrdata_last指標維護,本例中,填充了xl_heap_insert結構體.
block data由XLogRegisterBuffer初始化,由XLogRegisterBufData填充資料,在本例中,透過XLogRegisterBufData註冊了兩次資料,第一次是xl_heap_header結構體,第二次是實際的資料(實質上只是資料指標,最終需要什麼資料,由組裝器確定).
(gdb) p *mainrdata_head
$18 = {next = 0x18a22c0, data = 0x7fff05cfe3f0 "\001", len = 3}
(gdb) p *(xl_heap_insert *)mainrdata_head->data
$20 = {offnum = 1, flags = 0 '\000'}
(gdb) p *regbuf->rdata_head
$32 = {next = 0x18a22d8, data = 0x7fff05cfe3e0 "\003", len = 5}
(gdb) p *(xl_heap_header *)regbuf->rdata_head->data
$28 = {t_infomask2 = 3, t_infomask = 2050, t_hoff = 24 '\030'}
(gdb) p *regbuf->rdata_head->next
$34 = {next = 0x18a22f0, data = 0x18edaef "", len = 27}
以字元格式顯示地址0x18edaef之後的27個位元組(tuple data)
(gdb) x/27bc 0x18edaef
0x18edaef: 0 '\000' 1 '\001' 0 '\000' 0 '\000' 0 '\000' 23 '\027' 99 'c' 104 'h'
0x18edaf7: 101 'e' 99 'c' 107 'k' 112 'p' 111 'o' 105 'i' 110 'n' 116 't'
0x18edaff: 23 '\027' 99 'c' 104 'h' 101 'e' 99 'c' 107 'k' 112 'p' 111 'o'
0x18edb07: 105 'i' 110 'n' 116 't'
繼續往下執行,由於該記錄是第一條記錄,因此無需執行full-page-image
(gdb) n
533 bool is_compressed = false;
(gdb)
536 if (!regbuf->in_use)
(gdb)
540 if (regbuf->flags & REGBUF_FORCE_IMAGE)
(gdb) p regbuf->flags
$36 = 14 '\016'
(gdb) n
542 else if (regbuf->flags & REGBUF_NO_IMAGE)
(gdb)
543 needs_backup = false;
needs_data為T,事務日誌中僅寫入tuple data
(gdb) n
564 if (regbuf->rdata_len == 0)
(gdb) p regbuf->rdata_len
$37 = 32
(gdb) n
566 else if ((regbuf->flags & REGBUF_KEEP_DATA) != 0)
(gdb)
569 needs_data = !needs_backup;
(gdb)
571 bkpb.id = block_id;
(gdb) p needs_data
$38 = true
設定XLogRecordBlockHeader欄位值,page中第一個tuple,標記設定為BKPBLOCK_WILL_INIT
(gdb) n
572 bkpb.fork_flags = regbuf->forkno;
(gdb)
573 bkpb.data_length = 0;
(gdb)
575 if ((regbuf->flags & REGBUF_WILL_INIT) == REGBUF_WILL_INIT)
(gdb) n
576 bkpb.fork_flags |= BKPBLOCK_WILL_INIT;
(gdb)
582 include_image = needs_backup || (info & XLR_CHECK_CONSISTENCY) != 0;
(gdb) p bkpb
$40 = {id = 0 '\000', fork_flags = 64 '@', data_length = 0}
(gdb)
不需要執行FPI
(gdb) p info
$41 = 128 '\200'
(gdb) n
584 if (include_image)
(gdb) p include_image
$42 = false
(gdb)
需要包含資料
(gdb) n
691 if (needs_data)
(gdb)
697 bkpb.fork_flags |= BKPBLOCK_HAS_DATA;
(gdb)
698 bkpb.data_length = regbuf->rdata_len;
(gdb)
699 total_len += regbuf->rdata_len;
(gdb)
701 rdt_datas_last->next = regbuf->rdata_head;
(gdb)
702 rdt_datas_last = regbuf->rdata_tail;
(gdb) p bkpb
$43 = {id = 0 '\000', fork_flags = 96 '`', data_length = 32}
(gdb) p total_len
$44 = 32
(gdb) p *rdt_datas_last
$45 = {next = 0x18a22c0, data = 0x18a24c0 "r", len = 26}
已OK,複製頭部資訊到scratch緩衝區中
(gdb) n
705 if (prev_regbuf && RelFileNodeEquals(regbuf->rnode, prev_regbuf->rnode))
(gdb) p prev_regbuf
$46 = (registered_buffer *) 0x0
(gdb) n
711 samerel = false;
(gdb)
712 prev_regbuf = regbuf;
(gdb)
715 memcpy(scratch, &bkpb, SizeOfXLogRecordBlockHeader);
後面是RefFileNode + BlockNumber
(gdb)
716 scratch += SizeOfXLogRecordBlockHeader;
(gdb)
717 if (include_image)
(gdb)
728 if (!samerel)
(gdb)
730 memcpy(scratch, ®buf->rnode, sizeof(RelFileNode));
(gdb)
731 scratch += sizeof(RelFileNode);
(gdb)
733 memcpy(scratch, ®buf->block, sizeof(BlockNumber));
(gdb)
734 scratch += sizeof(BlockNumber);
(gdb)
524 for (block_id = 0; block_id < max_registered_block_id; block_id++)
結束迴圈
524 for (block_id = 0; block_id < max_registered_block_id; block_id++)
(gdb)
接下來是replorigin_session_origin(實際並不需要)
738 if ((curinsert_flags & XLOG_INCLUDE_ORIGIN) &&
(gdb) p curinsert_flags
$47 = 1 '\001'
(gdb)
$48 = 1 '\001'
(gdb) n
739 replorigin_session_origin != InvalidRepOriginId)
(gdb)
738 if ((curinsert_flags & XLOG_INCLUDE_ORIGIN) &&
接下來是main data
(gdb)
747 if (mainrdata_len > 0)
(gdb)
749 if (mainrdata_len > 255)
(gdb)
757 *(scratch++) = (char) XLR_BLOCK_ID_DATA_SHORT;
(gdb)
(gdb)
758 *(scratch++) = (uint8) mainrdata_len;
(gdb)
760 rdt_datas_last->next = mainrdata_head;
(gdb)
761 rdt_datas_last = mainrdata_last;
(gdb)
762 total_len += mainrdata_len;
(gdb)
計算大小
764 rdt_datas_last->next = NULL;
(gdb)
766 hdr_rdt.len = (scratch - hdr_scratch);
(gdb) p scratch
$49 = 0x18a24ee ""
(gdb) p hdr_scratch
$50 = 0x18a24c0 "r"
(gdb) p hdr_rdt.len
$51 = 26
(gdb) p total_len
$52 = 35
(gdb)
(gdb) n
767 total_len += hdr_rdt.len;
(gdb)
計算CRC
(gdb)
777 INIT_CRC32C(rdata_crc);
(gdb)
778 COMP_CRC32C(rdata_crc, hdr_scratch + SizeOfXLogRecord, hdr_rdt.len - SizeOfXLogRecord);
(gdb)
779 for (rdt = hdr_rdt.next; rdt != NULL; rdt = rdt->next)
(gdb) n
780 COMP_CRC32C(rdata_crc, rdt->data, rdt->len);
(gdb)
779 for (rdt = hdr_rdt.next; rdt != NULL; rdt = rdt->next)
(gdb)
780 COMP_CRC32C(rdata_crc, rdt->data, rdt->len);
(gdb)
779 for (rdt = hdr_rdt.next; rdt != NULL; rdt = rdt->next)
(gdb)
780 COMP_CRC32C(rdata_crc, rdt->data, rdt->len);
(gdb)
779 for (rdt = hdr_rdt.next; rdt != NULL; rdt = rdt->next)
(gdb)
787 rechdr->xl_xid = GetCurrentTransactionIdIfAny();
填充記錄頭部資訊的其他域欄位.
(gdb) n
788 rechdr->xl_tot_len = total_len;
(gdb)
789 rechdr->xl_info = info;
(gdb)
790 rechdr->xl_rmid = rmid;
(gdb)
791 rechdr->xl_prev = InvalidXLogRecPtr;
(gdb)
792 rechdr->xl_crc = rdata_crc;
(gdb)
794 return &hdr_rdt;
(gdb)
795 }
(gdb) p rechdr
$62 = (XLogRecord *) 0x18a24c0
(gdb) p *rechdr
$63 = {xl_tot_len = 81, xl_xid = 1998, xl_prev = 0, xl_info = 128 '\200', xl_rmid = 10 '\n', xl_crc = 1852971194}
(gdb)
full-page-write場景後續再行分析
四、參考資料
Write Ahead Logging — WAL
PostgreSQL 原始碼解讀(4)- 插入資料#3(heap_insert)
PostgreSQL 事務日誌WAL結構淺析
PG Source Code
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/6906/viewspace-2374772/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- PostgreSQL 原始碼解讀(111)- WAL#7(Insert&WAL - XLogRe...SQL原始碼
- PostgreSQL 原始碼解讀(113)- WAL#9(Insert&WAL - CopyXL...SQL原始碼
- PostgreSQL 原始碼解讀(194)- 查詢#110(排序#3 - 實現)SQL原始碼排序
- PostgreSQL 原始碼解讀(3)- 如何閱讀原始碼SQL原始碼
- PostgreSQL 原始碼解讀(219)- Locks(Overview)SQL原始碼View
- PostgreSQL 原始碼解讀(241)- plpgsql(CreateFunction)SQL原始碼Function
- PostgreSQL 原始碼解讀(240)- HTAB簡介SQL原始碼
- PostgreSQL 原始碼解讀(220)- Locks(LOCK Struct)SQL原始碼Struct
- PostgreSQL 原始碼解讀(221)- Locks(PROCLOCK Struct)SQL原始碼Struct
- PostgreSQL 原始碼解讀(1)- 插入資料#1SQL原始碼
- PostgreSQL 原始碼解讀(223)- Locks(Fast Path Locking)SQL原始碼AST
- PostgreSQL 原始碼解讀(224)- Locks(The Deadlock Detection Algorithm)SQL原始碼Go
- PostgreSQL 原始碼解讀(218)- spinlock的實現SQL原始碼
- PostgreSQL 原始碼解讀(244)- plpgsql(CreateFunction-ProcedureCreate)SQL原始碼Function
- PostgreSQL 原始碼解讀(201)- PG 12 BlackholeAM for tablesSQL原始碼
- PostgreSQL 原始碼解讀(152)- PG Tools#4(ReceiveXlogStream)SQL原始碼
- PostgreSQL 原始碼解讀(151)- PG Tools#3(StartLogStreamer)SQL原始碼
- PostgreSQL 原始碼解讀(2)- 插入資料#2(RelationPutHeapTuple)SQL原始碼APT
- PostgreSQL 原始碼解讀(5)- 插入資料#4(ExecInsert)SQL原始碼
- PostgreSQL 原始碼解讀(6)- 插入資料#5(ExecModifyTable)SQL原始碼
- PostgreSQL 原始碼解讀(8)- 插入資料#7(ExecutePlan)SQL原始碼
- PostgreSQL 原始碼解讀(10)- 插入資料#9(ProcessQuery)SQL原始碼
- PostgreSQL 原始碼解讀(13)- 插入資料#12(PostgresMain)SQL原始碼AI
- PostgreSQL 原始碼解讀(217)- A Faster, Lightweight Trigger Function in CSQL原始碼ASTFunction
- PostgreSQL 原始碼解讀(246)- plpgsql(CreateFunction-SearchSysCache3)SQL原始碼Function
- PostgreSQL 原始碼解讀(230)- 查詢#123(NOT IN實現)SQL原始碼
- PostgreSQL 原始碼解讀(222)- Locks(Lock Manager Internal Locking)SQL原始碼
- PostgreSQL 原始碼解讀(245)- plpgsql(CreateFunction-construct_array)SQL原始碼FunctionStruct
- PostgreSQL 原始碼解讀(196)- 浮點數比較SQL原始碼
- PostgreSQL 原始碼解讀(145)- Storage Manager#1(RecordAndGetPageWithFreeSpace)SQL原始碼
- PostgreSQL 原始碼解讀(164)- 查詢#84(表示式求值)SQL原始碼
- PostgreSQL 原始碼解讀(126)- MVCC#10(vacuum過程)SQL原始碼MVCC#
- PostgreSQL 原始碼解讀(225)- Transaction(子事務處理)SQL原始碼
- PostgreSQL 原始碼解讀(215)- 查詢#122(varstrfastcmp_locale)SQL原始碼AST
- PostgreSQL 原始碼解讀(231)- 查詢#124(NOT IN實現#2)SQL原始碼
- PostgreSQL 原始碼解讀(233)- 查詢#126(NOT IN實現#4)SQL原始碼
- PostgreSQL 原始碼解讀(234)- 查詢#127(NOT IN實現#5)SQL原始碼
- PostgreSQL 原始碼解讀(232)- 查詢#125(NOT IN實現#3)SQL原始碼