PostgreSQL 原始碼解讀(109)- WAL#5(相關資料結構)
本節簡單介紹了WAL相關的資料結構,包括XLogLongPageHeaderData、XLogPageHeaderData和XLogRecord。
一、資料結構
XLogPageHeaderData
每一個事務日誌檔案(WAL segment file)的page(大小預設為8K)都有頭部資料.
注:每個檔案第一個page的頭部資料是XLogLongPageHeaderData(詳見後續描述),而不是XLogPageHeaderData
/*
* Each page of XLOG file has a header like this:
* 每一個事務日誌檔案的page都有頭部資訊,結構如下:
*/
//可作為WAL版本資訊
#define XLOG_PAGE_MAGIC 0xD098 /* can be used as WAL version indicator */
typedef struct XLogPageHeaderData
{
//WAL版本資訊,PG V11.1 --> 0xD98
uint16 xlp_magic; /* magic value for correctness checks */
//標記位(詳見下面說明)
uint16 xlp_info; /* flag bits, see below */
//page中第一個XLOG Record的TimeLineID,型別為uint32
TimeLineID xlp_tli; /* TimeLineID of first record on page */
//page的XLOG地址(在事務日誌中的偏移),型別為uint64
XLogRecPtr xlp_pageaddr; /* XLOG address of this page */
/*
* When there is not enough space on current page for whole record, we
* continue on the next page. xlp_rem_len is the number of bytes
* remaining from a previous page.
* 如果當前頁的空間不足以儲存整個XLOG Record,在下一個頁面中儲存餘下的資料
* xlp_rem_len表示上一頁XLOG Record剩餘部分的大小
*
* Note that xl_rem_len includes backup-block data; that is, it tracks
* xl_tot_len not xl_len in the initial header. Also note that the
* continuation data isn't necessarily aligned.
* 注意xl_rem_len包含backup-block data(full-page-write);
* 也就是說在初始的頭部資訊中跟蹤的是xl_tot_len而不是xl_len.
* 另外要注意的是剩餘的資料不需要對齊.
*/
//上一頁空間不夠儲存XLOG Record,該Record在本頁繼續儲存佔用的空間大小
uint32 xlp_rem_len; /* total len of remaining data for record */
} XLogPageHeaderData;
#define SizeOfXLogShortPHD MAXALIGN(sizeof(XLogPageHeaderData))
typedef XLogPageHeaderData *XLogPageHeader;
XLogLongPageHeaderData
如設定了XLP_LONG_HEADER標記,在page header中儲存額外的欄位.
(通常在每個事務日誌檔案也就是segment file的的第一個page中存在).
這些附加的欄位用於準確的識別檔案。
/*
* When the XLP_LONG_HEADER flag is set, we store additional fields in the
* page header. (This is ordinarily done just in the first page of an
* XLOG file.) The additional fields serve to identify the file accurately.
* 如設定了XLP_LONG_HEADER標記,在page header中儲存額外的欄位.
* (通常在每個事務日誌檔案也就是segment file的的第一個page中存在).
* 附加欄位用於準確識別檔案。
*/
typedef struct XLogLongPageHeaderData
{
//標準的頭部域欄位
XLogPageHeaderData std; /* standard header fields */
//pg_control中的系統標識碼
uint64 xlp_sysid; /* system identifier from pg_control */
//交叉檢查
uint32 xlp_seg_size; /* just as a cross-check */
//交叉檢查
uint32 xlp_xlog_blcksz; /* just as a cross-check */
} XLogLongPageHeaderData;
#define SizeOfXLogLongPHD MAXALIGN(sizeof(XLogLongPageHeaderData))
//指標
typedef XLogLongPageHeaderData *XLogLongPageHeader;
/* When record crosses page boundary, set this flag in new page's header */
//如果XLOG Record跨越page邊界,在新page header中設定該標誌位
#define XLP_FIRST_IS_CONTRECORD 0x0001
//該標誌位標明是"long"頁頭
/* This flag indicates a "long" page header */
#define XLP_LONG_HEADER 0x0002
/* This flag indicates backup blocks starting in this page are optional */
//該標誌位標明從該頁起始的backup blocks是可選的(不一定存在)
#define XLP_BKP_REMOVABLE 0x0004
//xlp_info中所有定義的標誌位(用於page header的有效性檢查)
/* All defined flag bits in xlp_info (used for validity checking of header) */
#define XLP_ALL_FLAGS 0x0007
#define XLogPageHeaderSize(hdr) \
(((hdr)->xlp_info & XLP_LONG_HEADER) ? SizeOfXLogLongPHD : SizeOfXLogShortPHD)
XLogRecord
事務日誌檔案由N個的XLog Record組成,邏輯上對應XLOG Record這一概念的資料結構是XLogRecord.
XLOG Record的整體佈局如下:
頭部資料(固定大小的XLogRecord結構體)
XLogRecordBlockHeader 結構體
XLogRecordBlockHeader 結構體
...
XLogRecordDataHeader[Short|Long] 結構體
block data
block data
...
main data
XLOG Record按儲存的資料內容來劃分,大體可以分為三類:
1.Record for backup block:儲存full-write-page的block,這種型別Record的目的是為了解決page部分寫的問題;
2.Record for (tuple)data block:在full-write-page後,相應的page中的tuple變更,使用這種型別的Record記錄;
3.Record for Checkpoint:在checkpoint發生時,在事務日誌檔案中記錄checkpoint資訊(其中包括Redo point).
XLOG Record的詳細解析後續會解析,這裡暫且不提
/*
* The overall layout of an XLOG record is:
* Fixed-size header (XLogRecord struct)
* XLogRecordBlockHeader struct
* XLogRecordBlockHeader struct
* ...
* XLogRecordDataHeader[Short|Long] struct
* block data
* block data
* ...
* main data
* XLOG record的整體佈局如下:
* 固定大小的頭部(XLogRecord 結構體)
* XLogRecordBlockHeader 結構體
* XLogRecordBlockHeader 結構體
* ...
* XLogRecordDataHeader[Short|Long] 結構體
* block data
* block data
* ...
* main data
*
* There can be zero or more XLogRecordBlockHeaders, and 0 or more bytes of
* rmgr-specific data not associated with a block. XLogRecord structs
* always start on MAXALIGN boundaries in the WAL files, but the rest of
* the fields are not aligned.
* 其中,XLogRecordBlockHeaders可能有0或者多個,與block無關的0或多個位元組的rmgr-specific資料
* XLogRecord通常在WAL檔案的MAXALIGN邊界起寫入,但後續的欄位並沒有對齊
*
* The XLogRecordBlockHeader, XLogRecordDataHeaderShort and
* XLogRecordDataHeaderLong structs all begin with a single 'id' byte. It's
* used to distinguish between block references, and the main data structs.
* XLogRecordBlockHeader/XLogRecordDataHeaderShort/XLogRecordDataHeaderLong開頭是佔用1個位元組的"id".
* 用於區分block引用和main data結構體.
*/
typedef struct XLogRecord
{
//record的大小
uint32 xl_tot_len; /* total len of entire record */
//xact id
TransactionId xl_xid; /* xact id */
//指向log中的前一條記錄
XLogRecPtr xl_prev; /* ptr to previous record in log */
//標識位,詳見下面的說明
uint8 xl_info; /* flag bits, see below */
//該記錄的資源管理器
RmgrId xl_rmid; /* resource manager for this record */
/* 2 bytes of padding here, initialize to zero */
//2個位元組的crc校驗位,初始化為0
pg_crc32c xl_crc; /* CRC for this record */
/* XLogRecordBlockHeaders and XLogRecordDataHeader follow, no padding */
//接下來是XLogRecordBlockHeaders和XLogRecordDataHeader
} XLogRecord;
//宏定義:XLogRecord大小
#define SizeOfXLogRecord (offsetof(XLogRecord, xl_crc) + sizeof(pg_crc32c))
/*
* The high 4 bits in xl_info may be used freely by rmgr. The
* XLR_SPECIAL_REL_UPDATE and XLR_CHECK_CONSISTENCY bits can be passed by
* XLogInsert caller. The rest are set internally by XLogInsert.
* xl_info的高4位由rmgr自由使用.
* XLR_SPECIAL_REL_UPDATE和XLR_CHECK_CONSISTENCY由XLogInsert函式的呼叫者傳入.
* 其餘由XLogInsert內部使用.
*/
#define XLR_INFO_MASK 0x0F
#define XLR_RMGR_INFO_MASK 0xF0
/*
* If a WAL record modifies any relation files, in ways not covered by the
* usual block references, this flag is set. This is not used for anything
* by PostgreSQL itself, but it allows external tools that read WAL and keep
* track of modified blocks to recognize such special record types.
* 如果WAL記錄使用特殊的方式(不涉及通常塊引用)更新了關係的儲存檔案,設定此標記.
* PostgreSQL本身並不使用這種方法,但它允許外部工具讀取WAL並跟蹤修改後的塊,
* 以識別這種特殊的記錄型別。
*/
#define XLR_SPECIAL_REL_UPDATE 0x01
/*
* Enforces consistency checks of replayed WAL at recovery. If enabled,
* each record will log a full-page write for each block modified by the
* record and will reuse it afterwards for consistency checks. The caller
* of XLogInsert can use this value if necessary, but if
* wal_consistency_checking is enabled for a rmgr this is set unconditionally.
* 在恢復時強制執行一致性檢查.
* 如啟用此功能,每個記錄將為記錄修改的每個塊記錄一個完整的頁面寫操作,並在以後重用它進行一致性檢查。
* 在需要時,XLogInsert的呼叫者可使用此標記,但如果rmgr啟用了wal_consistency_checking,
* 則會無條件執行一致性檢查.
*/
#define XLR_CHECK_CONSISTENCY 0x02
/*
* Header info for block data appended to an XLOG record.
* 追加到XLOG record中block data的頭部資訊
*
* 'data_length' is the length of the rmgr-specific payload data associated
* with this block. It does not include the possible full page image, nor
* XLogRecordBlockHeader struct itself.
* 'data_length'是與此塊關聯的rmgr特定payload data的長度。
* 它不包括可能的full page image,也不包括XLogRecordBlockHeader結構體本身。
*
* Note that we don't attempt to align the XLogRecordBlockHeader struct!
* So, the struct must be copied to aligned local storage before use.
* 注意:我們不打算嘗試對齊XLogRecordBlockHeader結構體!
* 因此,在使用前,XLogRecordBlockHeader必須複製到一隊齊的本地儲存中.
*/
typedef struct XLogRecordBlockHeader
{
//塊引用ID
uint8 id; /* block reference ID */
//在關係中使用的fork和flags
uint8 fork_flags; /* fork within the relation, and flags */
//payload位元組大小
uint16 data_length; /* number of payload bytes (not including page
* image) */
/* If BKPBLOCK_HAS_IMAGE, an XLogRecordBlockImageHeader struct follows */
//如BKPBLOCK_HAS_IMAGE,後續為XLogRecordBlockImageHeader結構體
/* If BKPBLOCK_SAME_REL is not set, a RelFileNode follows */
//如BKPBLOCK_SAME_REL沒有設定,則為RelFileNode
/* BlockNumber follows */
//後續為BlockNumber
} XLogRecordBlockHeader;
#define SizeOfXLogRecordBlockHeader (offsetof(XLogRecordBlockHeader, data_length) + sizeof(uint16))
/*
* Additional header information when a full-page image is included
* (i.e. when BKPBLOCK_HAS_IMAGE is set).
* 當包含完整頁影像時(即當設定BKPBLOCK_HAS_IMAGE時),附加的頭部資訊。
*
* The XLOG code is aware that PG data pages usually contain an unused "hole"
* in the middle, which contains only zero bytes. Since we know that the
* "hole" is all zeros, we remove it from the stored data (and it's not counted
* in the XLOG record's CRC, either). Hence, the amount of block data actually
* present is (BLCKSZ - <length of "hole" bytes>).
* XLOG程式碼知道PG資料頁通常在中間包含一個未使用的“hole”(空閒空間),
* 大小為零位元組。
* 因為我們知道“hole”都是零,
* 以我們從儲存的資料中刪除它(而且它也沒有被計入XLOG記錄的CRC中)。
* 因此,實際呈現的塊資料量為(BLCKSZ - <“hole”的大小>)。
*
* Additionally, when wal_compression is enabled, we will try to compress full
* page images using the PGLZ compression algorithm, after removing the "hole".
* This can reduce the WAL volume, but at some extra cost of CPU spent
* on the compression during WAL logging. In this case, since the "hole"
* length cannot be calculated by subtracting the number of page image bytes
* from BLCKSZ, basically it needs to be stored as an extra information.
* But when no "hole" exists, we can assume that the "hole" length is zero
* and no such an extra information needs to be stored. Note that
* the original version of page image is stored in WAL instead of the
* compressed one if the number of bytes saved by compression is less than
* the length of extra information. Hence, when a page image is successfully
* compressed, the amount of block data actually present is less than
* BLCKSZ - the length of "hole" bytes - the length of extra information.
* 另外,在啟用wal_compression時,會在去掉“hole”後,嘗試使用PGLZ壓縮演算法壓縮full page image。
* 這可以簡化WAL大小,但會增加額外的解壓縮CPU時間.
* 在這種情況下,由於“hole”的長度不能透過從BLCKSZ中減去page image位元組數來計算,
* 所以它基本上需要作為額外的資訊來儲存。
* 但如果"hole"不存在,我們可以假設"hole"的大小為0,不需要儲存額外的資訊.
* 請注意,如果壓縮節省的位元組數小於額外資訊的長度,
* 那麼page image的原始版本儲存在WAL中,而不是壓縮後的版本。
* 因此,當一個page image被成功壓縮時,
* 實際的塊資料量小於BLCKSZ - “hole”的大小 - 額外資訊的大小。
*/
typedef struct XLogRecordBlockImageHeader
{
uint16 length; /* number of page image bytes */
uint16 hole_offset; /* number of bytes before "hole" */
uint8 bimg_info; /* flag bits, see below */
/*
* If BKPIMAGE_HAS_HOLE and BKPIMAGE_IS_COMPRESSED, an
* XLogRecordBlockCompressHeader struct follows.
* 如標記BKPIMAGE_HAS_HOLE和BKPIMAGE_IS_COMPRESSED設定,則後跟XLogRecordBlockCompressHeader
*/
} XLogRecordBlockImageHeader;
#define SizeOfXLogRecordBlockImageHeader \
(offsetof(XLogRecordBlockImageHeader, bimg_info) + sizeof(uint8))
/* Information stored in bimg_info */
//------------ bimg_info標記位
//存在"hole"
#define BKPIMAGE_HAS_HOLE 0x01 /* page image has "hole" */
//壓縮儲存
#define BKPIMAGE_IS_COMPRESSED 0x02 /* page image is compressed */
//在回放時,page image需要恢復
#define BKPIMAGE_APPLY 0x04 /* page image should be restored during
* replay */
/*
* Extra header information used when page image has "hole" and
* is compressed.
* page image存在"hole"和壓縮儲存時,額外的頭部資訊
*/
typedef struct XLogRecordBlockCompressHeader
{
//"hole"的大小
uint16 hole_length; /* number of bytes in "hole" */
} XLogRecordBlockCompressHeader;
#define SizeOfXLogRecordBlockCompressHeader \
sizeof(XLogRecordBlockCompressHeader)
/*
* Maximum size of the header for a block reference. This is used to size a
* temporary buffer for constructing the header.
* 塊引用的header的最大大小。
* 它用於設定用於構造頭部臨時緩衝區的大小。
*/
#define MaxSizeOfXLogRecordBlockHeader \
(SizeOfXLogRecordBlockHeader + \
SizeOfXLogRecordBlockImageHeader + \
SizeOfXLogRecordBlockCompressHeader + \
sizeof(RelFileNode) + \
sizeof(BlockNumber))
/*
* The fork number fits in the lower 4 bits in the fork_flags field. The upper
* bits are used for flags.
* fork號適合於fork_flags欄位的低4位。
* 高4位用於標記。
*/
#define BKPBLOCK_FORK_MASK 0x0F
#define BKPBLOCK_FLAG_MASK 0xF0
//塊資料是XLogRecordBlockImage
#define BKPBLOCK_HAS_IMAGE 0x10 /* block data is an XLogRecordBlockImage */
#define BKPBLOCK_HAS_DATA 0x20
//重做時重新初始化page
#define BKPBLOCK_WILL_INIT 0x40 /* redo will re-init the page */
//重做時重新初始化page,但會省略RelFileNode
#define BKPBLOCK_SAME_REL 0x80 /* RelFileNode omitted, same as previous */
/*
* XLogRecordDataHeaderShort/Long are used for the "main data" portion of
* the record. If the length of the data is less than 256 bytes, the short
* form is used, with a single byte to hold the length. Otherwise the long
* form is used.
* XLogRecordDataHeaderShort/Long用於記錄的“main data”部分。
* 如果資料的長度小於256位元組,則使用短格式,用一個位元組儲存長度。
* 否則使用長形式。
*
* (These structs are currently not used in the code, they are here just for
* documentation purposes).
* (這些結構體不會再程式碼中使用,在這裡是為了文件記錄的目的)
*/
typedef struct XLogRecordDataHeaderShort
{
uint8 id; /* XLR_BLOCK_ID_DATA_SHORT */
uint8 data_length; /* number of payload bytes */
} XLogRecordDataHeaderShort;
#define SizeOfXLogRecordDataHeaderShort (sizeof(uint8) * 2)
typedef struct XLogRecordDataHeaderLong
{
uint8 id; /* XLR_BLOCK_ID_DATA_LONG */
/* followed by uint32 data_length, unaligned */
//接下來是無符號32位整型的data_length(未對齊)
} XLogRecordDataHeaderLong;
#define SizeOfXLogRecordDataHeaderLong (sizeof(uint8) + sizeof(uint32))
/*
* Block IDs used to distinguish different kinds of record fragments. Block
* references are numbered from 0 to XLR_MAX_BLOCK_ID. A rmgr is free to use
* any ID number in that range (although you should stick to small numbers,
* because the WAL machinery is optimized for that case). A couple of ID
* numbers are reserved to denote the "main" data portion of the record.
* 塊id用於區分不同型別的記錄片段。
* 塊引用編號從0到XLR_MAX_BLOCK_ID。
* rmgr可以自由使用該範圍內的任何ID號
* (儘管您應該堅持使用較小的數字,因為WAL機制針對這種情況進行了最佳化)。
* 保留兩個ID號來表示記錄的“main”資料部分。
*
* The maximum is currently set at 32, quite arbitrarily. Most records only
* need a handful of block references, but there are a few exceptions that
* need more.
* 目前的最大值是32,非常隨意。
* 大多數記錄只需要少數塊引用,但也有少數例外需要更多。
*/
#define XLR_MAX_BLOCK_ID 32
#define XLR_BLOCK_ID_DATA_SHORT 255
#define XLR_BLOCK_ID_DATA_LONG 254
#define XLR_BLOCK_ID_ORIGIN 253
#endif /* XLOGRECORD_H */
這些資料結構在WAL segment file檔案中如何佈局,請參見後續的章節
二、參考資料
Write Ahead Logging — WAL
PostgreSQL 原始碼解讀(4)- 插入資料#3(heap_insert)
PG Source Code
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/6906/viewspace-2374778/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- PostgreSQL 原始碼解讀(178)- 查詢#95(聚合函式)#1相關資料結構SQL原始碼函式資料結構
- PostgreSQL 原始碼解讀(44)- 查詢語句#29(等價類相關資料結構)SQL原始碼資料結構
- PostgreSQL 原始碼解讀(205)- 查詢#118(資料結構RangeTblEntry)SQL原始碼資料結構
- PostgreSQL 原始碼解讀(112)- WAL#8(XLogCtrl資料結構)SQL原始碼GC資料結構
- PostgreSQL 原始碼解讀(193)- 查詢#109(排序#2 - ExecSort)SQL原始碼排序
- PostgreSQL 原始碼解讀(108)- 後臺程式#1(PGPROC資料結構)SQL原始碼資料結構
- PostgreSQL 原始碼解讀(206)- 查詢#119(資料結構RangSubselect等)SQL原始碼資料結構
- PostgreSQL 原始碼解讀(204)- 查詢#117(資料結構SelectStmt&Value)SQL原始碼資料結構
- PostgreSQL 原始碼解讀(207)- 查詢#120(資料結構FromExpr&JoinExpr)SQL原始碼資料結構
- PostgreSQL 原始碼解讀(1)- 插入資料#1SQL原始碼
- PostgreSQL 原始碼解讀(2)- 插入資料#2(RelationPutHeapTuple)SQL原始碼APT
- PostgreSQL 原始碼解讀(5)- 插入資料#4(ExecInsert)SQL原始碼
- PostgreSQL 原始碼解讀(6)- 插入資料#5(ExecModifyTable)SQL原始碼
- PostgreSQL 原始碼解讀(8)- 插入資料#7(ExecutePlan)SQL原始碼
- PostgreSQL 原始碼解讀(10)- 插入資料#9(ProcessQuery)SQL原始碼
- PostgreSQL 原始碼解讀(13)- 插入資料#12(PostgresMain)SQL原始碼AI
- PostgreSQL 原始碼解讀(3)- 如何閱讀原始碼SQL原始碼
- PostgreSQL 原始碼解讀(4)- 插入資料#3(heap_insert)SQL原始碼
- 資料結構相關知識資料結構
- PostgreSQL:原始碼目錄結構SQL原始碼
- PostgreSQL 原始碼解讀(12)- 插入資料#11(exec_simple_query)SQL原始碼
- PostgreSQL 原始碼解讀(7)- 插入資料#6(ExecProcNode和ExecPro...SQL原始碼
- PostgreSQL 原始碼解讀(9)- 插入資料#8(ExecutorRun和standard...SQL原始碼
- PostgreSQL 原始碼解讀(11)- 插入資料#10(PortalRunMulti和Por...SQL原始碼
- PostgreSQL 原始碼解讀(22)- 查詢語句#7(PlannedStmt結構詳解-日誌分析)SQL原始碼
- redis資料結構原始碼閱讀——字串編碼過程Redis資料結構原始碼字串編碼
- PostgreSQL 原始碼解讀(14)- Insert語句(如何構造PlannedStmt)SQL原始碼
- PostgreSQL 原始碼解讀(219)- Locks(Overview)SQL原始碼View
- PostgreSQL 原始碼解讀(241)- plpgsql(CreateFunction)SQL原始碼Function
- PostgreSQL 原始碼解讀(92)- 分割槽表#1(資料插入路由#1)SQL原始碼路由
- PostgreSQL 原始碼解讀(94)- 分割槽表#2(資料插入路由#2)SQL原始碼路由
- 比特幣原始碼研讀(3)資料結構-交易Transaction比特幣原始碼資料結構
- 資料結構——圖相關基本概念資料結構
- 資料結構相關部落格目錄資料結構
- 前端資料結構---相關基礎概念前端資料結構
- PostgreSQL 原始碼解讀(240)- HTAB簡介SQL原始碼
- PostgreSQL 原始碼解讀(220)- Locks(LOCK Struct)SQL原始碼Struct
- PostgreSQL 原始碼解讀(221)- Locks(PROCLOCK Struct)SQL原始碼Struct