PostgreSQL Page頁結構解析(7)- B-Tree索引儲存結構#3
本文簡單介紹了在PG資料庫B-Tree索引的物理儲存結構,包括root index block、branch index block、leaf block index等等相關索引結構資訊。
一、測試資料
我們繼續使用上一節使用的測試資料,這一次我們追加插入>1000行的資料。
-- 為方便對比,插入資料前先檢視索引後設資料頁
testdb=# select * from bt_metap('pk_t_index');
magic | version | root | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples
--------+---------+------+-------+----------+-----------+-------------+-------------------------
340322 | 3 | 1 | 0 | 1 | 0 | 0 | -1
(1 row)
testdb=# do $$
testdb$# begin
testdb$# for i in 19..1020 loop
testdb$# insert into t_index (id, c1, c2) values (i, '#'||i||'#', '#'||i||'#');
testdb$# end loop;
testdb$# end $$;
DO
testdb=# select count(*) from t_index;
count
-------
1008
(1 row)
二、索引儲存結構
插入資料後,重新檢視索引後設資料頁資訊:
testdb=# select * from bt_metap('pk_t_index');
magic | version | root | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples
--------+---------+------+-------+----------+-----------+-------------+-------------------------
340322 | 3 | 3 | 1 | 3 | 1 | 0 | -1
(1 row)
root block從原來的block 1變為block 3,檢視block 3的的Special space:
testdb=# select * from bt_page_stats('pk_t_index',3);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
3 | r | 3 | 0 | 13 | 8192 | 8096 | 0 | 0 | 1 | 2
(1 row)
type=r,表示root index block,這個block有3個index entries(live_items=3,該index block只是root block(btpo_flags=BTP_ROOT)。下面我們來看看這個block中的index entries:
testdb=# select * from bt_page_items('pk_t_index',3);
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------------------
1 | (1,0) | 8 | f | f |
2 | (2,53) | 16 | f | f | 7b 01 00 00 00 00 00 00
3 | (4,105) | 16 | f | f | e9 02 00 00 00 00 00 00
(3 rows)
root/branch index block儲存的是指向其他index block的指標。第1行,index entries指向第1個index block,由於該block沒有left block,因此,itemlen只有8個位元組,資料範圍為1-\x0000017b(十進位制值為379);第2行,index entries指向第2個index block,資料範圍為380-\x000002e9(745);第3行,index entries指向第4個index block,資料範圍為大於745的值。
這裡有個疑惑,正常來說,root index block中的entries應指向index block,但ctid的值(2,53)和(4,105)指向的卻是Heap Table Block,PG11 Beta2的Bug?
In a B-tree leaf page, ctid points to a heap tuple. In an internal page, the block number part of ctid points to another page in the index itself, while the offset part (the second number) is ignored and is usually 1.
testdb=# select * from heap_page_items(get_raw_page('t_index',2)) where t_ctid = '(2,53)';
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+---------+--------+----------+--------+-------------+------------+--------+--------+-------+------------------------------------------
53 | 5648 | 1 | 43 | 1612755 | 0 | 360 | (2,53) | 3 | 2306 | 24 | | | \x7b0100001323333739232020200d2333373923
(1 row)
testdb=# select * from heap_page_items(get_raw_page('t_index',4)) where t_ctid = '(4,105)';
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
-----+--------+----------+--------+---------+--------+----------+---------+-------------+------------+--------+--------+-------+------------------------------------------
105 | 3152 | 1 | 43 | 1612755 | 0 | 726 | (4,105) | 3 | 2306 | 24 | | | \xe90200001323373435232020200d2337343523
(1 row)
回到正題,我們首先看看index block 1的相關資料:
testdb=# select * from bt_page_stats('pk_t_index',1);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
1 | l | 367 | 0 | 16 | 8192 | 808 | 0 | 2 | 0 | 1
(1 row)
testdb=# select * from bt_page_items('pk_t_index',1) limit 10;
itemoffset | ctid | itemlen | nulls | vars | data
------------+--------+---------+-------+------+-------------------------
1 | (2,53) | 16 | f | f | 7b 01 00 00 00 00 00 00
2 | (0,1) | 16 | f | f | 02 00 00 00 00 00 00 00
3 | (0,2) | 16 | f | f | 04 00 00 00 00 00 00 00
4 | (0,3) | 16 | f | f | 08 00 00 00 00 00 00 00
5 | (0,4) | 16 | f | f | 10 00 00 00 00 00 00 00
6 | (0,6) | 16 | f | f | 11 00 00 00 00 00 00 00
7 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00
8 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00
9 | (0,9) | 16 | f | f | 14 00 00 00 00 00 00 00
10 | (0,10) | 16 | f | f | 15 00 00 00 00 00 00 00
(10 rows)
第1個block的Special space,其中type=l,表示leaf index block,btpo_flags=BTP_LEAF表示該block僅僅為leaf index block,block的index entries指向heap table。同時,這個block裡面有367個items,右邊兄弟block號是2(btpo_next)。
值得注意到,index entries的第1個條目,是最大值\x017b,第2個條目才是最小值,接下來的條目是按順序儲存的其他值。原始碼的README(src/backend/access/nbtree/README)裡面有解釋:
On a page that is not rightmost in its tree level, the "high key" is
kept in the page's first item, and real data items start at item 2.
The link portion of the "high key" item goes unused. A page that is
rightmost has no "high key", so data items start with the first item.
Putting the high key at the left, rather than the right, may seem odd,
but it avoids moving the high key as we add data items.
官方文件也有相關解釋:
Note that the first item on any non-rightmost page (any page with a non-zero value in the btpo_next field) is the page's “high key”, meaning its data serves as an upper bound on all items appearing on the page, while its ctid field is meaningless. Also, on non-leaf pages, the first real data item (the first item that is not a high key) is a “minus infinity” item, with no actual value in its data field. Such an item does have a valid downlink in its ctid field, however.
下面我們再來看看index block 2&4:
testdb=# select * from bt_page_stats('pk_t_index',2);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
2 | l | 367 | 0 | 16 | 8192 | 808 | 1 | 4 | 0 | 1
(1 row)
testdb=# select * from bt_page_items('pk_t_index',2) limit 10;
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------------------
1 | (4,105) | 16 | f | f | e9 02 00 00 00 00 00 00
2 | (2,53) | 16 | f | f | 7b 01 00 00 00 00 00 00
3 | (2,54) | 16 | f | f | 7c 01 00 00 00 00 00 00
4 | (2,55) | 16 | f | f | 7d 01 00 00 00 00 00 00
5 | (2,56) | 16 | f | f | 7e 01 00 00 00 00 00 00
6 | (2,57) | 16 | f | f | 7f 01 00 00 00 00 00 00
7 | (2,58) | 16 | f | f | 80 01 00 00 00 00 00 00
8 | (2,59) | 16 | f | f | 81 01 00 00 00 00 00 00
9 | (2,60) | 16 | f | f | 82 01 00 00 00 00 00 00
10 | (2,61) | 16 | f | f | 83 01 00 00 00 00 00 00
(10 rows)
testdb=# select * from bt_page_stats('pk_t_index',4);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
4 | l | 276 | 0 | 16 | 8192 | 2628 | 2 | 0 | 0 | 1
(1 row)
testdb=# select * from bt_page_items('pk_t_index',4) limit 10;
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------------------
1 | (4,105) | 16 | f | f | e9 02 00 00 00 00 00 00
2 | (4,106) | 16 | f | f | ea 02 00 00 00 00 00 00
3 | (4,107) | 16 | f | f | eb 02 00 00 00 00 00 00
4 | (4,108) | 16 | f | f | ec 02 00 00 00 00 00 00
5 | (4,109) | 16 | f | f | ed 02 00 00 00 00 00 00
6 | (4,110) | 16 | f | f | ee 02 00 00 00 00 00 00
7 | (4,111) | 16 | f | f | ef 02 00 00 00 00 00 00
8 | (4,112) | 16 | f | f | f0 02 00 00 00 00 00 00
9 | (4,113) | 16 | f | f | f1 02 00 00 00 00 00 00
10 | (4,114) | 16 | f | f | f2 02 00 00 00 00 00 00
(10 rows)
相關解析參照index block 1,大同小異,不再累述。
三、小結
知識要點:
1、root index block的儲存結構:指向branch或leaf index block
2、leaf index block的儲存結構:指向heap table block;btpo_next <> 0時,最大值儲存在第1個item中
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/6906/viewspace-2374917/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- PostgreSQL Page頁結構解析(5)- B-Tree索引儲存結構#1SQL索引
- PostgreSQL Page頁結構解析(6)- B-Tree索引儲存結構#2SQL索引
- PostgreSQL Page頁結構解析(3)- 行資料SQL
- PostgreSQL儲存引擎之page結構SQL儲存引擎
- PostgreSQL Page頁結構解析(1)-基礎SQL
- PostgreSQL Page頁結構解析(2)- 頁頭和行資料指標SQL指標
- PostgreSQL Page頁結構解析(4)- 執行DML時表佔用空間解析SQL
- PostgreSQL儲存引擎之heap tuple結構SQL儲存引擎
- 儲存結構
- MyRocks儲存引擎資料結構解析儲存引擎資料結構
- 如何從零學習PostgreSQL Page結構SQL
- VSAN儲存結構解析+儲存資料恢復案例資料恢復
- 《MySQL 基礎篇》十一:索引的儲存結構MySql索引
- 理解SQL Server 2008索引的儲存結構YDSQLServer索引
- JanusGraph -- 儲存結構
- CentOS 儲存結構CentOS
- 資料的儲存結構淺析LSM-Tree和B-tree
- 資料結構知識點--儲存結構與邏輯結構資料結構
- 【資料結構——圖和圖的儲存結構】資料結構
- PostgreSQL 資料庫學習 - 1.資料庫體系結構之儲存結構SQL資料庫
- 深度解析C#中LinkedList<T>的儲存結構C#
- InnoDB記錄儲存結構
- php圖的儲存結構PHP
- redis 儲存結構原理 2Redis
- HBase 資料儲存結構
- 【PHP資料結構】圖的概念和儲存結構PHP資料結構
- 儲存器的層次結構
- innodb表空間儲存結構
- etcd MVCC 儲存結構及流程MVC
- 【RocketMQ】RocketMQ儲存結構設計MQ
- 圖(Graph)——圖的儲存結構
- 【部落格383】etcd儲存結構
- PostgreSQL的B-tree索引SQL索引
- hadoop異構儲存+lucene索引Hadoop索引
- MySQL 索引結構MySql索引
- PostgreSQL:物理結構SQL
- PostgreSQL:程式結構SQL
- PE檔案結構解析3