震精-PostgreSQL10.0preview效能增強-WARM提升一倍效能

德哥發表於2017-03-24

背景

目前，PostgreSQL的MVCC是多版本來實現的，當更新資料時，產生新的版本。(社群正在著手增加基於回滾段的儲存引擎)

由於索引儲存的是KEY+CTID（行號），當tuple的新版本與舊版本不在同一個資料塊(BLOCK)的時候，索引也要隨之變化，當新版本在同一個塊裡面時，則發生HOT UPDATE，索引的值不需要更新，但是因為產生了一條新的記錄，所以也需要插入一條索引item，垃圾回收時，將其回收，因此產生了寫放大。

（HOT指，舊的tuple頭部，CHAIN指向新的TUPLE。）

但是HOT總不能覆蓋100%的更新，當tuple新版本不在同一個BLOCK時，即使索引的欄位值未發生變化，也需要更新索引，因為行號變化了。

這個問題在UBER的某篇文件中反映過

《為PostgreSQL討說法 – 淺析《UBER ENGINEERING SWITCHED FROM POSTGRES TO MYSQL》》

PostgreSQL 10.0會將這個問題解決掉，有兩個手段。

1. 增加間接索引的功能。

《PostgreSQL 10.0 preview 效能增強 – 間接索引(secondary index)》

2. 增加WARM的特性，也就是本文要說的。

索引放大的問題

使用pageinspect觀察page的變化。

postgres=# create extension pageinspect ;  
CREATE EXTENSION

建立一個表，3列，每列一個索引。

postgres=# create table tbl(id int primary key, c1 int, c2 int);  
CREATE TABLE  
postgres=# create index idx1 on tbl (c1);  
^[[ACREATE INDEX  
postgres=# create index idx2 on tbl (c2);  
CREATE INDEX

插入一條記錄

postgres=# insert into tbl values (1,1,1);  
INSERT 0 1

更新一個欄位的值，其他欄位的值不變

postgres=# begin;  
BEGIN  
postgres=# update tbl set c1=2 where id=1 returning ctid,*;  
 ctid  | id | c1 | c2   
-------+----+----+----  
 (0,2) |  1 |  2 |  1  
(1 row)  
UPDATE 1

ctid=0,2，因此發生了HOT，記錄沒有出現在其他PAGE。

在另一個會話，觀察三個索引的leaf page，可以看到，資料沒有變化的索引，也插入了一條ITEM

postgres=# select * from bt_page_items(`idx2`,1);  
 itemoffset | ctid  | itemlen | nulls | vars |          data             
------------+-------+---------+-------+------+-------------------------  
          1 | (0,2) |      16 | f     | f    | 01 00 00 00 00 00 00 00  
          2 | (0,1) |      16 | f     | f    | 01 00 00 00 00 00 00 00  
(2 rows)  
postgres=# select * from bt_page_items(`idx1`,1);  
 itemoffset | ctid  | itemlen | nulls | vars |          data             
------------+-------+---------+-------+------+-------------------------  
          1 | (0,1) |      16 | f     | f    | 01 00 00 00 00 00 00 00  
          2 | (0,2) |      16 | f     | f    | 02 00 00 00 00 00 00 00  
(2 rows)  
postgres=# select * from bt_page_items(`tbl_pkey`,1);  
 itemoffset | ctid  | itemlen | nulls | vars |          data             
------------+-------+---------+-------+------+-------------------------  
          1 | (0,2) |      16 | f     | f    | 01 00 00 00 00 00 00 00  
          2 | (0,1) |      16 | f     | f    | 01 00 00 00 00 00 00 00  
(2 rows)

使用vacuum回收垃圾（或者等其自動回收），垃圾回收時，如果要刪除heap表中的dead tuple，首先要刪除索引對應的item。

postgres=# end;  
COMMIT  
  
postgres=# vacuum verbose tbl;  
INFO:  vacuuming "public.tbl"  
INFO:  scanned index "tbl_pkey" to remove 1 row versions  
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.  
INFO:  scanned index "idx1" to remove 1 row versions  
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.  
INFO:  scanned index "idx2" to remove 1 row versions  
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.  
INFO:  "tbl": removed 1 row versions in 1 pages  
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.  
INFO:  index "tbl_pkey" now contains 1 row versions in 2 pages  
DETAIL:  1 index row versions were removed.  
0 index pages have been deleted, 0 are currently reusable.  
CPU 0.00s/0.00u sec elapsed 0.00 sec.  
INFO:  index "idx1" now contains 1 row versions in 2 pages  
DETAIL:  1 index row versions were removed.  
0 index pages have been deleted, 0 are currently reusable.  
CPU 0.00s/0.00u sec elapsed 0.00 sec.  
INFO:  index "idx2" now contains 1 row versions in 2 pages  
DETAIL:  1 index row versions were removed.  
0 index pages have been deleted, 0 are currently reusable.  
CPU 0.00s/0.00u sec elapsed 0.00 sec.  
INFO:  "tbl": found 1 removable, 1 nonremovable row versions in 1 out of 1 pages  
DETAIL:  0 dead row versions cannot be removed yet.  
There were 0 unused item pointers.  
Skipped 0 pages due to buffer pins.  
0 pages are entirely empty.  
CPU 0.00s/0.00u sec elapsed 0.00 sec.  
VACUUM  
  
  
ostgres=# select * from bt_page_items(`idx1`,1);  
 itemoffset | ctid  | itemlen | nulls | vars |          data             
------------+-------+---------+-------+------+-------------------------  
          1 | (0,2) |      16 | f     | f    | 02 00 00 00 00 00 00 00  
(1 row)  
  
postgres=# select * from bt_page_items(`idx2`,1);  
 itemoffset | ctid  | itemlen | nulls | vars |          data             
------------+-------+---------+-------+------+-------------------------  
          1 | (0,2) |      16 | f     | f    | 01 00 00 00 00 00 00 00  
(1 row)  
  
postgres=# select * from bt_page_items(`tbl_pkey`,1);  
 itemoffset | ctid  | itemlen | nulls | vars |          data             
------------+-------+---------+-------+------+-------------------------  
          1 | (0,2) |      16 | f     | f    | 01 00 00 00 00 00 00 00  
(1 row)

寫放大就是這樣產生的，所以10.0的兩個特性可以解決這樣的問題。

間接索引

之前講過，這裡就不重複了

《PostgreSQL 10.0 preview 效能增強 – 間接索引(secondary index)》

WARM

warm使用chain和recheck來解決寫放大的問題。

建表與索引如下

CREATE TABLE test (col1 int, col2 int, col3 int, col4 text);  
CREATE INDEX testindx_col1 ON test (col1);  
CREATE INDEX testindx_col2 ON test (col2);  
CREATE INDEX testindx_col3 ON test (col3);  
  
INSERT INTO test VALUES (1, 11, 111, `foo`);

當前的索引條目

testindx_col1: (1)    ===> (0,1)  
testindx_col2: (11)   ===> (0,1)  
testindx_col3: (111)  ===> (0,1)

WARM上場，更新col1=2，指向0,2，此時沒有變更的索引，不需要更新，依舊指向0,1，而發生變化的索引，也指向0,1，通過chain指向0,2。

Now if a transaction T1 UPDATEs the table such as, "UPDATE test SET col1 =  
2". This does not satisfy the HOT property since index column `col1` is  
being updated. But as per WARM algorithm, since only testindx_col1`s index  
key has changed, we insert a new index tuple only in testindx_col1. Say the  
new heap tuple has CTID (0,2). So testindx_col1 will have a new index tuple  
with key (2), but still pointing to CTID (0,1)  
  
testindx_col1: (1)    ==>  (0,1)  
               (2)    ===> (0,1)  
testindx_col2: (11)   ===> (0,1)  
testindx_col3: (111)  ===> (0,1)  
  
The heap at this point has two tuples, linked by CTID chain.  
(0,1)* ---> (0,2)

不管怎麼更新，都通過chain來指向，所以不需要修改沒有發生變化的索引。

If T3 later updates col2 and T4 further updates col3,  
T3: UPDATE test SET col2 = 12;  
T4: UPDATE test SET col3 = 112;  
  
The heap will look like:  
(0,1)* ---> (0,2) ---> (0,3) ---> (0,4)  
  
And the index pointers will be:  
  
testindx_col1: (1)   ===> (0,1)  
                       (2)   ===> (0,1)  
testindx_col2: (11)  ===> (0,1)  
                       (12)  ===> (0,1)  
testindx_col3: (111) ===> (0,1)  
                       (112) ===> (0,1)

通過FLAG表示當前行是否有chain。

當查詢到chain tuple時，順藤摸瓜，根據事務快照，判斷可見性。

為什麼需要recheck, 因為warm的引入發生值變化的索引，值與heap root lp中儲存的值不一樣了，因此需要recheck.

warm的引入，對有較多索引的表的更新，效能提升是非常明顯的。