PostgreSQLCREATEINDEXCONCURRENTLY的原理以及哪些操作可能堵塞索引的建立

德哥發表於2018-05-06

原文網址 : https://flycode.co/archives/185747

SQLIndex索引

背景

PostgreSQL支援線上建立索引（CREATE INDEX CONCURRENTLY），不堵塞其他會話對被建立索引表的DML（INSERT,UPDATE,DELETE）操作。特別適合於線上業務。

注意，傳統的建立索引的方法，會堵塞其他會話的DML。

那麼CREATE INDEX CONCURRENTLY的內部實現如何？有有一些什麼需要注意的？

比如我們有一個這樣的CASE，在執行CREATE INDEX CONCURRENTLY前，開啟了一個事務（COPY），雖然操作的並不是建立索引的表，但是卻導致了CREATE INDEX CONCURRENTLY遲遲不能結束。

為什麼？需要研究一下CREATE INDEX CONCURRENTLY的原理。

復現

1、建立測試表

postgres=# create table a(id int);    
CREATE TABLE    
postgres=# create table b(id int);    
CREATE TABLE

2、會話1

postgres=# begin;    
BEGIN    
postgres=# copy a from stdin;    
Enter data to be copied followed by a newline.    
End with a backslash and a period on a line by itself, or an EOF signal.    
>>

3、會話2

postgres=# create index idx_b_1 on b (id);    
CREATE INDEX    
postgres=# create index CONCURRENTLY idx_b_2 on b (id);    
    
hang住(實際已結束)

4、會話3，檢視鎖等待資訊，建議用這個QUERY檢視。

《PostgreSQL 鎖等待監控珍藏級SQL – 誰堵塞了誰》

postgres=# select * from pg_locks where granted is not true;    
  locktype  | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |  pid  |     mode      | granted | fastpath     
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------+---------+----------    
 virtualxid |          |          |      |       | 33/500     |               |         |       |          | 61/48              | 18690 | ShareLock     | f       | f    
(17 rows)    
    
    
postgres=# select * from pg_locks where virtualxid=`33/500`;    
  locktype  | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |  pid  |     mode      | granted | fastpath     
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------+---------+----------    
 virtualxid |          |          |      |       | 33/500     |               |         |       |          | 33/500             | 17371 | ExclusiveLock | t       | f    
 virtualxid |          |          |      |       | 33/500     |               |         |       |          | 61/48              | 18690 | ShareLock     | f       | f    
(2 rows)    
    
    
postgres=# select * from pg_stat_activity where pid=17371;    
-[ RECORD 1 ]----+------------------------------------------------------------------------------------------------------    
datid            | 13220    
datname          | postgres    
pid              | 17371    
usesysid         | 10    
usename          | postgres    
application_name | psql    
client_addr      |     
client_hostname  |     
client_port      | -1    
backend_start    | 2018-04-24 19:54:54.838402+08    
xact_start       | 2018-04-24 19:59:10.884774+08    
query_start      | 2018-04-24 19:59:10.884792+08    
state_change     | 2018-04-24 19:59:10.884792+08    
wait_event_type  |     
wait_event       |     
state            | active    
backend_xid      | 27958    
backend_xmin     | 4405    
query            | copy a from stdin;    
backend_type     | client backend

5、檢視pstack

pstack 18690    
#0  0x00007f44f4c4b903 in __epoll_wait_nocancel () from /lib64/libc.so.6    
#1  0x000000000070125e in WaitEventSetWait ()    
#2  0x0000000000701697 in WaitLatchOrSocket ()    
#3  0x000000000070fe76 in ProcSleep ()    
#4  0x000000000070af6f in WaitOnLock ()    
#5  0x000000000070c4f5 in LockAcquireExtended ()    
#6  0x000000000070ec0e in VirtualXactLock ()    
#7  0x00000000005a269e in DefineIndex ()    
#8  0x0000000000725eec in ProcessUtilitySlow.isra.2 ()    
#9  0x0000000000724ac6 in standard_ProcessUtility ()    
#10 0x0000000000722416 in PortalRunUtility ()    
#11 0x0000000000722e57 in PortalRunMulti ()    
#12 0x00000000007239bc in PortalRun ()    
#13 0x000000000071fb57 in exec_simple_query ()    
#14 0x0000000000720e02 in PostgresMain ()    
#15 0x000000000047a96b in ServerLoop ()    
#16 0x00000000006b9029 in PostmasterMain ()    
#17 0x000000000047b321 in main ()

6、幹掉會話，

postgres=# select pg_cancel_backend(17371);    
-[ RECORD 1 ]-----+--    
pg_cancel_backend | t

7、索引結束

postgres=# create index CONCURRENTLY idx_b_2 on b (id);    
CREATE INDEX

CREATE INDEX CONCURRENTLY 原理

為了搞明白前面那個原因，需要了解CREATE INDEX CONCURRENTLY的流程。

https://www.postgresql.org/docs/devel/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY

In a concurrent index build, the index is actually entered into the system catalogs in one transaction, then two table scans occur in two more transactions. Before each table scan, the index build must wait for existing transactions that have modified the table to terminate. After the second scan, the index build must wait for any transactions that have a snapshot ( see Chapter 13 ) predating the second scan to terminate. Then finally the index can be marked ready for use, and the CREATE INDEX command terminates. Even then, however, the index may not be immediately usable for queries: in the worst case, it cannot be used as long as transactions exist that predate the start of the index build.

使用CREATE INDEX CONCURRENTLY建立索引，分為三個階段，掃描兩次TABLE。

create index CONCURRENTLY idx_b_2 on b (id);

階段如下：

1、開啟事務1，拿到當前snapshot1。

2、掃描B表前，等待所有修改過B表（寫入、刪除、更新）的事務結束。

3、掃描B表，並建立索引。

4、結束事務1。

5、開啟事務2，拿到當前snapshot2。

6、再次掃描B表前，等待所有修改過B表（寫入、刪除、更新）的事務結束。

7、在snapshot2之後啟動的事務對B表執行的DML，會修改這個idx_b_2的索引。

8、再次掃描B表，更新索引。（從TUPLE中可以拿到版本號，在snapshot1到snapshot2之間變更的記錄，將其合併到索引）

9、上一步更新索引結束後，等待事務2之前開啟的持有snapshot的事務結束。

10、結束索引建立。索引可見。

前面復現的例子，問題出在哪裡呢？

實際上create index CONCURRENTLY需要2次掃描，三次等待。三次等待分別是2次掃描表前，結束建立索引前。前面的例子，實際上是在結束建立索引前，等待第二次SCAN之前持有snapshot的事務結束。

小結

1、注意事項，為了減少等待的時間，需要1. 儘量避免建立索引過程中，兩次SCAN之前對被建立索引表實施長事務，並且長事務中包含修改被建立索引的表。2. 在第二次SCAN前，儘量避免開啟長事務。

2、最後的一次等待，應該還有改進的空間，減少等待。實際上最後一次等待是為了防止那些還存在的事務，在事務中可能查詢B表，如果結束的話，這個索引實際上是SNAPSHOT2後的狀態（可能有一些對snapshot2前的事務STALE的狀態）。

3、注意，因為第一次掃描並建立中間狀態的索引（INVALID）後，索引實際上就對後面的DML起作用了，所以如果是在第二SCAN階段，索引建立失敗了，這個索引會一直影響DML（效能、約束）。

Another caveat when building a unique index concurrently is that the uniqueness constraint is already being enforced against other transactions when the second table scan begins. This means that constraint violations could be reported in other queries prior to the index becoming available for use, or even in cases where the index build eventually fails. Also, if a failure does occur in the second scan, the “invalid” index continues to enforce its uniqueness constraint afterwards.

參考

https://www.postgresql.org/docs/devel/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY

MySQL 索引原理以及優化
2019-06-03
MySql索引優化
建立索引後，速度變快原因?以及索引失效總結
2018-10-11
索引
Stream常用操作以及原理探索
2022-06-26
MySQL:FLTWL的堵塞和被堵塞總結
2018-08-07
MySql
ElasticSearch 獲取es資訊以及索引操作
2018-03-29
Elasticsearch索引
剖析 Elasticsearch 的索引原理
2019-05-13
Elasticsearch索引
建立索引的優劣勢
2020-12-09
索引
oracle資料庫建立、刪除索引等操作
2020-09-30
Oracle資料庫索引
淺析InnoDB引擎的索引和索引原理
2021-10-15
索引
什麼情況下需要建立索引？索引的作用？為什麼能夠提高查詢速度？（索引的原理）索引有什麼副作用嗎？
2018-08-21
索引
哪些情況下需要/不需要建立索引
2020-12-09
索引
MySQL：FTWRL一個奇怪的堵塞現象和堵塞總結
2019-09-18
MySql
mysql 索引的原理(超細)
2020-10-27
MySql索引
簡單易懂的索引原理
2020-04-07
索引
MySQL的索引原理及使用
2024-09-01
MySql索引
MySQL建立表的時候建立聯合索引的方法
2024-06-06
MySql索引
【Mongo】MongoDB索引管理－索引的建立、檢視、刪除
2018-05-01
MongoDB索引
python建立elasticsearch索引的探討
2018-11-12
PythonElasticsearch索引
如何清除建立失敗的索引
2019-03-14
索引
Mysql索引的建立與刪除
2021-06-01
MySql索引
61_索引管理_快速上機動手實戰建立、修改以及刪除索引
2024-10-02
索引
資料庫索引的工作原理
2018-04-26
資料庫索引
MySQ索引操作命令總結（建立、重建、查詢和刪除索引命令詳解）
2021-09-09
索引
用SQL建立索引的方法步驟
2020-12-27
SQL索引
不能建立降序索引的問題的解決
2019-04-25
索引
Hive建立索引
2018-11-28
Hive索引
DocumentDB 建立索引
2024-10-11
索引
外來鍵沒有索引哪些DML操作會被阻塞
2018-03-15
索引
索引的知識要點與操作
2024-04-28
索引
mysql 索引的基礎操作彙總
2021-09-09
MySql索引
點陣圖索引的工作原理 - Richard
2021-11-29
索引
GPU的介紹以及原理的分析
2018-08-29
GPU
windows10系統建立索引的方法
2019-04-17
Windows索引
MySQL如何建立一個好索引？建立索引的5條建議【宇哥帶你玩轉MySQL 索引篇(三)】
2020-04-28
MySql索引
常見的導致PG建立索引慢的原因
2023-03-31
索引
elasticsearch索引原理
2019-03-07
Elasticsearch索引
MySQL索引原理
2019-07-25
MySql索引
Sqlserver自動查詢缺失索引及拼出建立索引的語句的指令碼
2021-03-03
SQLServer索引指令碼

PostgreSQLCREATEINDEXCONCURRENTLY的原理以及哪些操作可能堵塞索引的建立

標籤

背景

復現

CREATE INDEX CONCURRENTLY 原理

小結

參考

相關文章