PostgreSQL10.0preview功能增強-後臺執行(pg_background)

德哥發表於2017-03-24

標籤

PostgreSQL , 10.0 , 後臺執行 , pg_background_launch , pg_background_result , pg_background_detach , pg_background


背景

當使用者在管理資料庫時,如果要在互動式介面跑一些QUERY,但是不知道QUERY要執行多久,擔心網路問題或者其他問題導致終端斷開,QUERY執行情況不明的話。就需要後臺執行這個功能了。

後臺執行在LINUX中也很常見,比如

nohup ls -la / >/tmp/result 2>&1 &  

這樣的話,即使斷開會話也沒關係,這條命令會在後臺執行,並將標準輸出存入/tmp/result,標準錯誤也重定向到標準輸出。

對於PostgreSQL資料庫,在10.0的這個特性出來之前,使用者可以使用dblink的非同步呼叫,達到類似的目的,但是不能斷開會話,注意了。

dblink非同步呼叫

使用dblink非同步呼叫,可以達到後臺執行的目的,但是別忘了,dblink連線是當前會話建立的,當前會話退出,連線也會退出。

postgres=# create extension dblink;  
CREATE EXTENSION  
  
建立測試表  
postgres=# create table t(id int);  
CREATE TABLE  
  
建立連線  
postgres=# select dblink_connect(`digoal`,`host=127.0.0.1 port=1921 user=postgres dbname=postgres`);  
 dblink_connect   
----------------  
 OK  
(1 row)  
  
開始事務(你也可以使用autocommit)  
postgres=# select * from dblink_send_query(`digoal`, `begin`);  
 dblink_send_query   
-------------------  
                 1  
(1 row  
  
獲取非同步呼叫結果  
postgres=# select * from dblink_get_result(`digoal`) as t(res text);  
  res    
-------  
 BEGIN  
(1 row)  
  
獲取非同步呼叫結果為0時,才能對這個連線進行下一次非同步呼叫。  
postgres=# select * from dblink_get_result(`digoal`) as t(res text);  
 res   
-----  
(0 rows)  
  
非同步插入資料  
postgres=# select * from dblink_send_query(`digoal`, `insert into t values (1)`);  
 dblink_send_query   
-------------------  
                 1  
(1 row)  
  
獲取非同步呼叫結果  
postgres=# select * from dblink_get_result(`digoal`) as t(res text);  
    res       
------------  
 INSERT 0 1  
(1 row)  
  
postgres=# select * from dblink_get_result(`digoal`) as t(res text);  
 res   
-----  
(0 rows)  
  
檢視資料是否插入,因為非同步事務沒有提交,所以看不到資料  
postgres=# select * from t;  
 id   
----  
(0 rows)  
  
提交非同步事務  
postgres=# select * from dblink_send_query(`digoal`, `commit`);  
 dblink_send_query   
-------------------  
                 1  
(1 row)  
  
檢視資料,有了  
postgres=# select * from t;  
 id   
----  
  1  
(1 row)  

斷開本地會話,非同步會話也會斷開,未提交的非同步事務自動回滾。

postgres=# select dblink_connect(`digoal`,`host=127.0.0.1 port=1921 user=postgres dbname=postgres`);  
-[ RECORD 1 ]--+---  
dblink_connect | OK  
  
postgres=# select * from dblink_send_query(`digoal`, `begin`);  
-[ RECORD 1 ]-----+--  
dblink_send_query | 1  
  
postgres=# select * from dblink_get_result(`digoal`) as t(res text);  
-[ RECORD 1 ]  
res | BEGIN  
  
postgres=# select * from dblink_get_result(`digoal`) as t(res text);  
(0 rows)  
  
postgres=# select * from dblink_send_query(`digoal`, `insert into t values (2)`);  
-[ RECORD 1 ]-----+--  
dblink_send_query | 1  
  
退出當前會話  
postgres=# q  
  
重新連線,非同步會話已斷開,並回滾。  
postgres=# select * from t;  
 id   
----  
  1  
(1 row)  

使用dblink非同步介面,可以完成一些後臺執行的功能,但是比較繁瑣,也不完美(比如當前會話不能退出)

https://www.postgresql.org/docs/9.6/static/contrib-dblink-get-result.html

PostgreSQL 10.0 新增了background session的功能,這個功能可以對標類似Oracle的自治事務的功能。(是plsql函式或儲存過程遷移到PostgreSQL plpgsql的有利相容性,此前需要使用dblink模擬自治事務)

基於background session,開發了一個後臺執行的管理介面。可以方便的執行後臺事務了。

PostgreSQL 10.0 background session(自治事務)功能

參考

《PostgreSQL 10.0 preview 功能增強 – 匿名、自治事務(Oracle 相容性)》

PostgreSQL 10.0 後臺執行介面功能

一開始的設計比較簡單,提供了三個API函式

• pg_background_launch : 開啟後臺work程式與會話,執行使用者提供的SQL,返回後臺會話的PID  
  
• pg_background_result   : 根據提供的PID,返回這個後臺會話執行SQL的結果  
  
• pg_background_detach : 根據提供的PID,返回這個後臺會話執行SQL的結果,同時關閉這個後臺程式。  

最開始的討論細節如下

Hi All,  
  
I would like to take over pg_background patch and repost for  
discussion and review.  
  
Initially Robert Haas has share this for parallelism demonstration[1]  
and abandoned later with  
summary of open issue[2] with this pg_background patch need to be  
fixed, most of them seems to be  
addressed in core except handling of type exists without binary  
send/recv functions and documentation.  
I have added handling for types that don`t have binary send/recv  
functions in the attach patch and will  
work on documentation at the end.  
  
One concern with this patch is code duplication with  
exec_simple_query(), we could  
consider Jim Nasby’s patch[3] to overcome this,  but  certainly we  
will end up by complicating  
exec_simple_query() to make pg_background happy.  
  
As discussed previously[1] pg_background is a contrib module that lets  
you launch arbitrary command in a background worker.  
  
• VACUUM in background  
• Autonomous transaction implementation better than dblink way (i.e.  
no separate authentication required).  
• Allows to perform task like CREATE INDEX CONCURRENTLY from a  
procedural language.  
  
This module comes with following SQL APIs:  
  
• pg_background_launch : This API takes SQL command, which user wants  
to execute, and size of queue buffer.  
  This function returns the process id of background worker.  
• pg_background_result   : This API takes the process id as input  
parameter and returns the result of command  
  executed thought the background worker.  
• pg_background_detach : This API takes the process id and detach the  
background process which is waiting for  
 user to read its results.  
  
  
Here`s an example of running vacuum and then fetching the results.  
Notice that the  
notices from the original session are propagated to our session; if an  
error had occurred,  
it would be re-thrown locally when we try to read the results.  
  
postgres=# create table foo (a int);  
CREATE TABLE  
postgres=# insert into foo values(generate_series(1,5));  
INSERT 0 5  
  
postgres=# select pg_background_launch(`vacuum verbose foo`);  
pg_background_launch  
----------------------  
              65427  
(1 row)  
  
postgres=# select * from pg_background_result(65427) as (x text);  
INFO:  vacuuming "public.foo"  
INFO:  "foo": found 0 removable, 5 nonremovable row versions in 1 out of 1 pages  
DETAIL:  0 dead row versions cannot be removed yet.  
There were 0 unused item pointers.  
Skipped 0 pages due to buffer pins.  
0 pages are entirely empty.  
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.  
 x  
--------  
VACUUM  
(1 row)  
  
  
Thanks to Vibhor kumar, Rushabh Lathia and Robert Haas for feedback.  
  
Please let me know your thoughts, and thanks for reading.  
  
[1]. https://www.postgresql.org/message-id/CA%2BTgmoam66dTzCP8N2cRcS6S6dBMFX%2BJMba%2BmDf68H%3DKAkNjPQ%40mail.gmail.com  
[2]. https://www.postgresql.org/message-id/CA%2BTgmobPiT_3Qgjeh3_v%2B8Cq2nMczkPyAYernF_7_W9a-6T1PA%40mail.gmail.com  
[3]. https://www.postgresql.org/message-id/54541779.1010906%40BlueTreble.com  
  
Regards,  
Amul  

社群討論後,這個架構改成了這樣的,架構更優雅一些。

• pg_background_launch : 這個介面只是用來fork一個後臺程式,並返回PID  
  
• pg_background_run : 根據提供的PID,讓這個後臺程式執行提供的SQL。  
  
• pg_background_result : 根據提供的PID,獲取執行SQL的結果。  
  
• pg_background_detach : 關閉後臺程式與會話。  

討論細節如下

Hi all,  
  
As we have discussed previously, we need to rework this patch as a client of  
Peter Eisentraut`s background sessions code[1].  
  
Attaching trial version patch to discussed possible design and api.  
We could have following APIs :  
  
• pg_background_launch : This function start and stores new background  
session, and returns the process id of background worker.  
  
• pg_background_run : This API takes the process id and SQL command as  
input parameter. Using this process id, stored worker`s session is  
retrieved and give SQL command is executed under it.  
  
• pg_background_result : This API takes the process id as input  
parameter and returns the result of command executed thought the  
background worker session.  Same as it was before but now result can  
be fetch in LIFO order i.e. result of last executed query using  
pg_background_run will be fetched first.  
  
• pg_background_detach : This API takes the process id and detach the  
background process. Stored worker`s session is not dropped until this  
called.  
  
• TBC : API to discard result of last query or discard altogether?  
  
• TBC : How about having one more api to see all existing sessions ?  
  
  
Kindly share your thoughts/suggestions.  Note that attach patch is WIP  
version, code, comments & behaviour could be vague.  
  
------------------  
Quick demo:  
------------------  
Apply attach patch to the top of Peter Eisentraut`s  
0001-Add-background-sessions.patch[1]  
  
postgres=# select pg_background_launch();  
 pg_background_launch  
----------------------  
                21004  
(1 row)  
  
postgres=# select pg_background_run(21004, `vacuum verbose foo`);  
 pg_background_run  
-------------------  
  
(1 row)  
  
postgres=# select * from pg_background_result(21004) as (x text);  
INFO:  vacuuming "public.foo"  
INFO:  "foo": found 0 removable, 5 nonremovable row versions in 1 out of 1 pages  
DETAIL:  0 dead row versions cannot be removed yet.  
There were 0 unused item pointers.  
Skipped 0 pages due to buffer pins.  
0 pages are entirely empty.  
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.  
   x  
--------  
 VACUUM  
(1 row)  
  
postgres=# select pg_background_run(21004, `select * from foo`);  
 pg_background_run  
-------------------  
  
(1 row)  
  
postgres=# select * from pg_background_result(21004) as (x int);  
 x  
---  
 1  
 2  
 3  
 4  
 5  
(5 rows)  
  
postgres=# select pg_background_detach(21004);  
 pg_background_detach  
----------------------  
  
(1 row)  
  
  
References :  
[1] https://www.postgresql.org/message-id/e1c2d331-ee6a-432d-e9f5-dcf85cffaf29%402ndquadrant.com.  
  
  
Regards,  
Amul Sul  

後面的討論又改成了這樣,還是朝著優雅的方向在改進

The following review has been posted through the commitfest application:  
make installcheck-world:  tested, passed  
Implements feature:       tested, passed  
Spec compliant:           tested, passed  
Documentation:            tested, failed  
  
I’ll summarize here my thoughts as a reviewer on the current state of the pg_background:  
1. Current version of a code [1] is fine, from my point of view. I have no suggestions on improving it. There is no documentation, but code is commented.  
2. Patch is dependent on background sessions from the same commitfest.  
3. There can exist more features, but for v1 there is surely enough features.  
4. There is some controversy on where implemented feature shall be: in separate extension (as in this patch), in db_link, in some PL API, in FDW or somewhere else.   
I think that new extension is an appropriate place for the feature. But I’m not certain.  
Summarizing these points, appropriate statuses of the patch are ‘Ready for committer’ or ‘Rejected’.   
Between these two I choose ‘Ready for committer’, I think patch is committable (after bg sessions).  
  
Best regards, Andrey Borodin.  
  
The new status of this patch is: Ready for Committer  

這個patch在commit前,還有一些變數,比如可能將這個功能合併到dblink裡面。而不是新開一個extension外掛.

這個patch的討論,詳見郵件組,本文末尾URL。

PostgreSQL社群的作風非常嚴謹,一個patch可能在郵件組中討論幾個月甚至幾年,根據大家的意見反覆的修正,patch合併到master已經非常成熟,所以PostgreSQL的穩定性也是遠近聞名的。

參考

https://commitfest.postgresql.org/13/893/

https://www.postgresql.org/message-id/flat/CAAJ_b97FRO+Y_-SOgXGj-WPwtuWrmifgfgPvbXMAvUKQykvNvA@mail.gmail.com#CAAJ_b97FRO+Y_-SOgXGj-WPwtuWrmifgfgPvbXMAvUKQykvNvA@mail.gmail.com


相關文章