PostgreSQL10內建分割槽vspg_pathman
標籤
PostgreSQL , 內建分割槽 , pg_pathman , perf , 效能 , 鎖
背景
PostgreSQL 10內建分割槽的效能不如pg_pathman分割槽外掛的效能。有非常大的優化空間,那麼是什麼導致了分割槽的效能問題呢?
編譯PostgreSQL 10.0
1、編譯、開啟debug
CFLAGS="-g -ggdb -fno-omit-frame-pointer" ./configure --prefix=/home/digoal/pgsql10.0
CFLAGS="-g -ggdb -fno-omit-frame-pointer" make world -j 128
CFLAGS="-g -ggdb -fno-omit-frame-pointer" make install-world
2、初始化叢集
initdb -D $PGDATA -U postgres -E SQL_ASCII --locale=C
3、配置postgresql.conf
listen_addresses = `0.0.0.0`
port = 1921
max_connections = 400
superuser_reserved_connections = 3
unix_socket_directories = `.`
shared_buffers = 128GB
work_mem = 1024MB
maintenance_work_mem = 4GB
dynamic_shared_memory_type = posix
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_flush_after = 0
max_worker_processes = 128
max_parallel_workers_per_gather = 0
max_parallel_workers = 128
wal_level = minimal
synchronous_commit = off
full_page_writes = off
wal_buffers = 1GB
wal_writer_delay = 10ms
wal_writer_flush_after = 0
checkpoint_timeout = 55min
max_wal_size = 128GB
min_wal_size = 80MB
checkpoint_completion_target = 0.01
checkpoint_flush_after = 0
max_wal_senders = 0
parallel_tuple_cost = 0
parallel_setup_cost = 0
min_parallel_table_scan_size = 0
min_parallel_index_scan_size = 0
effective_cache_size = 400GB
log_destination = `csvlog`
logging_collector = on
log_truncate_on_rotation = on
log_timezone = `PRC`
log_autovacuum_min_duration = 0
datestyle = `iso, mdy`
timezone = `PRC`
lc_messages = `C`
lc_monetary = `C`
lc_numeric = `C`
lc_time = `C`
default_text_search_config = `pg_catalog.english`
建立range和list分割槽,用於測試
1. range 分割槽表
1.1 主表
create table t_range(id int, info text, crt_time timestamp) partition by range (id);
1.2 分割槽
do language plpgsql $$
declare
i int;
begin
for i in 0..127 loop
execute `create table t_range_`||i||` partition of t_range for values from (`||i*100||`) to (`||(i+1)*100||`)`;
end loop;
end;
$$;
2. list 分割槽表
2.1 主表
create table t_list(id int, info text, crt_time timestamp) partition by list ( mod(hashtext(info), 128) );
2.2 分割槽
do language plpgsql $$
declare
i int;
begin
for i in 0..127 loop
execute `create table t_list_`||i||` partition of t_list for values in (`||i||`)`;
end loop;
end;
$$;
建立測試指令碼
1、測試範圍分割槽
vi test1.sql
set id random(0,12799)
insert into t_range values (:id, `test`, now());
2、測試LIST分割槽
vi test2.sql
set id random(0,127)
insert into t_list values (:id, `test`, now());
壓測
pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 64 -j 64 -T 12800
pgbench -M prepared -n -r -P 1 -f ./test2.sql -c 64 -j 64 -T 12800
progress: 14.0 s, 21361.0 tps, lat 2.996 ms stddev 0.848
progress: 15.0 s, 21302.0 tps, lat 3.004 ms stddev 0.868
progress: 16.0 s, 21349.0 tps, lat 2.998 ms stddev 0.852
progress: 17.0 s, 21366.9 tps, lat 2.996 ms stddev 0.858
progress: 18.0 s, 21346.0 tps, lat 2.998 ms stddev 0.856
收集profiling
perf record -ag
生成perf報告
1、
perf report -g
結果
Samples: 1M of event `cycles`, Event count (approx.): 662397686992
Children Self Command Shared Object Symbol
+ 88.63% 0.00% postgres libc-2.17.so [.] __libc_start_main
+ 88.63% 0.00% postgres postgres [.] startup_hacks
+ 88.63% 0.00% postgres postgres [.] PostmasterMain
+ 88.49% 0.00% postgres postgres [.] ServerLoop
+ 88.49% 0.00% postgres postgres [.] BackendStartup
+ 88.47% 0.00% postgres postgres [.] ExitPostmaster
+ 88.42% 0.04% postgres postgres [.] PostgresMain
+ 74.92% 0.02% postgres postgres [.] exec_execute_message
+ 74.77% 0.01% postgres postgres [.] PortalRun
+ 74.75% 0.02% postgres postgres [.] PortalRunMulti
+ 74.57% 0.01% postgres postgres [.] ProcessQuery
+ 71.56% 0.01% postgres postgres [.] ExecutorStart
+ 71.54% 0.01% postgres postgres [.] standard_ExecutorStart
+ 71.46% 0.02% postgres postgres [.] InitPlan
+ 71.26% 0.02% postgres postgres [.] ExecInitNode
+ 71.23% 0.05% postgres postgres [.] ExecInitModifyTable
+ 70.34% 0.23% postgres postgres [.] ExecSetupPartitionTupleRouting
+ 45.89% 0.23% postgres postgres [.] find_all_inheritors
+ 42.42% 0.28% postgres postgres [.] find_inheritance_children
+ 23.59% 0.08% postgres postgres [.] LockRelationOid
+ 23.33% 0.04% postgres postgres [.] LockAcquire
+ 22.88% 0.79% postgres postgres [.] LockAcquireExtended
+ 12.59% 1.07% postgres postgres [.] InitResultRelInfo
+ 12.10% 0.26% postgres postgres [.] LWLockAcquire
+ 11.50% 0.08% postgres postgres [.] RelationGetPartitionQual
+ 10.99% 0.01% postgres postgres [.] CommitTransaction
+ 10.89% 1.14% postgres postgres [.] _copyList
+ 10.58% 0.05% postgres postgres [.] systable_getnext
+ 10.51% 0.05% postgres postgres [.] index_getnext
+ 10.50% 6.88% postgres postgres [.] hash_search_with_hash_value
+ 10.35% 0.01% postgres postgres [.] ResourceOwnerRelease
+ 10.34% 0.03% postgres postgres [.] ResourceOwnerReleaseInternal
+ 10.29% 0.01% postgres postgres [.] ProcReleaseLocks
+ 10.22% 1.84% postgres postgres [.] LockReleaseAll
+ 9.46% 0.52% postgres postgres [.] LWLockRelease
+ 9.00% 0.21% postgres postgres [.] index_fetch_heap
+ 8.60% 0.01% postgres [kernel.kallsyms] [k] system_call_fastpath
+ 8.34% 0.23% postgres postgres [.] SearchSysCache
+ 7.98% 0.27% swapper [kernel.kallsyms] [k] cpu_startup_entry
+ 7.86% 0.00% swapper [kernel.kallsyms] [k] start_secondary
+ 7.73% 0.04% postgres [kernel.kallsyms] [k] sys_futex
+ 7.68% 0.05% postgres [kernel.kallsyms] [k] do_futex
+ 7.43% 5.59% postgres postgres [.] AllocSetAlloc
+ 7.27% 3.52% postgres postgres [.] SearchCatCache
+ 7.03% 0.22% postgres postgres [.] _copyOpExpr
+ 6.15% 0.36% postgres postgres [.] LWLockWakeup
+ 5.93% 0.38% postgres postgres [.] RelationGetPartitionDispatchInfo
+ 5.62% 0.14% postgres postgres [.] LockBuffer
+ 5.50% 0.28% postgres postgres [.] hash_search
2、
perf report --stdio -g
結果
45.89% 0.23% postgres postgres [.] find_all_inheritors
|
---find_all_inheritors
|
|--99.99%-- ExecSetupPartitionTupleRouting
| ExecInitModifyTable
42.42% 0.28% postgres postgres [.] find_inheritance_children
|
---find_inheritance_children
|
|--99.97%-- find_all_inheritors
| ExecSetupPartitionTupleRouting
| ExecInitModifyTable
| ExecInitNode
23.59% 0.08% postgres postgres [.] LockRelationOid
|
---LockRelationOid
|
|--98.67%-- find_inheritance_children
| find_all_inheritors
| ExecSetupPartitionTupleRouting
| ExecInitModifyTable
23.33% 0.04% postgres postgres [.] LockAcquire
|
---LockAcquire
|
|--98.59%-- LockRelationOid
| |
| |--98.82%-- find_inheritance_children
| | find_all_inheritors
| | ExecSetupPartitionTupleRouting
| | ExecInitModifyTable
| | ExecInitNode
11.50% 0.08% postgres postgres [.] RelationGetPartitionQual
|
---RelationGetPartitionQual
|
|--99.75%-- InitResultRelInfo
| |
| |--99.99%-- ExecSetupPartitionTupleRouting
| | ExecInitModifyTable
| | ExecInitNode
| | InitPlan
| | standard_ExecutorStart
| | ExecutorStart
11.42% 0.07% postgres postgres [.] generate_partition_qual
|
---generate_partition_qual
|
|--99.90%-- RelationGetPartitionQual
| InitResultRelInfo
| ExecSetupPartitionTupleRouting
| ExecInitModifyTable
| ExecInitNode
| InitPlan
| standard_ExecutorStart
| ExecutorStart
pg_pathman perf profiling
postgres=# CREATE EXTENSION pg_pathman;
CREATE EXTENSION
postgres=# create table tbl_range(id int not null, info text, crt_time timestamp);
CREATE TABLE
postgres=# select create_range_partitions(`tbl_range`, `id`, 0, 100, 128);
create_range_partitions
-------------------------
128
(1 row)
壓測,pg_pathman 是native partition 的10幾倍效能。
progress: 11.0 s, 262159.5 tps, lat 0.244 ms stddev 0.135
progress: 12.0 s, 260462.8 tps, lat 0.246 ms stddev 0.124
progress: 13.0 s, 259761.5 tps, lat 0.246 ms stddev 0.130
progress: 14.0 s, 271007.1 tps, lat 0.236 ms stddev 0.118
profiling
perf report -g
Samples: 1M of event `cycles`, Event count (approx.): 529675912799
Children Self Command Shared Object Symbol
+ 78.90% 0.00% postgres libc-2.17.so [.] __libc_start_main
+ 78.90% 0.00% postgres postgres [.] startup_hacks
+ 78.90% 0.00% postgres postgres [.] PostmasterMain
+ 78.00% 0.00% postgres postgres [.] ServerLoop
+ 78.00% 0.00% postgres postgres [.] BackendStartup
+ 78.00% 0.00% postgres postgres [.] ExitPostmaster
+ 77.72% 0.24% postgres postgres [.] PostgresMain
+ 48.13% 0.13% postgres postgres [.] exec_execute_message
+ 47.16% 0.07% postgres postgres [.] PortalRun
+ 47.03% 0.11% postgres postgres [.] PortalRunMulti
+ 44.80% 0.07% postgres postgres [.] ProcessQuery
+ 29.89% 0.01% postgres postgres [.] ExecutorRun
+ 29.87% 0.06% postgres postgres [.] standard_ExecutorRun
+ 29.78% 0.04% postgres postgres [.] ExecutePlan
+ 29.51% 0.03% postgres postgres [.] ExecProcNode
+ 29.47% 0.04% postgres postgres [.] ExecProcNodeFirst
+ 29.41% 0.14% postgres postgres [.] ExecModifyTable
+ 20.70% 0.13% postgres postgres [.] ExecInsert
+ 12.84% 0.02% postgres postgres [.] finish_xact_command
+ 12.80% 0.05% postgres postgres [.] CommitTransactionCommand
+ 12.21% 0.08% postgres postgres [.] CommitTransaction
+ 12.19% 0.05% postgres postgres [.] ExecutorStart
+ 12.09% 0.04% postgres postgres [.] standard_ExecutorStart
+ 11.57% 0.11% postgres postgres [.] InitPlan
+ 10.91% 0.10% postgres postgres [.] ExecConstraints
+ 10.75% 0.07% postgres postgres [.] ExecRelCheck
+ 10.19% 0.16% postgres postgres [.] ExecInitNode
+ 10.04% 0.25% postgres postgres [.] ExecInitModifyTable
+ 10.03% 0.01% postgres [kernel.kallsyms] [k] system_call_fastpath
+ 9.07% 0.11% postgres postgres [.] heap_insert
+ 8.82% 0.11% postgres postgres [.] ExecInitCustomScan
+ 8.46% 0.22% postgres postgres [.] exec_bind_message
+ 8.32% 0.03% postgres postgres [.] ExecProcNode
+ 8.27% 0.06% postgres postgres [.] ExecCustomScan
+ 8.14% 0.09% postgres pg_pathman.so [.] partition_filter_exec
+ 7.07% 0.25% swapper [kernel.kallsyms] [k] cpu_startup_entry
+ 6.99% 0.00% swapper [kernel.kallsyms] [k] start_secondary
+ 6.45% 0.04% postgres postgres [.] stringToNode
+ 6.45% 0.04% postgres pg_pathman.so [.] select_partition_for_insert
+ 6.40% 0.16% postgres postgres [.] nodeRead
+ 6.15% 0.21% postgres postgres [.] parseNodeString
+ 6.02% 0.00% pgbench libpthread-2.17.so [.] start_thread
+ 5.98% 0.04% postgres postgres [.] _readBoolExpr
+ 5.92% 0.05% postgres pg_pathman.so [.] partition_filter_begin
+ 5.21% 0.08% postgres postgres [.] _readOpExpr
+ 4.77% 0.05% postgres postgres [.] XLogInsert
+ 4.74% 0.05% postgres [kernel.kallsyms] [k] sys_futex
+ 4.68% 0.06% postgres [kernel.kallsyms] [k] do_futex
+ 4.64% 0.24% postgres postgres [.] LWLockAcquire
+ 4.42% 4.38% postgres postgres [.] pg_strtok
+ 4.34% 0.10% postgres postgres [.] XLogInsertRecord
+ 4.22% 0.01% pgbench [kernel.kallsyms] [k] system_call_fastpath
+ 4.10% 0.08% postgres postgres [.] RecordTransactionCommit
perf report --stdio -g
分析
從現象看,PG 10 native partition應該是find_all_inheritors效率問題,relation鎖時間過長。效能只有pg_pathman的十幾分之一。
native partition 的罪魁禍首:
/*
* find_all_inheritors -
* Returns a list of relation OIDs including the given rel plus
* all relations that inherit from it, directly or indirectly.
* Optionally, it also returns the number of parents found for
* each such relation within the inheritance tree rooted at the
* given rel.
*
* The specified lock type is acquired on all child relations (but not on the
* given rel; caller should already have locked it). If lockmode is NoLock
* then no locks are acquired, but caller must beware of race conditions
* against possible DROPs of child relations.
*/
List *
find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
{
/* hash table for O(1) rel_oid -> rel_numparents cell lookup */
HTAB *seen_rels;
HASHCTL ctl;
List *rels_list,
*rel_numparents;
ListCell *l;
memset(&ctl, 0, sizeof(ctl));
ctl.keysize = sizeof(Oid);
ctl.entrysize = sizeof(SeenRelsEntry);
ctl.hcxt = CurrentMemoryContext;
seen_rels = hash_create("find_all_inheritors temporary table",
32, /* start small and extend */
&ctl,
HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
/*
* We build a list starting with the given rel and adding all direct and
* indirect children. We can use a single list as both the record of
* already-found rels and the agenda of rels yet to be scanned for more
* children. This is a bit tricky but works because the foreach() macro
* doesn`t fetch the next list element until the bottom of the loop.
*/
rels_list = list_make1_oid(parentrelId);
rel_numparents = list_make1_int(0);
foreach(l, rels_list)
{
Oid currentrel = lfirst_oid(l);
List *currentchildren;
ListCell *lc;
/* Get the direct children of this rel */
currentchildren = find_inheritance_children(currentrel, lockmode);
/*
* Add to the queue only those children not already seen. This avoids
* making duplicate entries in case of multiple inheritance paths from
* the same parent. (It`ll also keep us from getting into an infinite
* loop, though theoretically there can`t be any cycles in the
* inheritance graph anyway.)
*/
foreach(lc, currentchildren)
{
Oid child_oid = lfirst_oid(lc);
bool found;
SeenRelsEntry *hash_entry;
hash_entry = hash_search(seen_rels, &child_oid, HASH_ENTER, &found);
if (found)
{
/* if the rel is already there, bump number-of-parents counter */
lfirst_int(hash_entry->numparents_cell)++;
}
else
{
/* if it`s not there, add it. expect 1 parent, initially. */
rels_list = lappend_oid(rels_list, child_oid);
rel_numparents = lappend_int(rel_numparents, 1);
hash_entry->numparents_cell = rel_numparents->tail;
}
}
}
if (numparents)
*numparents = rel_numparents;
else
list_free(rel_numparents);
hash_destroy(seen_rels);
return rels_list;
}
native partition的執行計劃:
postgres=# explain (analyze,verbose,timing,costs,buffers) insert into t_range values (1);
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Insert on public.t_range (cost=0.00..0.01 rows=1 width=44) (actual time=0.060..0.060 rows=0 loops=1)
Buffers: shared read=2 dirtied=1
-> Result (cost=0.00..0.01 rows=1 width=44) (actual time=0.001..0.001 rows=1 loops=1)
Output: 1, NULL::text, NULL::timestamp without time zone
Planning time: 0.036 ms
Execution time: 6.330 ms
(6 rows)
pg_pathman的執行計劃:
postgres=# explain (analyze,verbose,timing,costs,buffers) insert into tbl_range values (1);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Insert on public.tbl_range (cost=0.00..0.01 rows=1 width=44) (actual time=0.214..0.214 rows=0 loops=1)
Buffers: shared hit=19
-> Custom Scan (PartitionFilter) (cost=0.00..0.01 rows=1 width=44) (actual time=0.067..0.068 rows=1 loops=1)
Output: id, info, crt_time
Buffers: shared hit=8
-> Result (cost=0.00..0.01 rows=1 width=44) (actual time=0.001..0.001 rows=1 loops=1)
Output: 1, NULL::text, NULL::timestamp without time zone
Planning time: 5.177 ms
Execution time: 0.287 ms
(9 rows)
參考
《PostgreSQL 10.0 preview 功能增強 – 內建分割槽表》
《[未完待續] PostgreSQL sharding 套件(pg_pathman, postgres_fdw, logical replication)》
《PostgreSQL 9.5+ 高效分割槽表實現 – pg_pathman》
《pg_pathman extension for postgresql partitioning》
相關文章
- PostgreSQL:內建分割槽表SQL
- Linux分割槽方案、分割槽建議Linux
- mongo 建立使用者 建hash 分割槽 建索引Go索引
- SSD固態硬碟要分割槽嗎?不同容量SSD固態硬碟分割槽方案建議硬碟
- oracle分割槽表和分割槽表exchangeOracle
- PostgreSQL/LightDB 分割槽表之分割槽裁剪SQL
- Linux 分割槽擴容(根分割槽擴容,SWAP 分割槽擴容,掛載新分割槽為目錄)Linux
- Oracle分割槽表基礎運維-07增加分割槽(2 HASH分割槽)Oracle運維
- oracle分割槽表和非分割槽表exchangeOracle
- 非分割槽錶轉換成分割槽表
- [oracle] expdp 匯出分割槽表的分割槽Oracle
- Oracle分割槽表基礎運維-07增加分割槽(1範圍分割槽)Oracle運維
- openGauss 分割槽
- mysql 分割槽MySql
- 分割槽Partition
- lvs 分割槽
- Kafka 分割槽Kafka
- Hive的靜態分割槽與動態分割槽Hive
- Linux 新增LVM分割槽及LVM分割槽擴容LinuxLVM
- MySql資料分割槽操作之新增分割槽操作MySql
- Oracle分割槽表基礎運維-07增加分割槽(3列表分割槽)Oracle運維
- Oracle12c:建立主分割槽、子分割槽,實現自動分割槽插入效果Oracle
- Oracle查詢Interval partition分割槽表內資料Oracle
- 【Linux】MBR磁碟分割槽表只能有四個分割槽?Linux
- PG的非分割槽表線上轉分割槽表
- 理解MySQL分割槽MySql
- linux分割槽方案Linux
- 建立sawp分割槽
- 範圍分割槽
- 配置FLASH分割槽
- 搞懂MySQL分割槽MySql
- 【MYSQL】 分割槽表MySql
- MySql建立分割槽MySql
- 磁碟分割槽管理
- linux硬碟分割槽Linux硬碟
- oracle 分割槽表move和包含分割槽表的lob moveOracle
- 移動分割槽表和分割槽索引的表空間索引
- Oracle分割槽表基礎運維-09刪除分割槽Oracle運維
- Oracle分割槽表基礎運維-05組合分割槽Oracle運維