PostgreSQLPostGISpointjoinpolygon(byST_xxxx)-pglz_decompress效能優化

pg小助手發表於2018-10-23

背景
在空間資料中,通常會有軌跡、點、面的資料,假設有兩張表,一張為面的表,一張為點的表,使用包含 ST_xxxx(c.geom, p.geom) 來進行JOIN(例如以面為單位,聚合統計點的數量)。

pic

本文介紹了空間JOIN的效能分析,瓶頸分析,優化方法。

原文
http://blog.cleverelephant.ca/2018/09/postgis-external-storage.html

例子
測試資料:

Setup

First download some polygons and some points.

Admin 0 – Countries

Populated Places

Load the shapes into your database.

shp2pgsql -s 4326 -D -I ne_10m_admin_0_countries.shp countries | psql performance

shp2pgsql -s 4326 -D -I ne_10m_populated_places.shp places | psql performance
包含大量POINT的空間物件

SELECT count(*)
FROM countries
WHERE ST_NPoints(geom) > (8192 / 16);
1、使用預設的壓縮格式時,這個空間JOIN查詢,耗時25秒。

SELECT count(*), c.name
FROM countries c
JOIN places p
ON ST_Intersects(c.geom, p.geom)
GROUP BY c.name;
使用PERF或oprofile跟蹤其耗時的程式碼

《PostgreSQL 程式碼效能診斷之 – OProfile & Systemtap》

《PostgreSQL 原始碼效能診斷(perf profiling)指南 – 珍藏級》

發現問題是解壓縮的pglz_decompress 介面造成的。

《TOAST,The Oversized-Attribute Storage Technique – 暨儲存格式main, extended, external, plain介紹》

2、將空間欄位改成非壓縮格式,耗時降到4秒。

— Change the storage type
ALTER TABLE countries
ALTER COLUMN geom
SET STORAGE EXTERNAL;

— Force the column to rewrite
UPDATE countries
SET geom = ST_SetSRID(geom, 4326);

vacuum full countries;

— Re-run the query
SELECT count(*), c.name
FROM countries c
JOIN places p
ON ST_Intersects(c.geom, p.geom)
GROUP BY c.name;
小結
1、程式碼層面的效能瓶頸分析方法,perf.

《PostgreSQL 原始碼效能診斷(perf profiling)指南 – 珍藏級》

2、PostGIS空間相關計算函式

http://postgis.net/docs/manual-dev/reference.html

3、資料庫表級儲存格式包括4種:

對於定長的欄位型別,儲存格式如下:

PLAIN
prevents either compression or out-of-line storage; furthermore it disables use of single-byte headers for varlena types. This is the only possible strategy for columns of non-TOAST-able data types.
對於變長的欄位型別,除了可以使用PLAIN格式,還可以使用如下儲存格式:

EXTENDED
allows both compression and out-of-line storage.
This is the default for most TOAST-able data types.
Compression will be attempted first, then out-of-line storage if the row is still too big.

EXTERNAL
allows out-of-line storage but not compression.
Use of EXTERNAL will make substring operations on wide text and bytea columns faster (at the penalty of increased storage space) because these operations are optimized to fetch only the required parts of the out-of-line value when it is not compressed.

MAIN
allows compression but not out-of-line storage.
(Actually, out-of-line storage will still be performed for such columns, but only as a last resort when there is no other way to make the row small enough to fit on a page.)
4、本文發現的瓶頸為變長欄位,壓縮後,解壓縮的pglz_decompress 介面,所以將欄位的儲存格式改為非壓縮格式,即提升了大量的效能。

參考
http://blog.cleverelephant.ca/2018/09/postgis-external-storage.html

http://postgis.net/docs/manual-dev/reference.html

《TOAST,The Oversized-Attribute Storage Technique – 暨儲存格式main, extended, external, plain介紹》

《PostgreSQL 原始碼效能診斷(perf profiling)指南 – 珍藏級》

《PostgreSQL 程式碼效能診斷之 – OProfile & Systemtap》
轉自阿里雲德哥


相關文章