PostgreSQL空間切割(st_split)功能擴充套件-空間物件網格化

德哥發表於2017-10-28

標籤

PostgreSQL , PostGIS , st_split , ST_GeometryN , ST_NumGeometries , ST_XMax , ST_XMin , ST_YMax , ST_YMin , UDF , st_area , ST_MakeBox2D, 面積佔比


背景

前面介紹了空間包含(st_contains, st_within)搜尋降CPU的優化方法,將長條形(相對於BOUND BOX空間佔比很小)的物件切分成多個空間物件,提升相對於bound box的空間佔比,從而減少掃描範圍,提升命中率。

《PostgreSQL 空間st_contains,st_within空間包含搜尋優化 – 降IO和降CPU(bound box)》

這種優化方法的關鍵是SPLIT,PostGIS提供了一種split函式,但是隻能支援一次切兩片,本文提供一種方法,可以根據使用者的需求進行自由切割。(輸入被切割的目標物件,橫向切割多少刀,縱向切割多少刀,面積佔比高於多少時不切割。)

八星八箭有木有:

pic

空間split,目的是降低無效面積,看看這幅無效面積有多大吧,嚇不嚇人?

pic

空間切割示例

切割邏輯

1、輸入被切割的目標物件,橫向切割多少刀,縱向切割多少刀,面積佔比高於多少時不切割。

2、計算目標物件面積。

2、獲取目標物件的bound box邊界。計算BOUND BOX面積。

3、判斷是否需要切割。

求橫向切割線

切割

4、求切割後的geometry coll有幾個物件。

5、轉換為geo陣列輸出。

切割程式碼

create or replace function split_geo(  
  i_geo geometry,   -- 被切割的目標物件  
  i_srid int,      -- SRID  
  i_x int2,         -- X方向切多少刀  
  i_y int2,         -- Y方向切多少刀  
  i_aratio float4   -- 面積佔比閾值,高於它則不切割  
)   
returns geometry[] as $$  
declare  
  res geometry[];         -- 結果  
  tmp_geo geometry;       -- 切割後的臨時物件(geometry collection)  
  split_geos geometry[];  -- 切割線陣列  
  split_line geometry;    -- 線段  
  v_area_obj float8;      -- 目標物件面積  
  v_area_box float8;      -- 目標物件的bound box的面積  
  v_xmin float8 := ST_XMin(i_geo);          -- 目標物件BOUND BOX,XMIN  
  v_ymin float8 := ST_YMin(i_geo);          -- 目標物件BOUND BOX,YMIN  
  v_xmax float8 := ST_XMax(i_geo);          -- 目標物件BOUND BOX,XMAX  
  v_ymax float8 := ST_YMax(i_geo);          -- 目標物件BOUND BOX,YMAX  
  v_box geometry;         -- 目標物件的BOUND BOX  
  x_geo geometry;         -- 分解geometry collection臨時物件  
begin  
  -- 求邊界  
  v_box := st_setsrid(ST_MakeBox2D(st_makepoint(v_xmin,v_ymin), st_makepoint(v_xmax,v_ymax)),i_srid);  
    
  -- 求面積  
  v_area_obj := st_area(i_geo);  
  v_area_box := st_area(v_box);  
  
  -- split 前的空間佔比  
  raise notice `%`, (v_area_obj/v_area_box);  
    
  -- 計算面積佔比,判斷是否需要切割  
  if (v_area_obj/v_area_box) > i_aratio then    
    -- 大於面積比,不切割  
    return array[i_geo];  
  else  
    -- 計算切割線段X位點  
    for i in 1..i_x  
    loop  
      split_geos := coalesce  
        (   
        array_append(split_geos, st_setsrid(st_makeline(st_makepoint(v_xmin+i*((v_xmax-v_xmin)/(i_x+1)), v_ymin), st_makepoint(v_xmin+i*((v_xmax-v_xmin)/(i_x+1)), v_ymax)), i_srid))  
        ,   
        array[st_setsrid(st_makeline(st_makepoint(v_xmin+i*((v_xmax-v_xmin)/(i_x+1)), v_ymin), st_makepoint(v_xmin+i*((v_xmax-v_xmin)/(i_x+1)), v_ymax)), i_srid)]   
        );   
    end loop;  
  
    -- 計算切割線段Y位點  
    for i in 1..i_y  
    loop  
      split_geos := coalesce  
        (   
        array_append(split_geos, st_setsrid(st_makeline(st_makepoint(v_xmin, v_ymin+i*((v_ymax-v_ymin)/(i_y+1))), st_makepoint(v_xmax, v_ymin+i*((v_ymax-v_ymin)/(i_y+1)))), i_srid))  
        ,   
        array[st_setsrid(st_makeline(st_makepoint(v_xmin, v_ymin+i*((v_ymax-v_ymin)/(i_y+1))), st_makepoint(v_xmax, v_ymin+i*((v_ymax-v_ymin)/(i_y+1)))), i_srid)]   
        );   
    end loop;  
  
    -- 切割  
    foreach split_line in array split_geos  
    loop  
      tmp_geo := coalesce  
        (  
	st_split(tmp_geo, split_line)  
	,  
        st_split(i_geo, split_line)  
	);  
    end loop;  
  end if;  
  
  -- 將geometry collection轉換為geometry陣列  
  for i in 1..ST_NumGeometries(tmp_geo)  
  loop  
    res := coalesce(array_append(res, ST_GeometryN(tmp_geo, i)), array[ST_GeometryN(tmp_geo, i)]);  
  end loop;  
  
  -- split 後的空間佔比  
  v_area_obj := 0;  
  v_area_box := 0;  
  
  foreach x_geo in array res  
  loop  
    v_area_obj := v_area_obj + st_area(x_geo);  
    v_area_box := v_area_box + st_area(st_setsrid(ST_MakeBox2D(st_makepoint(st_xmin(x_geo),st_ymin(x_geo)), st_makepoint(st_xmax(x_geo),st_ymax(x_geo))),i_srid));  
  end loop;  
  
  -- split 後的空間佔比  
  raise notice `%`, (v_area_obj/v_area_box);  
  
  return res;  
end;  
$$ language plpgsql strict immutable;  

驗證切割

1、橫向縱向各切2刀,最多得到9個物件。(當刀下去後沒有切到有效部位時不返回,因此可能少於9個物件。)

select st_astext(  
unnest(  
split_geo(  
  st_setsrid(st_makepolygon(ST_GeomFromText(`LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)`)), 4326),  
  4326::int4,  
  2::int2,  
  2::int2,  
  0.9::float4  
)));  
  
  
NOTICE:  0.242857142857143  
NOTICE:  0.818181818181818  
                                                        st_astext                                                          
-------------------------------------------------------------------------------------------------------------------------  
 POLYGON((2.33333333333333 2.5,1 2.5,1 1.66666666666667,0 1.66666666666667,0 3,2.33333333333333 3,2.33333333333333 2.5))  
 POLYGON((1 1.66666666666667,1 0,0 0,0 1.66666666666667,1 1.66666666666667))  
 POLYGON((2.33333333333333 3,4.66666666666667 3,4.66666666666667 2.5,2.33333333333333 2.5,2.33333333333333 3))  
 POLYGON((4.66666666666667 3,5 3,5 3.33333333333333,6 3.33333333333333,6 2.5,4.66666666666667 2.5,4.66666666666667 3))  
 POLYGON((5 3.33333333333333,5 5,7 5,7 4,6 4,6 3.33333333333333,5 3.33333333333333))  
(5 rows)  

切割前後的面積佔比分別是24%和82%,一下子提升了好多,如果使用這幾個拼接,可以少掃描若干個資料塊。

2、橫向縱向各切2刀,最多得到9個物件。(當刀下去後沒有切到有效部位時不返回,因此可能少於9個物件。)

切割前後的面積佔比分別是24%和98%,一下子提升了好多,如果使用這幾個拼接,可以少掃描若干個資料塊。

postgres=# select st_astext(  
unnest(  
split_geo(  
  st_setsrid(st_makepolygon(ST_GeomFromText(`LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)`)), 4326),  
  4326::int4,  
  4::int2,  
  4::int2,  
  0.9::float4  
)));  
  
  
NOTICE:  0.242857142857143  
NOTICE:  0.977011494252874  
                     st_astext                        
----------------------------------------------------  
 POLYGON((1.4 2.5,1 2.5,1 2,0 2,0 3,1.4 3,1.4 2.5))  
 POLYGON((1 2,1 1,0 1,0 2,1 2))  
 POLYGON((1 1,1 0,0 0,0 1,1 1))  
 POLYGON((1.4 3,2.8 3,2.8 2.5,1.4 2.5,1.4 3))  
 POLYGON((2.8 3,4.2 3,4.2 2.5,2.8 2.5,2.8 3))  
 POLYGON((4.2 3,5 3,5.6 3,5.6 2.5,4.2 2.5,4.2 3))  
 POLYGON((5 3,5 4,5.6 4,5.6 3,5 3))  
 POLYGON((5 4,5 5,5.6 5,5.6 4,5 4))  
 POLYGON((5.6 5,7 5,7 4,6 4,5.6 4,5.6 5))  
 POLYGON((6 4,6 3,5.6 3,5.6 4,6 4))  
 POLYGON((6 3,6 2.5,5.6 2.5,5.6 3,6 3))  
(11 rows)  

切割前後優化對比

不切割,掃描1648個資料塊,過濾26590條無效資料。

explain (analyze,verbose,timing,costs,buffers)  
select * from f where st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText(`LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)`)), 4326));  
  
  
 Index Scan using idx_f on public.f  (cost=0.42..15026.72 rows=3333 width=40) (actual time=1.519..35.773 rows=8491 loops=1)  
   Output: id, pos  
   Index Cond: (`0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000`::geometry ~ f.pos)  
   Filter: _st_contains(`0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000`::geometry, f.pos)  
   Rows Removed by Filter: 26590  
   Buffers: shared hit=1648  
 Planning time: 0.274 ms  
 Execution time: 36.212 ms  

切割,掃描610個資料塊,過濾1932條無效資料。

explain (analyze,verbose,timing,costs,buffers)  
select * from f where st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText(`LINESTRING(2.33333333333333 2.5,1 2.5,1 1.66666666666667,0 1.66666666666667,0 3,2.33333333333333 3,2.33333333333333 2.5)`)), 4326))  
union all  
select * from f where st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText(`LINESTRING(1 1.66666666666667,1 0,0 0,0 1.66666666666667,1 1.66666666666667)`)), 4326))  
union all  
select * from f where st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText(`LINESTRING(2.33333333333333 3,4.66666666666667 3,4.66666666666667 2.5,2.33333333333333 2.5,2.33333333333333 3)`)), 4326))  
union all  
select * from f where st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText(`LINESTRING(4.66666666666667 3,5 3,5 3.33333333333333,6 3.33333333333333,6 2.5,4.66666666666667 2.5,4.66666666666667 3)`)), 4326))  
union all  
select * from f where st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText(`LINESTRING(5 3.33333333333333,5 5,7 5,7 4,6 4,6 3.33333333333333,5 3.33333333333333)`)), 4326))  
;  
  
  
 Append  (cost=0.42..75300.24 rows=16665 width=40) (actual time=0.113..11.690 rows=8491 loops=1)  
   Buffers: shared hit=610  
   ->  Index Scan using idx_f on public.f  (cost=0.42..15026.72 rows=3333 width=40) (actual time=0.113..3.365 rows=2053 loops=1)  
         Output: f.id, f.pos  
         Index Cond: (`0103000020E61000000100000007000000A3AAAAAAAAAA02400000000000000440000000000000F03F0000000000000440000000000000F03FBAAAAAAAAAAAFA3F0000000000000000BAAAAAAAAAAAFA3F00000000000000000000000000000840A3AAAAAAAAAA02400000000000000840A3AAAAAAAAAA02400000000000000440`::geometry ~ f.pos)  
         Filter: _st_contains(`0103000020E61000000100000007000000A3AAAAAAAAAA02400000000000000440000000000000F03F0000000000000440000000000000F03FBAAAAAAAAAAAFA3F0000000000000000BAAAAAAAAAAAFA3F00000000000000000000000000000840A3AAAAAAAAAA02400000000000000840A3AAAAAAAAAA02400000000000000440`::geometry, f.pos)  
         Rows Removed by Filter: 1142  
         Buffers: shared hit=189  
   ->  Index Scan using idx_f on public.f f_1  (cost=0.42..15026.72 rows=3333 width=40) (actual time=0.084..1.734 rows=1699 loops=1)  
         Output: f_1.id, f_1.pos  
         Index Cond: (`0103000020E61000000100000005000000000000000000F03FBAAAAAAAAAAAFA3F000000000000F03F0000000000000000000000000000000000000000000000000000000000000000BAAAAAAAAAAAFA3F000000000000F03FBAAAAAAAAAAAFA3F`::geometry ~ f_1.pos)  
         Filter: _st_contains(`0103000020E61000000100000005000000000000000000F03FBAAAAAAAAAAAFA3F000000000000F03F0000000000000000000000000000000000000000000000000000000000000000BAAAAAAAAAAAFA3F000000000000F03FBAAAAAAAAAAAFA3F`::geometry, f_1.pos)  
         Buffers: shared hit=92  
   ->  Index Scan using idx_f on public.f f_2  (cost=0.42..15026.72 rows=3333 width=40) (actual time=0.075..1.283 rows=1158 loops=1)  
         Output: f_2.id, f_2.pos  
         Index Cond: (`0103000020E61000000100000005000000A3AAAAAAAAAA02400000000000000840AEAAAAAAAAAA12400000000000000840AEAAAAAAAAAA12400000000000000440A3AAAAAAAAAA02400000000000000440A3AAAAAAAAAA02400000000000000840`::geometry ~ f_2.pos)  
         Filter: _st_contains(`0103000020E61000000100000005000000A3AAAAAAAAAA02400000000000000840AEAAAAAAAAAA12400000000000000840AEAAAAAAAAAA12400000000000000440A3AAAAAAAAAA02400000000000000440A3AAAAAAAAAA02400000000000000840`::geometry, f_2.pos)  
         Buffers: shared hit=74  
   ->  Index Scan using idx_f on public.f f_3  (cost=0.42..15026.72 rows=3333 width=40) (actual time=0.095..1.214 rows=986 loops=1)  
         Output: f_3.id, f_3.pos  
         Index Cond: (`0103000020E61000000100000007000000AEAAAAAAAAAA12400000000000000840000000000000144000000000000008400000000000001440A3AAAAAAAAAA0A400000000000001840A3AAAAAAAAAA0A4000000000000018400000000000000440AEAAAAAAAAAA12400000000000000440AEAAAAAAAAAA12400000000000000840`::geometry ~ f_3.pos)  
         Filter: _st_contains(`0103000020E61000000100000007000000AEAAAAAAAAAA12400000000000000840000000000000144000000000000008400000000000001440A3AAAAAAAAAA0A400000000000001840A3AAAAAAAAAA0A4000000000000018400000000000000440AEAAAAAAAAAA12400000000000000440AEAAAAAAAAAA12400000000000000840`::geometry, f_3.pos)  
         Rows Removed by Filter: 127  
         Buffers: shared hit=68  
   ->  Index Scan using idx_f on public.f f_4  (cost=0.42..15026.72 rows=3333 width=40) (actual time=0.105..3.331 rows=2595 loops=1)  
         Output: f_4.id, f_4.pos  
         Index Cond: (`0103000020E610000001000000070000000000000000001440A3AAAAAAAAAA0A40000000000000144000000000000014400000000000001C4000000000000014400000000000001C400000000000001040000000000000184000000000000010400000000000001840A3AAAAAAAAAA0A400000000000001440A3AAAAAAAAAA0A40`::geometry ~ f_4.pos)  
         Filter: _st_contains(`0103000020E610000001000000070000000000000000001440A3AAAAAAAAAA0A40000000000000144000000000000014400000000000001C4000000000000014400000000000001C400000000000001040000000000000184000000000000010400000000000001840A3AAAAAAAAAA0A400000000000001440A3AAAAAAAAAA0A40`::geometry, f_4.pos)  
         Rows Removed by Filter: 663  
         Buffers: shared hit=187  
 Planning time: 0.397 ms  
 Execution time: 12.150 ms  
(32 rows)  

PostgreSQL支援 op anysomeall(array)的操作,因此我們可以將SQL簡化成這樣,9宮格切割,掃描的資料塊降低到278,過濾記錄降到了1932:

explain (analyze,verbose,timing,costs,buffers) select * from f where pos @ any(split_geo(  
  st_setsrid(st_makepolygon(ST_GeomFromText(`LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)`)), 4326),  
  4326::int4,  
  2::int2,  
  2::int2,  
  0.9::float4  
))
and
st_within(pos, st_setsrid(st_makepolygon(ST_GeomFromText(`LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)`)), 4326));

NOTICE:  0.242857142857143
NOTICE:  0.818181818181818


 Bitmap Heap Scan on public.f  (cost=9.09..87.16 rows=17 width=40) (actual time=2.270..10.578 rows=8491 loops=1)
   Output: id, pos
   Recheck Cond: ((f.pos @ ANY (`{0103000020E61000000100000007000000ABAAAAAAAAAA02400000000000000440000000000000F03F0000000000000440000000000000F03FABAAAAAAAAAAFA3F0000000000000000ABAAAAAAAAAAFA3F00000000000000000000000000000840ABAAAAAAAAAA02400000000000000840ABAAAAAAAAAA02400000000000000440:0103000020E61000000100000005000000000000000000F03FABAAAAAAAAAAFA3F000000000000F03F0000000000000000000000000000000000000000000000000000000000000000ABAAAAAAAAAAFA3F000000000000F03FABAAAAAAAAAAFA3F:0103000020E61000000100000005000000ABAAAAAAAAAA02400000000000000840ABAAAAAAAAAA12400000000000000840ABAAAAAAAAAA12400000000000000440ABAAAAAAAAAA02400000000000000440ABAAAAAAAAAA02400000000000000840:0103000020E61000000100000007000000ABAAAAAAAAAA12400000000000000840000000000000144000000000000008400000000000001440ABAAAAAAAAAA0A400000000000001840ABAAAAAAAAAA0A4000000000000018400000000000000440ABAAAAAAAAAA12400000000000000440ABAAAAAAAAAA12400000000000000840:0103000020E610000001000000070000000000000000001440ABAAAAAAAAAA0A40000000000000144000000000000014400000000000001C4000000000000014400000000000001C400000000000001040000000000000184000000000000010400000000000001840ABAAAAAAAAAA0A400000000000001440ABAAAAAAAAAA0A40}`::geometry[])) AND (`0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000`::geometry ~ f.pos))
   Filter: _st_contains(`0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000`::geometry, f.pos)
   Rows Removed by Filter: 1932
   Heap Blocks: exact=133
   Buffers: shared hit=278
   ->  Bitmap Index Scan on idx_f  (cost=0.00..9.09 rows=50 width=0) (actual time=2.244..2.244 rows=10423 loops=1)
         Index Cond: ((f.pos @ ANY (`{0103000020E61000000100000007000000ABAAAAAAAAAA02400000000000000440000000000000F03F0000000000000440000000000000F03FABAAAAAAAAAAFA3F0000000000000000ABAAAAAAAAAAFA3F00000000000000000000000000000840ABAAAAAAAAAA02400000000000000840ABAAAAAAAAAA02400000000000000440:0103000020E61000000100000005000000000000000000F03FABAAAAAAAAAAFA3F000000000000F03F0000000000000000000000000000000000000000000000000000000000000000ABAAAAAAAAAAFA3F000000000000F03FABAAAAAAAAAAFA3F:0103000020E61000000100000005000000ABAAAAAAAAAA02400000000000000840ABAAAAAAAAAA12400000000000000840ABAAAAAAAAAA12400000000000000440ABAAAAAAAAAA02400000000000000440ABAAAAAAAAAA02400000000000000840:0103000020E61000000100000007000000ABAAAAAAAAAA12400000000000000840000000000000144000000000000008400000000000001440ABAAAAAAAAAA0A400000000000001840ABAAAAAAAAAA0A4000000000000018400000000000000440ABAAAAAAAAAA12400000000000000440ABAAAAAAAAAA12400000000000000840:0103000020E610000001000000070000000000000000001440ABAAAAAAAAAA0A40000000000000144000000000000014400000000000001C4000000000000014400000000000001C400000000000001040000000000000184000000000000010400000000000001840ABAAAAAAAAAA0A400000000000001440ABAAAAAAAAAA0A40}`::geometry[])) AND (`0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000`::geometry ~ f.pos))
         Buffers: shared hit=145
 Planning time: 2.828 ms
 Execution time: 11.024 ms
(12 rows)

小結

1、split後,有效面積佔比提升,從而降低了無效資料掃描帶來的IO和CPU開銷,提升了效能。

2、split過多,不同分片的GiST索引的非葉子節點可能被重複掃描,不過基本上都能在CACHE命中的話,可以不CARE這部分開銷。

參考

http://postgis.net/docs/manual-2.4/ST_Split.html

http://postgis.net/docs/manual-2.4/ST_XMin.html

http://postgis.net/docs/manual-2.4/ST_XMax.html

http://postgis.net/docs/manual-2.4/ST_YMin.html

http://postgis.net/docs/manual-2.4/ST_XMax.html

http://postgis.net/docs/manual-2.4/ST_GeometryN.html

http://postgis.net/docs/manual-2.4/ST_NumGeometries.html

http://postgis.net/docs/manual-2.4/ST_Area.html

http://postgis.net/docs/manual-2.4/ST_MakeBox2D.html

http://postgis.net/docs/manual-2.4/ST_SetSRID.html

《PostgreSQL 空間st_contains,st_within空間包含搜尋優化 – 降IO和降CPU(bound box)》

本文提到的切割方法還是比較粗糙的,更好的切割演算法我們可以繼續探討。


相關文章