PostgreSQLcube外掛-多維空間物件

德哥發表於2017-09-11

背景

CUBE是一個多維資料型別，支援兩種多維型別：多維POINT、區間（左下+右上）。以及這些幾何物件的幾何特性搜尋和計算（方位搜尋、距離計算），這些搜尋都支援GiST索引。

我們甚至可以將多個欄位合併成多維POINT，實現對大量資料的高效空間聚集、空間計算。

語法

External	Syntax Meaning
x	點，A one-dimensional point (or, zero-length one-dimensional interval)
(x)	Same as above
x1,x2,…,xn	點，A point in n-dimensional space, represented internally as a zero-volume cube
(x1,x2,…,xn)	Same as above
(x),(y)	一維區間（線段）（兩個括號隔開，分別表示每個維度的最小值(左括號)和最大值(右括號)），A one-dimensional interval starting at x and ending at y or vice versa; the order does not matter
[(x),(y)]	Same as above
(x1,…,xn),(y1,…,yn)	多維區間（平面方形、立方體、多維CUBE）（兩個括號隔開，分別表示每個維度的最小值(左括號)和最大值(右括號)）An n-dimensional cube represented by a pair of its diagonally opposite corners
[(x1,…,xn),(y1,…,yn)]	Same as above

操作符

Operator	Result	Description
a = b	boolean	The cubes a and b are identical.
a && b	boolean	The cubes a and b overlap.
a @> b	boolean	The cube a contains the cube b.
a <@ b	boolean	The cube a is contained in the cube b.
a < b	boolean	The cube a is less than the cube b.
a <= b	boolean	The cube a is less than or equal to the cube b.
a > b	boolean	The cube a is greater than the cube b.
a >= b	boolean	The cube a is greater than or equal to the cube b.
a <> b	boolean	The cube a is not equal to the cube b.
a -> n	float8	Get n-th coordinate(座標) of cube (counting from 1).
a ~> n	float8	Get n-th coordinate in “normalized” cube representation, in which the coordinates have been rearranged into the form “lower left — upper right”; that is, the smaller endpoint along each dimension appears first.
a <-> b	float8	歐幾里得距離。Euclidean distance between a and b.
a <#> b	float8	分別每個座標的距離，求和。Taxicab (L-1 metric) distance between a and b.
a <=> b	float8	分別每個座標的距離，取最大值。Chebyshev (L-inf metric) distance between a and b.

函式

Function	Result	Description	Example
cube(float8)	cube	Makes a one dimensional cube with both coordinates the same.	cube(1) == `(1)`
cube(float8, float8)	cube	Makes a one dimensional cube.	cube(1,2) == `(1),(2)`
cube(float8[])	cube	Makes a zero-volume cube using the coordinates defined by the array.	cube(ARRAY[1,2]) == `(1,2)`
cube(float8[], float8[])	cube	Makes a cube with upper right and lower left coordinates as defined by the two arrays, which must be of the same length.	cube(ARRAY[1,2], ARRAY[3,4]) == `(1,2),(3,4)`
cube(cube, float8)	cube	Makes a new cube by adding a dimension on to an existing cube, with the same values for both endpoints of the new coordinate. This is useful for building cubes piece by piece from calculated values.	cube(`(1,2),(3,4)`::cube, 5) == `(1,2,5),(3,4,5)`
cube(cube, float8, float8)	cube	Makes a new cube by adding a dimension on to an existing cube. This is useful for building cubes piece by piece from calculated values.	cube(`(1,2),(3,4)`::cube, 5, 6) == `(1,2,5),(3,4,6)`
cube_dim(cube)	integer	Returns the number of dimensions of the cube.	cube_dim(`(1,2),(3,4)`) == `2`
cube_ll_coord(cube, integer)	float8	Returns the n-th coordinate value for the lower left corner of the cube.	cube_ll_coord(`(1,2),(3,4)`, 2) == `2`
cube_ur_coord(cube, integer)	float8	Returns the n-th coordinate value for the upper right corner of the cube.	cube_ur_coord(`(1,2),(3,4)`, 2) == `4`
cube_is_point(cube)	boolean	Returns true if the cube is a point, that is, the two defining corners are the same.	–
cube_distance(cube, cube)	float8	Returns the distance between two cubes. If both cubes are points, this is the normal distance function.	–
cube_subset(cube, integer[])	cube	Makes a new cube from an existing cube, using a list of dimension indexes from an array. Can be used to extract the endpoints of a single dimension, or to drop dimensions, or to reorder them as desired.	cube_subset(cube(`(1,3,5),(6,7,8)`), ARRAY[2]) == `(3),(7)` cube_subset(cube(`(1,3,5),(6,7,8)`), ARRAY[3,2,1,1]) == `(5,3,1,1),(8,7,6,6)`
cube_union(cube, cube)	cube	Produces the union of two cubes.	–
cube_inter(cube, cube)	cube	Produces the intersection of two cubes.	–
cube_enlarge(c cube, r double, n integer)	cube	Increases the size of the cube by the specified radius r in at least n dimensions. If the radius is negative the cube is shrunk instead. All defined dimensions are changed by the radius r. Lower-left coordinates are decreased by r and upper-right coordinates are increased by r. If a lower-left coordinate is increased to more than the corresponding upper-right coordinate (this can only happen when r < 0) than both coordinates are set to their average. If n is greater than the number of defined dimensions and the cube is being enlarged (r > 0), then extra dimensions are added to make n altogether; 0 is used as the initial value for the extra coordinates. This function is useful for creating bounding boxes around a point for searching for nearby points.	cube_enlarge(`(1,2),(3,4)`, 0.5, 3) == `(0.5,1.5,-0.5),(3.5,4.5,0.5)`

例子

1、向量聚合（類似多維聚集）

https://github.com/umitanuki/kmeans-postgresql

2、4維（包含）的聚集分析

http://postgis.net/docs/manual-2.3/ST_ClusterKMeans.html

3、求多維點的距離

歐幾里得距離。  
  
postgres=# select `(1,2,3,4)`::cube <-> `(2,2,3,10)`::cube ;  
     ?column?       
------------------  
 6.08276253029822  
(1 row)  
  
分別每個座標的距離，取最大值。  
  
postgres=# select `(1,2,3,4)`::cube <=> `(2,2,3,10)`::cube ;  
 ?column?   
----------  
        6  
(1 row)  
  
分別每個座標的距離，求和。  
  
postgres=# select `(1,2,3,4)`::cube <#> `(2,2,3,10)`::cube ;  
 ?column?   
----------  
        7  
(1 row)

4、按距離排序，輸出附近的多維點。

SELECT c FROM test ORDER BY c <-> cube(array[0.5,0.5,0.5]) LIMIT 1;

5、假設某個表有多個數值型別欄位，基於這幾個欄位構建CUBE，建立gist表示式索引。將來可以根據這個按距離高速檢索附近的多維點（記錄）。

postgres=# create index idx on tbl_tmp using gist (cube(array[c1,c3,c4,c5]));

6、針對以上索引，我們可以對資料進行聚集儲存，實現高效過濾。

create table tbl(c1 int, c2 int, c3 numeric, c4 float4, c5 int, c6 int);

insert into tbl select random()*1000, random()*1000000, random()*100000000, random()*100000, random()*1000000, random()*100 from generate_series(1,10000000);

create index idx_tbl_1 on tbl using gist(cube(array[c1::float8,c2::float8,c3::float8,c4::float8,c5::float8,c6::float8]));

create index idx_tbl_2 on tbl using brin(c1,c2,c3,c4,c5,c6);

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c5 between 1 and 10  and c4 between 1 and 5;
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.tbl  (cost=8.51..218370.51 rows=1 width=31) (actual time=596.689..596.689 rows=0 loops=1)
   Output: c1, c2, c3, c4, c5, c6
   Recheck Cond: ((tbl.c4 >= `1`::double precision) AND (tbl.c4 <= `5`::double precision) AND (tbl.c5 >= 1) AND (tbl.c5 <= 10))
   Rows Removed by Index Recheck: 4980743
   Heap Blocks: lossy=9146
   Buffers: shared hit=9152
   ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..8.51 rows=10000000 width=0) (actual time=0.229..0.229 rows=92160 loops=1)
         Index Cond: ((tbl.c4 >= `1`::double precision) AND (tbl.c4 <= `5`::double precision) AND (tbl.c5 >= 1) AND (tbl.c5 <= 10))
         Buffers: shared hit=6
 Planning time: 0.126 ms
 Execution time: 596.727 ms
(11 rows)
  
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c5 between 1 and 10  and c6 between 1 and 5;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.tbl  (cost=6.25..120154.09 rows=1 width=31) (actual time=106.353..609.540 rows=2 loops=1)
   Output: c1, c2, c3, c4, c5, c6
   Recheck Cond: ((tbl.c5 >= 1) AND (tbl.c5 <= 10) AND (tbl.c6 >= 1) AND (tbl.c6 <= 5))
   Rows Removed by Index Recheck: 5399033
   Heap Blocks: lossy=9914
   Buffers: shared hit=9916
   ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..6.25 rows=5089292 width=0) (actual time=0.207..0.207 rows=99840 loops=1)
         Index Cond: ((tbl.c5 >= 1) AND (tbl.c5 <= 10) AND (tbl.c6 >= 1) AND (tbl.c6 <= 5))
         Buffers: shared hit=2
 Planning time: 0.113 ms
 Execution time: 609.588 ms
(11 rows)

設定聚集儲存

postgres=# cluster tbl USING idx_tbl_1;

聚集後的效果

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c5 between 1 and 10  and c4 between 1 and 5;
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.tbl  (cost=8.51..218375.51 rows=1 width=31) (actual time=219.648..219.648 rows=0 loops=1)
   Output: c1, c2, c3, c4, c5, c6
   Recheck Cond: ((tbl.c4 >= `1`::double precision) AND (tbl.c4 <= `5`::double precision) AND (tbl.c5 >= 1) AND (tbl.c5 <= 10))
   Rows Removed by Index Recheck: 1881220
   Heap Blocks: lossy=3456
   Buffers: shared hit=3458
   ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..8.51 rows=10000000 width=0) (actual time=0.133..0.133 rows=34560 loops=1)
         Index Cond: ((tbl.c4 >= `1`::double precision) AND (tbl.c4 <= `5`::double precision) AND (tbl.c5 >= 1) AND (tbl.c5 <= 10))
         Buffers: shared hit=2
 Planning time: 0.134 ms
 Execution time: 219.685 ms
(11 rows)

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c5 between 1 and 10  and c6 between 1 and 5;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.tbl  (cost=6.25..120159.09 rows=1 width=31) (actual time=43.253..315.421 rows=2 loops=1)
   Output: c1, c2, c3, c4, c5, c6
   Recheck Cond: ((tbl.c5 >= 1) AND (tbl.c5 <= 10) AND (tbl.c6 >= 1) AND (tbl.c6 <= 5))
   Rows Removed by Index Recheck: 2857135
   Heap Blocks: lossy=5248
   Buffers: shared hit=5250
   ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..6.25 rows=5089292 width=0) (actual time=0.147..0.147 rows=52480 loops=1)
         Index Cond: ((tbl.c5 >= 1) AND (tbl.c5 <= 10) AND (tbl.c6 >= 1) AND (tbl.c6 <= 5))
         Buffers: shared hit=2
 Planning time: 0.111 ms
 Execution time: 315.462 ms
(11 rows)

參考

https://www.postgresql.org/docs/10/static/cube.html

http://postgis.net/docs/manual-2.3/ST_ClusterKMeans.html

https://github.com/umitanuki/kmeans-postgresql

IMPDP 多個表空間物件匯入到一個表空間中
2011-03-01
物件
Inception Effect for Mac(盜夢空間鏡面效果fcpx外掛)
2021-12-30
Mac
平行宇宙構建-遊戲中的多維空間探索
2021-10-14
遊戲
[外掛擴充套件]微信二維碼展示外掛
2019-05-11
套件
時間線外掛
2014-04-04
[外掛擴充套件]跪求時間軸外掛
2019-05-11
套件
精讀《外掛化思維》
2019-03-04
前端外掛一：jQuery Multi-Select多選外掛
2016-11-24
前端jQuery
oracle表空間管理維護
2012-03-22
Oracle
[外掛擴充套件]工作時間軸（新外掛後臺）
2019-05-11
套件
時間選擇外掛ClockPicker
2017-12-26
KPI
parted掛載硬碟剩餘空間
2016-06-22
硬碟
LeetCode 如何不使用額外空間去更新二維陣列
2018-03-24
LeetCode陣列
RxJava 沉思錄（二）：空間維度
2018-09-05
RxJava
Oracle表空間維護總結
2017-03-31
Oracle
oracle之 SYSAUX表空間維護
2018-01-12
OracleUX
Numpy 高維空間中的軸
2024-03-01
Oracle檢視物件佔用空間
2015-07-22
Oracle物件
11.管理物件空間(筆記)
2006-04-15
物件筆記
php多維陣列去除空元素
2016-11-03
PHP陣列
龍主機-國外PHP空間
2019-05-11
PHP
jQuery的事件機制，事件物件介紹，外掛機制，多庫共存，each（）
2021-09-09
jQuery事件物件
[外掛擴充套件]jQuery二維碼外掛0.2【更新瀏覽器相容】
2019-05-11
套件jQuery瀏覽器
Jquery多選資料字典外掛
2012-04-10
jQuery
如何用物件導向的思維去封裝一個小型輪播圖外掛
2018-08-08
物件封裝
PHP 物件導向（三）名稱空間
2020-01-18
PHP物件
【HotSpot】一個java物件佔多少空間
2021-09-15
HotSpotJava物件
資料庫物件遷移表空間
2010-07-27
資料庫物件
空間資料庫三維空間兩點距離計算錯誤
2009-02-15
資料庫
PostgreSQL空間切割(st_split)功能擴充套件-空間物件網格化
2017-10-28
SQL套件物件
遷移使用者物件從一個表空間到另外表空間
2010-12-01
物件
MyBatis 的外掛物件如何建立出來的
2019-04-19
MyBatis物件
[需求建議]外掛建議：建議出一個時間軸外掛
2019-05-11
[外掛擴充套件]移動端多圖片上傳外掛uploadimages
2020-04-04
套件
[外掛擴充套件]解決onethink多圖上傳的外掛分享！
2019-05-11
套件
[外掛擴充套件]onethink 欄位外掛多圖上傳 UploadImages
2019-05-11
套件
Linux 磁碟掛載和swap空間管理
2022-07-02
Linux
Jquery多個外掛演示和原始碼
2009-08-03
jQuery原始碼

PostgreSQLcube外掛-多維空間物件

標籤

背景

語法

操作符

函式

例子

參考

相關文章