PostgreSQLjson索引實踐-檢索(存在、包含、等值、範圍等)加速

德哥發表於2018-10-05

標籤

PostgreSQL , json , gin , rum


背景

使用者在使用JSON型別時,常見的一些JSON搜尋包括:

1、存在,JSON中是否存在某個KEY,某些KEY,某些KEY的任意一個

存在某個KEY(TOP LEVEL)

`{"a":1, "b":2}`::jsonb ? `b`  

存在所有KEY

`{"a":1, "b":2, "c":3}`::jsonb ?& array[`b`, `c`]  

存在任意KEY、元素

`["a", "b"]`::jsonb ?| array[`a`, `b`]  

2、等值,JSON中是否存在指定的key:value對(支援巢狀JSON)

`{"a":1, "b":2}`::jsonb @> `{"b":2}`::jsonb  

3、包含,JSON中某個路徑下的VALUE(陣列)中,是否包含指定的所有元素。

postgres=# select jsonb `{"a":1, "b": {"c":[1,2,3], "d":["k","y","z"]}, "d":"kbc"}` @> `{"b":{"c":[2,3]}}`;  
 ?column?   
----------  
 t  
(1 row)  

4、相交,JSON中某個路徑下的VALUE(陣列)中,是否包含指定的任意元素。

postgres=# select jsonb `{"a":1, "b": {"c":[1,2,3], "d":["k","y","z"]}, "d":"kbc"}` @> `{"b":{"c":[2]}}`   
or  
jsonb `{"a":1, "b": {"c":[1,2,3], "d":["k","y","z"]}, "d":"kbc"}` @> `{"b":{"c":[3]}}`  
;  
  
 ?column?   
----------  
 t  
(1 row)  

或(注意1,2,3需要雙引號,作為text型別儲存,因為操作符?| ?&暫時只支援了text[],如果是numeric匹配不上)

postgres=# select jsonb `{"a":1, "b": {"c":["1","2","3"], "d":["k","y","z"]}, "d":"kbc"}` -> `b` -> `c` ?& array[`2`,`3`,`4`] ;  
 ?column?   
----------  
 f  
(1 row)  
  
postgres=# select jsonb `{"a":1, "b": {"c":["1","2","3"], "d":["k","y","z"]}, "d":"kbc"}` -> `b` -> `c` ?| array[`2`,`3`,`4`] ;  
 ?column?   
----------  
 t  
(1 row)  

5、範圍查詢,JSON中某個路徑下的VALUE,是否落在某個範圍內。

(js ->> `key1` )::numeric between xx and xx  
  
(js ->> `key2` )::numeric between xx and xx  

這些操作如何加速,或者如何使用索引加速?

一、json 索引支援

GIN的兩個OPS,分別支援JSON:

The default GIN operator class for jsonb supports queries with top-level key-exists operators ?, ?& and ?| operators and path/value-exists operator @>.  
  
The non-default GIN operator class jsonb_path_ops supports indexing the @> operator only.  

1、支援 @> 操作符的索引如下(jsonb_path_ops只支援@>操作符,但是效率高)

postgres=# create table tbl(id int, js jsonb);  
CREATE TABLE  
postgres=# create index idx_tbl_1 on tbl using gin (js jsonb_path_ops);  
CREATE INDEX  

2、支援除範圍查詢以外的所有查詢的索引如下

postgres=# create table tbl(id int, js jsonb);  
CREATE TABLE  
postgres=# create index idx_tbl_1 on tbl using gin (js);  -- 使用預設ops即可  
CREATE INDEX  

二、JSON KEY VALUE值範圍查詢加速

某些使用,需要對VALUE使用範圍查詢,比如時間(如果要建索引,請使用numeric表示,否則需要自定義immutable函式),數值都有這些需求。

通常的做法,把範圍查詢的型別提取出來,建立btree表示式索引,如果有任意組合的範圍查詢,使用gin或rum表示式索引。

例子

create index idx1 on tbl ( ((js->>`k1`)::float8) );  
create index idx2 on tbl ( ((js->>`k2`)::numeric) );  
...  
create index idxn on tbl ( ((js->>`kn`)::float8) );  

create extension btree_gin;  
create index idx1 on tbl using gin( ((js->>`k1`)::float8), ((js->>`k2`)::numeric), ... ((js->>`kn`)::float8) );  

create extension rum;  
create index idx1 on tbl using rum( ((js->>`k1`)::float8), ((js->>`k2`)::numeric), ... ((js->>`kn`)::float8) );  

create or replace function to_timestamp(text) returns timestamp as $$  
  select $1::timestamp;  
$$ language sql strict immutable;  
  
  
create index idx1 on tbl using gin( ((js->>`k1`)::float8), to_timestamp(js->>`k2`), ... ((js->>`kn`)::float8) );  
或  
create index idx1 on tbl using rum( ((js->>`k1`)::float8), to_timestamp(js->>`k2`), ... ((js->>`kn`)::float8) );  

三、索引使用例子

create table tbl(id int, js jsonb);  
create index idx_tbl_1 on tbl using gin (js jsonb_path_ops);  
create index idx_tbl_2 on tbl using gin (js);  
create index idx_tbl_3 on tbl using rum( ((js->>`k1`)::float8), to_timestamp(js->>`k2`), ((js->>`k3`)::numeric) );  
  
postgres=# explain select * from tbl where js ? `a`;  
                               QUERY PLAN                                 
------------------------------------------------------------------------  
 Bitmap Heap Scan on tbl  (cost=2.21..3.32 rows=1 width=36)  
   Recheck Cond: (js ? `a`::text)  
   ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..2.21 rows=1 width=0)  
         Index Cond: (js ? `a`::text)  
(4 rows)  
  
postgres=# explain select * from tbl where js @> `{"a":"b"}`;  
                               QUERY PLAN                                 
------------------------------------------------------------------------  
 Bitmap Heap Scan on tbl  (cost=2.21..3.32 rows=1 width=36)  
   Recheck Cond: (js @> `{"a": "b"}`::jsonb)  
   ->  Bitmap Index Scan on idx_tbl_1  (cost=0.00..2.21 rows=1 width=0)  
         Index Cond: (js @> `{"a": "b"}`::jsonb)  
(4 rows)  

postgres=# explain select * from tbl where to_timestamp(js->>`k2`) between `2018-01-01` and `2018-01-02`;
                                                                                               QUERY PLAN                                                                                                
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_tbl_3 on tbl  (cost=5.50..12.22 rows=6 width=36)
   Index Cond: ((to_timestamp((js ->> `k2`::text)) >= `2018-01-01 00:00:00`::timestamp without time zone) AND (to_timestamp((js ->> `k2`::text)) <= `2018-01-02 00:00:00`::timestamp without time zone))
(2 rows)

postgres=# explain select * from tbl where to_timestamp(js->>`k2`) between `2018-01-01` and `2018-01-02` and ((js->>`k3`)::numeric) between 1 and 200;
                                                                                                                                                     QUERY PLAN                                                                              
                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------
 Index Scan using idx_tbl_3 on tbl  (cost=9.90..12.11 rows=1 width=36)
   Index Cond: ((to_timestamp((js ->> `k2`::text)) >= `2018-01-01 00:00:00`::timestamp without time zone) AND (to_timestamp((js ->> `k2`::text)) <= `2018-01-02 00:00:00`::timestamp without time zone) AND (((js ->> `k3`::text))::numeric >
= `1`::numeric) AND (((js ->> `k3`::text))::numeric <= `200`::numeric))
(2 rows)

  
postgres=# select * from tbl where js @> `{"a": {"b":"c"}}`;  
 id | js   
----+----  
(0 rows)  
  
SELECT doc->`site_name` FROM websites  
  WHERE doc->`tags` @> `[{"term":"paris"}, {"term":"food"}]`;  
  
postgres=# select jsonb `{"a":1, "b": {"c":[1,2,3], "d":["k","y","z"]}, "d":"kbc"}`;  
                               jsonb                                 
-------------------------------------------------------------------  
 {"a": 1, "b": {"c": [1, 2, 3], "d": ["k", "y", "z"]}, "d": "kbc"}  
(1 row)  
  
postgres=# select jsonb `{"a":1, "b": {"c":[1,2,3], "d":["k","y","z"]}, "d":"kbc"}` @> `{"b":{"c":[2,3]}}`;  
 ?column?   
----------  
 t  
(1 row)  
  
postgres=# select jsonb `{"a":1, "b": {"c":[1,2,3], "d":["k","y","z"]}, "d":"kbc"}` @> `{"b":{"c":[2,4]}}`;  
 ?column?   
----------  
 f  
(1 row)  
  
  
postgres=# explain select * from tbl where js @> `{"b":{"c":[2,4]}}`;  
                               QUERY PLAN                                 
------------------------------------------------------------------------  
 Bitmap Heap Scan on tbl  (cost=3.31..4.42 rows=1 width=36)  
   Recheck Cond: (js @> `{"b": {"c": [2, 4]}}`::jsonb)  
   ->  Bitmap Index Scan on idx_tbl_1  (cost=0.00..3.31 rows=1 width=0)  
         Index Cond: (js @> `{"b": {"c": [2, 4]}}`::jsonb)  
(4 rows)  

參考

https://www.postgresql.org/docs/devel/static/datatype-json.html

https://www.postgresql.org/docs/devel/static/functions-json.html

《PostgreSQL 電商業務(任意維度商品圈選應用) – json包range陣列的命中優化 – 展開+索引優化》

《如何讓json裡面的value支援索引範圍檢索》

《PostgreSQL 11 preview – jsonb_plpython, jsonb_plperl 外掛, PG型別jsonb與pyton,perl程式型別的相互轉換》

《PostgreSQL 店鋪運營實踐 – JSON[]陣列 內部標籤資料等值、範圍檢索100倍+加速示例 (含,單值+多值列合成)》

《PostgreSQL json 任意位置 append 功能實現》

《多流實時聚合 – 記錄級實時快照 – JSON聚合與json全文檢索的功能應用》

《HTAP資料庫 PostgreSQL 場景與效能測試之 46 – (OLTP) 大json欄位的高併發更新》

《HTAP資料庫 PostgreSQL 場景與效能測試之 32 – (OLTP) 高吞吐資料進出(堆存、行掃、無需索引) – 閱後即焚(JSON + 函式流式計算)》

《plpgsql 程式設計 – JSON陣列迴圈》

《JSONB 壓縮版本 ZSON》

《PostgreSQL 10.0 preview 功能增強 – SQL:2016標準(之SQL/JSON) Oracle 12c相容》

《PostgreSQL 10.0 preview 功能增強 – JSON 內容全文檢索》

《如何從PostgreSQL json中提取陣列》

《PostgreSQL merge json的正確姿勢》

《PostgreSQL json jsonb 支援的value資料型別,如何構造一個jsonb》


相關文章