Greenplum型別一致性使用規範-索引條件、JOIN的型別一致性限制
標籤
PostgreSQL , Greenplum , 型別一致 , 索引 , inner轉換 , join
背景
在查詢時,有很多使用者會犯渾,發現建立了索引,但是查詢偏偏不走索引。
怎麼不走索引啊?
這裡做容易混淆的是型別的差異,例如欄位型別為字串,但是輸入的是INT型別,這樣就可能不走索引。(除非建立了自動的CAST,自動進行型別轉換)
查詢的輸入型別與索引的型別一致是非常有必要的。
例子
1、建表(使用變長、定長字串型別),寫入測試資料
create table tbl3(id int, c1 text, c2 varchar(64), c3 char(64));
postgres=# insert into tbl3 select id, md5, md5, md5 from (select id, md5(random()::text) as md5 from generate_series(1,1000000) t(id)) t;
INSERT 0 1000000
2、建立索引
postgres=# create index idx_tbl3_1 on tbl3 (c1);
CREATE INDEX
postgres=# create index idx_tbl3_2 on tbl3 (c2);
CREATE INDEX
postgres=# create index idx_tbl3_3 on tbl3 (c3);
CREATE INDEX
3、結構如下
postgres=# d+ tbl3
Table "public.tbl3"
Column | Type | Modifiers | Storage | Description
--------+-----------------------+-----------+----------+-------------
id | integer | | plain |
c1 | text | | extended |
c2 | character varying(64) | | extended |
c3 | character(64) | | extended |
Indexes:
"idx_tbl3_1" btree (c1)
"idx_tbl3_2" btree (c2)
"idx_tbl3_3" btree (c3)
Has OIDs: no
Distributed by: (id)
4、走索引,強制轉換為bpchar。
postgres=# explain analyze select * from tbl3 where c3=`abc`;
QUERY PLAN
------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..201.23 rows=1 width=135)
Rows out: 0 rows at destination with 3.081 ms to end, start offset by 0.294 ms.
-> Index Scan using idx_tbl3_3 on tbl3 (cost=0.00..201.23 rows=1 width=135)
Index Cond: c3 = `abc`::bpchar
Rows out: 0 rows (seg0) with 0.138 ms to end, start offset by 3.211 ms.
Slice statistics:
(slice0) Executor memory: 147K bytes.
(slice1) Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 3.484 ms
(11 rows)
5、不走索引,因為輸入的為text變長,與char定長不一致。
postgres=# explain analyze select * from tbl3 where c3=`abc`::text;
QUERY PLAN
-------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..20032.60 rows=1001 width=135)
Rows out: 0 rows at destination with 305 ms to end, start offset by 0.335 ms.
-> Seq Scan on tbl3 (cost=0.00..20032.60 rows=334 width=135)
Filter: c3::text = `abc`::text
Rows out: 0 rows (seg0) with 302 ms to end, start offset by 4.023 ms.
Slice statistics:
(slice0) Executor memory: 147K bytes.
(slice1) Executor memory: 195K bytes avg x 3 workers, 195K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 305.985 ms
(11 rows)
6、走索引,輸入的型別為變長型別,欄位也是變長型別,與TEXT相容。
postgres=# explain analyze select * from tbl3 where c2=`abc`::text;
QUERY PLAN
------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..200.86 rows=1 width=135)
Rows out: 0 rows at destination with 2.744 ms to end, start offset by 0.275 ms.
-> Index Scan using idx_tbl3_2 on tbl3 (cost=0.00..200.86 rows=1 width=135)
Index Cond: c2::text = `abc`::text
Rows out: 0 rows (seg0) with 0.031 ms to end, start offset by 2.868 ms.
Slice statistics:
(slice0) Executor memory: 147K bytes.
(slice1) Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 3.120 ms
(11 rows)
7、不走索引,輸入的型別為定長型別,但是欄位為變長型別。
postgres=# explain analyze select * from tbl3 where c2=`abc`::bpchar;
QUERY PLAN
-----------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..17531.66 rows=1 width=135)
Rows out: 0 rows at destination with 60 ms to end, start offset by 0.276 ms.
-> Seq Scan on tbl3 (cost=0.00..17531.66 rows=1 width=135)
Filter: c2::bpchar = `abc`::bpchar
Rows out: 0 rows (seg0) with 57 ms to end, start offset by 2.864 ms.
Slice statistics:
(slice0) Executor memory: 147K bytes.
(slice1) Executor memory: 131K bytes avg x 3 workers, 131K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 60.320 ms
(11 rows)
8、內表不走索引,因為JOIN欄位型別不匹配。
postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c3) and t1.c1=`abc`;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Aggregate (cost=25235.81..25235.82 rows=1 width=8)
-> Gather Motion 3:1 (slice2; segments: 3) (cost=25235.74..25235.79 rows=1 width=8)
-> Aggregate (cost=25235.74..25235.75 rows=1 width=8)
-> Nested Loop (cost=0.00..25235.66 rows=11 width=0)
-> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..200.88 rows=1 width=33)
-> Index Scan using idx_tbl3_1 on tbl3 t1 (cost=0.00..200.86 rows=1 width=33)
Index Cond: c1 = `abc`::text AND `abc`::text = c1
-> Seq Scan on tbl3 t2 (cost=0.00..25034.46 rows=11 width=65)
Filter: `abc`::text = c3::text AND c3::text = `abc`::text
(9 rows)
9、內表走索引,因為JOIN欄位型別都是變長,匹配。
postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c2) and t1.c1=`abc`;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=401.86..401.87 rows=1 width=8)
-> Gather Motion 3:1 (slice2; segments: 3) (cost=401.80..401.85 rows=1 width=8)
-> Aggregate (cost=401.80..401.81 rows=1 width=8)
-> Nested Loop (cost=200.88..401.79 rows=2 width=0)
-> Index Scan using idx_tbl3_1 on tbl3 t1 (cost=0.00..200.86 rows=1 width=33)
Index Cond: c1 = `abc`::text AND `abc`::text = c1
-> Materialize (cost=200.88..200.89 rows=1 width=33)
-> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..200.88 rows=1 width=33)
-> Index Scan using idx_tbl3_2 on tbl3 t2 (cost=0.00..200.86 rows=1 width=33)
Index Cond: `abc`::text = c2::text AND c2::text = `abc`::text
(10 rows)
分析
實際上就是要求索引的表示式與條件的表示式一致。
例子,使用表示式,強制轉換為變長型別。
postgres=# create index idx_tbl3_4 on tbl3 ((c3::text));
CREATE INDEX
查詢時,只要表示式的型別與條件型別匹配即可,走索引,同時會使用Implicit轉換。
postgres=# explain analyze select * from tbl3 where (c3::text)=`abc`;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=108.60..52793.25 rows=1001 width=135)
Rows out: 0 rows at destination with 3.796 ms to end, start offset by 0.309 ms.
-> Bitmap Heap Scan on tbl3 (cost=108.60..52793.25 rows=334 width=135)
Recheck Cond: c3::text = `abc`::text
Rows out: 0 rows (seg0) with 0.163 ms to end, start offset by 3.095 ms.
-> Bitmap Index Scan on idx_tbl3_4 (cost=0.00..108.35 rows=334 width=0)
Index Cond: c3::text = `abc`::text
Bitmaps out: Avg 1.0 x 3 workers. Max 1 (seg0) with 0.135 ms to end, start offset by 3.119 ms.
Slice statistics:
(slice0) Executor memory: 155K bytes.
(slice1) Executor memory: 321K bytes avg x 3 workers, 321K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_seqscan=off
Total runtime: 4.219 ms
(15 rows)
postgres=# explain analyze select * from tbl3 where (c3::text) = `123`::int;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=108.60..52793.25 rows=1001 width=135)
Rows out: 0 rows at destination with 2.926 ms to end, start offset by 0.298 ms.
-> Bitmap Heap Scan on tbl3 (cost=108.60..52793.25 rows=334 width=135)
Recheck Cond: c3::text = `123`::text
Rows out: 0 rows (seg0) with 0.110 ms to end, start offset by 3.057 ms.
-> Bitmap Index Scan on idx_tbl3_4 (cost=0.00..108.35 rows=334 width=0)
Index Cond: c3::text = `123`::text
Bitmaps out: Avg 1.0 x 3 workers. Max 1 (seg0) with 0.064 ms to end, start offset by 3.095 ms.
Slice statistics:
(slice0) Executor memory: 155K bytes.
(slice1) Executor memory: 289K bytes avg x 3 workers, 289K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_seqscan=off
Total runtime: 3.327 ms
(15 rows)
postgres=# dC
List of casts
Source type | Target type | Function | Implicit?
-----------------------------+-----------------------------+--------------------+---------------
abstime | date | date | in assignment
abstime | integer | (binary coercible) | no
abstime | timestamp without time zone | timestamp | yes
abstime | timestamp with time zone | timestamptz | yes
abstime | time without time zone | time | in assignment
bigint | bit | bit | no
bigint | character | text | in assignment
bigint | character varying | text | in assignment
bigint | double precision | float8 | yes
bigint | integer | int4 | in assignment
bigint | numeric | numeric | yes
bigint | oid | oid | yes
bigint | real | float4 | yes
...............
小結
要讓索引起作用,最起碼應該關注一下幾點。
1、注意變長字串和定長字串,屬於兩種資料型別,不要混淆。
2、JOIN時操作符兩邊的欄位型別一致。
3、查詢時,WHERE條件與索引表示式一致。
4、查詢時,WHERE條件與索引欄位的型別一致。
5、查詢時,WHERE條件的collate與索引的collate一致。(本地化相關)
6、操作符本身支援對應的索引訪問方法。例如大多數的=操作符,支援b-tree的索引訪問方法。
7、PG是CBO優化,因此索引訪問的成本更低時,才會選擇索引掃描。(設定了特定的優化器開關,或者使用了HINT除外)。
參考
相關文章
- 條件型別型別
- 各種索引型別發生的條件索引型別
- typescript type 分配條件型別TypeScript型別
- 索引的型別索引型別
- MySQL索引的型別MySql索引型別
- TypeScript 官方手冊翻譯計劃【九】:型別操控-條件型別TypeScript型別
- TypeScript 索引型別TypeScript索引型別
- input型別和限制型別
- MIME郵件的multipart型別與text型別的區別型別
- left join,right join,inner join的條件on和where的區別
- 使用條件型別實現TypeScript中的函式過載型別TypeScript函式
- ORACLE MYSQL中join 欄位型別不同索引失效的情況OracleMySql型別索引
- TypeScript 條件型別精讀與實踐TypeScript型別
- 連線條件字元型別不一樣。字元型別
- 從規範看ECMAScript(二):資料型別資料型別
- PLSQL Language Reference-PL/SQL資料型別-SQL資料型別-有限制的子型別SQL資料型別
- WHRER條件裡的資料型別必須和欄位資料型別一致資料型別
- 實戰Elasticsearch6的join型別Elasticsearch型別
- 移動LOB型別的索引型別索引
- Golang的值型別和引用型別的範圍、儲存區域、區別Golang型別
- 資料型別範圍資料型別
- TypeScript 可索引型別介面TypeScript索引型別
- Mysql索引的使用 - 組合索引 + 範圍條件的處理MySql索引
- MyBatis 基本資料型別條件判斷問題MyBatis資料型別
- 資料型別為date作為查詢條件資料型別
- Oracle中left join中右表的限制條件Oracle
- 【開發篇sql】 條件和表示式(二) 資料型別的比較規則SQL資料型別
- varchar or blob:欄位型別的儲存和溢位條件型別
- Python引用型別和值型別的區別與使用Python型別
- Java基本資料型別總結、型別轉換、常量的宣告規範,final關鍵字的用法Java資料型別
- C#的型別——值型別與引用型別C#型別
- 【索引】反向索引--條件 範圍查詢索引
- 淺談程式語言型別的強型別,弱型別,動態型別,靜態型別型別
- 如何理解Linux系統中的資料一致性和IO型別?Linux型別
- <input type="file"> 限制檔案型別型別
- TABLE型別的使用型別
- 基本資料型別與API引用型別的使用資料型別API
- Swift 中的值型別與引用型別使用指北Swift型別