Greenplum型別一致性使用規範-索引條件、JOIN的型別一致性限制

德哥發表於2017-10-28

原文網址 : https://flycode.co/archives/186465

型別索引

背景

在查詢時，有很多使用者會犯渾，發現建立了索引，但是查詢偏偏不走索引。

怎麼不走索引啊？

這裡做容易混淆的是型別的差異，例如欄位型別為字串，但是輸入的是INT型別，這樣就可能不走索引。（除非建立了自動的CAST，自動進行型別轉換）

查詢的輸入型別與索引的型別一致是非常有必要的。

例子

1、建表（使用變長、定長字串型別），寫入測試資料

create table tbl3(id int, c1 text, c2 varchar(64), c3 char(64));  
  
postgres=# insert into tbl3 select id, md5, md5, md5 from (select id, md5(random()::text) as md5 from generate_series(1,1000000) t(id)) t;  
INSERT 0 1000000

2、建立索引

postgres=# create index idx_tbl3_1 on tbl3 (c1);  
CREATE INDEX  
postgres=# create index idx_tbl3_2 on tbl3 (c2);  
CREATE INDEX  
postgres=# create index idx_tbl3_3 on tbl3 (c3);  
CREATE INDEX

3、結構如下

postgres=# d+ tbl3  
                         Table "public.tbl3"  
 Column |         Type          | Modifiers | Storage  | Description   
--------+-----------------------+-----------+----------+-------------  
 id     | integer               |           | plain    |   
 c1     | text                  |           | extended |   
 c2     | character varying(64) |           | extended |   
 c3     | character(64)         |           | extended |   
Indexes:  
    "idx_tbl3_1" btree (c1)  
    "idx_tbl3_2" btree (c2)  
    "idx_tbl3_3" btree (c3)  
Has OIDs: no  
Distributed by: (id)

4、走索引，強制轉換為bpchar。

postgres=# explain analyze select * from tbl3 where c3=`abc`;  
                                     QUERY PLAN                                       
------------------------------------------------------------------------------------  
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..201.23 rows=1 width=135)  
   Rows out:  0 rows at destination with 3.081 ms to end, start offset by 0.294 ms.  
   ->  Index Scan using idx_tbl3_3 on tbl3  (cost=0.00..201.23 rows=1 width=135)  
         Index Cond: c3 = `abc`::bpchar  
         Rows out:  0 rows (seg0) with 0.138 ms to end, start offset by 3.211 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 147K bytes.  
   (slice1)    Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 3.484 ms  
(11 rows)

5、不走索引，因為輸入的為text變長，與char定長不一致。

postgres=# explain analyze select * from tbl3 where c3=`abc`::text;  
                                     QUERY PLAN                                        
-------------------------------------------------------------------------------------  
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..20032.60 rows=1001 width=135)  
   Rows out:  0 rows at destination with 305 ms to end, start offset by 0.335 ms.  
   ->  Seq Scan on tbl3  (cost=0.00..20032.60 rows=334 width=135)  
         Filter: c3::text = `abc`::text  
         Rows out:  0 rows (seg0) with 302 ms to end, start offset by 4.023 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 147K bytes.  
   (slice1)    Executor memory: 195K bytes avg x 3 workers, 195K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 305.985 ms  
(11 rows)

6、走索引，輸入的型別為變長型別，欄位也是變長型別，與TEXT相容。

postgres=# explain analyze select * from tbl3 where c2=`abc`::text;  
                                     QUERY PLAN                                       
------------------------------------------------------------------------------------  
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..200.86 rows=1 width=135)  
   Rows out:  0 rows at destination with 2.744 ms to end, start offset by 0.275 ms.  
   ->  Index Scan using idx_tbl3_2 on tbl3  (cost=0.00..200.86 rows=1 width=135)  
         Index Cond: c2::text = `abc`::text  
         Rows out:  0 rows (seg0) with 0.031 ms to end, start offset by 2.868 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 147K bytes.  
   (slice1)    Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 3.120 ms  
(11 rows)

7、不走索引，輸入的型別為定長型別，但是欄位為變長型別。

postgres=# explain analyze select * from tbl3 where c2=`abc`::bpchar;  
                                    QUERY PLAN                                       
-----------------------------------------------------------------------------------  
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..17531.66 rows=1 width=135)  
   Rows out:  0 rows at destination with 60 ms to end, start offset by 0.276 ms.  
   ->  Seq Scan on tbl3  (cost=0.00..17531.66 rows=1 width=135)  
         Filter: c2::bpchar = `abc`::bpchar  
         Rows out:  0 rows (seg0) with 57 ms to end, start offset by 2.864 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 147K bytes.  
   (slice1)    Executor memory: 131K bytes avg x 3 workers, 131K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 60.320 ms  
(11 rows)

8、內表不走索引，因為JOIN欄位型別不匹配。

postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c3) and t1.c1=`abc`;  
                                                QUERY PLAN                                                   
-----------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=25235.81..25235.82 rows=1 width=8)  
   ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=25235.74..25235.79 rows=1 width=8)  
         ->  Aggregate  (cost=25235.74..25235.75 rows=1 width=8)  
               ->  Nested Loop  (cost=0.00..25235.66 rows=11 width=0)  
                     ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..200.88 rows=1 width=33)  
                           ->  Index Scan using idx_tbl3_1 on tbl3 t1  (cost=0.00..200.86 rows=1 width=33)  
                                 Index Cond: c1 = `abc`::text AND `abc`::text = c1  
                     ->  Seq Scan on tbl3 t2  (cost=0.00..25034.46 rows=11 width=65)  
                           Filter: `abc`::text = c3::text AND c3::text = `abc`::text  
(9 rows)

9、內表走索引，因為JOIN欄位型別都是變長，匹配。

postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c2) and t1.c1=`abc`;  
                                                   QUERY PLAN                                                      
-----------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=401.86..401.87 rows=1 width=8)  
   ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=401.80..401.85 rows=1 width=8)  
         ->  Aggregate  (cost=401.80..401.81 rows=1 width=8)  
               ->  Nested Loop  (cost=200.88..401.79 rows=2 width=0)  
                     ->  Index Scan using idx_tbl3_1 on tbl3 t1  (cost=0.00..200.86 rows=1 width=33)  
                           Index Cond: c1 = `abc`::text AND `abc`::text = c1  
                     ->  Materialize  (cost=200.88..200.89 rows=1 width=33)  
                           ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..200.88 rows=1 width=33)  
                                 ->  Index Scan using idx_tbl3_2 on tbl3 t2  (cost=0.00..200.86 rows=1 width=33)  
                                       Index Cond: `abc`::text = c2::text AND c2::text = `abc`::text  
(10 rows)

分析

實際上就是要求索引的表示式與條件的表示式一致。

例子，使用表示式，強制轉換為變長型別。

postgres=# create index idx_tbl3_4 on tbl3 ((c3::text));  
CREATE INDEX

查詢時，只要表示式的型別與條件型別匹配即可，走索引，同時會使用Implicit轉換。

postgres=# explain analyze select * from tbl3 where (c3::text)=`abc`;  
                                                   QUERY PLAN                                                     
----------------------------------------------------------------------------------------------------------------  
 Gather Motion 3:1  (slice1; segments: 3)  (cost=108.60..52793.25 rows=1001 width=135)  
   Rows out:  0 rows at destination with 3.796 ms to end, start offset by 0.309 ms.  
   ->  Bitmap Heap Scan on tbl3  (cost=108.60..52793.25 rows=334 width=135)  
         Recheck Cond: c3::text = `abc`::text  
         Rows out:  0 rows (seg0) with 0.163 ms to end, start offset by 3.095 ms.  
         ->  Bitmap Index Scan on idx_tbl3_4  (cost=0.00..108.35 rows=334 width=0)  
               Index Cond: c3::text = `abc`::text  
               Bitmaps out:  Avg 1.0 x 3 workers.  Max 1 (seg0) with 0.135 ms to end, start offset by 3.119 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 155K bytes.  
   (slice1)    Executor memory: 321K bytes avg x 3 workers, 321K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_seqscan=off  
 Total runtime: 4.219 ms  
(15 rows)

postgres=# explain analyze select * from tbl3 where (c3::text) = `123`::int;  
                                                   QUERY PLAN                                                     
----------------------------------------------------------------------------------------------------------------  
 Gather Motion 3:1  (slice1; segments: 3)  (cost=108.60..52793.25 rows=1001 width=135)  
   Rows out:  0 rows at destination with 2.926 ms to end, start offset by 0.298 ms.  
   ->  Bitmap Heap Scan on tbl3  (cost=108.60..52793.25 rows=334 width=135)  
         Recheck Cond: c3::text = `123`::text  
         Rows out:  0 rows (seg0) with 0.110 ms to end, start offset by 3.057 ms.  
         ->  Bitmap Index Scan on idx_tbl3_4  (cost=0.00..108.35 rows=334 width=0)  
               Index Cond: c3::text = `123`::text  
               Bitmaps out:  Avg 1.0 x 3 workers.  Max 1 (seg0) with 0.064 ms to end, start offset by 3.095 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 155K bytes.  
   (slice1)    Executor memory: 289K bytes avg x 3 workers, 289K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_seqscan=off  
 Total runtime: 3.327 ms  
(15 rows)

postgres=# dC  
                                         List of casts  
         Source type         |         Target type         |      Function      |   Implicit?     
-----------------------------+-----------------------------+--------------------+---------------  
 abstime                     | date                        | date               | in assignment  
 abstime                     | integer                     | (binary coercible) | no  
 abstime                     | timestamp without time zone | timestamp          | yes  
 abstime                     | timestamp with time zone    | timestamptz        | yes  
 abstime                     | time without time zone      | time               | in assignment  
 bigint                      | bit                         | bit                | no  
 bigint                      | character                   | text               | in assignment  
 bigint                      | character varying           | text               | in assignment  
 bigint                      | double precision            | float8             | yes  
 bigint                      | integer                     | int4               | in assignment  
 bigint                      | numeric                     | numeric            | yes  
 bigint                      | oid                         | oid                | yes  
 bigint                      | real                        | float4             | yes  
 ...............

小結

要讓索引起作用，最起碼應該關注一下幾點。

1、注意變長字串和定長字串，屬於兩種資料型別，不要混淆。

2、JOIN時操作符兩邊的欄位型別一致。

3、查詢時，WHERE條件與索引表示式一致。

4、查詢時，WHERE條件與索引欄位的型別一致。

5、查詢時，WHERE條件的collate與索引的collate一致。(本地化相關)

6、操作符本身支援對應的索引訪問方法。例如大多數的=操作符，支援b-tree的索引訪問方法。

7、PG是CBO優化，因此索引訪問的成本更低時，才會選擇索引掃描。（設定了特定的優化器開關，或者使用了HINT除外）。

參考

《PostgreSQL 自定義自動型別轉換(CAST)》

Greenplum索引設計的規範
2020-11-05
索引
typescript type 分配條件型別
2023-05-18
TypeScript型別
SQL | JOIN 型別使用介紹
2024-08-26
SQL型別
MySQL索引的型別
2019-03-31
MySql索引型別
TypeScript 索引型別
2018-09-13
TypeScript索引型別
TypeScript 官方手冊翻譯計劃【九】：型別操控-條件型別
2021-12-04
TypeScript型別
使用條件型別實現TypeScript中的函式過載
2021-10-26
型別TypeScript函式
TypeScript 條件型別精讀與實踐
2021-10-04
TypeScript型別
Git提交規範中，常見的commit型別
2024-05-21
GitMIT型別
從規範看ECMAScript（二）：資料型別
2019-03-04
資料型別
實戰Elasticsearch6的join型別
2022-09-07
Elasticsearch型別
32. 基本資料型別、約束條件
2024-10-11
資料型別
Golang的值型別和引用型別的範圍、儲存區域、區別
2018-08-14
Golang型別
Oracle中left join中右表的限制條件
2018-12-30
Oracle
react 報錯元素隱式具有 "any" 型別，因為型別為 "string" 的表示式不能用於索引型別 "{}"。在型別 "{}" 上找不到具有型別為 "string" 的引數的索引簽名。
2024-06-03
React型別索引
資料型別範圍
2018-03-29
資料型別
如何理解Linux系統中的資料一致性和IO型別？
2023-11-13
Linux型別
Java基本資料型別總結、型別轉換、常量的宣告規範，final關鍵字的用法
2019-03-29
Java資料型別
postgreSQL 索引（二）型別介紹
2018-12-02
SQL索引型別
<input type="file"> 限制檔案型別
2020-04-02
型別
mysql索引型別Normal,Unique,Full Text區別以及索引方法Btree,Hash的區別
2020-04-06
MySql索引型別ORM
基本資料型別與API引用型別的使用
2020-10-14
資料型別API
淺談程式語言型別的強型別，弱型別，動態型別，靜態型別
2021-01-10
型別
MySQL的索引型別和實現原理
2019-09-07
MySql索引型別
MySQL 索引的型別——《高效能MySQL》
2020-11-23
MySql索引型別
掌握4種SQL索引型別，剖析索引原理
2023-10-23
SQL索引型別
值型別與引用型別的區別
2020-12-13
型別
TypeScript 基本型別和泛型的使用
2019-02-27
TypeScript型別泛型
mysql索引型別：FULLTEXT、NORMAL、SPATIAL、UNIQUE
2020-09-02
MySql索引型別ORM
【Postgresql】索引型別（btree、hash、GIST、GIN）
2022-12-25
SQL索引型別
關於sqlserver字元型別查詢條件區分大小寫
2018-08-16
SQLServer字元型別
申請英國公司需要什麼條件？都有哪些型別？
2022-01-07
型別
SAP MM 自定義條件型別出現在採購資訊記錄的'條件'介面裡？
2019-12-07
型別
TS資料型別：型別別名/聯合型別/字面量型別/型別推論等綱要
2020-09-18
資料型別
Redis Stream型別的使用
2021-11-09
Redis型別
MySQL 的索引型別及如何建立維護
2019-12-24
MySql索引型別
TypeScript 官方手冊翻譯計劃【八】：型別操控-按索引訪問的型別
2021-12-04
TypeScript型別索引
Java的基本型別和引用型別
2020-12-09
Java型別
c#中值型別和引用型別的區別
2024-10-16
C#型別

Greenplum型別一致性使用規範-索引條件、JOIN的型別一致性限制

標籤

背景

例子

分析

小結

參考

相關文章