[20190810]如何索引一個超長欄位.txt
[20190810]如何索引一個超長欄位.txt
--//連結問的一個問題:http://www.itpub.net/thread-2119521-1-1.html
因業務需要,一個欄位的字元要很長,最長有4000, 並且還要用欄位來判斷唯一性,如果直接建立索引他會報錯的,而且怎麼長查詢起
來是否很慢?請問各位前輩,有什麼好的方法來處理這個問題。
--//他還加了一個需求保證唯一性。不知道對方使用什麼版本,自己測試看看:
1.環境:
SCOTT@test01p> @ ver1
PORT_STRING VERSION BANNER CON_ID
------------------------------ -------------- -------------------------------------------------------------------------------- ----------
IBMPC/WIN_NT64-9.1.0 12.2.0.1.0 Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production 0
2.測試例子建立:
SCOTT@test01p> create table t (id number ,v1 varchar2(4000) );
Table created.
SCOTT@test01p> insert into t select level ,lpad('x',100,'x')||level v1 from dual connect by level < 2e5;
199999 rows created.
SCOTT@test01p> create unique index i_t_v1 on t(v1);
Index created.
--//實際上這樣的索引還是能建立的,估計重新rebuild online 就會報錯。
SCOTT@test01p> alter index i_t_v1 rebuild online ;
alter index i_t_v1 rebuild online
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-01450: maximum key length (3215) exceeded
--//因為rebuild online 要建立一張IOT表,導致maximum key length (3215) exceeded.
SCOTT@test01p> alter index i_t_v1 rebuild ;
Index altered.
--//取消online 沒有問題。
3.測試:
--//分析略。
SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and segment_name in ('I_T_V1','T');
SEGMENT_NAME HEADER_FILE HEADER_BLOCK BYTES BLOCKS
-------------------- ----------- ------------ ---------- ----------
T 11 306 27262976 3328
I_T_V1 11 1066 28311552 3456
--//你可以發現在我的例子中,索引佔用的空間比表還大,因為我索引的欄位長度佔10X位元組。如果想原連結的情況也許更大。
SCOTT@test01p> alter session set statistics_level = all;
Session altered.
SCOTT@test01p> select substr( v1,100,3) from t where v1=lpad('x',100,'x')||42;
SUBSTR
------
x42
SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 9bky8g0hf1td9, child number 1
-------------------------------------
select substr( v1,100,3) from t where v1=lpad('x',100,'x')||42
Plan hash value: 4077016850
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 2 (100)| | 1 |00:00:00.01 | 3 |
|* 1 | INDEX UNIQUE SCAN| I_T_V1 | 1 | 1 | 107 | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 3 |
----------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("V1"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx42')
4.繼續測試:
--//12c 提供standard_hash 函式。
SCOTT@test01p> create unique index if_t_v1_1 on t(standard_hash(v1));
Index created.
SCOTT@test01p> create unique index if_t_v1_2 on t(ora_hash(v1));
create unique index if_t_v1_2 on t(ora_hash(v1))
*
ERROR at line 1:
ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found
--//可以發現使用ora_hash出現重複值,無法建立唯一約束。而使用standard_hash函式沒有問題。
SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and segment_name in ('I_T_V1','T','IF_T_V1_1');
SEGMENT_NAME HEADER_FILE HEADER_BLOCK BYTES BLOCKS
-------------------- ----------- ------------ ---------- ----------
T 11 306 27262976 3328
IF_T_V1_1 11 434 7340032 896
I_T_V1 11 1066 28311552 3456
--//可以發現建立standard_hash的函式索引相對小一些。
SCOTT@test01p> select substr( v1,100,3) from t where v1=lpad('x',100,'x')||43;
SUBSTR
------
x43
SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 0jc1sx44ckdj9, child number 0
-------------------------------------
select substr( v1,100,3) from t where v1=lpad('x',100,'x')||43
Plan hash value: 4077016850
-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 2 (100)| | 1 |00:00:00.03 | 3 | 2 |
|* 1 | INDEX UNIQUE SCAN| I_T_V1 | 1 | 1 | 111 | 2 (0)| 00:00:01 | 1 |00:00:00.03 | 3 | 2 |
-------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("V1"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
x43')
filter(STANDARD_HASH("V1")=HEXTORAW('FBDADC99CB387566504D65A6003CFA57CA05F238'))
--//產生的執行有點怪怪的,仔細看filter條件就明白了不知道是否存在standard_hash函式索引的原因,也許使用I_T_V1不用回表的原因,
--//但是使用的索引是I_T_V1.修改執行語句如下:
SCOTT@test01p> select id,substr( v1,100,3) from t where v1=lpad('x',100,'x')||44;
ID SUBSTR
---------- ------
44 x44
SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID dazgkpkj95mfs, child number 0
-------------------------------------
select id,substr( v1,100,3) from t where v1=lpad('x',100,'x')||44
Plan hash value: 1893196135
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 3 (100)| | 1 |00:00:00.01 | 4 |
|* 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 1 | 132 | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 4 |
|* 2 | INDEX UNIQUE SCAN | IF_T_V1_1 | 1 | 1 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 3 |
-----------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / T@SEL$1
2 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("V1"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx44'
2 - access("T"."SYS_NC00003$"=HEXTORAW('CC14E546ACC59A5B03CD75C49F62248C12C6D8D0'))
--//注意我執行的查詢條件是v1=lpad('x',100,'x')||44,並沒有使用standard_hash函式查詢,如果寫成如下:
SCOTT@test01p> select id,substr( v1,100,3) from t where standard_hash(v1)=standard_hash(lpad('x',100,'x')||44);
ID SUBSTR
---------- ------
44 x44
SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 9z17dmg6hk84n, child number 0
-------------------------------------
select id,substr( v1,100,3) from t where
standard_hash(v1)=standard_hash(lpad('x',100,'x')||44)
Plan hash value: 1893196135
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 3 (100)| | 1 |00:00:00.01 | 4 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 1 | 132 | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 4 |
|* 2 | INDEX UNIQUE SCAN | IF_T_V1_1 | 1 | 1 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 3 |
-----------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / T@SEL$1
2 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T"."SYS_NC00003$"=HEXTORAW('CC14E546ACC59A5B03CD75C49F62248C12C6D8D0'))
--//上下比較可以發現前者多了一個filter("V1"='xx......xx44'.
--//也就是使用standard_hash建立的索引也可以使用像v1=lpad('x',100,'x')||44這樣的謂詞查詢。
--//類似的情況到目前位置我僅僅知道trunc以及substr(v1,1,N)的情況。
5.繼續測試..2:
--//作者並沒有提示使用的是12c,如果不是12c應該沒有standard_hash函式。自己建立1個md5函式看看。
CREATE OR REPLACE FUNCTION FN_MD5(par1 IN VARCHAR) RETURN VARCHAR2
DETERMINISTIC IS
retval varchar2(32);
BEGIN
retval := utl_raw.cast_to_raw(DBMS_OBFUSCATION_TOOLKIT.MD5(INPUT_STRING => par1));
RETURN retval;
END FN_MD5;
SCOTT@test01p> create unique index if_t_v1_2 on t(fn_md5(v1));
Index created.
SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and segment_name in ('I_T_V1','T','IF_T_V1_1','IF_T_V1_2');
SEGMENT_NAME HEADER_FILE HEADER_BLOCK BYTES BLOCKS
-------------------- ----------- ------------ ---------- ----------
T 11 306 27262976 3328
IF_T_V1_1 11 434 7340032 896
I_T_V1 11 1066 28311552 3456
IF_T_V1_2 11 14850 10485760 1280
--//使用fn_md5自定義函式的索引比standard_hash索引稍微大一些。
SCOTT@test01p> select id,substr( v1,100,3) from t where fn_md5(v1)=fn_md5(lpad('x',100,'x')||48);
ID SUBSTR
---------- ------
48 x48
SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID ggv02trp66xas, child number 0
-------------------------------------
select id,substr( v1,100,3) from t where
fn_md5(v1)=fn_md5(lpad('x',100,'x')||48)
Plan hash value: 675596911
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 3 (100)| | 1 |00:00:00.01 | 4 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 1 | 132 | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 4 |
|* 2 | INDEX UNIQUE SCAN | IF_T_V1_2 | 1 | 1 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 3 |
-----------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / T@SEL$1
2 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T"."SYS_NC00004$"="FN_MD5"('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx48'))
--//OK沒有問題。
6.繼續測試..3:
--//如果不考慮唯一性,可以考慮ora_hash函式。
SCOTT@test01p> create index if_t_v1_3 on t(ora_hash(v1));
Index created.
SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and
segment_name in ('I_T_V1','T','IF_T_V1_1','IF_T_V1_2','IF_T_V1_3');
SEGMENT_NAME HEADER_FILE HEADER_BLOCK BYTES BLOCKS
-------------------- ----------- ------------ ---------- ----------
T 11 306 27262976 3328
IF_T_V1_1 11 434 7340032 896
I_T_V1 11 1066 28311552 3456
IF_T_V1_2 11 14850 10485760 1280
IF_T_V1_3 11 16130 5242880 640
--//使用ora_hash函式索引更小。
SCOTT@test01p> select count(*),ora_hash(v1) from t group by ora_hash(v1) order by 1 desc FETCH FIRST 2 ROWS ONLY;
COUNT(*) ORA_HASH(V1)
---------- ------------
2 3933908345
2 3975299677
SCOTT@test01p> select id,substr(v1,100,10) from t where ora_hash(v1)=3933908345;
ID SUBSTR(V1,100,10)
---------- --------------------
36510 x36510
144288 x144288
SCOTT@test01p> select id,substr( v1,100,3) from t where ora_hash(v1)=ora_hash(lpad('x',100,'x')||36510);
ID SUBSTR
---------- ------
36510 x36
144288 x14
SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 4dvtgan3r97pp, child number 1
-------------------------------------
select id,substr( v1,100,3) from t where
ora_hash(v1)=ora_hash(lpad('x',100,'x')||36510)
Plan hash value: 1660760003
-------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 801 (100)| | 2 |00:00:00.01 | 5 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 2000 | 244K| 801 (0)| 00:00:01 | 2 |00:00:00.01 | 5 |
|* 2 | INDEX RANGE SCAN | IF_T_V1_3 | 1 | 800 | | 1 (0)| 00:00:01 | 2 |00:00:00.01 | 3 |
-------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / T@SEL$1
2 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T"."SYS_NC00005$"=3933908345)
--//也可以使用索引,缺點返回多餘的行。
總結:
--//使用standard_hash,自定義md5函式,或者ora_hash,但是ora_hash不保證唯一。我知道就這些,不知道還有什麼其它方法。
--//另外關於ORA-01450: maximum key length (3215) exceeded錯誤,可以考慮定義更大的資料塊db_16k_cache_size,
--//db_32k_cache_size.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/267265/viewspace-2653279/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- [20181020]lob欄位的索引段.txt索引
- [20180613]縮短欄位長度.txt
- [20210423]建立檢視以及欄位長度.txt
- [20180408]那些函式索引適合欄位的查詢.txt函式索引
- mysql sql同一個欄位多個行轉成一個欄位查詢MySql
- oracle複合索引介紹(多欄位索引)Oracle索引
- MySQL中修改一個資料庫下包含有某個相同欄位的所有表的欄位長度MySql資料庫
- 怎麼給字串欄位加索引?字串索引
- [20231020]增加欄位的問題.txt
- MySQL 更新一個表裡的欄位等於另一個表某欄位的值MySql
- mysql基礎 依據一個欄位查詢另外一個欄位存在不同的值MySql
- MySQL 更新同一個表不同欄位MySql
- PG裡常見的欄位有索引但未使用索引的原因索引
- 將多個JSON欄位對映到單個Java欄位JSONJava
- mysql如何判斷是否存在某個欄位MySql
- [20181031]lob欄位與布隆過濾.txt
- 開發技巧:秀兒,如何才能一個欄位表示八個狀態啊
- [20210603]如何跟蹤索引分裂.txt索引
- SQLServer索引優化(1):對於有order by欄位的建索引策略SQLServer索引優化
- [20220117]超長sql語句.txtSQL
- MySQL null值欄位是否使用索引的總結MySqlNull索引
- [20190227]簡單探究tab$的bojb#欄位.txt
- [20181022]lob欄位的lobid來之那裡.txt
- [20210208]lob欄位與查詢的問題.txt
- MySql 查詢某一天日期格式欄位走索引問題MySql索引
- oracle 修改表欄位的長度Oracle
- 【Mongo】mongo更新欄位為另一欄位的值Go
- milvus 支援一個集合建立多個向量欄位嗎?
- mongodb如何改_id欄位?MongoDB
- Oracle-新增一個欄位並設定日期Oracle
- C#學習筆記-欄位、屬性、索引器C#筆記索引
- Sql查詢 一個表中某欄位的資料在另一個表中某欄位中不存在的SQL
- 請教一下 多欄位值如何排序?排序
- [20201109]11.2.0.4增加欄位與預設值問題.txt
- 一文詳解MySQL如何同時自增自減多個欄位MySql
- MySQL · 最佳實踐 · 如何索引 JSON 欄位 (阿里雲 RDS-資料庫核心組 )MySql索引JSON阿里資料庫
- [20211105]索引分裂塊清除日誌增加(唯一索引).txt索引
- Oracle 修改欄位型別和長度Oracle型別