[20181130]如何猜測那些值存在hash衝突.txt
[20181130]如何猜測那些值存在hash衝突.txt
--//今年6月份開始kerrycode的1個帖子提到子查詢結果快取在雜湊表中情況:
--//連結:http://www.cnblogs.com/kerrycode/p/9099507.html,摘要:
通俗來將,當使用標量子查詢的時候,ORACLE會將子查詢結果快取在雜湊表中, 如果後續的記錄出現同樣的值,最佳化器透過快取在雜湊
表中的值,判斷重複值不用重複呼叫函式,直接使用上次計算結果即可。從而減少呼叫函式次數,從而達到最佳化效能的效果。另外在
ORACLE 10和11中, 雜湊表只包含了255個Buckets,也就是說它能儲存255個不同值,如果超過這個範圍,就會出現雜湊衝突,那些出現
雜湊衝突的值就會重複呼叫函式,即便如此,依然能達到大幅改善效能的效果。
--//我當時就非常想從作者瞭解"雜湊表只包含了255個Buckets",這個觀點的出處.kerrycode給了我一個連結:
https://blogs.oracle.com/oraclemagazine/on-caching-and-evangelizing-sql
Oracle Database will use this hash table to remember the scalar subquery and the inputs to it—just :DEPTNO in this case
—and the output from it. At the beginning of every query execution, this cache is empty, but suppose you run the query
and the first PROJECTS row you retrieve has a DEPTNO value of 10. Oracle Database will assign the number 10 to a hash
value between 1 and 255 (the size of the hash table cache in Oracle Database 10g and Oracle Database 11g currently) and
will look in that hash table slot to see if the answer exists. In this case, it will not, so Oracle Database must run
the scalar subquery with the input of 10 to get the answer. If that answer (count) is 42, the hash table may look
something like this:
Select count(*) from emp where emp.deptno = :deptno
:deptno Count(*)
You'll have saved the DEPTNO value of 10 and the answer (count) of 42 in some slot—probably not the first or last slot,
but whatever slot the hash value 10 is assigned to. Now suppose the second row you get back from the PROJECTS table
includes a DEPTNO value of 20. Oracle Database will again look in the hash table after assigning the value 20, and it
will discover "no result in the cache yet." So it will run the scalar subquery, get the result, and put it into the hash
table cache. Now the cache may look like this:
Select count(*) from emp where emp.deptno = :deptno
:deptno Count(*)
Select count(*) from emp where emp.deptno = :deptno
:deptno Count(*)
… …
10 42
Now suppose the query returns a third row and it again includes a DEPTNO value of 10. This time, Oracle Database will
see DEPTNO = 10, find that it already has that value in the hash table cache, and will simply return 42 from the cache
instead of executing the scalar subquery. In fact, it will never have to run that scalar subquery for the DEPTNO values
of 10 or 20 again for that query—it will already have the answer.
What happens if the number of unique DEPTNO values exceeds the size of the hash table? What if there are more than 255
values? Or, more generally, if more than one DEPTNO value is assigned to the same slot in the hash table, what happens
in a hash collision?
The answer is the same for all these questions and is rather simple: Oracle Database will not be able to cache the
second or nth value to that slot in the hash table. For example, what if the third row returned by the query contains
the DEPTNO = 30 value? Further, suppose that DEPTNO = 30 is to be assigned to exactly the same hash table slot as DEPTNO
= 10. The database won't be able to effectively cache DEPTNO = 30 in this case—the value will never make it into the
hash table. It will, however, be "partially cached." Oracle Database still has the hash table with all the previous
executions, but it also keeps the last scalar subquery result it had "next to" the hash table. That is, if the fourth
row also includes a DEPTNO = 30 value, Oracle Database will discover that the result is not in the hash table but is
"next to" the hash table, because the last time it ran the scalar subquery, it was run with an input of 30. On the other
hand, if the fourth row includes a DEPTNO = 40 value, Oracle Database will run the scalar subquery with the DEPTNO = 40
value (because it hasn't seen that value yet during this query execution) and overwrite the DEPTNO = 30 result. The next
time Oracle Database sees DEPTNO = 30 in the result set, it'll have to run that scalar subquery again.
--//我自己開始瞎嘗試各種方法驗證hash buckets是否是255.我開始先入為主,認為就是255(或者256),經歷許多混亂,最後kerrycode給我
--//一個測試方法,連結如下:
http://blog.itpub.net/267265/viewspace-2156702/
http://www.cnblogs.com/kerrycode/p/9223093.html
--//按照這個方法很容易驗證hash buckets大小,11.2.0.4是1024,10.2.0.4是512,12.1.0.1是1024.
--//我想起開始測試時,75與48存在衝突的情況,當時我沒有想到這麼靠前的值存在衝突,為了驗證我幾乎是1個1個嘗試.
--//因為你根本不知道oracle的演算法.
--//昨天看驗證為什麼4與432存在衝突.
1.環境:
SCOTT@book> @ &r/ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
create or replace function f( x in varchar2 ) return number
as
begin
dbms_application_info.set_client_info(userenv('client_info')+1 );
return length(x);
end;
/
SCOTT@book> create table t as select rownum id1,mod(rownum-1,10000)+1 id2 from dual connect by level<=20000;
Table created.
SCOTT@book> create table t1 ( a number ,b number);
Table created.
--//欄位a 記錄呼叫函式次數.
2.建立測試指令碼:
--//建立指令碼cz.txt
exec dbms_application_info.set_client_info(0);
set term off
exec :x := &&1;
select count(distinct f_id2) from (select id2,(select f(id2) from dual) as f_id2 from t where id2 in (&&2,:x ));
set term on
insert into t1 values (userenv('client_info') ,:x) ;
commit ;
--//建立shell指令碼cz.sh:
#! /bin/bash
sqlplus -s -l scott/book <<EOF >> hz.txt
variable x number;
$(seq 500 | xargs -I{} echo @cz.txt {} $1)
quit
EOF
3.測試:
--//執行指令碼cz.sh:
$ . cz.sh 4
SCOTT@book> select * from t1 where a<>2;
A B
---------- ----------
1 4
3 432
--//可以發現4,432存在衝突.函式呼叫了3次.
SCOTT@book> delete t1;
500 rows deleted.
SCOTT@book> commit ;
Commit complete.
--//驗證1與那個值存在衝突.
$ . cz.sh 1
SCOTT@book> select * from t1 where a<>2;
A B
---------- ----------
3 484
1 1
--//可以驗證1與484存在hash衝突.
4.再拿連結例子做測試:
--//連結:
SCOTT@book> update emp set dept_no=484 where dept_no=432;
1 row updated.
SCOTT@book> commit ;
Commit complete.
SCOTT@book> alter session set statistics_level = all;
Session altered.
select
/*+ gather_plan_statistics post-shrink */
count(*)
from (
select /*+ no_merge */
outer.*
from emp outer
where outer.sal >
(
select /*+ no_unnest */ avg(inner.sal)
from emp inner
where inner.dept_no = outer.dept_no
)
)
;
COUNT(*)
----------
9498
SCOTT@book> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID gx7xb7rhfd2zf, child number 0
-------------------------------------
select /*+ gather_plan_statistics post-shrink */
count(*) from ( select /*+ no_merge */
outer.* from emp outer where outer.sal >
( select /*+ no_unnest */ avg(inner.sal)
from emp inner where
inner.dept_no = outer.dept_no ) )
Plan hash value: 322796046
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 569 (100)| | 1 |00:00:03.43 | 783K|
| 1 | SORT AGGREGATE | | 1 | 1 | | | | 1 |00:00:03.43 | 783K|
| 2 | VIEW | | 1 | 143 | | 569 (1)| 00:00:07 | 9498 |00:00:03.42 | 783K|
|* 3 | FILTER | | 1 | | | | | 9498 |00:00:03.42 | 783K|
| 4 | TABLE ACCESS FULL | EMP | 1 | 20001 | 156K| 71 (0)| 00:00:01 | 19001 |00:00:00.01 | 247 |
| 5 | SORT AGGREGATE | | 3173 | 1 | 8 | | | 3173 |00:00:03.41 | 783K|
|* 6 | TABLE ACCESS FULL| EMP | 3173 | 2857 | 22856 | 71 (0)| 00:00:01 | 10M|00:00:02.71 | 783K|
------------------------------------------------------------------------------------------------------------------------
--//迴圈3173.
SCOTT@book> select dept_no,count(*) from emp group by dept_no order by 1;
DEPT_NO COUNT(*)
---------- ----------
0 3167
1 3167
2 3167
3 3166
4 3166
5 3167
484 1
7 rows selected.
--//dept_no=1出現hash衝突.
--//dept_no=484 迴圈1次
--//dept_no=0 迴圈1次
--//dept_no=1 迴圈3167次
--//dept_no=2 迴圈1次
--//dept_no=3 迴圈1次
--//dept_no=4 迴圈1次
--//dept_no=5 迴圈1次
--//這樣累加: 1+1+3167+1+1+1+1 = 3173,這樣就相互驗證了.
4.我上面的測試純粹是蠻力測試,改寫為PL/SQL指令碼看看,PL/sql確實不熟練....
SCOTT@book> create table t2 ( a number ,b number,c number);
Table created.
--//欄位a 記錄呼叫函式次數.
--//指令碼cy.txt
declare
x number;
begin
for i in 1..10000 loop
dbms_application_info.set_client_info(0);
select count(distinct f_id2) into x from (select id2,(select f(id2) from dual) as f_id2 from t where id2 in (i, &&1 ) );
if ( userenv('client_info') =3 ) then
insert into t2 values (userenv('client_info') ,i,&&1) ;
commit ;
exit;
END IF;
end loop;
end;
/
--//我加入發現後exit(退出).你可以註解或者取消,這樣測試1..10000之間的hash buckets衝突值.
--//執行如下:
@ cy.txt 4
@ cy.txt 1
@ cy.txt 3
@ cy.txt 18
@ cy.txt 48
@ cy.txt 75
SCOTT@book> select * from t2;
A B C
---------- ---------- ----------
3 432 4
3 484 1
3 735 3
3 2071 18
3 75 48
3 48 75
6 rows selected.
--//這樣就很快知道那些值會發生hash衝突了.
--//不知道那位還有什麼更好的方法...
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/267265/viewspace-2222229/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- [20181130]hash衝突導致查詢緩慢.txt
- hash衝突解決方法
- [20200801]sql hint衝突.txtSQL
- hash解決衝突的方法優缺點
- [20231116]如何知道X表存在那些索引.txt索引
- [20181130]control file sequential read.txt
- [20190515]熱備份模式與rman衝突.txt模式
- [20200107]vim 按鍵與金山詞霸衝突.txt
- Hash雜湊競猜介紹
- 使用 etcd 和 grpc 遇到的版本衝突的那些事兒RPC
- Git 衝突了怎麼辦,如何高效快速的解決程式碼衝突?Git
- 如何解決git程式碼衝突Git
- oracle 序列值導致的主鍵衝突問題Oracle
- 如何在VS Code中啟用實時合併衝突檢測?
- iceberg合併小檔案衝突測試
- 雜湊衝突
- [20220121]Hash Aggregation.txt
- HASH雜湊值競猜遊戲開發原始碼丨原始碼示例丨HASH雜湊遊戲程式設計遊戲開發原始碼程式設計
- 什麼是 IP 衝突以及如何解決?
- 關於5G被激烈討論的那些爭端和衝突
- Git 解決衝突Git
- lvm 名稱衝突LVM
- git 解決衝突Git
- 基於C#的多邊形衝突檢測C#
- [20180713]關於hash join 測試中一個疑問.txt
- 如何解決 touchstart 事件與 click 事件的衝突事件
- hash遊戲競猜系統開發方案丨雜湊HASH遊戲競猜系統開發方案/(原始碼部署)遊戲原始碼
- Git衝突解決技巧Git
- 解衝突用到的命令
- 用層級理解衝突
- 處理併發衝突
- 雜湊衝突詳解
- 程式衝突及其解決
- Java依賴版本衝突Java
- git pull 衝突解決Git
- Activemq和Rabbitmq埠衝突MQ
- InnoDB鎖衝突案例演示
- Manjaro更新出現衝突JAR