dbms_stats.gather_table_stats中 granularity的實驗
在OLAP 或者資料倉儲環境中,經常有這樣的事情,新建立一個分割槽,將批量的資料(通
常是很大量的資料)載入到分割槽中,然後對分割槽作分析,然後做報表或者資料探勘;在理想
的情況下,對錶的全域性,分割槽都做分析,這樣才能得到最充足的資料;但是通常來講這樣的
表都非常大,如果每增加一個分割槽都需要做一次全域性分析的話,勢必消耗極大的系統資源;
但是如果只對新加入的分割槽進行分析而不做全域性分析,Oracle 在全域性範圍內的資訊就會不
準確,比如這個表有多少條資料?這個資訊就是不準確的,很可能導致CBO 產生錯誤的執行
計劃,下面的例子將說明這個問題:
[oracle@localhost ~]$ clear
[oracle@localhost ~]$ sqlplus /nolog
SQL*Plus: Release 11.2.0.1.0 Production on 星期六 9月 1 10:17:42 2012
Copyright (c) 1982, 2009, Oracle. All rights reserved.
SQL> conn change/change
Connected.
SQL> create table t(id int) partition by range(id)
2 (
3 partition p1 values less than(5),
4 partition p2 values less than(10),
5 partition p3 values less than(15)
6 );
Table created.
SQL> insert into t values(1);
1 row created.
SQL> insert into t values(6);
1 row created.
SQL> insert into t values(11);
1 row created.
SQL> commit;
Commit complete.
SQL> create index ind_t on t(id) local;
Index created.
SQL> exec dbms_stats.gather_table_stats(user,'t',cascade=>true);
PL/SQL procedure successfully completed.
先建立一個分割槽表T,他包含3 個分割槽p1,p2 和P3,每個分割槽插入一條記錄,在
表上建立一個本地索引,然後對錶進行分析:
引數在預設情況下,DBMS_STATS 包會對錶級(全域性),分割槽
級(對應引數partition)都會進行分析:
SQL> select num_rows,avg_row_len,last_analyzed from user_tables where table_name='T';
NUM_ROWS AVG_ROW_LEN LAST_ANALYZE
---------- ----------- ------------
3 3 01-9月 -12
SQL> select partition_name,num_rows,avg_row_len,last_analyzed from user_tab_partitions where
2 table_name='T';
PARTITION_NAME
--------------------------------------------------------------------------------
NUM_ROWS AVG_ROW_LEN LAST_ANALYZE
---------- ----------- ------------
P1
1 3 01-9月 -12
P2
1 3 01-9月 -12
P3
1 3 01-9月 -12
SQL> col PARTITION_NAME for a20
SQL> /
PARTITION_NAME NUM_ROWS AVG_ROW_LEN LAST_ANALYZE
-------------------- ---------- ----------- ------------
P1 1 3 01-9月 -12
P2 1 3 01-9月 -12
P3 1 3 01-9月 -12
CASCADE 設定為TRUE,相應的索引的全域性和分割槽級別也都會被分析:
SQL> select BLevel,leaf_blocks,last_analyzed from user_indexes where index_name='IND_T';
BLEVEL LEAF_BLOCKS LAST_ANALYZE
---------- ----------- ------------
0 3 01-9月 -12
SQL> select partition_name,BLevel,leaf_blocks,last_analyzed from user_ind_partitions where index_name='IND_T';
PARTITION_NAME BLEVEL LEAF_BLOCKS LAST_ANALYZE
-------------------- ---------- ----------- ------------
P1 0 1 01-9月 -12
P2 0 1 01-9月 -12
P3 0 1 01-9月 -12
向表中加入一個新的分割槽,並給這個新的分割槽載入10000 條資料,並且讓這個分
區的資料分佈嚴重傾斜:
SQL> alter table t add partition pm values less than(maxvalue);
Table altered.
SQL> begin for i in 1..10000 loop
2 insert into t values(16);
3 end loop;
4 end;
5 /
PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.
SQL> update t set id=10000 where id=16 and rownum=1;
1 row updated.
SQL> commit;
Commit complete.
SQL> select count(*) from t;
COUNT(*)
----------
10003
SQL> select * from t partition(p1);
ID
----------
1
SQL> select * from t partition(p2);
ID
----------
6
SQL> select * from t partition(p3);
ID
----------
11
SQL> select id,count(*) from t partition(pm) group by id;
ID COUNT(*)
---------- ----------
10000 1
16 9999
上面的資料查出了每個分割槽的記錄數,注意PM 分割槽有9999 條記錄的值為16,1 條記錄的值
為10000。
SQL> select partition_name, num_rows,avg_row_len,last_analyzed from user_tab_partitions where table_name='T';
PARTITION_NAME NUM_ROWS AVG_ROW_LEN LAST_ANALYZE
-------------------- ---------- ----------- ------------
P1 1 3 01-9月 -12
P2 1 3 01-9月 -12
P3 1 3 01-9月 -12
PM
SQL> select num_rows,avg_row_len,last_analyzed from user_tables where table_name
2 ='T';
NUM_ROWS AVG_ROW_LEN LAST_ANALYZE
---------- ----------- ------------
3 3 01-9月 -12
我們看到,這時候新的分割槽PM 沒有被分析,而表的全域性資訊顯示,這個表中的資料仍然只
有3 條,讓我們來執行一條SQL,看看CBO 的判斷:
SQL> set linesize 1000
SQL> set autotrace traceonly
SQL> select * from t where id=16;
9999 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 657481967
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 1 | 3 | 1 (0)| 00:00:01 | 4 | 4 |
|* 2 | INDEX RANGE SCAN | IND_T | 1 | 3 | 1 (0)| 00:00:01 | 4 | 4 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=16)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
713 consistent gets
0 physical reads
1496 redo size
172445 bytes sent via SQL*Net to client
7850 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
9999 rows processed
CBO 根據表的資訊估算的返回結果級只有1 條。這個結果嚴重偏離了實際情
況,我們知道實際上ID=16 的值有9999 條,但是因為我們沒有對新增加的表進行分析,所
以導致CBO 出現了錯誤的判斷。
那麼如果我們只對分割槽級別進行分析,而不做表級的分析,那麼會如
何呢?
SQL> set autotrace off
SQL> exec dbms_stats.gather_table_stats(user,'t',partname=>'pm',granularity=>'partition');
PL/SQL procedure successfully completed.
SQL> select partition_name, num_rows,avg_row_len,last_analyzed from user_tab_partitions where
2 table_name='T';
PARTITION_NAME NUM_ROWS AVG_ROW_LEN LAST_ANALYZE
-------------------- ---------- ----------- ------------
P1 1 3 01-9月 -12
P2 1 3 01-9月 -12
P3 1 3 01-9月 -12
PM 10000 3 01-9月 -12
我們將granularity 設定為partition,這樣,ORACLE 只會在partition 級別上做分析,表
級(BLOBAL)將不會作分析:
SQL> select num_rows,avg_row_len,last_analyzed from user_tables where table_name='T';
NUM_ROWS AVG_ROW_LEN LAST_ANALYZE
---------- ----------- ------------
3 3 01-9月 -12
我們看到PM 分割槽的資訊已經被更新,顯示為10000 條,而在表級資訊裡,顯示的T 表仍然
只有3 條記錄
SQL> alter table t add object_name varchar2(10);
Table altered.
SQL> update t set object_name='AAAA';
10003 rows updated.
SQL> commit
2 ;
Commit complete.
SQL> create index ind2_t on t(object_name);
Index created.
SQL> set autotrace traceonly
SQL> select * from t where object_name='AAAA';
10003 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 4057582461
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 5 (0)| 00:00:01 | | |
| 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| T | 1 | 3 | 5 (0)| 00:00:01 | ROWID | ROWID |
|* 2 | INDEX RANGE SCAN | IND2_T | 40 | | 28 (0)| 00:00:01 | | |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_NAME"='AAAA')
Statistics
----------------------------------------------------------
24 recursive calls
0 db block gets
8822 consistent gets
28 physical reads
0 redo size
277692 bytes sent via SQL*Net to client
7850 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10003 rows processed
CBO 估算出結果集為1,所以選擇了索引,因此,即使已經對分割槽進行了分析,而沒有
同步更新表級(全域性)統計資訊,依然可能導致CBO 選擇執行計劃錯誤。
我們來完成全域性的資訊統計,再來看CBO 是如何選擇執行計劃的?
SQL> exec dbms_stats.gather_table_stats(user,'t',granularity=>'GLOBAL');
PL/SQL procedure successfully completed.
SQL> set autotrace off
SQL> select partition_name, num_rows,avg_row_len,last_analyzed from user_tab_partitions where table_name='T';
PARTITION_NAME NUM_ROWS AVG_ROW_LEN LAST_ANALYZE
-------------------- ---------- ----------- ------------
P1 1 3 01-9月 -12
P2 1 3 01-9月 -12
P3 1 3 01-9月 -12
PM 10000 3 01-9月 -12
分析完成後,表的行數在統計資訊裡已經更新為10000 條了,說表全域性統計資訊已
經更新。再看剛才SQL 的執行計劃:
SQL> set autotrace traceonly
SQL> select * from t where object_name='AAAA';
10003 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3557914527
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10003 | 80024 | 17 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE ALL| | 10003 | 80024 | 17 (0)| 00:00:01 | 1 | 4 |
|* 2 | TABLE ACCESS FULL | T | 10003 | 80024 | 17 (0)| 00:00:01 | 1 | 4 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("OBJECT_NAME"='AAAA')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
731 consistent gets
0 physical reads
0 redo size
172557 bytes sent via SQL*Net to client
7850 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10003 rows processed
這時候CBO 估算出返回的結果級為10003 行,所以毫不猶豫的選擇了全表掃描。
我們看到,即使在分割槽級別的資訊有了,如果表的全域性資訊沒有更新,依然會導致CBO 做出
錯誤的執行計劃。
那麼如果我們只對全域性資訊做分析,而不對分割槽做分析,對CBO 的影響又將是如何的
呢?
用下面的方式將PM 分割槽的統計資訊恢復為載入資料之前的狀態:
SQL> delete from t where id>15;
10000 rows deleted.
Execution Plan
----------------------------------------------------------
Plan hash value: 1929529978
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 10000 | 30000 | 7 (0)| 00:00:01 | | |
| 1 | DELETE | T | | | | | | |
| 2 | PARTITION RANGE SINGLE| | 10000 | 30000 | 7 (0)| 00:00:01 | 4 | 4 |
|* 3 | TABLE ACCESS FULL | T | 10000 | 30000 | 7 (0)| 00:00:01 | 4 | 4 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("ID">15)
Statistics
----------------------------------------------------------
24 recursive calls
54851 db block gets
48 consistent gets
0 physical reads
7668236 redo size
850 bytes sent via SQL*Net to client
782 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
10000 rows processed
SQL> set autotrace off
SQL> commit;
Commit complete.
SQL> delete from t where id>15;
0 rows deleted.
SQL> exec dbms_stats.gather_table_stats(user,'t');
PL/SQL procedure successfully completed.
SQL> select partition_name, num_rows,avg_row_len,last_analyzed from user_tab_partitions where
2 table_name='T';
PARTITION_NAME NUM_ROWS AVG_ROW_LEN LAST_ANALYZE
-------------------- ---------- ----------- ------------
P1 1 8 01-9月 -12
P2 1 8 01-9月 -12
P3 1 8 01-9月 -12
PM 0 0 01-9月 -12
SQL> select num_rows,avg_row_len,last_analyzed from user_tables where table_name='T';
NUM_ROWS AVG_ROW_LEN LAST_ANALYZE
---------- ----------- ------------
3 8 01-9月 -12
SQL> select count(*) from t;
COUNT(*)
----------
3
SQL> alter table t drop column object_name;
Table altered.
SQL> begin
2 for i in 1..10000 loop
3 insert into t values(16);
4 end loop;
5 end;
6 /
PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.
SQL> update t set id=10000 where id=16 and rownum=1;
1 row updated.
SQL> commit;
Commit complete.
對錶做全域性分析,不對新的分割槽分析
SQL> exec dbms_stats.gather_table_stats(user,'t',granularity=>'global',method_opt=>'for all columns size 1');
PL/SQL procedure successfully completed.
設定method_opt=>'for all columns size 1'是要Oracle 不對列做直方圖分
析,因為現在我們只要全域性統計資訊。
SQL> select num_rows,avg_row_len,last_analyzed from user_tables where table_name='T';
NUM_ROWS AVG_ROW_LEN LAST_ANALYZE
---------- ----------- ------------
10003 3 01-9月 -12
SQL> select partition_name, num_rows,avg_row_len,last_analyzed from user_tab_partitions where table_name='T';
PARTITION_NAME NUM_ROWS AVG_ROW_LEN LAST_ANALYZE
-------------------- ---------- ----------- ------------
P1 1 8 01-9月 -12
P2 1 8 01-9月 -12
P3 1 8 01-9月 -12
PM 0 0 01-9月 -12
全域性分析資訊更新了,而分割槽的分析資訊沒有更新。
SQL> set autotrace traceonly exp;
SQL> select * from t where id=16;
Execution Plan
----------------------------------------------------------
Plan hash value: 657481967
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 1 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 1 | 13 | 1 (0)| 00:00:01 | 4 | 4 |
|* 2 | INDEX RANGE SCAN | IND_T | 1 | 13 | 1 (0)| 00:00:01 | 4 | 4 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=16)
儘管CBO 知道T 表有10003 條資料,因為沒有對PM 分割槽做分析(包括直方圖),無法知
道ID=16 的資料有多少,所以對返回結果集做出了錯誤的估計,認為結果集為1 條,選擇了
使用索引。
我們從上面的例子中看到,當一些新的資料進入到表中,如何對這些新的資料進行分析
(即如何設定Granularity 的值),是一個非常重要的問題;上面的例子只是說明了,如果
有一些資訊沒有分析到,就可能導致CBO 做出錯誤的執行計劃。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10701850/viewspace-1246587/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- CFS排程引數: sched_wakeup_granularity
- DBMS_STATS.GATHER_TABLE_STATS詳解
- 【轉】dbms_stats.gather_table_stats與analyze table 的區別
- 【實驗】【MySQL】MySQL中的運算子展示MySql
- dbms_stats.gather_table_stats與analyze table 的區別[轉貼]
- 執行收集統計資訊dbms_stats.gather_table_stats包的bug
- DDD中實現業務規則的驗證 - Marcin
- Kerberos 身份驗證在 ChunJun 中的落地實踐ROS
- RAG中late chunking的實驗效果測試
- 2、房價預測實戰中學到的經驗
- 在 Android 中實現 Redux 的一點經驗AndroidRedux
- 試驗Oracle中實現行列轉換的方法(轉)Oracle
- c#中Debug和Release的區別實驗C#
- Java上機實踐四實驗二家中的電視Java
- 手遊中實時音視訊的開發經驗與實現技巧
- 實驗者偏差:獨立遊戲測試中的“反饋陷阱”遊戲
- 實現Nest中引數的聯合型別校驗型別
- ASP.NET2.0中文驗證碼的實現ASP.NET
- 實驗 詳解Docker的各種操作小實驗Docker
- vue中axios如何實現token驗證VueiOS
- .NET 9 中沒有 wasi 實驗性支援
- .NET中特性+反射 實現資料校驗反射
- BGP中next-hop-self 小實驗
- 實驗11.ACL實驗
- 同行程式碼評審過程中的實踐經驗行程
- asm磁碟組中的磁碟頭資訊丟失恢復實驗ASM
- 實驗5.OSPF配置實驗
- 20個實驗之實驗11
- 20個實驗之實驗10
- 20個實驗之實驗16
- 30個實驗之實驗23
- 30個實驗之實驗25
- 實驗一--Easy IoT實現mqtt實驗MQQT
- 實驗
- Spring Cloud在雲端計算SaaS中的實戰經驗分享SpringCloud
- oracle 12c中CDB和PDB的備份還原實驗Oracle
- espnet中的transformer和LSTM語言模型對比實驗ORM模型
- 精益生產在上海某實驗室中的應用分析