dbms_stats.gather_table_stats中 granularity的實驗

buptdream發表於2014-08-06

在OLAP 或者資料倉儲環境中，經常有這樣的事情，新建立一個分割槽，將批量的資料（通

常是很大量的資料）載入到分割槽中，然後對分割槽作分析，然後做報表或者資料探勘；在理想

的情況下，對錶的全域性，分割槽都做分析，這樣才能得到最充足的資料；但是通常來講這樣的

表都非常大，如果每增加一個分割槽都需要做一次全域性分析的話，勢必消耗極大的系統資源；

但是如果只對新加入的分割槽進行分析而不做全域性分析，Oracle 在全域性範圍內的資訊就會不

準確，比如這個表有多少條資料？這個資訊就是不準確的，很可能導致CBO 產生錯誤的執行

計劃，下面的例子將說明這個問題：

[oracle@localhost ~]$ clear

[oracle@localhost ~]$ sqlplus /nolog

SQL*Plus: Release 11.2.0.1.0 Production on 星期六 9月 1 10:17:42 2012

SQL> conn change/change

Connected.

SQL> create table t(id int) partition by range(id)

2 (

3 partition p1 values less than(5),

4 partition p2 values less than(10),

5 partition p3 values less than(15)

6 );

Table created.

SQL> insert into t values(1);

1 row created.

SQL> insert into t values(6);

1 row created.

SQL> insert into t values(11);

1 row created.

SQL> commit;

Commit complete.

SQL> create index ind_t on t(id) local;

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'t',cascade=>true);

PL/SQL procedure successfully completed.

先建立一個分割槽表T，他包含3 個分割槽p1,p2 和P3，每個分割槽插入一條記錄，在

表上建立一個本地索引，然後對錶進行分析：

引數在預設情況下，DBMS_STATS 包會對錶級（全域性），分割槽

級（對應引數partition)都會進行分析：

SQL> select num_rows,avg_row_len,last_analyzed from user_tables where table_name='T';

NUM_ROWS AVG_ROW_LEN LAST_ANALYZE

---------- ----------- ------------

3 3 01-9月 -12

SQL> select partition_name,num_rows,avg_row_len,last_analyzed from user_tab_partitions where

2 table_name='T';

PARTITION_NAME

--------------------------------------------------------------------------------

NUM_ROWS AVG_ROW_LEN LAST_ANALYZE

---------- ----------- ------------

1 3 01-9月 -12

SQL> col PARTITION_NAME for a20

SQL> /

PARTITION_NAME NUM_ROWS AVG_ROW_LEN LAST_ANALYZE

-------------------- ---------- ----------- ------------

P1 1 3 01-9月 -12

P2 1 3 01-9月 -12

P3 1 3 01-9月 -12

CASCADE 設定為TRUE，相應的索引的全域性和分割槽級別也都會被分析:

SQL> select BLevel,leaf_blocks,last_analyzed from user_indexes where index_name='IND_T';

BLEVEL LEAF_BLOCKS LAST_ANALYZE

---------- ----------- ------------

0 3 01-9月 -12

SQL> select partition_name,BLevel,leaf_blocks,last_analyzed from user_ind_partitions where index_name='IND_T';

PARTITION_NAME BLEVEL LEAF_BLOCKS LAST_ANALYZE

-------------------- ---------- ----------- ------------

P1 0 1 01-9月 -12

P2 0 1 01-9月 -12

P3 0 1 01-9月 -12

向表中加入一個新的分割槽，並給這個新的分割槽載入10000 條資料，並且讓這個分

區的資料分佈嚴重傾斜：

SQL> alter table t add partition pm values less than(maxvalue);

Table altered.

SQL> begin for i in 1..10000 loop

2 insert into t values(16);

3 end loop;

4 end;

5 /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> update t set id=10000 where id=16 and rownum=1;

1 row updated.

SQL> commit;

Commit complete.

SQL> select count(*) from t;

COUNT(*)

----------

10003

SQL> select * from t partition(p1);

----------

SQL> select * from t partition(p2);

----------

SQL> select * from t partition(p3);

----------

SQL> select id,count(*) from t partition(pm) group by id;

ID COUNT(*)

---------- ----------

10000 1

16 9999

上面的資料查出了每個分割槽的記錄數，注意PM 分割槽有9999 條記錄的值為16，1 條記錄的值

為10000。

SQL> select partition_name, num_rows,avg_row_len,last_analyzed from user_tab_partitions where table_name='T';

PARTITION_NAME NUM_ROWS AVG_ROW_LEN LAST_ANALYZE

-------------------- ---------- ----------- ------------

P1 1 3 01-9月 -12

P2 1 3 01-9月 -12

P3 1 3 01-9月 -12

SQL> select num_rows,avg_row_len,last_analyzed from user_tables where table_name

2 ='T';

NUM_ROWS AVG_ROW_LEN LAST_ANALYZE

---------- ----------- ------------

3 3 01-9月 -12

我們看到，這時候新的分割槽PM 沒有被分析，而表的全域性資訊顯示，這個表中的資料仍然只

有3 條，讓我們來執行一條SQL，看看CBO 的判斷：

SQL> set linesize 1000

SQL> set autotrace traceonly

SQL> select * from t where id=16;

9999 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 657481967

------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 | | |

| 1 | PARTITION RANGE SINGLE| | 1 | 3 | 1 (0)| 00:00:01 | 4 | 4 |

|* 2 | INDEX RANGE SCAN | IND_T | 1 | 3 | 1 (0)| 00:00:01 | 4 | 4 |

------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("ID"=16)

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

713 consistent gets

0 physical reads

1496 redo size

172445 bytes sent via SQL*Net to client

7850 bytes received via SQL*Net from client

668 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

9999 rows processed

CBO 根據表的資訊估算的返回結果級只有1 條。這個結果嚴重偏離了實際情

況，我們知道實際上ID=16 的值有9999 條，但是因為我們沒有對新增加的表進行分析，所

以導致CBO 出現了錯誤的判斷。

那麼如果我們只對分割槽級別進行分析，而不做表級的分析，那麼會如

何呢？

SQL> set autotrace off

SQL> exec dbms_stats.gather_table_stats(user,'t',partname=>'pm',granularity=>'partition');

PL/SQL procedure successfully completed.

SQL> select partition_name, num_rows,avg_row_len,last_analyzed from user_tab_partitions where

2 table_name='T';

PARTITION_NAME NUM_ROWS AVG_ROW_LEN LAST_ANALYZE

-------------------- ---------- ----------- ------------

P1 1 3 01-9月 -12

P2 1 3 01-9月 -12

P3 1 3 01-9月 -12

PM 10000 3 01-9月 -12

我們將granularity 設定為partition，這樣，ORACLE 只會在partition 級別上做分析，表

級（BLOBAL）將不會作分析：

SQL> select num_rows,avg_row_len,last_analyzed from user_tables where table_name='T';

NUM_ROWS AVG_ROW_LEN LAST_ANALYZE

---------- ----------- ------------

3 3 01-9月 -12

我們看到PM 分割槽的資訊已經被更新，顯示為10000 條，而在表級資訊裡，顯示的T 表仍然

只有3 條記錄

SQL> alter table t add object_name varchar2(10);

Table altered.

SQL> update t set object_name='AAAA';

10003 rows updated.

SQL> commit

2 ;

Commit complete.

SQL> create index ind2_t on t(object_name);

Index created.

SQL> set autotrace traceonly

SQL> select * from t where object_name='AAAA';

10003 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 4057582461

-------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 3 | 5 (0)| 00:00:01 | | |

| 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| T | 1 | 3 | 5 (0)| 00:00:01 | ROWID | ROWID |

|* 2 | INDEX RANGE SCAN | IND2_T | 40 | | 28 (0)| 00:00:01 | | |

-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("OBJECT_NAME"='AAAA')

Statistics

----------------------------------------------------------

24 recursive calls

0 db block gets

8822 consistent gets

28 physical reads

0 redo size

277692 bytes sent via SQL*Net to client

7850 bytes received via SQL*Net from client

668 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

10003 rows processed

CBO 估算出結果集為1，所以選擇了索引，因此，即使已經對分割槽進行了分析，而沒有

同步更新表級（全域性）統計資訊，依然可能導致CBO 選擇執行計劃錯誤。

我們來完成全域性的資訊統計，再來看CBO 是如何選擇執行計劃的?

SQL> exec dbms_stats.gather_table_stats(user,'t',granularity=>'GLOBAL');

PL/SQL procedure successfully completed.

SQL> set autotrace off

SQL> select partition_name, num_rows,avg_row_len,last_analyzed from user_tab_partitions where table_name='T';

PARTITION_NAME NUM_ROWS AVG_ROW_LEN LAST_ANALYZE

-------------------- ---------- ----------- ------------

P1 1 3 01-9月 -12

P2 1 3 01-9月 -12

P3 1 3 01-9月 -12

PM 10000 3 01-9月 -12

分析完成後，表的行數在統計資訊裡已經更新為10000 條了，說表全域性統計資訊已

經更新。再看剛才SQL 的執行計劃：

SQL> set autotrace traceonly

SQL> select * from t where object_name='AAAA';

10003 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3557914527

--------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 10003 | 80024 | 17 (0)| 00:00:01 | | |

| 1 | PARTITION RANGE ALL| | 10003 | 80024 | 17 (0)| 00:00:01 | 1 | 4 |

|* 2 | TABLE ACCESS FULL | T | 10003 | 80024 | 17 (0)| 00:00:01 | 1 | 4 |

--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - filter("OBJECT_NAME"='AAAA')

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

731 consistent gets

0 physical reads

0 redo size

172557 bytes sent via SQL*Net to client

7850 bytes received via SQL*Net from client

668 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

10003 rows processed

這時候CBO 估算出返回的結果級為10003 行，所以毫不猶豫的選擇了全表掃描。

我們看到，即使在分割槽級別的資訊有了，如果表的全域性資訊沒有更新，依然會導致CBO 做出

錯誤的執行計劃。

那麼如果我們只對全域性資訊做分析，而不對分割槽做分析，對CBO 的影響又將是如何的

呢？

用下面的方式將PM 分割槽的統計資訊恢復為載入資料之前的狀態：

SQL> delete from t where id>15;

10000 rows deleted.

Execution Plan

----------------------------------------------------------

Plan hash value: 1929529978

------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------

| 0 | DELETE STATEMENT | | 10000 | 30000 | 7 (0)| 00:00:01 | | |

| 1 | DELETE | T | | | | | | |

| 2 | PARTITION RANGE SINGLE| | 10000 | 30000 | 7 (0)| 00:00:01 | 4 | 4 |

|* 3 | TABLE ACCESS FULL | T | 10000 | 30000 | 7 (0)| 00:00:01 | 4 | 4 |

------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

3 - filter("ID">15)

Statistics

----------------------------------------------------------

24 recursive calls

54851 db block gets

48 consistent gets

0 physical reads

7668236 redo size

850 bytes sent via SQL*Net to client

782 bytes received via SQL*Net from client

3 SQL*Net roundtrips to/from client

1 sorts (memory)

0 sorts (disk)

10000 rows processed

SQL> set autotrace off

SQL> commit;

Commit complete.

SQL> delete from t where id>15;

0 rows deleted.

SQL> exec dbms_stats.gather_table_stats(user,'t');

PL/SQL procedure successfully completed.

SQL> select partition_name, num_rows,avg_row_len,last_analyzed from user_tab_partitions where

2 table_name='T';

PARTITION_NAME NUM_ROWS AVG_ROW_LEN LAST_ANALYZE

-------------------- ---------- ----------- ------------

P1 1 8 01-9月 -12

P2 1 8 01-9月 -12

P3 1 8 01-9月 -12

PM 0 0 01-9月 -12

SQL> select num_rows,avg_row_len,last_analyzed from user_tables where table_name='T';

NUM_ROWS AVG_ROW_LEN LAST_ANALYZE

---------- ----------- ------------

3 8 01-9月 -12

SQL> select count(*) from t;

COUNT(*)

----------

SQL> alter table t drop column object_name;

Table altered.

SQL> begin

2 for i in 1..10000 loop

3 insert into t values(16);

4 end loop;

5 end;

6 /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> update t set id=10000 where id=16 and rownum=1;

1 row updated.

SQL> commit;

Commit complete.

對錶做全域性分析，不對新的分割槽分析

SQL> exec dbms_stats.gather_table_stats(user,'t',granularity=>'global',method_opt=>'for all columns size 1');

PL/SQL procedure successfully completed.

設定method_opt=>'for all columns size 1'是要Oracle 不對列做直方圖分

析，因為現在我們只要全域性統計資訊。

SQL> select num_rows,avg_row_len,last_analyzed from user_tables where table_name='T';

NUM_ROWS AVG_ROW_LEN LAST_ANALYZE

---------- ----------- ------------

10003 3 01-9月 -12

SQL> select partition_name, num_rows,avg_row_len,last_analyzed from user_tab_partitions where table_name='T';

PARTITION_NAME NUM_ROWS AVG_ROW_LEN LAST_ANALYZE

-------------------- ---------- ----------- ------------

P1 1 8 01-9月 -12

P2 1 8 01-9月 -12

P3 1 8 01-9月 -12

PM 0 0 01-9月 -12

全域性分析資訊更新了，而分割槽的分析資訊沒有更新。

SQL> set autotrace traceonly exp;

SQL> select * from t where id=16;

Execution Plan

----------------------------------------------------------

Plan hash value: 657481967

------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 13 | 1 (0)| 00:00:01 | | |

| 1 | PARTITION RANGE SINGLE| | 1 | 13 | 1 (0)| 00:00:01 | 4 | 4 |

|* 2 | INDEX RANGE SCAN | IND_T | 1 | 13 | 1 (0)| 00:00:01 | 4 | 4 |

------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("ID"=16)

儘管CBO 知道T 表有10003 條資料，因為沒有對PM 分割槽做分析（包括直方圖），無法知

道ID=16 的資料有多少，所以對返回結果集做出了錯誤的估計，認為結果集為1 條，選擇了

使用索引。

我們從上面的例子中看到，當一些新的資料進入到表中，如何對這些新的資料進行分析

（即如何設定Granularity 的值），是一個非常重要的問題；上面的例子只是說明了，如果

有一些資訊沒有分析到，就可能導致CBO 做出錯誤的執行計劃。

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/10701850/viewspace-1246587/，如需轉載，請註明出處，否則將追究法律責任。

CFS排程引數: sched_wakeup_granularity
2024-08-26
DBMS_STATS.GATHER_TABLE_STATS詳解
2012-12-03
【轉】dbms_stats.gather_table_stats與analyze table 的區別
2012-04-22
【實驗】【MySQL】MySQL中的運算子展示
2009-07-03
MySql
dbms_stats.gather_table_stats與analyze table 的區別[轉貼]
2009-03-18
執行收集統計資訊dbms_stats.gather_table_stats包的bug
2007-12-28
DDD中實現業務規則的驗證 - Marcin
2019-07-14
Kerberos 身份驗證在 ChunJun 中的落地實踐
2022-12-16
ROS
RAG中late chunking的實驗效果測試
2024-11-20
2、房價預測實戰中學到的經驗
2016-11-18
在 Android 中實現 Redux 的一點經驗
2018-01-02
AndroidRedux
試驗Oracle中實現行列轉換的方法（轉）
2009-04-13
Oracle
c#中Debug和Release的區別實驗
2024-05-06
C#
Java上機實踐四實驗二家中的電視
2020-10-07
Java
手遊中實時音視訊的開發經驗與實現技巧
2019-03-04
實驗者偏差：獨立遊戲測試中的“反饋陷阱”
2019-09-10
遊戲
實現Nest中引數的聯合型別校驗
2022-04-26
型別
ASP.NET2.0中文驗證碼的實現
2009-02-03
ASP.NET
實驗詳解Docker的各種操作小實驗
2020-11-10
Docker
vue中axios如何實現token驗證
2018-06-22
VueiOS
.NET 9 中沒有 wasi 實驗性支援
2024-11-03
.NET中特性＋反射實現資料校驗
2024-05-21
反射
BGP中next-hop-self 小實驗
2024-06-04
實驗11.ACL實驗
2024-06-25
同行程式碼評審過程中的實踐經驗
2014-10-21
行程
asm磁碟組中的磁碟頭資訊丟失恢復實驗
2010-04-28
ASM
實驗5.OSPF配置實驗
2024-06-25
20個實驗之實驗11
2024-04-28
20個實驗之實驗10
2024-04-28
20個實驗之實驗16
2024-05-07
30個實驗之實驗23
2024-05-08
30個實驗之實驗25
2024-05-08
實驗一--Easy IoT實現mqtt實驗
2020-09-25
MQQT
實驗
2024-04-22
Spring Cloud在雲端計算SaaS中的實戰經驗分享
2019-03-03
SpringCloud
oracle 12c中CDB和PDB的備份還原實驗
2019-07-19
Oracle
espnet中的transformer和LSTM語言模型對比實驗
2022-03-30
ORM模型
精益生產在上海某實驗室中的應用分析
2021-01-29

dbms_stats.gather_table_stats中 granularity的實驗

相關文章