引言：“資料壓縮”以前對我來說還是比較新鮮的詞，並不是沒有聽說過，而是沒有實際使用過，之前一直做專案經理工作上也設計到資料庫的運維，但由於儲存設計的比較充裕，在加上效能運轉的還能讓客戶接受，所以壓縮技術基本上沒怎麼用，當時也怕對DML操作有負面影響！之所以現在要實驗這方面的技術，也是因為二期我們的資料量暴增，對機器對效能對運維都用一定的壓力和衝擊，這也說明了這門技術是在特定環境、特定場合下來使用的，下面我們來詳細講一講如何去使用這門技術---海量資料之利器“壓縮compress”

一、應用場合
答：正如上面所說，壓縮技術一般應用在“海量資料”的範疇，在大資料量下並且重複率高的情境下應該的效果才好，適合於OLAP（報表系統）不適合OLTP（線上交易系統），當資料量小時作用還不是很大，只要資料量達到一定程度的時候使用這門技術才有意義。
二、壓縮技術的好處
1.節約空間：我認為壓縮技術的最大好處就是節約空間，在儲存裝置空間很緊張的情況下更需要使用壓縮技術來緩解
2.減少memory/IO開銷：當使用了壓縮技術後，同等記錄對應的資料塊相對較少，自然而然IO開銷就小了
3.提高資料處理效率：原本10個資料塊的，經過壓縮後變成5個資料塊，但還是那些資料，可效率提高了一倍。
三、壓縮原理
答：我們都用過rar壓縮zip壓縮，其實這些壓縮的原理都很相似，當然在細微處理上有不同，資料塊壓縮原理上是使用長度較短符號來代替列中的每個值，由於符號佔用空間小於實際值，從而達到節約空間的效果，當列中重複值越多壓縮效果越好。
四、資料壓縮不適合OLTP系統
1.OLTP是一種多事務短時間片系統，有大量的DML操作，而DML操作我們都知道是有鎖lock的（insert update delete）當資料塊中記錄數多起來時，導致訪問這個資料塊的使用者就多，就是產生爭用和等待，從而降低了效率。
2.資料插入、更新和索引鍵值更新時，都要先解壓縮再更新，等待的時間會更長。
五、分割槽表的壓縮
1.分割槽表的每個分割槽都可以獨立壓縮
2.有些時候，分割槽表在建立時不設定分割槽壓縮屬性，當某個分割槽需要壓縮時，再獨立進行壓縮
3.如果在表級設定壓縮屬性的話，那麼所有分割槽都會啟動壓縮功能，反正如果是某幾個分割槽獨立壓縮，表級屬性上什麼也不寫的。
實驗
create table leo_partition1 (object_id,object_name) 建立一個六個分割槽的分割槽表，沒有指定壓縮
partition by range (object_id)
(
partition leo_p1 values less than (1000),
partition leo_p2 values less than (2000),
partition leo_p3 values less than (3000),
partition leo_p4 values less than (4000),
partition leo_p5 values less than (5000),
partition leo_max values less than (maxvalue)
)
as select object_id,object_name from dba_objects;
> select count(*) from leo_partition1;

COUNT(*)
----------
10351

> select partition_name,compression from user_tab_partitions where table_name='LEO_PARTITION1';

PARTITION_NAME                 COMPRESS
------------------------------ --------
LEO_MAX                        DISABLED
LEO_P1                         DISABLED
LEO_P2                         DISABLED
LEO_P3                         DISABLED
LEO_P4                         DISABLED
LEO_P5                         DISABLED

> alter table leo_partition1 compress; 例如定義表級壓縮屬性，此時所有分割槽都壓縮

> alter table leo_partition1 modify partition leo_p1 compress; 我們測試準對leo_p1 和 leo_p5分割槽進行壓縮，來說明分割槽可以獨立壓縮

Table altered.

> alter table leo_partition1 modify partition leo_p5 compress;

Table altered.

> select partition_name,compression from user_tab_partitions where table_name='LEO_PARTITION1';

PARTITION_NAME                 COMPRESS
------------------------------ -----------------------
LEO_MAX                       DISABLED
LEO_P1                         ENABLED
LEO_P2                         DISABLED
LEO_P3                         DISABLED
LEO_P4                         DISABLED
LEO_P5                         ENABLED

6 rows selected.

> alter table leo_partition1 modify partition leo_p5 nocompress; 我們還可以取消壓縮

Table altered.

> select partition_name,compression from user_tab_partitions where table_name='LEO_PARTITION1'; leo_p5分割槽已經取消壓縮

PARTITION_NAME                 COMPRESS
------------------------------ -----------------------
LEO_MAX                      DISABLED
LEO_P1                         ENABLED
LEO_P2                         DISABLED
LEO_P3                         DISABLED
LEO_P4                         DISABLED
LEO_P5                         DISABLED

六、分割槽索引的壓縮
1.分割槽表索引中不能對單個獨立分割槽索引設定壓縮，要麼整體壓縮要麼不壓縮。
2.如果建立分割槽表索引時未設定壓縮，以後都不能在設定了
> create index leo_partition1_index on leo_partition1(object_id) local; 建立時沒有設定壓縮

Index created.

> select index_name,compression from user_indexes where table_name='LEO_PARTITION1'; 分割槽表索引沒有壓縮

INDEX_NAME COMPRESS
------------------------------ --------
LEO_PARTITION1_INDEX DISABLED

> select partition_name,compression from user_ind_partitions where index_name='LEO_PARTITION1_INDEX'; 獨立分割槽索引也沒有壓縮

PARTITION_NAME                 COMPRESS
------------------------------ --------
LEO_MAX                        DISABLED
LEO_P1                         DISABLED
LEO_P2                         DISABLED
LEO_P3                         DISABLED
LEO_P4                         DISABLED
LEO_P5                         DISABLED

> alter index leo_partition1_index rebuild partition leo_p5 compress;
alter index leo_partition1_index rebuild partition leo_p5 compress
*
ERROR at line 1:
ORA-28659: COMPRESS must be specified at object level first 報錯，壓縮必須首先在物件級別指定

3.必須建立分割槽表索引時一次性指定好壓縮屬性，以後才能對獨立分割槽索引壓縮排行修改
> create index leo_partition1_index on leo_partition1(object_id) local compress; 建立時指定
Index created.
> select index_name,compression from user_indexes where table_name='LEO_PARTITION1'; 分割槽表索引已壓縮

INDEX_NAME COMPRESS
------------------------------ --------
LEO_PARTITION1_INDEX ENABLED

> select partition_name,index_name,compression from user_ind_partitions where index_name='LEO_PARTITION1_INDEX';獨立分割槽索引也壓

縮

PARTITION_NAME                 INDEX_NAME                     COMPRESS
------------------------------ ------------------------------ --------
LEO_MAX                        LEO_PARTITION1_INDEX           ENABLED
LEO_P1                         LEO_PARTITION1_INDEX           ENABLED
LEO_P2                         LEO_PARTITION1_INDEX           ENABLED
LEO_P3                         LEO_PARTITION1_INDEX           ENABLED
LEO_P4                         LEO_PARTITION1_INDEX           ENABLED
LEO_P5                         LEO_PARTITION1_INDEX           ENABLED
6 rows selected.

> alter index leo_partition1_index rebuild partition leo_p3 nocompress; 獨立分割槽索引取消壓縮
Index altered.
> alter index leo_partition1_index rebuild partition leo_p4 nocompress;
Index altered.
> alter index leo_partition1_index rebuild partition leo_p5 nocompress;
Index altered.
> select partition_name,index_name,compression from user_ind_partitions where index_name='LEO_PARTITION1_INDEX';

PARTITION_NAME                 INDEX_NAME                     COMPRESS
------------------------------ ------------------------------ --------
LEO_MAX                        LEO_PARTITION1_INDEX           ENABLED
LEO_P1                         LEO_PARTITION1_INDEX           ENABLED
LEO_P2                         LEO_PARTITION1_INDEX           ENABLED
LEO_P3                         LEO_PARTITION1_INDEX           DISABLED
LEO_P4                         LEO_PARTITION1_INDEX           DISABLED
LEO_P5                         LEO_PARTITION1_INDEX           DISABLED

七、資料壓縮對空間影響
1.節約空間：資料壓縮會節約大量儲存空間，我們建立一個壓縮表和一個普通表，來對比節約空間的效果
> create table leo_compress_table compress as select * from dba_objects; 壓縮表

Table created.

> create table leo_nocompress_table as select * from dba_objects; 普通表

Table created.

> select segment_name,extent_id,bytes from user_extents where segment_name in ('LEO_COMPRESS_TABLE','LEO_NOCOMPRESS_TABLE');

SEGMENT_NAME            EXTENT_ID      BYTES
---------------------- ---------- ----------
LEO_COMPRESS_TABLE              0      65536
LEO_COMPRESS_TABLE              1      65536
LEO_COMPRESS_TABLE              2      65536
LEO_COMPRESS_TABLE              3      65536
LEO_COMPRESS_TABLE              4      65536
LEO_COMPRESS_TABLE              5      65536
LEO_COMPRESS_TABLE              6      65536      同樣資料量，壓縮表佔用7個區，使用0.43M空間

LEO_NOCOMPRESS_TABLE            0      65536
LEO_NOCOMPRESS_TABLE            1      65536
LEO_NOCOMPRESS_TABLE            2      65536
LEO_NOCOMPRESS_TABLE            3      65536
LEO_NOCOMPRESS_TABLE            4      65536
LEO_NOCOMPRESS_TABLE            5      65536
LEO_NOCOMPRESS_TABLE            6      65536
LEO_NOCOMPRESS_TABLE            7      65536
LEO_NOCOMPRESS_TABLE            8      65536
LEO_NOCOMPRESS_TABLE            9      65536
LEO_NOCOMPRESS_TABLE           10      65536
LEO_NOCOMPRESS_TABLE           11      65536
LEO_NOCOMPRESS_TABLE           12      65536
LEO_NOCOMPRESS_TABLE           13      65536
LEO_NOCOMPRESS_TABLE           14      65536
LEO_NOCOMPRESS_TABLE           15      65536
LEO_NOCOMPRESS_TABLE           16    1048576      同樣資料量，普通表佔用17個區，使用2M空間

小結：我們可以看出，普通表空間是壓縮表空間的4倍多，顯然壓縮可以節約大量的儲存空間，這在磁碟空間不足的情況下很有作用

2.索引壓縮比：索引鍵值重複率越高，壓縮比越大，索引鍵值重複率越低，壓縮比越小
我們來比較一下，列中資料重複率的高低對壓縮比的影響

> create table leo_compress_table1 as select object_id,object_name from dba_objects; object_id列沒有重複值

Table created.

> create table leo_compress_table2 as select 2 object_id,object_name from dba_objects; object_id列全是2，完全重複

Table created.

> create index leo_compress_table1_index on leo_compress_table1 (object_id) compress; 為表1建立壓縮索引

Index created.

> create index leo_compress_table2_index on leo_compress_table2 (object_id) compress; 為表2建立壓縮索引

Index created.

> select segment_name,extent_id,bytes from user_extents where segment_name in

('LEO_COMPRESS_TABLE1_INDEX','LEO_COMPRESS_TABLE2_INDEX');

SEGMENT_NAME                EXTENT_ID      BYTES
-------------------------- ---------- ----------
LEO_COMPRESS_TABLE2_INDEX           0      65536
LEO_COMPRESS_TABLE2_INDEX           1      65536
LEO_COMPRESS_TABLE2_INDEX           2      65536    重複率高表2的壓縮索引，佔用3個區，使用0.18M空間

LEO_COMPRESS_TABLE1_INDEX           0      65536
LEO_COMPRESS_TABLE1_INDEX           1      65536
LEO_COMPRESS_TABLE1_INDEX           2      65536
LEO_COMPRESS_TABLE1_INDEX           3      65536
LEO_COMPRESS_TABLE1_INDEX           4      65536    重複率低表1的壓縮索引，佔用5個區，使用0.31M空間

小結：看來索引鍵值重複率的高低會直接影響壓縮索引的儲存空間，也從側面反映了在高重複率的環境下壓縮意義才大。

3.那麼在索引鍵值重複率低的情況下，壓縮索引與不壓縮索引的區別有多大呢，使用show_space()儲存過程分析空間使用情況，我列出了show_space()儲存過程定義指令碼完美版同時支援ASSM and MSSM 段物件儲存屬性，先要建立show_space()儲存過程才能使用，預設是沒有的，作者 TOM大叔，有好幾個版本，用我這個就可以了，然後重點set serveroutput on;顯示輸出結果，否則只顯示執行完成但沒有結果輸出

> create table leo_table1 as select * from dba_objects; 表1和表3結構資料完全一致，都是普通表

Table created.
> create table leo_table3 as select * from dba_objects;

Table created.

> create index leo_compress_table1_index on leo_table1 (object_id) compress; 建立壓縮索引，object_id列沒有重複值，重複率極低

Index created.

> create index leo_table3_index on leo_table3 (object_id) ; 建立普通索引，object_id列沒有重複值，重複率極低

Index created.

create or replace procedure show_space                               定義show_space()儲存過程，支援ASSM and MSSM 段物件儲存屬性
( p_segname in varchar2,
p_owner   in varchar2 default user,
p_type    in varchar2 default 'TABLE',
p_partition in varchar2 default NULL )
-- this procedure uses authid current user so it can query DBA_*
-- views using privileges from a ROLE and so it can be installed
-- once per database, instead of once per user that wanted to use it
authid current_user
as
    l_free_blks                 number;
    l_total_blocks              number;
    l_total_bytes               number;
    l_unused_blocks             number;
    l_unused_bytes              number;
    l_LastUsedExtFileId         number;
    l_LastUsedExtBlockId        number;
    l_LAST_USED_BLOCK           number;
    l_segment_space_mgmt        varchar2(255);
    l_unformatted_blocks number;
    l_unformatted_bytes number;
    l_fs1_blocks number; l_fs1_bytes number;
    l_fs2_blocks number; l_fs2_bytes number;
    l_fs3_blocks number; l_fs3_bytes number;
    l_fs4_blocks number; l_fs4_bytes number;
    l_full_blocks number; l_full_bytes number;

        -- inline procedure to print out numbers nicely formatted
        -- with a simple label
    procedure p( p_label in varchar2, p_num in number )
    is
    begin
        dbms_output.put_line( rpad(p_label,40,'.') ||
                              to_char(p_num,'999,999,999,999') );
    end;
begin
   -- this query is executed dynamically in order to allow this procedure
   -- to be created by a user who has access to DBA_SEGMENTS/TABLESPACES
   -- via a role as is customary.
   -- NOTE: at runtime, the invoker MUST have access to these two
   -- views!
   -- this query determines if the object is a ASSM object or not
   begin
      execute immediate
          'select ts.segment_space_management
             from dba_segments seg, dba_tablespaces ts
            where seg.segment_name      = :p_segname
              and (:p_partition is null or
                  seg.partition_name = :p_partition)
              and seg.owner = :p_owner
              and seg.tablespace_name = ts.tablespace_name'
             into l_segment_space_mgmt
            using p_segname, p_partition, p_partition, p_owner;
   exception
       when too_many_rows then
          dbms_output.put_line
          ( 'This must be a partitioned table, use p_partition => ');
          return;
   end;

   -- if the object is in an ASSM tablespace, we must use this API
   -- call to get space information, else we use the FREE_BLOCKS
   -- API for the user managed segments
   if l_segment_space_mgmt = 'AUTO'
   then
     dbms_space.space_usage
     ( p_owner, p_segname, p_type, l_unformatted_blocks,
       l_unformatted_bytes, l_fs1_blocks, l_fs1_bytes,
       l_fs2_blocks, l_fs2_bytes, l_fs3_blocks, l_fs3_bytes,
       l_fs4_blocks, l_fs4_bytes, l_full_blocks, l_full_bytes, p_partition);

     p( 'Unformatted Blocks ', l_unformatted_blocks );
     p( 'FS1 Blocks (0-25) ', l_fs1_blocks );
     p( 'FS2 Blocks (25-50) ', l_fs2_blocks );
     p( 'FS3 Blocks (50-75) ', l_fs3_blocks );
     p( 'FS4 Blocks (75-100)', l_fs4_blocks );
     p( 'Full Blocks        ', l_full_blocks );
else
     dbms_space.free_blocks(
       segment_owner     => p_owner,
       segment_name      => p_segname,
       segment_type      => p_type,
       freelist_group_id => 0,
       free_blks         => l_free_blks);

     p( 'Free Blocks', l_free_blks );
end if;

-- and then the unused space API call to get the rest of the
-- information
dbms_space.unused_space
( segment_owner     => p_owner,
    segment_name      => p_segname,
    segment_type      => p_type,
    partition_name    => p_partition,
    total_blocks      => l_total_blocks,
    total_bytes       => l_total_bytes,
    unused_blocks     => l_unused_blocks,
    unused_bytes      => l_unused_bytes,
    LAST_USED_EXTENT_FILE_ID => l_LastUsedExtFileId,
    LAST_USED_EXTENT_BLOCK_ID => l_LastUsedExtBlockId,
    LAST_USED_BLOCK => l_LAST_USED_BLOCK );

    p( 'Total Blocks', l_total_blocks );
    p( 'Total Bytes', l_total_bytes );
    p( 'Total MBytes', trunc(l_total_bytes/1024/1024) );
    p( 'Unused Blocks', l_unused_blocks );
    p( 'Unused Bytes', l_unused_bytes );
    p( 'Last Used Ext FileId', l_LastUsedExtFileId );
    p( 'Last Used Ext BlockId', l_LastUsedExtBlockId );
    p( 'Last Used Block', l_LAST_USED_BLOCK );
end;
/
Procedure created.

> set serveroutput on;                                  必須設定輸出show_space()儲存過程結果，否則只顯示執行完成但沒有結果輸出
> execute show_space('LEO_COMPRESS_TABLE1_INDEX',USER,'INDEX');    壓縮索引leo_compress_table1_index空間使用情況
Unformatted Blocks .....................               0
FS1 Blocks (0-25) .....................               0
FS2 Blocks (25-50) .....................               1
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................              31                 其中用了31個
Total Blocks............................              40                 佔用了40個資料塊
Total Bytes.............................         327,680                 0.31M
Total MBytes............................               0
Unused Blocks...........................               3
Unused Bytes............................          24,576
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           4,569
Last Used Block.........................               5

PL/SQL procedure successfully completed.

> execute show_space('LEO_TABLE3_INDEX',USER,'INDEX');             普通索引leo_table3_index空間使用情況
Unformatted Blocks .....................               0
FS1 Blocks (0-25) .....................               0
FS2 Blocks (25-50) .....................               1
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................              22                 其中用了22個
Total Blocks............................              32                 佔用了32個資料塊
Total Bytes.............................         262,144                 0.25M
Total MBytes............................               0
Unused Blocks...........................               5
Unused Bytes............................          40,960
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           4,601
Last Used Block.........................               3

PL/SQL procedure successfully completed.

小結：那麼在索引鍵值重複率低的情況下，壓縮索引比普通索引佔用的空間還多，這是為什麼呢，壓縮原理就是把鍵值用符號代替，重複鍵值越多冗餘空間越多，但我們測試鍵值幾乎是沒有重複的，還要騰出額外空間給對映列表（鍵值:符號對應表），所以壓縮索引使用的空間就會更多。

八、資料壓縮對效能影響
原因：資料壓縮後，掃描資料塊變少了，導致邏輯I/O（LIO）減少（一致性讀減少），同時有更多的資料塊快取到記憶體中，導致物理I/O（PIO）減少，當然CPU消耗自然也少了。
1.海量資料普通表和海量資料壓縮表空間比較
> create table leo_big_table as select * from dba_objects; 海量資料普通表，16萬條記錄
Table created.
> insert into leo_big_table select * from leo_big_table;
10374 rows created.
> /
20748 rows created.
> /
41496 rows created.
> /
82992 rows created.
> select count(*) from leo_big_table;

COUNT(*)
----------
165984

> commit;
Commit complete.
> create table leo_big_table_compress compress as select * from dba_objects; 海量資料壓縮表，16萬條記錄
Table created.
> insert into leo_big_table_compress select * from leo_big_table_compress;
10375 rows created.
> /
20750 rows created.
> /
41500 rows created.
> /
83000 rows created.
> select count(*) from leo_big_table_compress;

COUNT(*)
----------
166000

>
> commit;
Commit complete.
> execute show_space('LEO_BIG_TABLE');                      海量資料普通表，16萬條記錄
Unformatted Blocks .....................               0
FS1 Blocks (0-25) .....................               1
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               5
Full Blocks        .....................           2,064          其中2064個裝滿了資料
Total Blocks............................           2,176          佔用了2176個資料塊
Total Bytes.............................      17,825,792
Total MBytes............................              17          17M空間
Unused Blocks...........................              64
Unused Bytes............................         524,288
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           6,793
Last Used Block.........................              64

PL/SQL procedure successfully completed.

> execute show_space('LEO_BIG_TABLE_COMPRESS');             海量資料壓縮表，16萬條記錄
Unformatted Blocks .....................               0
FS1 Blocks (0-25) .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               1
FS4 Blocks (75-100).....................              31
Full Blocks        .....................           1,786          其中1786個裝滿了資料
Total Blocks............................           1,920          佔用了1920個資料塊
Total Bytes.............................      15,728,640
Total MBytes............................              15          15M空間
Unused Blocks...........................              64
Unused Bytes............................         524,288
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           8,713
Last Used Block.........................              64

PL/SQL procedure successfully completed.
小結：顯而易見，海量資料的環境下，壓縮表比普通表佔用的空間更少

2.10046事件追蹤查詢select普通表和壓縮表的sql執行計劃（全表掃描）

> alter session set events '10046 trace name context forever,level 12';
> select * from leo_big_table;
165984 rows selected.
> select * from leo_big_table_compress;
166000 rows selected.
tkprof leo_ora_1983.trc leo_20120812.txt sys=no
more leo_20120812.txt

select * from leo_big_table 普通表sql效能指標

call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0          0           0       硬解析
Execute      1      0.00       0.00          0          0          0           0       執行
Fetch    11067      1.55       1.44          0      13000          0      165984       取操作
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total    11069      1.55       1.44          0      13000          0      165984

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 27

Rows Row Source Operation
------- -----------------------------------------------------------------------
165984 TABLE ACCESS FULL LEO_BIG_TABLE (cr=13000 pr=0 pw=0 time=1992805 us) 輸出165984行，一致性讀13000

********************************************************************************

select * from leo_big_table_compress 壓縮表sql效能指標

call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0          0           0       硬解析
Execute      1      0.00       0.00          0          0          0           0       執行
Fetch    11068      1.14       1.13          0      12775          0      166000       取操作
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total    11070      1.14       1.13          0      12775          0      166000

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 27

Rows Row Source Operation
------- ---------------------------------------------------
166000 TABLE ACCESS FULL LEO_BIG_TABLE_COMPRESS (cr=12775 pr=0 pw=0 time=1495219 us) 輸出166000行，一致性讀12775

********************************************************************************
小結：這裡磁碟讀disk都是0是因為所有記錄都快取在記憶體中，從一致性讀、CPU、elapsed等指標上看，壓縮表還是優於普通表

3.10046事件追蹤查詢select普通表和壓縮表的sql執行計劃（走索引）

> create index leo_big_table_index on leo_big_table(object_id); 普通表建立索引

Index created.

> create index leo_big_table__compress_index on leo_big_table_compress(object_id); 壓縮表建立索引

Index created.

> select * from leo_big_table where object_id=5276;

> select * from leo_big_table_compress where object_id=5276;

select * from leo_big_table where object_id=5276 普通表走索引檢索sql效能指標

call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0          0           0            硬解析
Execute      1      0.00       0.00          0          0          0           0            執行
Fetch        3      0.00       0.00          0         20          0          16            取操作
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total        5      0.00       0.00          0         20          0          16

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 27

Rows     Row Source Operation
------- ---------------------------------------------------
     16 TABLE ACCESS BY INDEX ROWID LEO_BIG_TABLE (cr=20 pr=0 pw=0 time=316 us)             輸出16行，走索引，表一致性讀20
     16   INDEX RANGE SCAN LEO_BIG_TABLE_INDEX (cr=4 pr=0 pw=0 time=169 us)(object id 11066) 因為索引沒壓縮，索引一致性讀4

********************************************************************************

select * from leo_big_table_compress where object_id=5276 壓縮表走索引檢索sql效能指標

call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          2          0           0            硬解析
Execute      1      0.00       0.00          0          0          0           0            執行
Fetch        3      0.00       0.00          0         20          0          16            取操作
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total        5      0.00       0.00          0         22          0          16

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 27

Rows     Row Source Operation
------- ---------------------------------------------------
     16 TABLE ACCESS BY INDEX ROWID LEO_BIG_TABLE_COMPRESS (cr=20 pr=0 pw=0 time=140 us)     輸出16行，走索引，一致性讀22
     16   INDEX RANGE SCAN LEO_BIG_TABLE__COMPRESS_INDEX (cr=4 pr=0 pw=0 time=31 us)(object id 11067) 因為索引沒壓縮，索引一致性讀4

********************************************************************************
小結：我們看到壓縮表比普通表的一致性讀要多2個，原因有二，第一查詢的資料太少，不容易看出來。第二先解壓再顯示，解壓需要額外的開銷。在海量資料環境下，壓縮表走索引的效果應該比普通表要好。

4.10046事件追蹤order by group by普通表和壓縮表的sql執行計劃

select * from leo_big_table order by owner 普通表order by 效能指標

call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          1          0           0         硬解析
Execute      1      0.00       0.00          0          0          0           0         執行
Fetch    11067      2.91      13.21       2116       2087         28      165984         取操作
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total    11069      2.91      13.21       2116       2088         28      165984

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 27

Rows     Row Source Operation
------- ---------------------------------------------------
165984 SORT ORDER BY (cr=2087 pr=2116 pw=2116 time=13536800 us)                全表掃描排序後輸出165984行，一致性讀2087，磁碟讀2116
165984   TABLE ACCESS FULL LEO_BIG_TABLE (cr=2076 pr=0 pw=0 time=7635325 us)

********************************************************************************

select * from leo_big_table_compress order by owner 壓縮表order by 效能指標

call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0          0           0          硬解析
Execute      1      0.00       0.00          0          0          0           0          執行
Fetch    11068      1.65       2.16       2117       1834          4      166000          取操作
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total    11070      1.65       2.17       2117       1834          4      166000

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 27

Rows     Row Source Operation
------- ---------------------------------------------------
166000 SORT ORDER BY (cr=1834 pr=2117 pw=2117 time=2356632 us)                  全表掃描排序後輸出166000行，一致性讀1834，磁碟讀2117
166000   TABLE ACCESS FULL LEO_BIG_TABLE_COMPRESS (cr=1823 pr=0 pw=0 time=1494518 us)

********************************************************************************

select owner,count(*) from leo_big_table group by owner 普通表group by 效能指標

call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0          0           0           硬解析
Execute      1      0.00       0.00          0         15          0           0           執行
Fetch        2      0.10       0.10          0       2076          0           9           取操作
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total        4      0.10       0.10          0       2091          0           9

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 27

Rows     Row Source Operation
------- ---------------------------------------------------
      9 HASH GROUP BY (cr=2091 pr=0 pw=0 time=116605 us)                        分組計算後輸出9行，一致性讀2091
165984   TABLE ACCESS FULL LEO_BIG_TABLE (cr=2076 pr=0 pw=0 time=1825987 us)    全表掃描165984行

********************************************************************************

select owner,count(*) from leo_big_table_compress group by owner 壓縮表group by 效能指標

call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0          0           0            硬解析
Execute      1      0.00       0.00          0          0          0           0            執行
Fetch        2      0.08       0.08          0       1823          0           9            取操作
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total        4      0.08       0.08          0       1823          0           9

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 27

Rows     Row Source Operation
------- ---------------------------------------------------
      9 HASH GROUP BY (cr=1838 pr=0 pw=0 time=94680 us)                           分組計算後輸出9行，一致性讀1838
166000   TABLE ACCESS FULL LEO_BIG_TABLE_COMPRESS (cr=1823 pr=0 pw=0 time=1162061 us) 全表掃描166000行

小結：從上面測試結果來看，壓縮表的效能更好一些，尤其在邏輯讀I/O（LIO）、一致性讀、物理讀I/O（PIO）、CPU等指標上。

九、資料壓縮和DML操作
答：不管從實際應用還是官方說明，“壓縮”技術都是屬於OLAP資料倉儲領域
問：那麼在OLTP領域就不能使用“壓縮”技術了嘛？其實不然，特定場合DML操作可以壓縮處理的，但大部分是不作壓縮的
特定場合
（1）alter table 表名 move compress           給普通表進行壓縮，透過MOVE方式進行資料重組
（2）alter table 表名 move partition 分割槽名 compress 給分割槽表進行壓縮，每個分割槽都可以獨立壓縮，結尾必須新增compress關鍵字否則不做壓縮
（3）create table 表名 compress as select     CTAS 物件屬性
（4）insert /*+ append */                    直接載入插入資料，必壓縮。常規 insert into 不壓縮資料
（5）sqlldr 引數 direct path                  sqlload 直接路徑載入，必壓縮

實驗 1 CTAS

> create table leo_test1 compress as select * from dba_objects;                CTAS 建立初就壓縮
Table created.
> create table leo_test2 compress as select * from dba_objects where 1=2;      只要結構不要資料
Table created.
> select count(*) from leo_test1;                                              有10378行
COUNT(*)
----------
     10378
> select count(*) from leo_test2;                                              只要結構不要資料 successful

COUNT(*)
----------
         0
> insert into leo_test2 select * from dba_objects;                             常規insert into 不壓縮資料
10379 rows created.
> commit;                                                                      提交

> set serveroutput on;                                                         開啟螢幕輸出
> execute show_space('LEO_TEST1');                                            使用CTAS直接壓縮
Unformatted Blocks .....................               0
FS1 Blocks (0-25) .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................              45                            其中用了45個裝滿資料
Total Blocks............................              56                            佔用了56個資料塊
Total Bytes.............................         458,752                            0.43M空間，已壓縮
Total MBytes............................               0
Unused Blocks...........................               5
Unused Bytes............................          40,960
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           9,649
Last Used Block.........................               3

PL/SQL procedure successfully completed.

> execute show_space('LEO_TEST2');                                           常規insert into 不壓縮資料
Unformatted Blocks .....................               0
FS1 Blocks (0-25) .....................               1
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               2
Full Blocks        .....................             115                            其中用了115個裝滿資料
Total Blocks............................             128                            佔用了128個資料塊
Total Bytes.............................       1,048,576
Total MBytes............................               1                            1M空間
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           9,777
Last Used Block.........................               8

PL/SQL procedure successfully completed.

小結：我們看到leo_test2表比leo_test1表佔用的空間幾乎多了一倍，為什麼都用CTAS建立的表空間會不一樣呢，leo_test2表在建立時只有結構沒有資料，壓縮只針對資料壓縮因此沒有資料就沒有壓縮。後來我們使用了常規insert into 插入資料但資料是不壓縮的，這是導致表2比表1佔用空間多的原因。

實驗 2 insert /*+ append */ 直接載入插入資料，必壓縮

> truncate table leo_test2;                                                   清空leo_test2表
Table truncated.
> insert /*+ append */ into leo_test2 select * from dba_objects;              直接載入插入資料必壓縮
10379 rows created.
> commit;                                                                     提交
> execute show_space('LEO_TEST2');                                            分析空間使用情況
Unformatted Blocks .....................               0
FS1 Blocks (0-25) .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................              45                            其中用了45個裝滿資料
Total Blocks............................              56                            佔用了56個資料塊
Total Bytes.............................         458,752                            0.43M空間，已壓縮
Total MBytes............................               0
Unused Blocks...........................               5
Unused Bytes............................          40,960
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           9,705
Last Used Block.........................               3

小結：使用insert /*+ append */ 直接載入插入資料方式，資料必須壓縮狀態

實驗 3 update操作對壓縮資料作解壓縮處理，空間立刻恢復為未壓縮時的容量

> update leo_test1 set object_name=object_name||'leo_nocompress';         leo_test1已經是壓縮狀態，現在對其進行update操作
10378 rows updated.
> commit;
Commit complete.
> execute show_space('LEO_TEST1');                                        update操作對壓縮資料進行解壓縮處理
Unformatted Blocks .....................               0
FS1 Blocks (0-25) .....................               0
FS2 Blocks (25-50) .....................               1
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................              36
Full Blocks        .....................             143                        其中用了143個裝滿資料
Total Blocks............................             256                        佔用了256個資料塊
Total Bytes.............................       2,097,152
Total MBytes............................               2                        2M空間，已經解壓縮了，空間恢復為未壓縮是的容量
Unused Blocks...........................              64
Unused Bytes............................         524,288
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           9,865
Last Used Block.........................              64

PL/SQL procedure successfully completed.

小結：此時leo_test1表的空間立刻恢復為未壓縮時的容量，但比原來的空間還要大呢，是因為update操作將記錄的長度變大了，空間自然也變大了！

實驗 4 給普通表進行壓縮，透過MOVE方式進行資料重組

> alter table leo_test1 move compress;                                   現在對普通表進行壓縮重組資料
Table altered.
> execute show_space('LEO_TEST1');
Unformatted Blocks .....................               0
FS1 Blocks (0-25) .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................              65                       其中用了65個裝滿資料
Total Blocks............................              72                       佔用了72個資料塊
Total Bytes.............................         589,824                       0.56M空間，已經壓縮
Total MBytes............................               0
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           9,849
Last Used Block.........................               8

PL/SQL procedure successfully completed.

小結：同樣的資料經過重組資料進行壓縮，可以比未壓縮時節約了4倍空間，在一些特定場合使用壓縮技術還是非常有效果的。

實驗 5 給分割槽表進行壓縮，每個分割槽都可以獨立壓縮

> create table leo_test_partition1 (object_id,object_name) 我們先建立一個分割槽表，共6個分割槽
partition by range (object_id)
(
partition leo_p1 values less than (1000),
partition leo_p2 values less than (2000),
partition leo_p3 values less than (3000),
partition leo_p4 values less than (4000),
partition leo_p5 values less than (5000),
partition leo_max values less than (maxvalue)
)
as select object_id,object_name from dba_objects;
Table created.

> insert into leo_test_partition1 select * from leo_test_partition1; 多次載入資料

> select count(*) from leo_test_partition1; 33萬行

COUNT(*)
----------
332576

> select count(*) from leo_test_partition1 partition (leo_p2); leo_p2分割槽有32000行

COUNT(*)
----------
32000

> commit; 提交

> select partition_name,compression from user_tab_partitions where table_name='LEO_TEST_PARTITION1'; 6個分割槽都是未壓縮狀態

PARTITION_NAME                 COMPRESS
------------------------------ --------
LEO_MAX                        DISABLED
LEO_P1                         DISABLED
LEO_P2                         DISABLED
LEO_P3                         DISABLED
LEO_P4                         DISABLED
LEO_P5                         DISABLED

> execute show_space('LEO_TEST_PARTITION1',USER,'TABLE PARTITION','LEO_P2');        leo_p2分割槽未壓縮時空間情況
Unformatted Blocks .....................               0
FS1 Blocks (0-25) .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               5
Full Blocks        .....................             105                                  其中用了105個裝滿資料
Total Blocks............................             120                                  佔用了120個資料塊
Total Bytes.............................         983,040                                  0.9375M
Total MBytes............................               0
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               7
Last Used Ext BlockId...................          10,601
Last Used Block.........................               8

PL/SQL procedure successfully completed.

> alter table leo_test_partition1 move partition leo_p2 compress; 單獨對leo_p2分割槽進行壓縮，結尾必須新增compress關鍵字否則不做壓縮
Table altered.
> select partition_name,compression from user_tab_partitions where table_name='LEO_TEST_PARTITION1'; leo_p2分割槽已經壓縮

PARTITION_NAME                 COMPRESS
------------------------------ --------
LEO_MAX                        DISABLED
LEO_P1                         DISABLED
LEO_P2                         ENABLED
LEO_P3                         DISABLED
LEO_P4                         DISABLED
LEO_P5                         DISABLED

> execute show_space('LEO_TEST_PARTITION1',USER,'TABLE PARTITION','LEO_P2');         leo_p2分割槽是壓縮時空間情況
Unformatted Blocks .....................               0
FS1 Blocks (0-25) .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................             104                                   其中用了104個裝滿資料，只比原來少了1個塊
Total Blocks............................             120                                   佔用了120個資料塊
Total Bytes.............................         983,040
Total MBytes............................               0
Unused Blocks...........................               6
Unused Bytes............................          49,152
Last Used Ext FileId....................               7
Last Used Ext BlockId...................          11,297
Last Used Block.........................               2

PL/SQL procedure successfully completed.

小結：我們看到leo_p2分割槽壓縮時比未壓縮時只在使用的資料塊上少了一個，感覺基本上沒怎麼壓，後來分析資料發現，記錄重複率非常小，樣本資料條數也不夠大，導致壓縮比也小，效果稍差。
原理出處：表壓縮比：列中資料重複率越高，壓縮比越大，列中資料重複率越低，壓縮比越小
大家在做測試的時候可以和我的結果對比一下，看看是不是一樣，如果分析的不足，請提出寶貴意見，共同進步：）

十、資料壓縮應用場景
1.靜態資料：這種資料處在不在改變的狀態，只用於查詢和分析，可以壓縮
2.歷史資料：這種資料當前已經不在使用，只用於以後歸檔查詢，可以壓縮
3.海量資料：提高資料處理效率，減少記憶體和磁碟I/O開銷，可以壓縮
4.備份資料：壓縮備份可以節約大量空間給其他物件使用，可以壓縮

Leonarding
2012.8.14
天津&autumn
分享技術~收穫快樂
Blog：http://space.itpub.net/26686207

oracle 海量資料之利器“資料壓縮”實際應用及優缺點彙總

相關文章