oracle 海量資料之利器“資料壓縮”實際應用及優缺點彙總

leonarding發表於2012-08-14
引言:“資料壓縮”以前對我來說還是比較新鮮的詞,並不是沒有聽說過,而是沒有實際使用過,之前一直做專案經理工作上也設計到資料庫的運維,但由於儲存設計的比較充裕,在加上效能運轉的還能讓客戶接受,所以壓縮技術基本上沒怎麼用,當時也怕對DML操作有負面影響!之所以現在要實驗這方面的技術,也是因為二期我們的資料量暴增,對機器對效能對運維都用一定的壓力和衝擊,這也說明了這門技術是在特定環境、特定場合下來使用的,下面我們來詳細講一講如何去使用這門技術---海量資料之利器“壓縮compress”
一、應用場合
答:正如上面所說,壓縮技術一般應用在“海量資料”的範疇,在大資料量下並且重複率高的情境下應該的效果才好,適合於OLAP(報表系統)不適合OLTP(線上交易系統),當資料量小時作用還不是很大,只要資料量達到一定程度的時候使用這門技術才有意義。
二、壓縮技術的好處
1.節約空間:我認為壓縮技術的最大好處就是節約空間,在儲存裝置空間很緊張的情況下更需要使用壓縮技術來緩解
2.減少memory/IO開銷:當使用了壓縮技術後,同等記錄對應的資料塊相對較少,自然而然IO開銷就小了
3.提高資料處理效率:原本10個資料塊的,經過壓縮後變成5個資料塊,但還是那些資料,可效率提高了一倍。
三、壓縮原理
答:我們都用過rar壓縮zip壓縮,其實這些壓縮的原理都很相似,當然在細微處理上有不同,資料塊壓縮原理上是使用長度較短符號來代替列中的每個值,由於符號佔用空間小於實際值,從而達到節約空間的效果,當列中重複值越多壓縮效果越好。
四、資料壓縮不適合OLTP系統
1.OLTP是一種多事務短時間片系統,有大量的DML操作,而DML操作我們都知道是有鎖lock的(insert  update  delete)當資料塊中記錄數多起來時,導致訪問這個資料塊的使用者就多,就是產生爭用和等待,從而降低了效率。
2.資料插入、更新和索引鍵值更新時,都要先解壓縮再更新,等待的時間會更長。
五、分割槽表的壓縮
1.分割槽表的每個分割槽都可以獨立壓縮
2.有些時候,分割槽表在建立時不設定分割槽壓縮屬性,當某個分割槽需要壓縮時,再獨立進行壓縮
3.如果在表級設定壓縮屬性的話,那麼所有分割槽都會啟動壓縮功能,反正如果是某幾個分割槽獨立壓縮,表級屬性上什麼也不寫的。
實驗
create table leo_partition1 (object_id,object_name)       建立一個六個分割槽的分割槽表,沒有指定壓縮
partition by range (object_id)
(
partition leo_p1 values less than (1000),
partition leo_p2 values less than (2000),
partition leo_p3 values less than (3000),
partition leo_p4 values less than (4000),
partition leo_p5 values less than (5000),
partition leo_max values less than (maxvalue)
)
as select object_id,object_name from dba_objects;
> select count(*) from leo_partition1;
  COUNT(*)
----------
     10351
> select partition_name,compression from user_tab_partitions where table_name='LEO_PARTITION1'; 
PARTITION_NAME                 COMPRESS
------------------------------ --------
LEO_MAX                        DISABLED
LEO_P1                         DISABLED
LEO_P2                         DISABLED
LEO_P3                         DISABLED
LEO_P4                         DISABLED
LEO_P5                         DISABLED
> alter table leo_partition1 compress;                            例如 定義表級壓縮屬性,此時所有分割槽都壓縮
> alter table leo_partition1 modify partition leo_p1 compress;    我們測試準對leo_p1 和 leo_p5分割槽進行壓縮,來說明分割槽可以獨立壓縮
Table altered.
> alter table leo_partition1 modify partition leo_p5 compress;
Table altered.
> select partition_name,compression from user_tab_partitions where table_name='LEO_PARTITION1';
PARTITION_NAME                 COMPRESS
------------------------------ -----------------------
LEO_MAX                       DISABLED
LEO_P1                         ENABLED
LEO_P2                         DISABLED
LEO_P3                         DISABLED
LEO_P4                         DISABLED
LEO_P5                         ENABLED
6 rows selected.
> alter table leo_partition1 modify partition leo_p5 nocompress;      我們還可以取消壓縮
Table altered.
> select partition_name,compression from user_tab_partitions where table_name='LEO_PARTITION1';   leo_p5分割槽已經取消壓縮
PARTITION_NAME                 COMPRESS
------------------------------ -----------------------
LEO_MAX                      DISABLED
LEO_P1                         ENABLED
LEO_P2                         DISABLED
LEO_P3                         DISABLED
LEO_P4                         DISABLED
LEO_P5                         DISABLED
六、分割槽索引的壓縮
1.分割槽表索引中不能對單個獨立分割槽索引設定壓縮,要麼整體壓縮要麼不壓縮。
2.如果建立分割槽表索引時未設定壓縮,以後都不能在設定了
> create index leo_partition1_index on leo_partition1(object_id) local;                建立時沒有設定壓縮
Index created.
> select index_name,compression from user_indexes where table_name='LEO_PARTITION1';   分割槽表索引沒有壓縮
INDEX_NAME                     COMPRESS
------------------------------ --------
LEO_PARTITION1_INDEX           DISABLED
> select partition_name,compression from user_ind_partitions where index_name='LEO_PARTITION1_INDEX';   獨立分割槽索引也沒有壓縮
PARTITION_NAME                 COMPRESS
------------------------------ --------
LEO_MAX                        DISABLED
LEO_P1                         DISABLED
LEO_P2                         DISABLED
LEO_P3                         DISABLED
LEO_P4                         DISABLED
LEO_P5                         DISABLED
> alter index leo_partition1_index rebuild partition leo_p5 compress;       
alter index leo_partition1_index rebuild partition leo_p5 compress
*
ERROR at line 1:
ORA-28659: COMPRESS must be specified at object level first                     報錯,壓縮必須首先在物件級別指定
3.必須建立分割槽表索引時一次性指定好壓縮屬性,以後才能對獨立分割槽索引壓縮排行修改
> create index leo_partition1_index on leo_partition1(object_id) local compress;       建立時指定
Index created.
> select index_name,compression from user_indexes where table_name='LEO_PARTITION1';   分割槽表索引已壓縮
INDEX_NAME                     COMPRESS
------------------------------ --------
LEO_PARTITION1_INDEX           ENABLED
> select partition_name,index_name,compression from user_ind_partitions where index_name='LEO_PARTITION1_INDEX';獨立分割槽索引也壓
PARTITION_NAME                 INDEX_NAME                     COMPRESS
------------------------------ ------------------------------ --------
LEO_MAX                        LEO_PARTITION1_INDEX           ENABLED
LEO_P1                         LEO_PARTITION1_INDEX           ENABLED
LEO_P2                         LEO_PARTITION1_INDEX           ENABLED
LEO_P3                         LEO_PARTITION1_INDEX           ENABLED
LEO_P4                         LEO_PARTITION1_INDEX           ENABLED
LEO_P5                         LEO_PARTITION1_INDEX           ENABLED
6 rows selected.
> alter index leo_partition1_index rebuild partition leo_p3 nocompress;         獨立分割槽索引取消壓縮
Index altered.
> alter index leo_partition1_index rebuild partition leo_p4 nocompress;
Index altered.
> alter index leo_partition1_index rebuild partition leo_p5 nocompress;
Index altered.
> select partition_name,index_name,compression from user_ind_partitions where index_name='LEO_PARTITION1_INDEX';
PARTITION_NAME                 INDEX_NAME                     COMPRESS
------------------------------ ------------------------------ --------
LEO_MAX                        LEO_PARTITION1_INDEX           ENABLED
LEO_P1                         LEO_PARTITION1_INDEX           ENABLED
LEO_P2                         LEO_PARTITION1_INDEX           ENABLED
LEO_P3                         LEO_PARTITION1_INDEX           DISABLED
LEO_P4                         LEO_PARTITION1_INDEX           DISABLED
LEO_P5                         LEO_PARTITION1_INDEX           DISABLED
七、資料壓縮對空間影響
1.節約空間:資料壓縮會節約大量儲存空間,我們建立一個壓縮表和一個普通表,來對比節約空間的效果
> create table leo_compress_table compress as select * from dba_objects;            壓縮表
Table created.
> create table leo_nocompress_table  as select * from dba_objects;                  普通表
Table created.
> select segment_name,extent_id,bytes from user_extents where segment_name in ('LEO_COMPRESS_TABLE','LEO_NOCOMPRESS_TABLE');
SEGMENT_NAME            EXTENT_ID      BYTES
---------------------- ---------- ----------
LEO_COMPRESS_TABLE              0      65536
LEO_COMPRESS_TABLE              1      65536
LEO_COMPRESS_TABLE              2      65536
LEO_COMPRESS_TABLE              3      65536
LEO_COMPRESS_TABLE              4      65536
LEO_COMPRESS_TABLE              5      65536
LEO_COMPRESS_TABLE              6      65536      同樣資料量,壓縮表佔用7個區,使用0.43M空間
LEO_NOCOMPRESS_TABLE            0      65536
LEO_NOCOMPRESS_TABLE            1      65536
LEO_NOCOMPRESS_TABLE            2      65536
LEO_NOCOMPRESS_TABLE            3      65536
LEO_NOCOMPRESS_TABLE            4      65536
LEO_NOCOMPRESS_TABLE            5      65536
LEO_NOCOMPRESS_TABLE            6      65536
LEO_NOCOMPRESS_TABLE            7      65536
LEO_NOCOMPRESS_TABLE            8      65536
LEO_NOCOMPRESS_TABLE            9      65536
LEO_NOCOMPRESS_TABLE           10      65536
LEO_NOCOMPRESS_TABLE           11      65536
LEO_NOCOMPRESS_TABLE           12      65536
LEO_NOCOMPRESS_TABLE           13      65536
LEO_NOCOMPRESS_TABLE           14      65536
LEO_NOCOMPRESS_TABLE           15      65536
LEO_NOCOMPRESS_TABLE           16    1048576      同樣資料量,普通表佔用17個區,使用2M空間
小結:我們可以看出,普通表空間是壓縮表空間的4倍多,顯然壓縮可以節約大量的儲存空間,這在磁碟空間不足的情況下很有作用
2.索引壓縮比:索引鍵值重複率越高,壓縮比越大,索引鍵值重複率越低,壓縮比越小
我們來比較一下,列中資料重複率的高低對壓縮比的影響
> create table leo_compress_table1 as select object_id,object_name from dba_objects;     object_id列沒有重複值
Table created.
> create table leo_compress_table2 as select 2 object_id,object_name from dba_objects;   object_id列全是2,完全重複
Table created.
> create index leo_compress_table1_index on leo_compress_table1 (object_id) compress;    為表1建立壓縮索引
Index created.
> create index leo_compress_table2_index on leo_compress_table2 (object_id) compress;    為表2建立壓縮索引
Index created.
> select segment_name,extent_id,bytes from user_extents where segment_name in
('LEO_COMPRESS_TABLE1_INDEX','LEO_COMPRESS_TABLE2_INDEX');
SEGMENT_NAME                EXTENT_ID      BYTES
-------------------------- ---------- ----------
LEO_COMPRESS_TABLE2_INDEX           0      65536
LEO_COMPRESS_TABLE2_INDEX           1      65536
LEO_COMPRESS_TABLE2_INDEX           2      65536    重複率高表2的壓縮索引,佔用3個區,使用0.18M空間
LEO_COMPRESS_TABLE1_INDEX           0      65536
LEO_COMPRESS_TABLE1_INDEX           1      65536
LEO_COMPRESS_TABLE1_INDEX           2      65536
LEO_COMPRESS_TABLE1_INDEX           3      65536
LEO_COMPRESS_TABLE1_INDEX           4      65536    重複率低表1的壓縮索引,佔用5個區,使用0.31M空間
小結:看來索引鍵值重複率的高低會直接影響壓縮索引的儲存空間,也從側面反映了在高重複率的環境下壓縮意義才大。
3.那麼在索引鍵值重複率低的情況下,壓縮索引與不壓縮索引的區別有多大呢,使用show_space()儲存過程分析空間使用情況,我列出了show_space()儲存過程定義指令碼完美版同時支援ASSM and MSSM 段物件儲存屬性,先要建立show_space()儲存過程才能使用,預設是沒有的,作者 TOM大叔,有好幾個版本,用我這個就可以了,然後重點set serveroutput on;顯示輸出結果,否則只顯示執行完成但沒有結果輸出
> create table leo_table1 as select *  from dba_objects;       表1和表3結構資料完全一致,都是普通表
Table created.
> create table leo_table3 as select *  from dba_objects;
Table created.
> create index leo_compress_table1_index on leo_table1 (object_id) compress;    建立壓縮索引,object_id列沒有重複值,重複率極低
Index created.
> create index leo_table3_index on leo_table3 (object_id) ;    建立普通索引,object_id列沒有重複值,重複率極低
Index created.
create or replace procedure show_space                               定義show_space()儲存過程,支援ASSM and MSSM 段物件儲存屬性
( p_segname in varchar2,
  p_owner   in varchar2 default user,
  p_type    in varchar2 default 'TABLE',
  p_partition in varchar2 default NULL )
-- this procedure uses authid current user so it can query DBA_*
-- views using privileges from a ROLE and so it can be installed
-- once per database, instead of once per user that wanted to use it
authid current_user
as
    l_free_blks                 number;
    l_total_blocks              number;
    l_total_bytes               number;
    l_unused_blocks             number;
    l_unused_bytes              number;
    l_LastUsedExtFileId         number;
    l_LastUsedExtBlockId        number;
    l_LAST_USED_BLOCK           number;
    l_segment_space_mgmt        varchar2(255);
    l_unformatted_blocks number;
    l_unformatted_bytes number;
    l_fs1_blocks number; l_fs1_bytes number;
    l_fs2_blocks number; l_fs2_bytes number;
    l_fs3_blocks number; l_fs3_bytes number;
    l_fs4_blocks number; l_fs4_bytes number;
    l_full_blocks number; l_full_bytes number;
        -- inline procedure to print out numbers nicely formatted
        -- with a simple label
    procedure p( p_label in varchar2, p_num in number )
    is
    begin
        dbms_output.put_line( rpad(p_label,40,'.') ||
                              to_char(p_num,'999,999,999,999') );
    end;
begin
   -- this query is executed dynamically in order to allow this procedure
   -- to be created by a user who has access to DBA_SEGMENTS/TABLESPACES
   -- via a role as is customary.
   -- NOTE: at runtime, the invoker MUST have access to these two
   -- views!
   -- this query determines if the object is a ASSM object or not
   begin
      execute immediate
          'select ts.segment_space_management
             from dba_segments seg, dba_tablespaces ts
            where seg.segment_name      = :p_segname
              and (:p_partition is null or
                  seg.partition_name = :p_partition)
              and seg.owner = :p_owner
              and seg.tablespace_name = ts.tablespace_name'
             into l_segment_space_mgmt
            using p_segname, p_partition, p_partition, p_owner;
   exception
       when too_many_rows then
          dbms_output.put_line
          ( 'This must be a partitioned table, use p_partition => ');
          return;
   end;

   -- if the object is in an ASSM tablespace, we must use this API
   -- call to get space information, else we use the FREE_BLOCKS
   -- API for the user managed segments
   if l_segment_space_mgmt = 'AUTO'
   then
     dbms_space.space_usage
     ( p_owner, p_segname, p_type, l_unformatted_blocks,
       l_unformatted_bytes, l_fs1_blocks, l_fs1_bytes,
       l_fs2_blocks, l_fs2_bytes, l_fs3_blocks, l_fs3_bytes,
       l_fs4_blocks, l_fs4_bytes, l_full_blocks, l_full_bytes, p_partition);
    
     p( 'Unformatted Blocks ', l_unformatted_blocks );
     p( 'FS1 Blocks (0-25)  ', l_fs1_blocks );
     p( 'FS2 Blocks (25-50) ', l_fs2_blocks );
     p( 'FS3 Blocks (50-75) ', l_fs3_blocks );
     p( 'FS4 Blocks (75-100)', l_fs4_blocks );
     p( 'Full Blocks        ', l_full_blocks );
  else
     dbms_space.free_blocks(
       segment_owner     => p_owner,
       segment_name      => p_segname,
       segment_type      => p_type,
       freelist_group_id => 0,
       free_blks         => l_free_blks);
    
     p( 'Free Blocks', l_free_blks );
  end if;
  -- and then the unused space API call to get the rest of the
  -- information
  dbms_space.unused_space
  ( segment_owner     => p_owner,
    segment_name      => p_segname,
    segment_type      => p_type,
    partition_name    => p_partition,
    total_blocks      => l_total_blocks,
    total_bytes       => l_total_bytes,
    unused_blocks     => l_unused_blocks,
    unused_bytes      => l_unused_bytes,
    LAST_USED_EXTENT_FILE_ID => l_LastUsedExtFileId,
    LAST_USED_EXTENT_BLOCK_ID => l_LastUsedExtBlockId,
    LAST_USED_BLOCK => l_LAST_USED_BLOCK );
    p( 'Total Blocks', l_total_blocks );
    p( 'Total Bytes', l_total_bytes );
    p( 'Total MBytes', trunc(l_total_bytes/1024/1024) );
    p( 'Unused Blocks', l_unused_blocks );
    p( 'Unused Bytes', l_unused_bytes );
    p( 'Last Used Ext FileId', l_LastUsedExtFileId );
    p( 'Last Used Ext BlockId', l_LastUsedExtBlockId );
    p( 'Last Used Block', l_LAST_USED_BLOCK );
end;
/
Procedure created.
> set serveroutput on;                                  必須設定輸出show_space()儲存過程結果,否則只顯示執行完成但沒有結果輸出
> execute show_space('LEO_COMPRESS_TABLE1_INDEX',USER,'INDEX');    壓縮索引leo_compress_table1_index空間使用情況
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               1
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................              31                 其中用了31個
Total Blocks............................              40                 佔用了40個資料塊
Total Bytes.............................         327,680                 0.31M
Total MBytes............................               0
Unused Blocks...........................               3
Unused Bytes............................          24,576
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           4,569
Last Used Block.........................               5
PL/SQL procedure successfully completed.
> execute show_space('LEO_TABLE3_INDEX',USER,'INDEX');             普通索引leo_table3_index空間使用情況
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               1
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................              22                 其中用了22個
Total Blocks............................              32                 佔用了32個資料塊
Total Bytes.............................         262,144                 0.25M
Total MBytes............................               0
Unused Blocks...........................               5
Unused Bytes............................          40,960
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           4,601
Last Used Block.........................               3
PL/SQL procedure successfully completed.
小結:那麼在索引鍵值重複率低的情況下,壓縮索引比普通索引佔用的空間還多,這是為什麼呢,壓縮原理就是把鍵值用符號代替,重複鍵值越多冗餘空間越多,但我們測試鍵值幾乎是沒有重複的,還要騰出額外空間給對映列表(鍵值:符號對應表),所以壓縮索引使用的空間就會更多。
八、資料壓縮對效能影響
原因:資料壓縮後,掃描資料塊變少了,導致邏輯I/O(LIO)減少(一致性讀減少),同時有更多的資料塊快取到記憶體中,導致物理I/O(PIO)減少,當然CPU消耗自然也少了。
1.海量資料普通表和海量資料壓縮表空間比較
> create table leo_big_table as select *  from dba_objects;                        海量資料普通表,16萬條記錄
Table created.
> insert into leo_big_table select * from leo_big_table;
10374 rows created.
> /
20748 rows created.
> /
41496 rows created.
> /
82992 rows created.
> select count(*) from leo_big_table;
  COUNT(*)
----------
    165984
> commit;
Commit complete.
> create table leo_big_table_compress compress  as select *  from dba_objects;     海量資料壓縮表,16萬條記錄
Table created.
> insert into leo_big_table_compress select * from leo_big_table_compress;
10375 rows created.
> /
20750 rows created.
> /
41500 rows created.
> /
83000 rows created.
> select count(*) from leo_big_table_compress;
  COUNT(*)
----------
    166000
>
> commit;
Commit complete.
> execute show_space('LEO_BIG_TABLE');                      海量資料普通表,16萬條記錄
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               1
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               5
Full Blocks        .....................           2,064          其中2064個裝滿了資料
Total Blocks............................           2,176          佔用了2176個資料塊
Total Bytes.............................      17,825,792
Total MBytes............................              17          17M空間
Unused Blocks...........................              64
Unused Bytes............................         524,288
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           6,793
Last Used Block.........................              64
PL/SQL procedure successfully completed.
> execute show_space('LEO_BIG_TABLE_COMPRESS');             海量資料壓縮表,16萬條記錄
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               1
FS4 Blocks (75-100).....................              31
Full Blocks        .....................           1,786          其中1786個裝滿了資料
Total Blocks............................           1,920          佔用了1920個資料塊
Total Bytes.............................      15,728,640
Total MBytes............................              15          15M空間
Unused Blocks...........................              64
Unused Bytes............................         524,288
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           8,713
Last Used Block.........................              64
PL/SQL procedure successfully completed.
小結:顯而易見,海量資料的環境下,壓縮表比普通表佔用的空間更少
2.10046事件追蹤查詢select普通表和壓縮表的sql執行計劃(全表掃描)
> alter session set events '10046 trace name context forever,level 12';
> select * from leo_big_table;
165984 rows selected.
> select * from leo_big_table_compress;
166000 rows selected.
tkprof leo_ora_1983.trc leo_20120812.txt sys=no
more leo_20120812.txt
select * from leo_big_table                                                            普通表sql效能指標
call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0       硬解析
Execute      1      0.00       0.00          0          0          0           0       執行
Fetch    11067      1.55       1.44          0      13000          0      165984       取操作
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    11069      1.55       1.44          0      13000          0      165984
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 27
Rows     Row Source Operation
-------  -----------------------------------------------------------------------
 165984  TABLE ACCESS FULL LEO_BIG_TABLE (cr=13000 pr=0 pw=0 time=1992805 us)          輸出165984行,一致性讀13000
********************************************************************************
select * from leo_big_table_compress                                                   壓縮表sql效能指標
call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0       硬解析
Execute      1      0.00       0.00          0          0          0           0       執行
Fetch    11068      1.14       1.13          0      12775          0      166000       取操作
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    11070      1.14       1.13          0      12775          0      166000
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 27
Rows     Row Source Operation
-------  ---------------------------------------------------
 166000  TABLE ACCESS FULL LEO_BIG_TABLE_COMPRESS (cr=12775 pr=0 pw=0 time=1495219 us)  輸出166000行,一致性讀12775
********************************************************************************
小結:這裡磁碟讀disk都是0是因為所有記錄都快取在記憶體中,從一致性讀、CPU、elapsed等指標上看,壓縮表還是優於普通表
3.10046事件追蹤查詢select普通表和壓縮表的sql執行計劃(走索引)
> create index leo_big_table_index on leo_big_table(object_id);                       普通表建立索引
Index created.
> create index leo_big_table__compress_index on leo_big_table_compress(object_id);    壓縮表建立索引
Index created.
> select * from leo_big_table where object_id=5276;                                  
> select * from leo_big_table_compress where object_id=5276;                         
select * from leo_big_table where object_id=5276                                            普通表走索引檢索sql效能指標
call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0            硬解析
Execute      1      0.00       0.00          0          0          0           0            執行
Fetch        3      0.00       0.00          0         20          0          16            取操作
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.00       0.00          0         20          0          16
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 27
Rows     Row Source Operation
-------  ---------------------------------------------------
     16  TABLE ACCESS BY INDEX ROWID LEO_BIG_TABLE (cr=20 pr=0 pw=0 time=316 us)             輸出16行,走索引,表一致性讀20
     16   INDEX RANGE SCAN LEO_BIG_TABLE_INDEX (cr=4 pr=0 pw=0 time=169 us)(object id 11066) 因為索引沒壓縮,索引一致性讀4
********************************************************************************
select * from leo_big_table_compress where object_id=5276                                   壓縮表走索引檢索sql效能指標
call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          2          0           0            硬解析
Execute      1      0.00       0.00          0          0          0           0            執行
Fetch        3      0.00       0.00          0         20          0          16            取操作
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.00       0.00          0         22          0          16
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 27
Rows     Row Source Operation
-------  ---------------------------------------------------
     16  TABLE ACCESS BY INDEX ROWID LEO_BIG_TABLE_COMPRESS (cr=20 pr=0 pw=0 time=140 us)     輸出16行,走索引,一致性讀22
     16   INDEX RANGE SCAN LEO_BIG_TABLE__COMPRESS_INDEX (cr=4 pr=0 pw=0 time=31 us)(object id 11067) 因為索引沒壓縮,索引一致性讀4
********************************************************************************
小結:我們看到壓縮表比普通表的一致性讀要多2個,原因有二,第一 查詢的資料太少,不容易看出來。第二 先解壓再顯示,解壓需要額外的開銷。在海量資料環境下,壓縮表走索引的效果應該比普通表要好。
4.10046事件追蹤order by group by普通表和壓縮表的sql執行計劃
select * from leo_big_table order by owner                                               普通表order by 效能指標
call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0         硬解析
Execute      1      0.00       0.00          0          0          0           0         執行
Fetch    11067      2.91      13.21       2116       2087         28      165984         取操作
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    11069      2.91      13.21       2116       2088         28      165984
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 27
Rows     Row Source Operation
-------  ---------------------------------------------------
 165984  SORT ORDER BY (cr=2087 pr=2116 pw=2116 time=13536800 us)                全表掃描排序後輸出165984行,一致性讀2087,磁碟讀2116
 165984   TABLE ACCESS FULL LEO_BIG_TABLE (cr=2076 pr=0 pw=0 time=7635325 us)
********************************************************************************
select * from leo_big_table_compress order by owner                                       壓縮表order by 效能指標
call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0          硬解析
Execute      1      0.00       0.00          0          0          0           0          執行
Fetch    11068      1.65       2.16       2117       1834          4      166000          取操作
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    11070      1.65       2.17       2117       1834          4      166000
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 27
Rows     Row Source Operation
-------  ---------------------------------------------------
 166000  SORT ORDER BY (cr=1834 pr=2117 pw=2117 time=2356632 us)                  全表掃描排序後輸出166000行,一致性讀1834,磁碟讀2117
 166000   TABLE ACCESS FULL LEO_BIG_TABLE_COMPRESS (cr=1823 pr=0 pw=0 time=1494518 us)
********************************************************************************
select owner,count(*) from leo_big_table group by owner                                    普通表group by 效能指標
call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0           硬解析
Execute      1      0.00       0.00          0         15          0           0           執行
Fetch        2      0.10       0.10          0       2076          0           9           取操作
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.10       0.10          0       2091          0           9
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 27
Rows     Row Source Operation
-------  ---------------------------------------------------
      9  HASH GROUP BY (cr=2091 pr=0 pw=0 time=116605 us)                        分組計算後輸出9行,一致性讀2091
 165984   TABLE ACCESS FULL LEO_BIG_TABLE (cr=2076 pr=0 pw=0 time=1825987 us)    全表掃描165984行
********************************************************************************
select owner,count(*) from leo_big_table_compress group by owner                            壓縮表group by 效能指標
call     count       cpu    elapsed 磁碟讀disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0            硬解析
Execute      1      0.00       0.00          0          0          0           0            執行
Fetch        2      0.08       0.08          0       1823          0           9            取操作
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.08       0.08          0       1823          0           9
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 27
Rows     Row Source Operation
-------  ---------------------------------------------------
      9  HASH GROUP BY (cr=1838 pr=0 pw=0 time=94680 us)                           分組計算後輸出9行,一致性讀1838
 166000   TABLE ACCESS FULL LEO_BIG_TABLE_COMPRESS (cr=1823 pr=0 pw=0 time=1162061 us)  全表掃描166000行
小結:從上面測試結果來看,壓縮表的效能更好一些,尤其在邏輯讀I/O(LIO)、一致性讀、物理讀I/O(PIO)、CPU等指標上。
 
九、資料壓縮和DML操作
答:不管從實際應用還是官方說明,“壓縮”技術都是屬於OLAP資料倉儲領域
問:那麼在OLTP領域就不能使用“壓縮”技術了嘛? 其實不然,特定場合DML操作可以壓縮處理的,但大部分是不作壓縮的
特定場合
(1)alter table 表名 move compress           給普通表進行壓縮,透過MOVE方式進行資料重組
(2)alter table 表名 move partition 分割槽名 compress  給分割槽表進行壓縮,每個分割槽都可以獨立壓縮,結尾必須新增compress關鍵字否則不做壓縮
(3)create table 表名 compress as select     CTAS  物件屬性
(4)insert /*+ append */                    直接載入插入資料,必壓縮。 常規 insert into 不壓縮資料
(5)sqlldr 引數 direct path                  sqlload 直接路徑載入,必壓縮
實驗 1   CTAS
> create table leo_test1 compress as select * from dba_objects;                CTAS 建立初就壓縮
Table created.
> create table leo_test2 compress as select * from dba_objects where 1=2;      只要結構不要資料
Table created.
> select count(*) from leo_test1;                                              有10378行
  COUNT(*)
----------
     10378
> select count(*) from leo_test2;                                              只要結構不要資料 successful
  COUNT(*)
----------
         0
> insert into leo_test2 select * from dba_objects;                             常規insert  into  不壓縮資料
10379 rows created.
> commit;                                                                      提交
> set serveroutput on;                                                         開啟螢幕輸出
> execute show_space('LEO_TEST1');                                            使用CTAS直接壓縮
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................              45                            其中用了45個裝滿資料
Total Blocks............................              56                            佔用了56個資料塊
Total Bytes.............................         458,752                            0.43M空間,已壓縮
Total MBytes............................               0                              
Unused Blocks...........................               5
Unused Bytes............................          40,960
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           9,649
Last Used Block.........................               3
PL/SQL procedure successfully completed.
> execute show_space('LEO_TEST2');                                           常規insert into 不壓縮資料
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               1
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               2
Full Blocks        .....................             115                            其中用了115個裝滿資料
Total Blocks............................             128                            佔用了128個資料塊
Total Bytes.............................       1,048,576
Total MBytes............................               1                            1M空間
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           9,777
Last Used Block.........................               8
PL/SQL procedure successfully completed.
小結:我們看到leo_test2表比leo_test1表佔用的空間幾乎多了一倍,為什麼都用CTAS建立的表空間會不一樣呢,leo_test2表在建立時只有結構沒有資料,壓縮只針對資料壓縮因此沒有資料就沒有壓縮。後來我們使用了常規insert into 插入資料但資料是不壓縮的,這是導致表2比表1佔用空間多的原因。
實驗 2   insert /*+ append */               直接載入插入資料,必壓縮
> truncate table leo_test2;                                                   清空leo_test2表
Table truncated.
> insert /*+ append */ into leo_test2 select * from dba_objects;              直接載入插入資料必壓縮
10379 rows created.
> commit;                                                                     提交
> execute show_space('LEO_TEST2');                                            分析空間使用情況
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................              45                            其中用了45個裝滿資料
Total Blocks............................              56                            佔用了56個資料塊
Total Bytes.............................         458,752                            0.43M空間,已壓縮
Total MBytes............................               0
Unused Blocks...........................               5
Unused Bytes............................          40,960
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           9,705
Last Used Block.........................               3
小結:使用insert /*+ append */ 直接載入插入資料方式,資料必須壓縮狀態
實驗 3   update操作對壓縮資料作解壓縮處理,空間立刻恢復為未壓縮時的容量
> update leo_test1 set object_name=object_name||'leo_nocompress';         leo_test1已經是壓縮狀態,現在對其進行update操作
10378 rows updated.
> commit;
Commit complete.
> execute show_space('LEO_TEST1');                                        update操作對壓縮資料進行解壓縮處理
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               1
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................              36
Full Blocks        .....................             143                        其中用了143個裝滿資料
Total Blocks............................             256                        佔用了256個資料塊
Total Bytes.............................       2,097,152
Total MBytes............................               2                        2M空間,已經解壓縮了,空間恢復為未壓縮是的容量
Unused Blocks...........................              64
Unused Bytes............................         524,288
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           9,865
Last Used Block.........................              64
PL/SQL procedure successfully completed.
小結:此時leo_test1表的空間立刻恢復為未壓縮時的容量,但比原來的空間還要大呢,是因為update操作將記錄的長度變大了,空間自然也變大了!
實驗 4   給普通表進行壓縮,透過MOVE方式進行資料重組
> alter table leo_test1 move compress;                                   現在對普通表進行壓縮重組資料
Table altered.
> execute show_space('LEO_TEST1');
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................              65                       其中用了65個裝滿資料
Total Blocks............................              72                       佔用了72個資料塊
Total Bytes.............................         589,824                       0.56M空間,已經壓縮
Total MBytes............................               0
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               7
Last Used Ext BlockId...................           9,849
Last Used Block.........................               8
PL/SQL procedure successfully completed.
小結:同樣的資料經過重組資料進行壓縮,可以比未壓縮時節約了4倍空間,在一些特定場合使用壓縮技術還是非常有效果的。
實驗 5   給分割槽表進行壓縮,每個分割槽都可以獨立壓縮
> create table leo_test_partition1 (object_id,object_name)             我們先建立一個分割槽表,共6個分割槽
partition by range (object_id)
(
partition leo_p1 values less than (1000),
partition leo_p2 values less than (2000),
partition leo_p3 values less than (3000),
partition leo_p4 values less than (4000),
partition leo_p5 values less than (5000),
partition leo_max values less than (maxvalue)
)
as select object_id,object_name from dba_objects;
Table created.
> insert into leo_test_partition1 select * from leo_test_partition1;    多次載入資料
> select count(*) from leo_test_partition1;                             33萬行
  COUNT(*)
----------
    332576
> select count(*) from leo_test_partition1 partition (leo_p2);          leo_p2分割槽有32000行
  COUNT(*)
----------
     32000
> commit;                                                               提交
> select partition_name,compression from user_tab_partitions where table_name='LEO_TEST_PARTITION1';   6個分割槽都是未壓縮狀態
PARTITION_NAME                 COMPRESS
------------------------------ --------
LEO_MAX                        DISABLED
LEO_P1                         DISABLED
LEO_P2                         DISABLED
LEO_P3                         DISABLED
LEO_P4                         DISABLED
LEO_P5                         DISABLED
> execute show_space('LEO_TEST_PARTITION1',USER,'TABLE PARTITION','LEO_P2');        leo_p2分割槽未壓縮時空間情況
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               5
Full Blocks        .....................             105                                  其中用了105個裝滿資料
Total Blocks............................             120                                  佔用了120個資料塊
Total Bytes.............................         983,040                                  0.9375M
Total MBytes............................               0
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               7
Last Used Ext BlockId...................          10,601
Last Used Block.........................               8
PL/SQL procedure successfully completed.
> alter table leo_test_partition1 move partition leo_p2 compress; 單獨對leo_p2分割槽進行壓縮,結尾必須新增compress關鍵字否則不做壓縮
Table altered.
> select partition_name,compression from user_tab_partitions where table_name='LEO_TEST_PARTITION1';  leo_p2分割槽已經壓縮
PARTITION_NAME                 COMPRESS
------------------------------ --------
LEO_MAX                        DISABLED
LEO_P1                         DISABLED
LEO_P2                         ENABLED
LEO_P3                         DISABLED
LEO_P4                         DISABLED
LEO_P5                         DISABLED
> execute show_space('LEO_TEST_PARTITION1',USER,'TABLE PARTITION','LEO_P2');         leo_p2分割槽是壓縮時空間情況
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................             104                                   其中用了104個裝滿資料,只比原來少了1個塊
Total Blocks............................             120                                   佔用了120個資料塊
Total Bytes.............................         983,040
Total MBytes............................               0
Unused Blocks...........................               6
Unused Bytes............................          49,152
Last Used Ext FileId....................               7
Last Used Ext BlockId...................          11,297
Last Used Block.........................               2
PL/SQL procedure successfully completed.
小結:我們看到leo_p2分割槽壓縮時比未壓縮時只在使用的資料塊上少了一個,感覺基本上沒怎麼壓,後來分析資料發現,記錄重複率非常小,樣本資料條數也不夠大,導致壓縮比也小,效果稍差。
原理出處:表壓縮比:列中資料重複率越高,壓縮比越大,列中資料重複率越低,壓縮比越小
大家在做測試的時候可以和我的結果對比一下,看看是不是一樣,如果分析的不足,請提出寶貴意見,共同進步:)
 
十、資料壓縮應用場景
1.靜態資料:這種資料處在不在改變的狀態,只用於查詢和分析,可以壓縮
2.歷史資料:這種資料當前已經不在使用,只用於以後歸檔查詢,可以壓縮
3.海量資料:提高資料處理效率,減少記憶體和磁碟I/O開銷,可以壓縮
4.備份資料:壓縮備份可以節約大量空間給其他物件使用,可以壓縮
 
Leonarding
2012.8.14
天津&autumn
分享技術~收穫快樂
Blog:http://space.itpub.net/26686207

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26686207/viewspace-740928/,如需轉載,請註明出處,否則將追究法律責任。

相關文章