使用DataPump的Sample引數來進行資料抽樣

realkid4發表於2015-11-23

 

資料泵DatapumpOracle推出的一項非常方便的資料邏輯備份還原工具。同傳統的Exp/Imp相比,Datapump無論在效能、功能上都提供了很多優勢進步。在實際工作中,大資料抽樣是一個經常出現的業務需求。對一個海量資料表,也許抽取有代表性的資料量就可以完成相應工作。

 

select語句中,我們的確可以使用sample語句去實現對應的一些功能。藉助Datapump,我們也可以實現取樣方式的資料表備份處理。

 

1、環境說明

 

我們選擇Oracle 11gR2進行測試實驗,具體版本為11.2.0.4

 

 

SQL> select * from v$version;

 

BANNER

--------------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

PL/SQL Release 11.2.0.4.0 - Production

CORE     11.2.0.4.0     Production

 

TNS for Linux: Version 11.2.0.4.0 - Production

NLSRTL Version 11.2.0.4.0 – Production

 

 

建立資料表SM_TEST

 

 

SQL> create table scott.sm_test as select * from dba_objects;

Table created

 

SQL> select count(*) from scott.sm_test;

  COUNT(*)

----------

133391

 

 

2、測試實驗

 

ExpdpSample引數,官方的解釋是很少的。

 

 

SAMPLE

Percentage of data to be exported.

 

 

Datapump是可以工作在三個模式下,全庫模式、Schema模式和表模式。針對三種模式,我們都可以使用Sample引數來進行設定。下面透過最簡單的比例值指定來設定取樣比例。

 

 

[oracle@localhost ~]$ expdp \"/ as sysdba\" dumpfile=TEST_1.dmp tables=scott.sm_test sample=50

 

Export: Release 11.2.0.4.0 - Production on Mon Nov 23 13:50:02 2015

 

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

 

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Starting "SYS"."SYS_EXPORT_TABLE_01":  "/******** AS SYSDBA" dumpfile=TEST_1.dmp tables=scott.sm_test sample=50

Estimate in progress using BLOCKS method...

Processing object type TABLE_EXPORT/TABLE/TABLE_DATA

Total estimation using BLOCKS method: 8 MB

Processing object type TABLE_EXPORT/TABLE/TABLE

. . exported "SCOTT"."SM_TEST"                           6.402 MB   66587 rows

Master table "SYS"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded

******************************************************************************

Dump file set for SYS.SYS_EXPORT_TABLE_01 is:

  /u01/app/admin/sicsdb/dpdump/TEST_1.dmp

Job "SYS"."SYS_EXPORT_TABLE_01" successfully completed at Mon Nov 23 13:50:11 2015 elapsed 0 00:00:07

 

 

指定匯出的資料表名稱的表模式中,我們透過sample的百分比引數,來確定特定表的內容。從匯出資料6萬左右的情況,正好是13萬總資料的50%,是合理的結果。

 

如果執行在schema工作模式下,直接指定比例份額,結果如何呢?

 

 

SQL> select count(*) from scott.emp;

 

  COUNT(*)

----------

        14

 

 

[oracle@localhost ~]$ expdp \"/ as sysdba\" dumpfile=TEST_2.dmp schemas=scott sample=50

 

Export: Release 11.2.0.4.0 - Production on Mon Nov 23 13:51:31 2015

 

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

 

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Starting "SYS"."SYS_EXPORT_SCHEMA_01":  "/******** AS SYSDBA" dumpfile=TEST_2.dmp schemas=scott sample=50

Estimate in progress using BLOCKS method...

Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA

Total estimation using BLOCKS method: 8.093 MB

Processing object type SCHEMA_EXPORT/USER

Processing object type SCHEMA_EXPORT/SYSTEM_GRANT

Processing object type SCHEMA_EXPORT/ROLE_GRANT

Processing object type SCHEMA_EXPORT/DEFAULT_ROLE

Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA

Processing object type SCHEMA_EXPORT/TABLE/TABLE

Processing object type SCHEMA_EXPORT/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT

Processing object type SCHEMA_EXPORT/TABLE/INDEX/INDEX

Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/CONSTRAINT

Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS

Processing object type SCHEMA_EXPORT/VIEW/VIEW

Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/REF_CONSTRAINT

Processing object type SCHEMA_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS

. . exported "SCOTT"."SM_TEST"                           6.406 MB   66611 rows

. . exported "SCOTT"."DEPT"                              5.859 KB       1 rows

. . exported "SCOTT"."EMP"                               8.359 KB       9 rows

. . exported "SCOTT"."SALGRADE"                          5.835 KB       3 rows

. . exported "SCOTT"."BONUS"                                 0 KB       0 rows

Master table "SYS"."SYS_EXPORT_SCHEMA_01" successfully loaded/unloaded

******************************************************************************

Dump file set for SYS.SYS_EXPORT_SCHEMA_01 is:

  /u01/app/admin/sicsdb/dpdump/TEST_2.dmp

Job "SYS"."SYS_EXPORT_SCHEMA_01" successfully completed at Mon Nov 23 13:51:48 2015 elapsed 0 00:00:16

 

 

取樣比例在Schema下所有資料表範圍內生效,所有資料表基本上以50%的比例進行取樣處理。

 

最後,如果我們匯出多張資料表,只對某些資料表進行取樣處理,那麼如何指定sample引數呢?

 

 

[oracle@localhost ~]$ expdp \"/ as sysdba\" dumpfile=TEST_3.dmp schemas=scott sample=scott.sm_test:30

 

Export: Release 11.2.0.4.0 - Production on Mon Nov 23 13:52:50 2015

 

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

 

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Starting "SYS"."SYS_EXPORT_SCHEMA_01":  "/******** AS SYSDBA" dumpfile=TEST_3.dmp schemas=scott sample=scott.sm_test:30

Estimate in progress using BLOCKS method...

Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA

Total estimation using BLOCKS method: 4.987 MB

Processing object type SCHEMA_EXPORT/USER

Processing object type SCHEMA_EXPORT/SYSTEM_GRANT

Processing object type SCHEMA_EXPORT/ROLE_GRANT

Processing object type SCHEMA_EXPORT/DEFAULT_ROLE

Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA

Processing object type SCHEMA_EXPORT/TABLE/TABLE

Processing object type SCHEMA_EXPORT/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT

Processing object type SCHEMA_EXPORT/TABLE/INDEX/INDEX

Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/CONSTRAINT

Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS

Processing object type SCHEMA_EXPORT/VIEW/VIEW

Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/REF_CONSTRAINT

Processing object type SCHEMA_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS

. . exported "SCOTT"."SM_TEST"                           3.861 MB   40146 rows

. . exported "SCOTT"."DEPT"                              5.960 KB       5 rows

. . exported "SCOTT"."EMP"                               8.562 KB      14 rows

. . exported "SCOTT"."SALGRADE"                          5.859 KB       5 rows

. . exported "SCOTT"."BONUS"                                 0 KB       0 rows

Master table "SYS"."SYS_EXPORT_SCHEMA_01" successfully loaded/unloaded

******************************************************************************

Dump file set for SYS.SYS_EXPORT_SCHEMA_01 is:

  /u01/app/admin/sicsdb/dpdump/TEST_3.dmp

Job "SYS"."SYS_EXPORT_SCHEMA_01" successfully completed at Mon Nov 23 13:53:00 2015 elapsed 0 00:00:09

 

 

從上面實驗中,如果只是針對特定資料表進行取樣處理,需要透過資料表名:取樣比例方式來指定。

 

3、結論

 

根據筆者的猜測,sample引數在expdp中主要是傳導到select語句的sample語句來實現。

 

 


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/17203031/viewspace-1844553/,如需轉載,請註明出處,否則將追究法律責任。

相關文章