SQL*Loader 筆記 (二) 效能最佳化

dbasdk發表於2014-07-25
最近客戶有個測試專案,需要將大量的文字檔案用sqlloder載入到oracle資料庫中。特此做出如下實驗,演練一下sqlloader的幾個重要引數。

主要參考官方文件上關於redo產生的控制
Minimize use of the redo log 

One way to speed a direct load dramatically is to minimize use of the redo log. There are three ways to do this. You can disable archiving, you can specify that the load is unrecoverable, or you can set the SQL NOLOGGING parameter for the objects being loaded. This section discusses all methods. 


關閉歸檔日誌模式

關閉所有例項
# srvctl stop instance -d devdb -i "devdb1,devdb2”
mount節點1
# srvctl start instance -d devdb -n node1 -o mount
關閉歸檔
SYS@devdb1 >alter database noarchivelog;
關閉所有例項
# srvctl stop instance -d devdb -i "devdb1,devdb2”
啟動所有例項
# srvctl start instance -d devdb -i "devdb1,devdb2"



檢視sqlloader所需的檔案及大小

$ du -sh *
4.0K    c_stock.ctl                <==控制檔案
1021M   c_stock_large.txt   <==資料檔案為大小為 1021MB將近1GB,同等大小的檔案還有N個
4.0K    c_stock.log               <==輸入日誌檔案

控制檔案
$ cat c_stock.ctl
load data
into table tpcc.c_stock
append
FIELDS TERMINATED BY '|'
(
s_i_id,
s_w_id,
s_quantity,
s_dist_01,
s_dist_02,
s_dist_03,
s_dist_04,
s_dist_05,
s_dist_06,
s_dist_07,
s_dist_08,
s_dist_09,
s_dist_10,
s_data,
s_ytd,
s_order_cnt,
s_remote_cnt
)

資料檔案
$ cat c_stock.txt
1|1|80|RfIBeBlKPqdPouSIcIdvFmh3|3gZUHn1oqzPvY03Df4iroZZo|x2J0ePT5VJPsUGH93ksRbjie|drvpekDBabpcQ85BFKUqepn5|zFKyOSAP7V4b3xA1wP3cMWEh|wmvyBIrYbZnNINsDxkDoG3g2|H75jUxoeJKB8gSUiI5Te
JcH4|m8iQ0mJxkEE2ZS4wqvsxccDK|8mMFmhzxgFd6QMrYeTQ1BKnQ|4Fb6GEEEOg9ZnwmtZnnFYEt0|7SLI5rAzsw5U0zy6wnhQxBFLM9zfspqq71W1imqyIjiwGHq|0|0|0
2|1|71|SxGgW9Rz6eCprSDpKneQzsf5|znr5SFVB2pHPmmcjoEwFmXSU|aWzxcEsBPHuycfX2ssFDCHMQ|96lkT22ROrePTuez1zV1EGRV|YnoqSY4SUf2B7UiKaknTErgt|QZkjv12hcgvV2pDKttbobhWb|r9VVrZe5Pme8b6geazX2
PoC8|HBmIG7IX6rGlgIhTSjSSeWWc|lJ3YVvWsV7Yr4wdYNHaSefBW|mhEq3ordVk1GFMYoHLGB5HpJ|npisQ5FWxTNMaQvkG2OiO2Pnwej3rS6F5eXL|0|0|0
…….
……. 略去剩餘部分

觀察當前的log序列號
SYS@devdb1 >select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARCHIVED  STATUS     FIRST_CHANGE# FIRST_TIME          NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- ------------------- ------------ -------------------
         1          1        133   52428800        512          2 NO        CURRENT          1935103 2014/07/23 19:16:57   2.8147E+14
         2          1        132   52428800        512          2 NO        INACTIVE         1929910 2014/07/23 19:16:45      1935103 2014/07/23 19:16:57
         3          2         51   52428800        512          2 NO        CURRENT          1696072 2014/07/23 18:39:50   2.8147E+14
         4          2         50   52428800        512          2 NO        INACTIVE         1660718 2014/07/23 13:53:34      1696072 2014/07/23 18:39:50 


在oracle使用者下執行sqlloader,引數silent=all 不輸出螢幕資訊

$ sqlldr userid=tpcc/tpcc control=c_stock.ctl data=c_stock_large.txt silent=all


檢視日誌輸出
$ cat c_stock.log

SQL*Loader: Release 11.2.0.3.0 - Production on Wed Jul 23 19:31:46 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Control File:   c_stock.ctl
Data File:      c_stock_large.txt
Bad File:     c_stock_large.bad
Discard File:  none specified
 
 (Allow all discards)

Number to load: ALL
Number to skip: 0
Errors allowed: 50
Bind array:     64 rows, maximum of 256000 bytes
Continuation:    none specified
Path used:      Conventional
Silent options: FEEDBACK, ERRORS and DISCARDS

Table TPCC.C_STOCK, loaded from every logical record.
Insert option in effect for this table: APPEND

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
S_I_ID                              FIRST     *   |       CHARACTER           
S_W_ID                               NEXT     *   |       CHARACTER           
S_QUANTITY                           NEXT     *   |       CHARACTER           
S_DIST_01                            NEXT     *   |       CHARACTER           
S_DIST_02                            NEXT     *   |       CHARACTER           
S_DIST_03                            NEXT     *   |       CHARACTER           
S_DIST_04                            NEXT     *   |       CHARACTER           
S_DIST_05                            NEXT     *   |       CHARACTER           
S_DIST_06                            NEXT     *   |       CHARACTER           
S_DIST_07                            NEXT     *   |       CHARACTER           
S_DIST_08                            NEXT     *   |       CHARACTER           
S_DIST_09                            NEXT     *   |       CHARACTER           
S_DIST_10                            NEXT     *   |       CHARACTER           
S_DATA                               NEXT     *   |       CHARACTER           
S_YTD                                NEXT     *   |       CHARACTER           
S_ORDER_CNT                          NEXT     *   |       CHARACTER           
S_REMOTE_CNT                         NEXT     *   |       CHARACTER           

value used for ROWS parameter changed from 64 to 58

Table TPCC.C_STOCK:
  3500000 Rows successfully loaded.      <==成功讀取350萬條記錄
  0 Rows not loaded due to data errors.   <==沒有錯誤
  0 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.


Space allocated for bind array:                 254388 bytes(58 rows)
Read   buffer bytes: 1048576

Total logical records skipped:          0
Total logical records read:       3500000
Total logical records rejected:         0
Total logical records discarded:        0

Run began on Wed Jul 23 19:31:46 2014
Run ended on Wed Jul 23 19:36:11 2014

Elapsed time was:     00:04:24.27     <==執行時間4分鐘24秒27
CPU time was:         00:00:50.02

檢視執行sqlloader後log的變化,測試環境沒有其他應用,可以認為log的變化全部來自於sqlloder載入資料產生。

SYS@devdb1 >select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARCHIVED  STATUS     FIRST_CHANGE# FIRST_TIME          NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- ------------------- ------------ -------------------
         1          1        159   52428800        512          2 NO        CURRENT          2153818 2014/07/23 19:36:03   2.8147E+14
         2          1        158   52428800        512          2 NO        INACTIVE         2148648 2014/07/23 19:35:53      2153818 2014/07/23 19:36:03
         3          2         51   52428800        512          2 NO        CURRENT          1696072 2014/07/23 18:39:50   2.8147E+14
         4          2         50   52428800        512          2 NO        INACTIVE         1660718 2014/07/23 13:53:34      1696072 2014/07/23 18:39:50

本機是rac雙節點環境,sqlloader在節點一的伺服器上執行,可以看出日誌的序列從剛才的133,132變化到了159,158。 節點二的日誌不發生變化。


下面開始最佳化sqlloader

一,最佳化:直接路徑載入

SYS@devdb1 >truncate table tpcc.c_stock;    <==清理環境

$ sqlldr userid=tpcc/tpcc control=c_stock.ctl data=c_stock_large.txt direct=true

$ cat c_stock.log
…...
Number to load: ALL
Number to skip: 0
Errors allowed: 50
Continuation:    none specified
Path used:      Direct    <==直接路徑載入
…..
  3500000 Rows successfully loaded.     <==成功讀取350萬條記錄
…...
Run began on Wed Jul 23 19:54:40 2014
Run ended on Wed Jul 23 19:55:10 2014

Elapsed time was:     00:00:30.31   <==執行時間0分鐘30秒31, 效果顯著!
CPU time was:         00:00:14.94


SYS@devdb1 >select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARCHIVED  STATUS     FIRST_CHANGE# FIRST_TIME          NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- ------------------- ------------ -------------------
         1          1        159   52428800        512          2 NO        INACTIVE         2153818 2014/07/23 19:36:03      2161051 2014/07/23 19:54:58
         2          1        160   52428800        512          2 NO        CURRENT          2161051 2014/07/23 19:54:58   2.8147E+14
         3          2         51   52428800        512          2 NO        CURRENT          1696072 2014/07/23 18:39:50   2.8147E+14
         4          2         50   52428800        512          2 NO        INACTIVE         1660718 2014/07/23 13:53:34      1696072 2014/07/23 18:39:50

雖然直接載入的方式不經過sql引擎,不會產生redo和undo,但是我們看到還是會有微乎其微的日誌增長。

二,最佳化:nologging
SYS@devdb1 >truncate table tpcc.c_stock;    <==清理環境

SYS@devdb1 >alter table tpcc.c_stock nologging;  <==更改表屬性,強制不生成日誌

$ cat c_stock.log
…..
Path used:      Direct
Run began on Wed Jul 23 20:07:54 2014
Run ended on Wed Jul 23 20:08:30 2014

Elapsed time was:     00:00:35.18  <==執行時間0分鐘35秒18, 比剛才慢了將近5秒
CPU time was:         00:00:17.18

SYS@devdb1 >select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARCHIVED  STATUS     FIRST_CHANGE# FIRST_TIME          NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- ------------------- ------------ -------------------
         1          1        159   52428800        512          2 NO        INACTIVE         2153818 2014/07/23 19:36:03      2161051 2014/07/23 19:54:58
         2          1        160   52428800        512          2 NO        CURRENT          2161051 2014/07/23 19:54:58   2.8147E+14
         3          2         51   52428800        512          2 NO        CURRENT          1696072 2014/07/23 18:39:50   2.8147E+14
         4          2         50   52428800        512          2 NO        INACTIVE         1660718 2014/07/23 13:53:34      1696072 2014/07/23 18:39:50

日誌序列已經不再變化了,可是並沒有使速度變得更快,反而慢了5秒

三,最佳化:unrecoverable  
SYS@devdb1 >truncate table tpcc.c_stock;

unrecoverable  <==在控制檔案頭加入該引數,不再向資料庫控制檔案中寫入SCN
load data
into table tpcc.c_stock
append
FIELDS TERMINATED BY '|'
(
s_i_id,
s_w_id,
s_quantity,
s_dist_01,
s_dist_02,
s_dist_03,
s_dist_04,
s_dist_05,
s_dist_06,
s_dist_07,
s_dist_08,
s_dist_09,
s_dist_10,
s_data,
s_ytd,
s_order_cnt,
s_remote_cnt



$ sqlldr userid=tpcc/tpcc control=c_stock.ctl data=c_stock_large.txt direct=true silent=all
$ cat c_stock.log
…..
Path used:      Direct
,,,
  3500000 Rows successfully loaded.
 ….
Run began on Wed Jul 23 20:17:57 2014
Run ended on Wed Jul 23 20:18:29 2014

Elapsed time was:     00:00:32.90  <==執行時間0分鐘32秒90
CPU time was:         00:00:16.33
效果平平,沒有什麼變化


四,最佳化 — parallel
SYS@devdb1 >truncate table tpcc.c_stock;

$ sqlldr userid=tpcc/tpcc control=c_stock.ctl data=c_stock_large.txt direct=true parallel=true
cat c_stock.log
…..
Path used:      Direct - with parallel option.  <==直接路徑載入,並行全開啟了
…..
Run began on Wed Jul 23 20:23:36 2014
Run ended on Wed Jul 23 20:24:12 2014
...
Elapsed time was:     00:00:35.53  <==執行時間0分鐘35秒53, 最慢的一次。
CPU time was:         00:00:19.03


實驗總結:
0,如果是測試環境做資料載入,一定要先關閉歸檔日誌(alter database no archivelog)。如果歸檔日誌不關閉,後面的所有最佳化都沒有效果。
1, sqlloder的direct 載入方式效果最突出。 
2,設定表為nologing雖然可以徹底避免日誌的生成,但是最佳化的效果已經不明顯了。
3,parallel需要在同時執行多個控制檔案,訪問多個資料檔案是才會有效果。例如:
sqlldr USERID=scott CONTROL=load1.ctl DIRECT=TRUE PARALLEL=true
sqlldr USERID=scott CONTROL=load2.ctl DIRECT=TRUE PARALLEL=true
sqlldr USERID=scott CONTROL=load3.ctl DIRECT=TRUE PARALLEL=true

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29734436/viewspace-1226639/,如需轉載,請註明出處,否則將追究法律責任。

相關文章