GoldenGate Extract程式hang問題解決一例

Jet_Zhang發表於2015-05-25
一直執行正常的extract程式突然hang住了,起初懷疑是不是找不到歸檔所致,但是細細的檢查了下,extract所需的歸檔都還在,而且日誌中也沒報任何錯。而且將extract程式停掉後,也能正常起來,沒有任何錯誤或者告警:

  1. // ggserr.log
  2. 2015-05-25 11:03:27 INFO OGG-01513 Oracle GoldenGate Capture for Oracle, elis.prm: Positioning to (Thread 1) Sequence 5824, RBA 290544656, SCN 1415.2504802626.
  3. 2015-05-25 11:03:27 INFO OGG-01516 Oracle GoldenGate Capture for Oracle, elis.prm: Positioned to (Thread 1) Sequence 5824, RBA 290544656, SCN 1415.2504802626, May 20, 2015 4:50:25 PM.
  4. 2015-05-25 11:03:27 INFO OGG-01513 Oracle GoldenGate Capture for Oracle, elis.prm: Positioning to (Thread 2) Sequence 4798, RBA 18781712, SCN 1415.2504799356.
  5. 2015-05-25 11:03:27 INFO OGG-01516 Oracle GoldenGate Capture for Oracle, elis.prm: Positioned to (Thread 2) Sequence 4798, RBA 18781712, SCN 1415.2504799356, May 20, 2015 4:50:19 PM.
  6. 2015-05-25 11:03:27 INFO OGG-00993 Oracle GoldenGate Capture for Oracle, elis.prm: EXTRACT ELIS started.
  7. //extract程式的rpt檔案
  8. 2015-05-25 11:03:27 INFO OGG-00546 Default thread stack size: 196608.
  9. 2015-05-25 11:03:27 INFO OGG-00547 Increasing thread stack size from 196608 to 1048576.
  10. 2015-05-25 11:03:27 INFO OGG-01513 Positioning to (Thread 1) Sequence 5824, RBA 290544656, SCN 1415.2504802626.
  11. 2015-05-25 11:03:27 INFO OGG-01516 Positioned to (Thread 1) Sequence 5824, RBA 290544656, SCN 1415.2504802626, May 20, 2015 4:50:25 PM.
  12. 2015-05-25 11:03:27 INFO OGG-01513 Positioning to (Thread 2) Sequence 4798, RBA 18781712, SCN 1415.2504799356.
  13. 2015-05-25 11:03:27 INFO OGG-01516 Positioned to (Thread 2) Sequence 4798, RBA 18781712, SCN 1415.2504799356, May 20, 2015 4:50:19 PM.
  14. 2015-05-25 11:03:27 INFO OGG-01517 Position of first record processed for Thread 1, Sequence 5824, RBA 290544656, SCN 1415.2504802623, May 20, 2015 4:50:25 PM.
  15. 2015-05-25 11:03:27 INFO OGG-01644 BOUNDED RECOVERY: COMPLETE: for object pool 1: p33619994_Redo Thread 1 at SeqNo: 5824, RBA: 290545168, SCN: 1415.2504802626 (6079883526466), Thread: 1.
  16. 2015-05-25 11:03:27 INFO OGG-01055 Recovery initialization completed for target file /ogg/ggate/data/dirext/lis/ea002534, at RBA 1109.
  17. 2015-05-25 11:03:27 INFO OGG-01478 Output file /ogg/ggate/data/dirext/lis/ea is using format RELEASE 11.2.
  18. 2015-05-25 11:03:27 INFO OGG-01517 Position of first record processed for Thread 2, Sequence 4798, RBA 18781712, SCN 1415.2504799356, May 20, 2015 4:50:19 PM.
  19. 2015-05-25 11:03:27 INFO OGG-01026 Rolling over remote file /ogg/ggate/data/dirext/lis/ea002534.
  20. 2015-05-25 11:03:27 INFO OGG-01053 Recovery completed for target file /ogg/ggate/data/dirext/lis/ea002535, at RBA 1109.
  21. 2015-05-25 11:03:27 INFO OGG-01057 Recovery completed for all targets.
從日誌中看,extract程式沒有問題,但是extract程式實際上是執行不正常的:

  1. GGSCI (hxddlis01) 8> info all
  2. Program Status Group Lag at Chkpt Time Since Chkpt
  3. MANAGER RUNNING
  4. JAGENT RUNNING
  5. EXTRACT RUNNING ELIS 96:14:58 00:00:04
  6. EXTRACT RUNNING PL2IS 00:00:00 00:00:00
  7. EXTRACT RUNNING PLIS 00:00:00 00:00:09
extract程式的lag越來越大,而且trail檔案也不再增長。這情況真是讓人一頭霧水。我們先來看看當前系統的執行環境:

  1. GoldenGate:11.2.1.0.27
  2. Database: 11.2.0.3 兩節點的RAC,使用ASM
  3. OS: AIX 6.1
然後搜尋Oracle Support,找到一篇文章1432994.1,和碰到的情形有點相像。在我們的環境中也是使用了DBLOGREADER,但是不同的是DBLOGREADERBUFSIZE已經配置成1M:

  1. ...
  2. TRANLOGOPTIONS BUFSIZE 1048576
  3. TRANLOGOPTIONS DBLOGREADER, DBLOGREADERBUFSIZE 1048576
  4. --TRANLOGOPTIONS _NOREADAHEAD ANY
  5. EXTTRAIL /ogg/ggate/data/dirext/lis/ea
  6. ...
這也是AIX下能配置的最大值。難道該值還是太大的緣故?所以就嘗試將該值改成了512K:

  1. ...
  2. TRANLOGOPTIONS BUFSIZE 1048576
  3. TRANLOGOPTIONS DBLOGREADER, DBLOGREADERBUFSIZE 524288
  4. --TRANLOGOPTIONS _NOREADAHEAD ANY
  5. EXTTRAIL /ogg/ggate/data/dirext/lis/ea
  6. ...
然後重啟extract程式,結果就正常了。估計使用DBLOGREADER還是有不少的bug。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/13885898/viewspace-1670715/,如需轉載,請註明出處,否則將追究法律責任。

相關文章