OGG-01031錯誤處理一例 (目的端檔案被其他程式鎖住導致)

perfychi發表於2014-06-05
From:http://blog.itpub.net/25618347/viewspace-730797/


剛剛觀察測試庫gg執行狀況發現,pump程式異常abend,err log報錯日誌如下:
 
2012-05-24 10:45:54  ERROR   OGG-01031  Oracle GoldenGate Capture for Oracle, dpump.prm:  There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply received is Unable to open file "/home/oracle/ggs/dirdat/k1005466" (error 11, Resource temporarily unavailable)).
2012-05-24 10:45:54  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, dpump.prm:  PROCESS ABENDING.

 
該錯誤官方說明如下:
 
OGG-01031: There is a problem in network communication, a remote file problem,
encryption keys for target and source do not match (if using ENCRYPT) or an
unknown error. (Reply received is {0})
Cause: Extract was not able to send data to the target.
Action: Examine the network for an outage between the source and target system.
Make certain that MGRPORT in RMTHOST matches the one in the parameter file
of the remote Manager. Check for errors on the target system that indicate the
Manager or Collector process is not running, or that Collector does not have
privileges to write to the remote trail. Check for encryption errors if the ENCRYPT
option is specified with the RMTHOST or RMTHOSTOPTIONS parameter: This
might indicate that the encryption key that was sent from the source does not
match the one in the ENCKEYS file on the target, or that the key or ENCKEYS file
on the target is missing.
 
貌似是網路原因造成的,排查源庫和目標庫的相關引數發現,都是一致的,也沒什麼問題,再到目標庫檢視相關程式都是running狀態,但是出現了一個比較明顯的告警資訊,如下:
 
2012-05-24 10:57:50  WARNING OGG-01223  Oracle GoldenGate Collector for Oracle:  Unable to lock file "/home/oracle/ggs/dirdat/k1005466" (error 11, Resource temporarily unavailable).  Lock currently held by process id (PID) 1731.
2012-05-24 10:57:50  WARNING OGG-01223  Oracle GoldenGate Collector for Oracle:  Unable to open file "/home/oracle/ggs/dirdat/k1005466" (error 11, Resource temporarily unavailable).
2012-05-24 10:57:50  INFO    OGG-01676  Oracle GoldenGate Collector for Oracle:  Terminating after client disconnect.
2012-05-24 10:59:22  INFO    OGG-00963  Oracle GoldenGate Manager for Oracle, mgr.prm:  Command received from EXTRACT on host (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).
2012-05-24 10:59:22  INFO    OGG-01677  Oracle GoldenGate Collector for Oracle:  Waiting for connection (started dynamically).
2012-05-24 10:59:22  INFO    OGG-00963  Oracle GoldenGate Manager for Oracle, mgr.prm:  Command received from SERVER on host localhost.localdomain (REPORT 10710 7810).
2012-05-24 10:59:22  INFO    OGG-00974  Oracle GoldenGate Manager for Oracle, mgr.prm:  Manager started collector process (Port 7810).
2012-05-24 10:59:22  INFO    OGG-01228  Oracle GoldenGate Collector for Oracle:  Timeout in 300 seconds.

 
仔細觀察發現,源和目標庫的相關操作都是對目標庫的trail檔案k1005466進行操作的,推測大致的過程是這樣的:
啟動pump程式後,該程式要往目標庫的第5466號trail檔案寫資料,而此時目標庫的該檔案被程式編號為1731的程式所鎖定,pump無法獲得該檔案上的鎖,造成等待,延時,最終超時而abend掉(這種情況表面看來很像是由於網路原因造成的,其實不然),之後在目標庫檢視1731號程式的具體情況如下:
 
[oracle@trptdb ggs]$ ps -ef | grep 1731
oracle    1731     1  0 May23 ?        00:00:13 ./server -w 300 -p 7819-12818 -m 7809 -k -l /home/oracle/ggs/ggserr.log
oracle   10699 10451  0 10:58 pts/1    00:00:00 grep 1731

 
發現是ogg的server程式鎖住了第5466號檔案,推測應該是讀取5466號檔案寫入ggserr log所造成吧,具體有待查證,之後乾脆直接kill掉該程式,然後再次啟動源庫的pump程式,2端開始正常應用:
 
源庫ggserr.log:
 
2012-05-24 10:52:26  INFO    OGG-00987  Oracle GoldenGate Command Interpreter for Oracle:  GGSCI command (oracle): start dpump.
2012-05-24 10:52:26  INFO    OGG-00963  Oracle GoldenGate Manager for Oracle, mgr.prm:  Command received from GGSCI on host 127.0.0.1 (START EXTRACT DPUMP ).
2012-05-24 10:52:26  INFO    OGG-00975  Oracle GoldenGate Manager for Oracle, mgr.prm:  EXTRACT DPUMP starting.
2012-05-24 10:52:26  INFO    OGG-00992  Oracle GoldenGate Capture for Oracle, dpump.prm:  EXTRACT DPUMP starting.
2012-05-24 10:52:26  INFO    OGG-00993  Oracle GoldenGate Capture for Oracle, dpump.prm:  EXTRACT DPUMP started.
2012-05-24 10:52:31  INFO    OGG-01226  Oracle GoldenGate Capture for Oracle, dpump.prm:  Socket buffer size set to 27985 (flush size 27985).
2012-05-24 10:52:32  INFO    OGG-01056  Oracle GoldenGate Capture for Oracle, dpump.prm:  Recovery initialization completed for target file /home/oracle/ggs/dirdat/k1005466, at RBA 5777739, CSN 76400322526.
2012-05-24 10:52:32  INFO    OGG-01478  Oracle GoldenGate Capture for Oracle, dpump.prm:  Output file /home/oracle/ggs/dirdat/k1 is using format RELEASE 10.4/11.1.
2012-05-24 10:52:32  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, dpump.prm:  Rolling over remote file /home/oracle/ggs/dirdat/k1005467.
2012-05-24 10:52:32  INFO    OGG-01054  Oracle GoldenGate Capture for Oracle, dpump.prm:  Recovery completed for target file /home/oracle/ggs/dirdat/k1005467, at RBA 1393, CSN 76400322526.
2012-05-24 10:52:32  INFO    OGG-01057  Oracle GoldenGate Capture for Oracle, dpump.prm:  Recovery completed for all targets.
2012-05-24 10:52:33  WARNING OGG-00869  Oracle GoldenGate Capture for Oracle, dpump.prm:  No unique key is defined for table M_GOODS_STOCK2. All viable columns will be used to represent the key, but may not guarantee uniqueness.  KEYCOLS may be used to define the key.
2012-05-24 10:52:33  WARNING OGG-00869  Oracle GoldenGate Capture for Oracle, dpump.prm:  No unique key is defined for table M_GOODS_STOCK. All viable columns will be used to represent the key, but may not guarantee uniqueness.  KEYCOLS may be used to define the key.
2012-05-24 10:52:34  WARNING OGG-00869  Oracle GoldenGate Capture for Oracle, dpump.prm:  No unique key is defined for table M_PAGE_CLICK. All viable columns will be used to represent the key, but may not guarantee uniqueness.  KEYCOLS may be used to define the key.

 
目標庫ggserr.log:
 
2012-05-24 10:59:27  INFO    OGG-01229  Oracle GoldenGate Collector for Oracle:  Connected to :22983.
2012-05-24 10:59:27  INFO    OGG-01669  Oracle GoldenGate Collector for Oracle:  Opening /home/oracle/ggs/dirdat/k1005466 (byte -1, current EOF 5777739).
2012-05-24 10:59:27  INFO    OGG-01670  Oracle GoldenGate Collector for Oracle:  Closing /home/oracle/ggs/dirdat/k1005466.
2012-05-24 10:59:27  INFO    OGG-01669  Oracle GoldenGate Collector for Oracle:  Opening /home/oracle/ggs/dirdat/k1005466 (byte 5777739, current EOF 5777739).
2012-05-24 10:59:27  INFO    OGG-01735  Oracle GoldenGate Collector for Oracle:  Synchronizing /home/oracle/ggs/dirdat/k1005466 to disk.
2012-05-24 10:59:27  INFO    OGG-01735  Oracle GoldenGate Collector for Oracle:  Synchronizing /home/oracle/ggs/dirdat/k1005466 to disk.
2012-05-24 10:59:27  INFO    OGG-01670  Oracle GoldenGate Collector for Oracle:  Closing /home/oracle/ggs/dirdat/k1005466.
2012-05-24 10:59:27  INFO    OGG-01669  Oracle GoldenGate Collector for Oracle:  Opening /home/oracle/ggs/dirdat/k1005467 (byte -1, current EOF 0).
2012-05-24 10:59:27  INFO    OGG-01020  Oracle GoldenGate Delivery for Oracle, rep1.prm:  Processed extract process RESTART_ABEND record at seq 5467, rba 1333 (aborted 0 records).
2012-05-24 10:59:30  INFO    OGG-01735  Oracle GoldenGate Collector for Oracle:  Synchronizing /home/oracle/ggs/dirdat/k1005467 to disk.
2012-05-24 10:59:30  INFO    OGG-01735  Oracle GoldenGate Collector for Oracle:  Synchronizing /home/oracle/ggs/dirdat/k1005467 to disk.
2012-05-24 10:59:30  INFO    OGG-01670  Oracle GoldenGate Collector for Oracle:  Closing /home/oracle/ggs/dirdat/k1005467.
2012-05-24 10:59:30  INFO    OGG-01669  Oracle GoldenGate Collector for Oracle:  Opening /home/oracle/ggs/dirdat/k1005468 (byte -1, current EOF 0).
2012-05-24 10:59:31  INFO    OGG-01735  Oracle GoldenGate Collector for Oracle:  Synchronizing /home/oracle/ggs/dirdat/k1005468 to disk.
2012-05-24 10:59:31  INFO    OGG-01735  Oracle GoldenGate Collector for Oracle:  Synchronizing /home/oracle/ggs/dirdat/k1005468 to disk.
2012-05-24 10:59:31  INFO    OGG-01670  Oracle GoldenGate Collector for Oracle:  Closing /home/oracle/ggs/dirdat/k1005468.
2012-05-24 10:59:31  INFO    OGG-01669  Oracle GoldenGate Collector for Oracle:  Opening /home/oracle/ggs/dirdat/k1005469 (byte -1, current EOF 0).
2012-05-24 10:59:33  INFO    OGG-01735  Oracle GoldenGate Collector for Oracle:  Synchronizing /home/oracle/ggs/dirdat/k1005469 to disk.
2012-05-24 10:59:33  INFO    OGG-01735  Oracle GoldenGate Collector for Oracle:  Synchronizing /home/oracle/ggs/dirdat/k1005469 to disk.
2012-05-24 10:59:33  INFO    OGG-01670  Oracle GoldenGate Collector for Oracle:  Closing /home/oracle/ggs/dirdat/k1005469.
2012-05-24 10:59:33  INFO    OGG-01669  Oracle GoldenGate Collector for Oracle:  Opening /home/oracle/ggs/dirdat/k1005470 (byte -1, current EOF 0).
2012-05-24 10:59:41  WARNING OGG-00869  Oracle GoldenGate Delivery for Oracle, rep1.prm:  No unique key is defined for table 'M_GOODS_STOCK2'. All viable columns will be used to represent the key, but may not guarantee uniqueness.  KEYCOLS may be used to define the key.

 
由此看來遇到問題的時候,errlog裡面會提供大量有用的排錯資訊,這一點也需要細心和耐心,同時可以查閱官方對錯誤和警告的詳細說明,對解決問題很有幫助。
 
記錄一下~~
 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/27042095/viewspace-1176477/,如需轉載,請註明出處,否則將追究法律責任。

相關文章