ERROR OGG-01028
最近客戶1套OGG系統,抽取程式經常報OGG-01028錯誤,該錯誤一般是由於OGG Bug 10356426引起的,日誌如下:
ERROR OGG-01028 Oracle GoldenGate Capture for Oracle, extoa.prm: encountered commit SCN 2561.1344492794 (11000755737850) that is not greater than the highest SCN already processed 2561.1344493093 (11000755738149) Redo Thread 1 (1) xid 78.31.31562 (0x004e.001f.00007b4a), starting seq.rba 12835.396542992, scn 2561.1344492791 (11000755737847), commit seq.rba 12835.396544448 commit timestamp 2013-03-28 22:26:20.000000.
2013-03-28 22:26:22 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, extoa.prm: PROCESS ABENDING.
2013-03-28 22:27:22 WARNING OGG-01475 Oracle GoldenGate Manager for Oracle, mgr.prm: Cannot automatically restart EXTRACT EXTOA, which abended due to an out of order transaction. Issue ETROLLOVER to advance the output trail sequence past the current trail sequence and restart. Then, use ALTER EXTSEQNO on the subsequent pump EXTRACT, or REPLICAT, process group to start reading from the new trail file created by ALTER ETROLLOVER; the downstream process will not automatically switch to the new trail file.
2013-03-28 22:28:22 WARNING OGG-01475 Oracle GoldenGate Manager for Oracle, mgr.prm: Cannot automatically restart EXTRACT EXTOA, which abended due to an out of order transaction. Issue ETROLLOVER to advance the output trail sequence past the current trail sequence and restart. Then, use ALTER EXTSEQNO on the subsequent pump EXTRACT, or REPLICAT, process group to start reading from the new trail file created by ALTER ETROLLOVER; the downstream process will not automatically switch to the new trail file.
2013-03-28 22:29:13 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpeoa.prm: Command received from GGSCI: STATS daily totalsonly *.
參考 MOS ID 957112.1文章
2013-03-28 22:28:22 WARNING OGG-01475 Oracle GoldenGate Manager for Oracle, mgr.prm: Cannot automatically restart EXTRACT EXTOA, which abended due to an out of order transaction. Issue ETROLLOVER to advance the output trail sequence past the current trail sequence and restart. Then, use ALTER EXTSEQNO on the subsequent pump EXTRACT, or REPLICAT, process group to start reading from the new trail file created by ALTER ETROLLOVER; the downstream process will not automatically switch to the new trail file.
2013-03-28 22:29:13 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpeoa.prm: Command received from GGSCI: STATS daily totalsonly *.
參考 MOS ID 957112.1文章
處理步驟如下
1.檢查2個節點的SCN 情況
select thread#,LAST_REDO_CHANGE# from v$thread;
1.檢查2個節點的SCN 情況
select thread#,LAST_REDO_CHANGE# from v$thread;
2.檢查2個節點的OS TIME 是否一致
SQL> alter session set nls_date_format='YYYY-MM-DD hh24:mi:ss';
SQL> alter session set nls_date_format='YYYY-MM-DD hh24:mi:ss';
Session altered.
SQL> select sysdate,dbtimezone from dual;
node1$ date
node2$ date
node1$ date
node2$ date
3.檢查2個節點的redo genration 是否有很大差距
select inst_id,round(sum(blocks*block_size/(1024*1024*1024)))
from gv$archived_log where completion_time> sysdate-7
group by inst_id;
4.透過2個節點的AWR報告 或SQL檢視7天內的log file parallel write及log file sync延遲時間
--在node1 執行
select 1 inst_id,avg(avg_ms)
from (
select
round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),1) avg_ms
from (
select
s.dbid,
to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI') btime,
total_waits count_end,
time_waited_micro/1000 time_ms_end,
Lag (e.time_waited_micro/1000)
OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg,
Lag (e.total_waits)
OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg
from
DBA_HIST_SYSTEM_EVENT e,
DBA_HIST_SNAPSHOT s
where
s.BEGIN_INTERVAL_TIME and s.snap_id=e.snap_id
and e.event_name in (
'log file sync',
'log file parallel write'
)
and s.dbid=e.dbid
)
order by btime
)
/
--在node2 再執行一次
select 2 inst_id,avg(avg_ms)
();
--在node1 執行
select 1 inst_id,avg(avg_ms)
from (
select
round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),1) avg_ms
from (
select
s.dbid,
to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI') btime,
total_waits count_end,
time_waited_micro/1000 time_ms_end,
Lag (e.time_waited_micro/1000)
OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg,
Lag (e.total_waits)
OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg
from
DBA_HIST_SYSTEM_EVENT e,
DBA_HIST_SNAPSHOT s
where
s.BEGIN_INTERVAL_TIME
and e.event_name in (
'log file sync',
'log file parallel write'
)
and s.dbid=e.dbid
)
order by btime
)
/
--在node2 再執行一次
select 2 inst_id,avg(avg_ms)
();
5.修改extract 引數檔案
THREADOPTIONS MAXCOMMITPROPAGATIONDELAY 90000 IOLATENCY 180000
--MAXCOMMITPROPAGATIONDELAY範圍為0-90000 預設為3000ms (3s)
--IOLATENCY最大為180000 預設為1500ms (1.5s)
--MAXCOMMITPROPAGATIONDELAY範圍為0-90000 預設為3000ms (3s)
--IOLATENCY最大為180000 預設為1500ms (1.5s)
透過以上5個步驟,確認排除故障隱患。
6.具體操作Solution Details:
6.1. Do an ETROLLOVER on Extract, and take note of the new sequence number of the trail file.
操作之前在源端檢視最新的trail file號
ls -lt /goldengate/dirdat/oaxxxx,那麼這裡需要記錄的應該是 xxxx+1,即ALTER EXTRACT extoa, ETROLLOVER後生成的。
6.1. Do an ETROLLOVER on Extract, and take note of the new sequence number of the trail file.
操作之前在源端檢視最新的trail file號
ls -lt /goldengate/dirdat/oaxxxx,那麼這裡需要記錄的應該是 xxxx+1,即ALTER EXTRACT extoa, ETROLLOVER後生成的。
ALTER EXTRACT extoa, ETROLLOVER
info extoa detail
info dpeoa detail
info extoa detail
info dpeoa detail
6.2. Start extract
START EXTRACT extoa
6.3. Send PUMP, LOGEND, to see if it's at the end of the previous trail.
SEND EXTRACT DPEOA, LOGEND
6.4. Once it is at the end of the trail file, You must stop the pump, and do an ETROLLOVER for it too. Take note of the new trail file sequence number that is created from this step.
操作之前在再次在源端檢視最新的trail file號/goldengate/dirdat/oaxxxx,並在操作以下步驟後記錄最新的sequence number
STOP EXTRACT DPEOA
ALTER EXTRACT DPEOA, ETROLLOVER
操作之前在再次在源端檢視最新的trail file號/goldengate/dirdat/oaxxxx,並在操作以下步驟後記錄最新的sequence number
STOP EXTRACT DPEOA
ALTER EXTRACT DPEOA, ETROLLOVER
6.5. Alter the pump to SEQNO to the new trail file created from step #1.
這裡的#####取6.1 記錄的xxxx+1
這裡的#####取6.1 記錄的xxxx+1
ALTER EXTRACT DPEOA, EXTSEQNO ##### EXTRBA 0
6.6. Restart pump
START EXTRACT DPEOA
6.7. Send Replicat, LOGEND to make sure it has processed all the remaining data, and stop Replicat.
SEND REPLICAT REPOA, LOGEND ----提示yes時,執行下一步
STOP REPLICAT REPOA
SEND REPLICAT REPOA, LOGEND ----提示yes時,執行下一步
STOP REPLICAT REPOA
6.8. If replicat is not at end of trail, generate a report and forcestop replicat
SEND REPLICAT REPOA, REPORT
STOP REPLICAT REPOA!
SEND REPLICAT REPOA, REPORT
STOP REPLICAT REPOA!
6.9. Add the following parameters to replicat parameter file to allow replicat to process each trail record as a single transaction, or set them to 1 if you have any of these parameters:
GROUPTRANSOPS 1
MAXTRANSOPS 1
GROUPTRANSOPS 1
MAXTRANSOPS 1
6.10. Restart replicat
START REPLICAT REPOA
6.11. Once replicat has completely processed the trail, stop the replicat
STOP REPLICAT REPOA
STOP REPLICAT REPOA
6.12. Edit the replicat parameter file:
- Add parameter HANDLECOLLISIONS to Replicat parameter file
- Remove or comment out GROUPTRANSOPS and MAXTRANSOPS or revert them back to their original values.
- Add parameter HANDLECOLLISIONS to Replicat parameter file
- Remove or comment out GROUPTRANSOPS and MAXTRANSOPS or revert them back to their original values.
Note: There are pre-conditions for using HANDLECOLLISIONS . Ther emust be either
a. no pkupdates
or
b. extract has "FETCHOPTIONS FETCHPKUPDATECOLS"
b. extract has "FETCHOPTIONS FETCHPKUPDATECOLS"
Also all the tables should have Primary key or unique index on the table to avoid data integrity issues when using handlecollisions.
要確保所有的同步的表均有主鍵或者唯一索引才能避免資料完整性
要確保所有的同步的表均有主鍵或者唯一索引才能避免資料完整性
6.13. ALTER REPLICAT, SEQNO to the new trail file created in step #4.
ALTER REPLICAT REPOA, EXTSEQNO ###### EXTRBA 0
6.14. Start Replicat
START REPLICAT REPOA
6.15. Once Replicat has processed the out of order SCN operations, disable HANDLECOLLISIONS. You could also look for the CSN and wait for Replicat to checkpoint past it.
一但replicat處理完無序的SCN操作後就可以取消掉這個引數了。
SEND REPLICAT REPOA, NOHANDLECOLLISIONS.
一但replicat處理完無序的SCN操作後就可以取消掉這個引數了。
SEND REPLICAT REPOA, NOHANDLECOLLISIONS.
6.16.Edit the replicat parameter and comment out the HANDLECOLLISIONS parameter. You do not need to stop/restart replicat. This will ensure that on any subsequent replicat restarts the parameter is disabled.
Note:
If the out of order transactions happen to the SAME record on the SAME table, then the Replicat will probably abend in step 7 or step 10. If that is the case, look at the discard file find out that particular record (the primary key value). Then add handlecollisions, continue with the steps below. Later, once the Replicat caught up, that particular record needs to be manually synced.
However the chance of an out of order transactions happening to the SAME record on SAME table is rare. Given an example, this will mean that the user application insert record A from node 1, then immediately it updated this record A on node 2. Out of order transactions usually happens to different tables or different records on same table.
In other words, if you do see collisions been handled in this recovery procedure, that means you have to re-sync ONLY those particular records that have collisions.
If the out of order transactions happen to the SAME record on the SAME table, then the Replicat will probably abend in step 7 or step 10. If that is the case, look at the discard file find out that particular record (the primary key value). Then add handlecollisions, continue with the steps below. Later, once the Replicat caught up, that particular record needs to be manually synced.
However the chance of an out of order transactions happening to the SAME record on SAME table is rare. Given an example, this will mean that the user application insert record A from node 1, then immediately it updated this record A on node 2. Out of order transactions usually happens to different tables or different records on same table.
In other words, if you do see collisions been handled in this recovery procedure, that means you have to re-sync ONLY those particular records that have collisions.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/27571661/viewspace-761121/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- OGG-01028 Opening file errorError
- OGG-01028 Recovery record is missing ERRORError
- OGG-01028 Recovery record is missing ERROR[續]Error
- GoldenGate extract process abended with error OGG-01028的處理GoError
- OGG-01028錯誤處理
- OGG-01028 Record position is beyond end of recovery
- OGG-01028問題處理記錄
- OGG-01028 Recovery record is missing from log 的BUG處理
- mount error(5): Input/output errorError
- 【ERROR】OPatch failed with error code 73ErrorAI
- error:slave communication error with ASMErrorASM
- ERROR: slave communication error with ASMErrorASM
- MMON encountered error 959, clearing the errorError
- 如何解決"Parse error: syntax error"Error
- GoldenGate不支援壓縮表問題OGG-01028 Table compression is not supportedGo
- DB error due to HP-UX Error:23ErrorUX
- CRS ERROR - PRKC-1073 - ErrorError
- Error page: / Error infos: DedeCms錯誤警告Error
- Error for iOSErroriOS
- error事件Error事件
- Error: no such columnError
- DepthClipEnable errorError
- Error StackError
- innobackupex: Error: noError
- ERROR 1045 (28000): ProxySQL Error: 報錯ErrorSQL
- Original error: Error: socket hang upError
- ERROR 2026 (HY000): SSL connection error: unknown error numberError
- OGG-01028 和ORA-00308: 開啟歸檔檔案失敗
- Golang 學習——error 和建立 error 原始碼解析GolangError原始碼
- Setup had an error Error: At least one of these paths should existErrorAST
- MySQL ERROR Got an error reading communication packetsMySqlErrorGo
- [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11Error
- Install Error : ERROR:SQLException null Resultset [ID 875002.1]ErrorSQLExceptionNull
- FATAL ERROR IN TWO-TASK SERVER: error = 12569ErrorServer
- Last_IO_Error: Got fatal error 1236ASTErrorGo
- ORACLE STREAM ERROROracleError
- Exception和ErrorExceptionError
- jQuery.error()jQueryError