OGG EXTRACT / REPLICAT CHECKPOINT RBA IS LARGER THAN LOCAL TRAIL SIZE
Applies to:
Oracle GoldenGate - Version: 10.0.0.0 and later [Release: 10.0.0 and later ]
Oracle GoldenGate - Version: 10.0.0.0 and later [Release: 10.0.0 and later]
Information in this document applies to any platform.
Symptoms
After a system outage, a pump Extract or Replicat may get stuck on a trail file even if there are more trail files available in machine for the reader object to process. This issue occurs when the pump extract / Replicat checkpoint RBA is larger than the local trail file size.
Cause
In general, Datapump Extracts and Replicats read the current trail file data from the disk cache instead from the physical file. There is therefore a chance that the Datapump or Replicat will checkpoint an RBA which is still in cache. If there is a disk or system outage, the data in the cache may be lost. This occurs because the system doesn't have a chance to flush the data to the disk.
On restart of the writer process(archive/ redo log reading Extract or data pump), the writer process finds that the trail file size is smaller than the reader's (Datapump or Replicat) checkpoint RBA. The Datapump or Replicat hangs waiting for more data to come. The writer (archive/ redo log reading Extract or data pump) process then goes through the Full Audit Recovery(FAR) process, recovering the current trail file, closing o the existing trail file and creating a new trail file( with the next sequence number). The writer will then continue writing to the new trail file. It will put the "old data" which was been already processed or committed by the reader into the new sequence number trail file. This old data was originally in the disk cache and not written to the disk file upon system outage and restart. However, part of the data was already processed by the reader(Datapump or Replicat) before it died.
Solution
In the scenario described above, records already processed by the reader (Datapump or Replicat) may be written to the new trail file. Manual intervention is required to avoid duplicate processing.
The manual recovery process requires finding any records that will be duplicated (from the reader's point of view) in seqno X+1 (assuming current seqno is X) with total length of ((reader's too-big checkpoint) - (actual size of seqno X)). The reader should be altered to at a point in the trail follwoing the one with the too large checkpoint reference. The reader i saltered to the RBA ofthe record just after the RestartAbend record + (totaled length of duplicated records)).
The calculated RBA to which the reader trail file is altered should be the address of a record that starts a transaction. The start of a transaction is indicated by a TransInd value of (x00- first record in transaction) or (x03- only record in the transaction).
A live example which explains this scenario :
GGSCI (ORACLEREP) 6> info repdb
REPLICAT REPDB Last Started 2010-06-17 18:25 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:04 ago)
Log Read Checkpoint File ./dirdat/rp000012
First Record RBA 53966725
2. sh>ls -tlr dirdat
total 3822488
-rw-rw-rw- 1 ggs dba 299999891 Jun 16 22:09 rp000011
-rw-rw-rw- 1 ggs dba 53966568 Jun 17 18:15 rp000012 << where the checkpoint is pointing
-rw-rw-rw- 1 ggs dba 256319510 Jun 17 18:51 rp000013 << the next available trail file
------------------------------------------------------------------------------------------------------------------------------------------------
The actual trail size is 53966568 where the replicat rba is at 53966725
Current LogTrail is /home/pjacob/rp000013
Logdump 101 >n
2010/06/17 11:15:13.810.439 FileHeader Len 753 RBA 0
Name: *FileHeader*
3000 0199 3000 0008 4747 0d0a 544c 0a0d 3100 0002 | 0...0...GG..TL..1...
0002 3200 0004 ffff ffff 3300 0008 02f1 af71 469a | ..2.......3......qF.
5c07 3400 001d 001b 7572 693a 4f52 4143 4c45 2d30 | \.4.....uri:ORACLE-0
312d 5241 433a 3a68 6f6d 653a 6767 7336 0000 1300 | 1-RAC::home:ggs6....
112e 2f64 6972 6461 742f 7270 3030 3030 3133 3700 | ../dirdat/rp0000137.
0001 0138 0000 0400 0000 0d39 0000 0800 0000 0011 | ...8.......9........
e19d 9c3a 0000 8109 3438 3535 3038 3736 3600 0000 | ...:....485508766...
Logdump 102 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x00)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 0 (x0000) IO Time : 2010/06/17 11:15:13.761.084
IOType : 150 (x96) OrigNode : 0 (x00)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 0 AuditPos : 0
Continued : N (x00) RecCount : 0 (x00)
2010/06/17 11:15:13.761.084 RestartAbend Len 0 RBA 761
Name:
After Image: Partition 0 G s
Logdump 103 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 28 (x001c) IO Time : 2010/06/17 03:00:26.000.150
IOType : 15 (x0f) OrigNode : 255 (xff)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 685 AuditPos : 181997072
Continued : N (x00) RecCount : 1 (x01)
2010/06/17 03:00:26.000.150 FieldComp Len 28 RBA 823
Name: SCHEMA.XXXX
After Image: Partition 4 G s
0000 000a 0000 0006 4152 532d 3034 0002 000a 0000 | ........ARS-04......
0000 0000 0000 4ab0 | ......J.
Column 0 (x0000), Len 10 (x000a)
Column 2 (x0002), Len 10 (x000a)
Logdump 112 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 28 (x001c) IO Time : 2010/06/17 03:00:35.000.150
IOType : 15 (x0f) OrigNode : 255 (xff)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 654 AuditPos : 225996876
Continued : N (x00) RecCount : 1 (x01)
2010/06/17 03:00:35.000.150 FieldComp Len 28 RBA 980
Name: SCHEMA.XXXYY
After Image: Partition 4 G s
0000 000a 0000 0006 4152 532d 3032 0002 000a 0000 | ........ARS-02......
0000 0000 0000 4ab9 | ......J.
-----------------------------------------------------------------------------------------------------------
The actual trail size is 53966568 while the replicat checkpoint rba is at 53966725
((reader's too-big checkpoint) - (actual size of seqno X)
The extra byte count is 53966725 - 53966568 = 157
The calculated rba of the record just after the RestartAbend record(823) + totaled length of duplicated records(157) = 980
The replicat should be altered to trail file sequence number 13 and rba 980 by using the following command
Ggsci> alter rep < rep name>, extseqno 13, extrba 980.
Note: If the X+1 trail file does not contain any actual data, then you need to do the same for X+2 trail file and so on.
Note: If you just do a alter with the next sequence number(without following the above procedure), you can create data integrity issues.
Note: If using an OGG build greater or equal to 10.4.0.81, then the Datapump Extract / Replicat will abend if the read checkpoint is beyond the current EOF. You can then use the above procedure to get the datapump or replicat running.
The issue is tracked via bug- 9669344 and development is working on a solution.
NOTE -- After the calculation if you get a RBA on the new trail which is not pointing to the start of a record please reach the support for further help.
Special Case Reported
*********************
A case had been reported in which source server crashes resulting in the target replicat hitting the described. In this case, thesource pump extract's write rba is also larger then remote trail file size. This result's in therer being no next trail available.
The steps to get the pump started is
1. Get the last record in the remote trail to which the pump extract is writing.
2. Find the corresponding record in the local trail, and get the rba of the next record
3. Make a backup of checkpoint files (./dirchk/pump-name.cpe) of all pumps.
4. Position the pump to the rba obtained from step 2, and do an Etrollover
For ex:
alter , extseqno < the sequence number to which the pump is currently pointing>, extrba < obtained from step 2>
alter < pump ext name>, Etrollover ---- Keep in mind that the respective replicat processes must be altered manually because of the pump ETROLLOVER.
5. Start the pump.
6 The steps for the replicat will be the same as previously mentioned in this note.
References
BUG:9673276 - REPLICAT WAITING FOR MORE DATA
BUG:9857982 - DATAPUMP EXTRACT SHOULD ABEND IF THE READ CHECKPOINT IS BEYOND CURRENT EOF
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/22531473/viewspace-742548/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- OGG-01705 Input checkpoint position is greater than the size of the file
- OGG 中replicat 和extract 關係圖
- Poster stopped: message is larger than configured max size
- Incremental checkpoint up to RBAREM
- ogg中抽取或複製程式(extract or replicat)中表的的重複配置
- kafka報錯:InvalidReceiveException: Invalid receive (size = 1195725856 larger than 104857600)KafkaException
- OGG-extract程式對應的多餘trail檔案的刪除AI
- Centering HTML elements larger than their parentsHTML
- ERROR: Could not delete DB checkpoint for REPLICATErrordelete
- 強制關閉extract和replicat程式
- kafka 錯誤: larger than available brokersKafkaAI
- OGG:Extract 簡介
- ogg replicat 程式 abend 處理
- Extract or Replicat Fail to Start (Don't Start) With No Error MessagesAIError
- LeetCode-Max Sum of Rectangle No Larger Than KLeetCode
- OGG 對trail 檔案加密AI加密
- ORA-15099: disk '/dev/data-disk05-2.5t' is larger than maximum size of 2097152dev
- MySQL報錯:Packets larger than max_allowed_packet are not allMySql
- OGG 修改trail 檔案的大小AI
- local + uniform size 設定ORM
- 452 Error writing file: A file cannot be larger than the value set by ulimit.ErrorMIT
- OGG Replicat Failed Due To Check_point Table beingTruncatedAI
- ORA-15196 WITH ASM DISKS LARGER THAN 2TB [ID 736891.1]ASM
- 關於Oracle GoldenGate中Extract的checkpoint的理解OracleGo
- OGG 11g Checkpoint 詳解
- OGG刪除過期的trail檔案,shell實現AI
- 使用OGG"Loading data from file to Replicat"的方法應該注意的問題:replicat程式是前臺程式...
- goldengate extract abended unable to queue I/O, I/O beyond file sizeGo
- Oracle RBAOracle
- Low cache RBA和On disk RBA的區別
- ERROR OGG-00446 Missing filename opening checkpoint file.Error
- GoldenGate extract process abended with error OGG-01028的處理GoError
- 處理OGG-02198 Incompatible record (logical EOF) in trail fileAI
- low cache rba,on disk rba資料庫恢復過程資料庫
- OGG的replicat程式的Time Since Chkpt一直增加,程式處於假死狀態
- 備份與恢復 - Low cache RBA和On disk RBA的區別
- [LeetCode] 1343. Number of Sub-arrays of Size K and Average Greater than or Equal to ThresholdLeetCode
- mongodb replicat internal(一)MongoDB