OGG EXTRACT / REPLICAT CHECKPOINT RBA IS LARGER THAN LOCAL TRAIL SIZE

xiexingzhi發表於2012-09-03

Applies to:

Oracle GoldenGate - Version: 10.0.0.0 and later [Release: 10.0.0 and later ]

Oracle GoldenGate - Version: 10.0.0.0 and later [Release: 10.0.0 and later]

Information in this document applies to any platform.

Symptoms

After a system outage, a pump Extract or Replicat may get stuck on a trail file even if there are more trail files available in machine for the reader object to process. This issue occurs when the pump extract / Replicat checkpoint RBA is larger than the local trail file size.

Cause

In general, Datapump Extracts and Replicats read the current trail file data from the disk cache instead from the physical file. There is therefore a chance that the Datapump or Replicat will checkpoint an RBA which is still in cache. If there is a disk or system outage, the data in the cache may be lost. This occurs because the system doesn't have a chance to flush the data to the disk.

On restart of the writer process(archive/ redo log reading Extract or data pump), the writer process finds that the trail file size is smaller than the reader's (Datapump or Replicat) checkpoint RBA. The Datapump or Replicat hangs waiting for more data to come. The writer (archive/ redo log reading Extract or data pump) process then goes through the Full Audit Recovery(FAR) process, recovering the current trail file, closing o the existing trail file and creating a new trail file( with the next sequence number). The writer will then continue writing to the new trail file. It will put the "old data" which was been already processed or committed by the reader into the new sequence number trail file. This old data was originally in the disk cache and not written to the disk file upon system outage and restart. However, part of the data was already processed by the reader(Datapump or Replicat) before it died.

Solution

In the scenario described above, records already processed by the reader (Datapump or Replicat) may be written to the new trail file. Manual intervention is required to avoid duplicate processing.

The manual recovery process requires finding any records that will be duplicated (from the reader's point of view) in seqno X+1 (assuming current seqno is X) with total length of ((reader's too-big checkpoint) - (actual size of seqno X)). The reader should be altered to at a point in the trail follwoing the one with the too large checkpoint reference. The reader i saltered to the RBA ofthe record just after the RestartAbend record + (totaled length of duplicated records)).

The calculated RBA to which the reader trail file is altered should be the address of a record that starts a transaction. The start of a transaction is indicated by a TransInd value of (x00- first record in transaction) or (x03- only record in the transaction).

A live example which explains this scenario :

GGSCI (ORACLEREP) 6> info repdb

REPLICAT REPDB Last Started 2010-06-17 18:25 Status RUNNING

Checkpoint Lag 00:00:00 (updated 00:00:04 ago)

Log Read Checkpoint File ./dirdat/rp000012

First Record RBA 53966725

2. sh>ls -tlr dirdat

total 3822488

-rw-rw-rw- 1 ggs dba 299999891 Jun 16 22:09 rp000011

-rw-rw-rw- 1 ggs dba 53966568 Jun 17 18:15 rp000012 << where the checkpoint is pointing

-rw-rw-rw- 1 ggs dba 256319510 Jun 17 18:51 rp000013 << the next available trail file

------------------------------------------------------------------------------------------------------------------------------------------------

The actual trail size is 53966568 where the replicat rba is at 53966725

Current LogTrail is /home/pjacob/rp000013

Logdump 101 >n

2010/06/17 11:15:13.810.439 FileHeader Len 753 RBA 0

Name: *FileHeader*

3000 0199 3000 0008 4747 0d0a 544c 0a0d 3100 0002 | 0...0...GG..TL..1...

0002 3200 0004 ffff ffff 3300 0008 02f1 af71 469a | ..2.......3......qF.

5c07 3400 001d 001b 7572 693a 4f52 4143 4c45 2d30 | \.4.....uri:ORACLE-0

312d 5241 433a 3a68 6f6d 653a 6767 7336 0000 1300 | 1-RAC::home:ggs6....

112e 2f64 6972 6461 742f 7270 3030 3030 3133 3700 | ../dirdat/rp0000137.

0001 0138 0000 0400 0000 0d39 0000 0800 0000 0011 | ...8.......9........

e19d 9c3a 0000 8109 3438 3535 3038 3736 3600 0000 | ...:....485508766...

Logdump 102 >n

___________________________________________________________________

Hdr-Ind : E (x45) Partition : . (x00)

UndoFlag : . (x00) BeforeAfter: A (x41)

RecLength : 0 (x0000) IO Time : 2010/06/17 11:15:13.761.084

IOType : 150 (x96) OrigNode : 0 (x00)

TransInd : . (x03) FormatType : R (x52)

SyskeyLen : 0 (x00) Incomplete : . (x00)

AuditRBA : 0 AuditPos : 0

Continued : N (x00) RecCount : 0 (x00)

2010/06/17 11:15:13.761.084 RestartAbend Len 0 RBA 761

Name:

After Image: Partition 0 G s

Logdump 103 >n

___________________________________________________________________

Hdr-Ind : E (x45) Partition : . (x04)

UndoFlag : . (x00) BeforeAfter: A (x41)

RecLength : 28 (x001c) IO Time : 2010/06/17 03:00:26.000.150

IOType : 15 (x0f) OrigNode : 255 (xff)

TransInd : . (x03) FormatType : R (x52)

SyskeyLen : 0 (x00) Incomplete : . (x00)

AuditRBA : 685 AuditPos : 181997072

Continued : N (x00) RecCount : 1 (x01)

2010/06/17 03:00:26.000.150 FieldComp Len 28 RBA 823

Name: SCHEMA.XXXX

After Image: Partition 4 G s

0000 000a 0000 0006 4152 532d 3034 0002 000a 0000 | ........ARS-04......

0000 0000 0000 4ab0 | ......J.

Column 0 (x0000), Len 10 (x000a)

Column 2 (x0002), Len 10 (x000a)

Logdump 112 >n

___________________________________________________________________

Hdr-Ind : E (x45) Partition : . (x04)

UndoFlag : . (x00) BeforeAfter: A (x41)

RecLength : 28 (x001c) IO Time : 2010/06/17 03:00:35.000.150

IOType : 15 (x0f) OrigNode : 255 (xff)

TransInd : . (x03) FormatType : R (x52)

SyskeyLen : 0 (x00) Incomplete : . (x00)

AuditRBA : 654 AuditPos : 225996876

Continued : N (x00) RecCount : 1 (x01)

2010/06/17 03:00:35.000.150 FieldComp Len 28 RBA 980

Name: SCHEMA.XXXYY

After Image: Partition 4 G s

0000 000a 0000 0006 4152 532d 3032 0002 000a 0000 | ........ARS-02......

0000 0000 0000 4ab9 | ......J.

-----------------------------------------------------------------------------------------------------------

The actual trail size is 53966568 while the replicat checkpoint rba is at 53966725

((reader's too-big checkpoint) - (actual size of seqno X)

The extra byte count is 53966725 - 53966568 = 157

The calculated rba of the record just after the RestartAbend record(823) + totaled length of duplicated records(157) = 980

The replicat should be altered to trail file sequence number 13 and rba 980 by using the following command

Ggsci> alter rep < rep name>, extseqno 13, extrba 980.

Note: If the X+1 trail file does not contain any actual data, then you need to do the same for X+2 trail file and so on.

Note: If you just do a alter with the next sequence number(without following the above procedure), you can create data integrity issues.

Note: If using an OGG build greater or equal to 10.4.0.81, then the Datapump Extract / Replicat will abend if the read checkpoint is beyond the current EOF. You can then use the above procedure to get the datapump or replicat running.

The issue is tracked via bug- 9669344 and development is working on a solution.

NOTE -- After the calculation if you get a RBA on the new trail which is not pointing to the start of a record please reach the support for further help.

Special Case Reported

*********************

A case had been reported in which source server crashes resulting in the target replicat hitting the described. In this case, thesource pump extract's write rba is also larger then remote trail file size. This result's in therer being no next trail available.

The steps to get the pump started is

1. Get the last record in the remote trail to which the pump extract is writing.

2. Find the corresponding record in the local trail, and get the rba of the next record

3. Make a backup of checkpoint files (./dirchk/pump-name.cpe) of all pumps.

4. Position the pump to the rba obtained from step 2, and do an Etrollover

For ex:

alter , extseqno < the sequence number to which the pump is currently pointing>, extrba < obtained from step 2>

alter < pump ext name>, Etrollover ---- Keep in mind that the respective replicat processes must be altered manually because of the pump ETROLLOVER.

5. Start the pump.

6 The steps for the replicat will be the same as previously mentioned in this note.

References

BUG:9673276 - REPLICAT WAITING FOR MORE DATA

BUG:9857982 - DATAPUMP EXTRACT SHOULD ABEND IF THE READ CHECKPOINT IS BEYOND CURRENT EOF

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/22531473/viewspace-742548/，如需轉載，請註明出處，否則將追究法律責任。

OGG-01705 Input checkpoint position is greater than the size of the file
2012-06-05
OGG 中replicat 和extract 關係圖
2014-07-04
Poster stopped: message is larger than configured max size
2020-01-17
Incremental checkpoint up to RBA
2008-08-30
REM
ogg中抽取或複製程式（extract or replicat）中表的的重複配置
2016-07-18
kafka報錯：InvalidReceiveException: Invalid receive (size = 1195725856 larger than 104857600)
2024-07-24
KafkaException
OGG-extract程式對應的多餘trail檔案的刪除
2015-10-09
AI
Centering HTML elements larger than their parents
2015-03-13
HTML
ERROR: Could not delete DB checkpoint for REPLICAT
2014-09-10
Errordelete
強制關閉extract和replicat程式
2014-06-11
kafka 錯誤: larger than available brokers
2020-04-16
KafkaAI
OGG：Extract 簡介
2021-09-30
ogg replicat 程式 abend 處理
2014-07-26
Extract or Replicat Fail to Start (Don't Start) With No Error Messages
2012-11-09
AIError
LeetCode-Max Sum of Rectangle No Larger Than K
2016-07-20
LeetCode
OGG 對trail 檔案加密
2014-06-05
AI加密
ORA-15099: disk '/dev/data-disk05-2.5t' is larger than maximum size of 2097152
2016-11-04
dev
MySQL報錯:Packets larger than max_allowed_packet are not all
2018-10-20
MySql
OGG 修改trail 檔案的大小
2014-06-12
AI
local + uniform size 設定
2011-05-06
ORM
452 Error writing file: A file cannot be larger than the value set by ulimit.
2012-04-07
ErrorMIT
OGG Replicat Failed Due To Check_point Table beingTruncated
2014-11-05
AI
ORA-15196 WITH ASM DISKS LARGER THAN 2TB [ID 736891.1]
2012-08-01
ASM
關於Oracle GoldenGate中Extract的checkpoint的理解
2014-11-13
OracleGo
OGG 11g Checkpoint 詳解
2017-03-05
OGG刪除過期的trail檔案，shell實現
2019-07-08
AI
使用OGG"Loading data from file to Replicat"的方法應該注意的問題：replicat程式是前臺程式...
2016-01-25
goldengate extract abended unable to queue I/O, I/O beyond file size
2011-10-10
Go
Oracle RBA
2013-10-11
Oracle
Low cache RBA和On disk RBA的區別
2007-12-28
ERROR OGG-00446 Missing filename opening checkpoint file.
2012-05-31
Error
GoldenGate extract process abended with error OGG-01028的處理
2015-05-12
GoError
處理OGG-02198 Incompatible record (logical EOF) in trail file
2019-10-18
AI
low cache rba,on disk rba資料庫恢復過程
2011-03-29
資料庫
OGG的replicat程式的Time Since Chkpt一直增加，程式處於假死狀態
2022-04-01
備份與恢復 - Low cache RBA和On disk RBA的區別
2011-04-19
[LeetCode] 1343. Number of Sub-arrays of Size K and Average Greater than or Equal to Threshold
2024-11-10
LeetCode
mongodb replicat internal(一）
2014-01-21
MongoDB

OGG EXTRACT / REPLICAT CHECKPOINT RBA IS LARGER THAN LOCAL TRAIL SIZE

相關文章