在goldengate中,傳輸程式和挖掘程式在交換資料的時候傳輸程式是從cache區域中去抓取資料並傳送到目標端,而不是等到真的寫到磁碟後(可以提高效能)。當意外down機時系統來不及將cache中的內容寫到磁碟,出現了傳輸程式新建的檢查點是基於cache中的資訊更新的,而trail檔案的大小實際上是要比檢查點寫的RBA小的。
透過公式:程式新RBA=報錯資訊的過大RBA( 變數A)+新trail中的第一條記錄的RBA( 變數B)-程式失敗trail中包含下個trail第一條記錄事務的起始RBA( 變數C)
1.檢視報錯資訊中(或者INFO檢視程式當前RBA)為: 變數A;
EXTRACT P1_PD Last Started 2022-02-08 09:52 Status ABENDED
Checkpoint Lag 08:46:00 (updated 00:00:53 ago)
Log Read Checkpoint File ./dirdat/pd016425
2022-02-08 01:06:11.000000 RBA
45970013(變數A)
或者:
2022-02-08 09:52:11 ERROR OGG-01705 Oracle GoldenGate Capture for Oracle, p1_pd.prm: Input checkpoint position
45970013 for input trail file './dirdat/pd016425' is greater than the size of the file (45941215). Please consult Oracle Knowledge Management Doc ID 1138409.1. for instructions.
2.使用Logdump工具找到下一個trail中的第一條記錄: 變數B;
Logdump 1 >open ./dirdat/pd016426
Current LogTrail is /oggdata/ogg12c/dirdat/pd016426
Logdump 2 >ghdr on
Logdump 3 >n
2022/02/08 08:56:01.354.224 FileHeader Len 1707 RBA 0
Name: *FileHeader*
3000 02f9 3000 0008 4747 0d0a 544c 0a0d 3100 0002 | 0...0...GG..TL..1...
0004 3200 0004 2000 0000 3300 0008 02f2 fdaa a308 | ..2... ...3.........
a1f0 3400 0024 0022 7572 693a 6463 7064 6462 3031 | ..4..$."uri:dcpddb01
3a3a 6f67 6764 6174 613a 6f67 6731 3263 3a45 315f | ::oggdata:ogg12c:E1_
5044 3600 0013 0011 2e2f 6469 7264 6174 2f70 6430 | PD6....../dirdat/pd0
3136 3432 3637 0000 0101 3800 0004 0000 402a 3900 | 164267....8.....@*9.
0008 0000 0000 0615 655e 3a00 0081 0130 0000 0000 | ........e^:....0....
Logdump 4 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x00)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 0 (x0000) IO Time : 2022/02/08 08:56:01.320.554
IOType : 150 (x96) OrigNode : 0 (x00)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 0 AuditPos : 0
Continued : N (x00) RecCount : 0 (x00)
2022/02/08 08:56:01.320.554 RestartAbend Len 0 RBA 1715
Name:
After Image: Partition 0 G s
Logdump 5 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 43 (x002b) IO Time : 2022/02/08 01:06:07.000.000
IOType : 15 (x0f) OrigNode : 255 (xff)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 44481 AuditPos : 37992464
Continued : N (x00) RecCount : 1 (x01)
2022/02/08 01:06:07.000.000 FieldComp Len 43 RBA
1775
Name:
TOPS5A.SYS_JOBRUNINFOS
After Image: Partition 4 G s
0000 000e 0000 000a 4a6f 6273 4368 616e 6765 0001 | ........JobsChange..
0015 0000 3230 3232 2d30 322d 3038 3a30 313a 3036 | ....2022-02-08:01:06
3a30 36 | :06
注:如果新trail沒有第一條記錄,則依次往後找,直到找到有記錄的trail為止。
3.找到新trail事務在舊trail中的起始位置:變數C:
1).透過步驟2,我們找到了新trail的第一條記錄RBA為1775.而這條記錄對應的AuditRBA=44481, AuditPos= 37992464 這表明此事務在Oracle資料庫中的44481號redo的 RBA: 37992464處。另外還需要記錄此記錄的表名。
2).在舊的trail中查詢以上事務的起始點:
Logdump 6 >open ./dirdat/pd016425
Current LogTrail is /oggdata/ogg12c/dirdat/pd016425
Logdump 7 >ghdr on
Logdump 8 >filter include AuditRBA 44481
Logdump 9 >filter include filename TOPS5A.SYS_JOBRUNINFOS
Logdump 10 >filter match all
Logdump 11 >n
--下面一段為官網摘抄,不是本次具體數值,本次C值為方法二獲得
找到記錄:
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 114 (x0072) IO Time : 2011/11/28 05:54:08.000.919
IOType : 15 (x0f) OrigNode : 255 (xff)
TransInd : . (x00) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 14735 AuditPos : 272667152 ------------ same as the one in step 2
Continued : N (x00) RecCount : 1 (x01)
2011/11/28 05:54:08.000.919 FieldComp Len 114 RBA 160284255 -->我們要找的RBA
Name: SCHEMA.XXXX
After Image: Partition 4 G b
0000 000c 0000 0008 3130 3532 3030 3839 0001 000c | ........10520089....
0000 0008 3130 3534 3639 3532 0002 0008 0000 0004 | ....10546952........
3633 3737 0003 0010 0000 000c 4461 7669 732c 204d | 6377........Jacob, M
696c 6573 0004 0015 0000 3230 3039 2d30 342d 3235 | iles......2009-05-25
3a30 323a 3338 3a35 3900 0500 1500 0032 3031 312d | :02:38:59......2011-
3131 2d32 383a 3035 3a35 343a 3037 | 11-28:05:54:07
此方法直接參照MOS
This option should only be used if option 1 and option 2 failed.
Please refer the bug- 13955891
A live example which explains this scenario:
1) GGSCI (OGGORACLE)1> info rep01prd, detail
REPLICAT REP01PRD Last Started 2012-04-11 12:05 Status ABENDED
Checkpoint Lag 00:00:00 (updated 08:24:18 ago)
Log Read Checkpoint File /opt/GoldenGateV11/dirdat/rp002289
2012-04-11 04:38:15.998847 RBA 499003921
2. sh>ls -tlr dirdat
-rw-rw-rw- 1 oggadm dba 499999882 Apr 11 01:52 /opt/GoldenGateV11/dirdat/rp002288
-rw-rw-rw- 1 oggadm dba 498003983 Apr 11 06:15 /opt/GoldenGateV11/dirdat/rp002289 << where the checkpoint is pointing
-rw-rw-rw- 1 oggadm dba 499999920 Apr 11 06:16 /opt/GoldenGateV11/dirdat/rp002290 << the next available trail file
Here the actual trail file size, End of File (EOF), is 498003983 where the replicat read checkpoint rba is at 499003921
The formula to find the correct RBA is
New datapump / Replicat RBA = Reader's too-big checkpoint RBA(A) - Repaired record RBA in existing trail (B)+ First record RBA in the new trail file (after the restart abend)(c)
Let us call
Reader's too-big checkpoint RBA -- A
Repaired record RBA in existing trail -- B
First record RBA in the new trail file (after the restart abend) -- C
We know the Reader's too-big checkpoint RBA (A) =
499003921( from the info output) -- step 1
We could get the Repaired record RBA in existing trail-- B -- step 2
----------------------------------------------------------
Logdump 979 >open rp002289
Current LogTrail is C:\GG\V11111\rp002289
Logdump 980 >ghdr on
Logdump 981 >count
LogTrail C:\GG\V11111\rp002289 has
2352689 records
Total Data Bytes 272734228
Avg Bytes/Record 115
Delete 14237
Insert 2192433
FieldComp 146014
GGSPKUpdate 1
LargeObject 3
Repaired Records 1
Others 1
Before Images 14237
After Images 2338451
Average of 265671 Transactions
Bytes/Trans ..... 1451
Records/Trans ... 8
Files/Trans ..... 0
Logdump 982 >skip
2352688 -- has 2352689 records -1
Logdump 983 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 251 (x00fb) IO Time : 2012/04/11 05:38:08.999.658
IOType : 5 (x05) OrigNode : 255 (xff)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 336307 AuditPos : 89724944
Continued : N (x00) RecCount : 1 (x01)
2012/04/11 05:38:08.999.658 Insert Len 251 RBA 498003230
Name: PRODDTA.F0911T
After Image: Partition 4 G s
0000 0004 0000 4841 0001 000a 0000 0006 3739 3436 | ......HA........7946
3536 0002 0007 0000 3030 3333 3000 0300 0a00 0000 | 56......00330.......
0000 0000 01b5 e600 0400 0400 0041 4100 0500 0500 | .............AA.....
0000 0138 0006 0004 0000 2020 0007 0003 0000 2000 | ...8...... ...... .
0800 0300 0020 0009 0003 0000 2000 0a00 0300 0020 | ..... ...... ......
000b 0003 0000 2000 0c00 0300 0020 000d 0003 0000 | ...... ...... ......
2000 0e00 0300 0020 000f 0003 0000 2000 1000 0300 | ...... ...... .....
Logdump 984 >n
Repaired Len 369 RBA
498003606
4800 0031 4504 0041 00fb 05ff 02f1 e397 7402 50ea | H..1E..A........t.P.
0000 0000 0559 2010 0005 21b3 0352 0000 0001 5052 | .....Y ...!..R....PR
4f44 4454 412e 4630 3931 3154 0044 0000 fb00 0000 | ODDTA.F0911T.D......
0400 0048 4100 0100 0a00 0000 0637 3934 3635 3600 | ...HA........794656.
0200 0700 0030 3033 3330 0003 000a 0000 0000 0000 | .....00330..........
0001 b5e6 0004 0004 0000 4141 0005 0005 0000 0001 | ..........AA........
3900 0600 0400 0020 2000 0700 0300 0020 0008 0003 | 9...... ...... ....
0000 2000 0900 0300 0020 000a 0003 0000 2000 0b00 | .. ...... ...... ...
0300 0020 000c 0003 0000 2000 0d00 0300 0020 000e | ... ...... ...... ..
0003 0000 2000 0f00 0300 0020 0010 0003 0000 2000 | .... ...... ...... .
1100 0300 0020 0012 000e 0000 2020 2020 2020 2020 | ..... ......
2020 2020 0013 0003 0000 2000 1400 0e00 0020 2020 | ...... ......
2020 2020 2020 2020 2000 1500 0300 0020 0016 000e | ...... ....
0000 2020 2020 2020 2020 2020 2020 0017 0003 0000 | .. ......
2000 1800 0e00 0020 2020 2020 2020 2020 2020 2000 | ...... .
1900 0500 0000 0130 5400 0039 5200 0014 4141 4142 | .......0T..9R...AAAB
3936 4143 5441 4147 3156 5541 417a 0001 4c00 000b | 96ACTAAG1VUAAz..L...
3731 3331 3831 3530 3739 3636 0000 0e31 312e 0000 | 713181507966...11...
0000 0000 0000 0000 00 | .........
Logdump 985 >
-----------------------------------------------
Here we could get the RBA for (B) as 498003606
-----------------------------------------------------------------------------------------------------
First record RBA in the new trail file (after the restart abend)(c) --- Step 3
-----------------------------------------------------------------------------------------------------
Logdump 985 >open rp002290
Current LogTrail is C:\GG\V11111\rp002290
Logdump 986 >n
2012/04/11 07:15:49.756.239 FileHeader Len 1307 RBA 0
Name: *FileHeader*
3000 01f3 3000 0008 4747 0d0a 544c 0a0d 3100 0002 | 0...0...GG..TL..1...
0002 3200 0004 2000 0000 3300 0008 02f1 e398 d156 | ..2... ...3........V
5d4f 3400 001e 001c 7572 693a 7468 6562 653a 3a6f | ]O4.....uri:thebe::o
7074 3a47 6f6c 6465 6e47 6174 6556 3131 3500 0044 | pt:GoldenGateV115..D
3500 001e 001c 7572 693a 7468 6562 653a 3a6f 7074 | 5.....uri:thebe::opt
3a47 6f6c 6465 6e47 6174 6556 3131 3500 001e 001c | :GoldenGateV115.....
7572 693a 7468 6562 653a 3a6f 7074 3a47 6f6c 6465 | uri:thebe::opt:Golde
Logdump 987 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x00)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 0 (x0000) IO Time : 2012/04/11 07:15:49.762.966
IOType : 150 (x96) OrigNode : 0 (x00)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 0 AuditPos : 0
Continued : N (x00) RecCount : 0 (x00)
2012/04/11 07:15:49.762.966 RestartAbend Len 0 RBA 1315
Name:
After Image: Partition 0 G s
Logdump 988 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 251 (x00fb) IO Time : 2012/04/11 05:38:09.014.430
IOType : 5 (x05) OrigNode : 255 (xff)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 336307 AuditPos : 89726992
Continued : N (x00) RecCount : 1 (x01)
2012/04/11 05:38:09.014.430 Insert Len 251 RBA
1378
Name: PRODDTA.F0911T
After Image: Partition 4 G s
0000 0004 0000 4841 0001 000a 0000 0006 3739 3436 | ......HA........7946
3536 0002 0007 0000 3030 3333 3000 0300 0a00 0000 | 56......00330.......
0000 0000 01b5 e600 0400 0400 0041 4100 0500 0500 | .............AA.....
0000 0139 0006 0004 0000 2020 0007 0003 0000 2000 | ...9...... ...... .
0800 0300 0020 0009 0003 0000 2000 0a00 0300 0020 | ..... ...... ......
000b 0003 0000 2000 0c00 0300 0020 000d 0003 0000 | ...... ...... ......
2000 0e00 0300 0020 000f 0003 0000 2000 1000 0300 | ...... ...... .....
Logdump 989 >
-------------------------------------------------------------------------------
Here we could get the RBA for (C) as 1378
Now we have all the three values needed for the calculation
New datapump / Replicat RBA = Reader's too-big checkpoint RBA(A) - Repaired record RBA in existing trail (B)+ First record RBA in the new trail file (after the restart abend)(c)
New datapump / Replicat RBA = 499003921 - 498003606 + 1378 = 1001693
Ensure that you have a good record at RBA 1001693 with TransInd x00 or x03.
This option should only be used if option 1 and option 2 does not work.
If it is a good record, then the datapump / replicat should be altered to trail file sequence number 2290 and rba 1001693 by using the following command
Ggsci> alter rep < rep name>, extseqno 2290, extrba 1001693
備註:我們發現Oracle對於同一個問題給了我們三個方法來解決,而且得出的結果極大可能是不一致的,如果考慮到資料量或者說恢復時間的緊急行,那麼可以使用上面其中一種方法先恢復,後續遇到資料不一致的情況再手工進行處理。但是,如果條件允許的話,個人建議還是重新初始化最保險。