OGG的replicat程式的Time Since Chkpt一直增加,程式處於假死狀態

xueshancheng發表於2022-04-01

1 概述

DBA關閉所有資料庫例項,但未關閉OGG,重啟OGG所有程式,抽取程式,MGR程式、大部分複製程式都恢復正常,但還有幾個程式的Time Since Chkpt 時間一直處於增加,等待一個小時,重啟MGR及相關程式,進行觀察;再次等待15小時,發現Time Since Chkpt還是處於增加狀態,OGG沒有應用任何資料,故認定由於資料庫重啟,導致OGG的REPLICAT程式異常,處於假死狀態。


2 OGG程式狀態檢視

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

REPLICAT    RUNNING     repaa    00:00:05      00:12:02  


REPLICAT    RUNNING     repaa    00:00:05      00:44:50   


GGSCI (testdb3) 25> info repaa


REPLICAT   repaa  Last Started 2022-03-30 17:05   Status RUNNING

Checkpoint Lag       00:0 0:05 (updated 01:04:53 ago)

Log Read Checkpoint  File ./dirdat/aa559442

                     2022-03-30 16:53:00.000921  RBA 1169746574

                     

檢視report 報告,已經讀取到559446檔案了

Opened trail file ./dirdat/aa559442 at 2022-03-30 17:05:01


Switching to next trail file ./dirdat/aa559443 at 2022-03-30 17:05:02 due to EOF, with current RBA 1999998495

Opened trail file ./dirdat/aa559443 at 2022-03-30 17:05:02



Switching to next trail file ./dirdat/aa559444 at 2022-03-30 17:05:43 due to EOF, with current RBA 1999998562

Opened trail file ./dirdat/aa559444 at 2022-03-30 17:05:43



Switching to next trail file ./dirdat/aa559445 at 2022-03-30 17:07:33 due to EOF, with current RBA 1999998898

Opened trail file ./dirdat/aa559445 at 2022-03-30 17:07:33



Switching to next trail file ./dirdat/aa559446 at 2022-03-30 17:12:51 due to EOF, with current RBA 1999998960

Opened trail file ./dirdat/aa559446 at 2022-03-30 17:12:51



檢視程式狀態,目前程式讀取trail檔案到559446,但事務處理的資料為0

GGSCI (testdb3) 9> send repaa status


Sending STATUS request to REPLICAT repaa ...

  Current status: Processing data

  Sequence #: 559446

  RBA: 876964647

  0 records in current transaction


根據如上資訊,暫時認為OGG程式是正常的,等待1小時後,發現時間還是處於增長狀態,

根據經驗,重啟MGR及此程式,一般程式就會恢復正常。


2 重啟相關程式


GGSCI (testdb3) 24> start repaa


Sending START request to MANAGER ...

REPLICAT repaa starting



GGSCI (testdb3) 25> info repaa


REPLICAT   repaa  Last Started 2022-03-30 17:57   Status RUNNING

Checkpoint Lag       00:00:05 (updated 01:04:53 ago)

Log Read Checkpoint  File ./dirdat/aa559442

                     2022-03-30 16:53:00.000921  RBA 1169746574


3 等待15小時,發現此程式延遲還是增加,如下


REPLICAT    RUNNING     repaa    00:00:05      15:42:26  


4 檢視OGG程式執行的語句,發現一直執行SELECT COUNT (DISTINCT thread#)  FROM v$log

,不進行任何其它的SQL


SYS@testdb3 > SELECT /*+ ORDERED */ sql_id,sql_text  FROM v$sqltext a WHERE (a.hash_value, a.address) IN      

 (SELECT DECODE(sql_hash_value, 0, prev_hash_value, sql_hash_value),               

DECODE(sql_hash_value, 0, prev_sql_addr, sql_address)     

      FROM v$session b         WHERE b.sid='&SID'  and b.SERIAL#='&SERIAL') order by piece asc; 

Enter value for sid: 2103

Enter value for serial: 5

old   1:  SELECT /*+ ORDERED */ sql_id,sql_text  FROM v$sqltext a WHERE (a.hash_value, a.address) IN   

    (SELECT DECODE(sql_hash_value, 0, prev_hash_value, sql_hash_value),     

DECODE(sql_hash_value, 0, prev_sql_addr, sql_address)   

            FROM v$session b         WHERE b.sid='&SID'  and b.SERIAL#='&SERIAL') order by piece asc

new   1:  SELECT /*+ ORDERED */ sql_id,sql_text  FROM v$sqltext a WHERE (a.hash_value, a.address) IN  

     (SELECT DECODE(sql_hash_value, 0, prev_hash_value, sql_hash_value),     

DECODE(sql_hash_value, 0, prev_sql_addr, sql_address)  

              FROM v$session b         WHERE b.sid='2103'  and b.SERIAL#='5') order by piece asc


SQL_ID                         SQL_TEXT

------------------------------ ----------------------------------------------------------------

gnxg91rmnxtdj                  SELECT COUNT (DISTINCT thread#)  FROM v$log



5 對程式做相關trace,未發現任何異常,如下:


send repaa,trace2 ./dirtmp/repaa_20220331.log


09:20:27.875 (58526527) General statistics:

                         0.00%  Checking messages (includes checkpointing)

                                 0.00%  Checking periodic tasks

                         0.00%  Waiting for more data

                         0.00%  Converting ASCII header to internal

                         0.00%  Converting ASCII data to internal

                         0.00%  Reading input records

                         0.00%  Writing output records (replicate_io)

                                 0.00%  Mapping columns

                                 0.00%  Outputting data records

                                 0.00%  Performing SQL statements

                         0.00%  Performing BATCHSQL statements

                         0.00%  Performing actual DB op

                         0.00%  Preparing SQL statements

                         0.00%  Performing transaction commits

                         0.00%  Checkpointing

                         

                         

6 查詢官方文件及各大網站,未發現解決方法,經過分析,認為有可能是資料庫關閉,

導致OGG某些 程式異常,如果重置RBA,一般OGG會從某個位置重新開始錄入資料,

那我指定相同的EXTSEQNO和 RBA號,是不OGG也會恢復正常?

進行測試,重新執行相同的RBA號, 驗證程式是否正常, 發現OGG程式恢復正常。

                       

GGSCI (testdb3) 77> info repaa


REPLICAT   repaa  Last Started 2022-03-30 17:57   Status RUNNING

Checkpoint Lag       00:00:05 (updated 15:47:14 ago)

Log Read Checkpoint  File ./dirdat/aa559442

                     2022-03-30 16:53:00.000921  RBA 1169746574

                      

由於不能正常關閉OGG程式,需要在作業系統層面KILL,KILL完OGG程式後,執行如下名

 

GGSCI (testdb3) 18> alter   repaa extseqno    559442 extrba 1169746574

REPLICAT altered.



GGSCI (testdb3) 19> info repaa


REPLICAT   repaa  Initialized   2022-03-31 09:32   Status STOPPED

Checkpoint Lag       00:00:00 (updated 00:00:10 ago)

Log Read Checkpoint  File ./dirdat/aa559442

                     First Record  RBA 1169746574 

 

經過幾分鐘的等待,OGG恢復正常,如下:

REPLICAT    RUNNING     repaa    09:40:41      00:00:04 

等待幾小時,延遲變為0:

REPLICAT    RUNNING     repaa    00:00:04      00:00:04 

 

GGSCI (testdb3) 73> info repaa


REPLICAT   repaa  Last Started 2022-03-31 08:59   Status RUNNING

Checkpoint Lag       00:00:03 (updated 00:00:06 ago)

Log Read Checkpoint  File ./dirdat/aa559881

                     2022-03-31 15:53:58.000975  RBA 14563502




來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69996316/viewspace-2885424/,如需轉載,請註明出處,否則將追究法律責任。

相關文章