CheckPoint沒有執行-Checkpointing Not Occurring[TimesTen運維基礎]

tangyunoracle發表於2014-06-09
今天接到一個客戶的電話,說他們有一個庫的CheckPoint歷史時間比較奇怪,而且事務日誌一致都沒有刪除。
1、看了一下事務持有日誌,確實有點奇怪持有日誌的是CheckPoint檔案,而且也沒有主備複製和長事務。
Command> call ttlogholds;
< 11302, 54794696, Checkpoint                   , ocstt.ds0 >
< 11831, 3753696, Checkpoint                    , ocstt.ds1 >
2 rows found.
2、檢視了一下CheckPoint歷史:
Command> select sysdate from dual;
< 2014-05-19 17:28:54 >
1 row found.
Command> call ttckpthistory;
< 2014-06-09 09:40:32.625312, 2014-06-09 09:40:33.600128, Fuzzy           , Completed       , Checkpointer    , , 0, 11032, 54794784, 94, 629145600, 80, 5586352, 40, 2354992, 2346248, >
< 2014-06-09 09:30:32.410709, 2014-06-09 09:30:32.531441, Fuzzy           , Completed       , Checkpointer    , , 1, 11032, 13583360, 94, 629145600, 80, 5586352, 40, 2354992, 2346248, >
< 2014-06-06 16:16:18.869326, 2014-06-06 16:16:19.013980, Fuzzy           , Completed       , Checkpointer    , , 0, 10864, 12890096, 94, 629145600, 80, 5586352, 35, 2198336, 2243848, >
< 2014-06-06 16:06:18.671965, 2014-06-06 16:06:18.853700, Fuzzy           , Completed       , Checkpointer    , , 1, 10863, 50556048, 91, 629145600, 74, 5296896, 27, 1934448, 1879304, >
< 2014-06-06 15:56:18.577550, 2014-06-06 15:56:18.659042, Fuzzy           , Completed       , Checkpointer    , , 0, 10863, 21179784, 91, 629145600, 76, 5504656, 25, 1689048, 1867016, >
< 2014-06-06 15:46:18.444551, 2014-06-06 15:46:18.564260, Fuzzy           , Completed       , Checkpointer    , , 1, 10862, 58908472, 91, 629145600, 76, 5504656, 33, 2225416, 2215176, >
< 2014-06-06 15:36:18.280709, 2014-06-06 15:36:18.431814, Fuzzy           , Completed       , Checkpointer    , , 0, 10862, 29515448, 91, 629145600, 76, 5504656, 33, 2225416, 2215176, >
< 2014-05-18 10:22:21.430088, 2014-05-18 10:22:23.745732, Static          , Completed       , Subdaemon       , , 1, 11831, 3753784, 775, 629145600, 763, 53787560, 775, 629145600, 53789960, >
8 rows found.
發現CheckPoint的歷史中前面幾行的時間都是2014-06-09和2014-06-06,但是今天才2014-05-19號,難怪CheckPoint一直都沒有執行。
3、檢查sys.odbc.ini的配置和configuration配置,並允許ttconfig儲存過程檢視,CheckPoint的配置均正常,與客戶確認作業系統時間有做過調整。
懷疑是客戶修改作業系統時間引起,最後透過MetaLink文件 ID 1379020.1(Checkpointing Not Occurring)確認。

4、手動執行兩次CheckPoint,CheckPoint正常執行,而且事務日誌並清除;但是下次CheckPoint時間仍然不能自動執行CheckPoint。需要等到作業系統時間大於CkptHistory的時間才能正常自動執行CheckPoint。
5、解決辦法:
有三種解決辦法:
a)、透過ttBulkCp或ttMigrate匯出資料,重建DSN後再重啟匯入資料。

b)、採用Crontab排程定時任務,定時執行CheckPoint。

c)、修改CheckPoint的方式為按照事務日誌變化量自動發起CheckPoint,(如:call ttCkptConfig (0,1000,0); )。

----------------------------------End---------------------------------------------
參考文件:

Checkpointing Not Occurring (文件 ID 1379020.1)

Applies to:
TimesTen Data Server - Version 7.0.5.0.0 to 11.2.1 [Release 7.0 to 11.2]
Information in this document applies to any platform.
This problem could potentially occur in any TimesTen data store.


Symptoms

-Customer reported that automatic checkpointing was not occurring in a production application. All attempts to restart checkpointing failed.

-The problem was occurring in 2 different data stores existing on the same server. The problem has been previously observed by another customer and  occured on 8 different data stores running on the same server.

-The customer was using time-interval based checkpointing, which is the default TimesTen checkpoint configuration. The default is for the checkpointer to execute a checkpoint once every 600 seconds (10 minutes).
Changes

No changes were specifically made to either of the data stores themselves. However it turns out that an FE was making changes to they system clock while TimesTen was operational.
Cause


Checkpoint histories from both Nodes showed  a checkpoint entry to be 8 years in the future. Both data stores had a checkpoint history showing that a checkpoint was performed on 11-Nov-2019:
.

/* NODE_1 */

Command> call ttCkptHistory;
< 2019-11-11 17:03:00.711478, 2019-11-11 17:03:23.156722, Fuzzy , Completed , Checkpointer , , 0, 746589, 30757968, 212794, 3221225472, 210492, 2312255864, 212794, 3221225472, 2315922824, >
< 2011-11-17 01:19:31.976204, 2011-11-17 01:19:32.489286, Fuzzy , Completed , Checkpointer , , 0, 746498, 3216136, 212794, 3221225472, 210478, 2311956160, 28, 1373176, 1224072, >
< 2011-11-17 01:14:31.758497, 2011-11-17 01:14:54.678123, Fuzzy , Completed , Checkpointer , , 1, 746498, 3215984, 212794, 3221225472, 210478, 2311956160, 104516, 941230496, 1181552008, >
< 2011-11-17 01:09:31.809785, 2011-11-17 01:09:56.464715, Fuzzy , Completed , Checkpointer , , 0, 746498, 3148336, 212794, 3221225472, 210492, 2312255864, 119230, 1106716584, 1348410760, >
< 2011-11-17 01:04:31.866130, 2011-11-17 01:04:56.486027, Fuzzy , Completed , Checkpointer , , 1, 746493, 8825888, 212794, 3221225472, 210492, 2312255864, 122153, 1135240656, 1373654408, >
< 2011-11-17 00:59:31.964454, 2011-11-17 00:59:56.541392, Fuzzy , Completed , Checkpointer , , 0, 746487, 61057424, 212794, 3221225472, 210492, 2312255864, 124353, 1161729472, 1404247432, >
< 2011-11-17 00:54:31.498796, 2011-11-17 00:54:58.654976, Fuzzy , Completed , Checkpointer , , 1, 746482, 38247720, 212794, 3221225472, 210492, 2312255864, 153016, 1407583008, 1602108808, >
< 2011-11-17 22:40:01.549312, 2011-11-17 22:40:25.057321, Fuzzy , Completed , User , , 1, 749263, 14956640, 212793, 3221225472, 210490, 2312125496, 193103, 1889295352, 1961270664, >
8 rows found.


/* NODE_2 */

Command> call ttCkptHistory;
< 2019-11-11 17:03:00.711478, 2019-11-11 17:03:23.156722, Fuzzy , Completed , Checkpointer , , 0, 746589, 30757968, 212794, 3221225472, 210492, 2312255864, 212794, 3221225472, 2315922824, >
< 2011-11-17 01:19:31.976204, 2011-11-17 01:19:32.489286, Fuzzy , Completed , Checkpointer , , 0, 746498, 3216136, 212794, 3221225472, 210478, 2311956160, 28, 1373176, 1224072, >
< 2011-11-17 01:14:31.758497, 2011-11-17 01:14:54.678123, Fuzzy , Completed , Checkpointer , , 1, 746498, 3215984, 212794, 3221225472, 210478, 2311956160, 104516, 941230496, 1181552008, >
< 2011-11-17 01:09:31.809785, 2011-11-17 01:09:56.464715, Fuzzy , Completed , Checkpointer , , 0, 746498, 3148336, 212794, 3221225472, 210492, 2312255864, 119230, 1106716584, 1348410760, >
< 2011-11-17 01:04:31.866130, 2011-11-17 01:04:56.486027, Fuzzy , Completed , Checkpointer , , 1, 746493, 8825888, 212794, 3221225472, 210492, 2312255864, 122153, 1135240656, 1373654408, >
< 2011-11-17 00:59:31.964454, 2011-11-17 00:59:56.541392, Fuzzy , Completed , Checkpointer , , 0, 746487, 61057424, 212794, 3221225472, 210492, 2312255864, 124353, 1161729472, 1404247432, >
< 2011-11-17 00:54:31.498796, 2011-11-17 00:54:58.654976, Fuzzy , Completed , Checkpointer , , 1, 746482, 38247720, 212794, 3221225472, 210492, 2312255864, 153016, 1407583008, 1602108808, >
< 2011-11-17 22:40:01.777759, 2011-11-17 22:40:28.797874, Fuzzy , Completed , User , , 0, 749261, 15723336, 213130, 3221225472, 210827, 2315085008, 195219, 2827231536, 1997065608, >
8 rows found.



Customer subsequently determined that changes had been made to the server system clock which resulted in a checkpoint being registered as having been performed on Nov 11, 2019, i.e., 8 years in the future.

Because a checkpoint resides in the checkpoint history data structure with a date 8 years in the future, the next time-interval based checkpoint will not occur until  + Nov 11, 2019. In this case that means that no automatic checkpoint will occur which is to say it won't happen until  about 17:09 on Nov 11, 2019. Because of the logic  used to update the internal checkpoint history structure, this bad checkpoint date will not be flushed out of the structure until 8 checkpoints at a time later than it have been performed. So unless customer chooses to rebuild his data stores, customer will have to operate for the next 8 years with a corrupted checkpoint history structure in the data store header.
Solution

Customer has the following possible solutions and workarounds:

(1) Rebuild the affected data stores. Export the data from the data store using ttBulkCp or ttMigrate, destroy the current data store, create a new data store with identical attributes as the old data store and import the data back in. This is the safest solution and also the most time-consuming solution.                                                                 
(2) Enable a cron job which wakes up at a defined interval, connects to the data store and performs a manual checkpoint by calling 'ttCkpt' .

(3) Modify the automatic checkpointing algorithm of the data store so that it is dependent on accumulated transaction log volume instead of a time interval. If customer executes the following command in ttisql:
 
call ttCkptConfig (0,1000,0);


then the checkpointer will automatically execute a checkpoint each time the amount of transaction log data generated since the last checkpoint exceeds 1000 megabytes (1 gigabyte). Enabling a checkpoint algorithm based on accumulated log volume causes the checkpointer thread to ignore date stamp information in the checkpoint history structure, thus working around the date corruption in the checkpoint history. Check the TimesTen Reference Manual for more information on the use of 'ttCkptconfig' to change default checkpointing behavior.

References
BUG:13402829 - CORRUPTED DATES IN CHECKPOINT HISTORY ARE BLOCKING AUTOMATIC CHECKPOINTING
=======================End=================================================================

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/24930246/viewspace-1179185/,如需轉載,請註明出處,否則將追究法律責任。

相關文章