關於Oracle RAC 叢集日誌無法輪循的問題處理

流浪的野狼發表於2015-09-29
準備過節了,對系統做了一次全面的檢查,也發現了不少的問題,對某一臺RAC檢查時發現如下錯誤:
2015-08-23 04:34:18.962:
[client(18715)]CRS-0009:log file "/oracle/app/11.2.0/grid/log/d4jt6csvpra04/client/olsnodes.log" reopened
2015-08-23 04:34:18.962:
[client(18715)]CRS-0019:file rotation terminated. log file: "/oracle/app/11.2.0/grid/log/d4jt6csvpra04/client/olsnodes.log"
2015-08-23 04:39:19.095:
[client(19507)]CRS-0014:An error occurred while attempting to delete file "/oracle/app/11.2.0/grid/log/d4jt6csvpra04/client/olsnodes.l03" during log file rotation. Additional diagnostics: LFI-00142: Unable to delete an existing file [olsnodes][l03] not owned by Oracle.

從日誌來看,應該是日誌的輪循出現了錯誤,根據提示無法刪除屬於oracle的日誌檔案,檢查日誌發現:部分日誌非常的大:
total 4.5G
-rw-r--r-- 1 grid oinstall    0 Dec  7  2013 orarootagent_rootOUT.log
-rw-r--r-- 1 grid oinstall  11M Dec 10  2013 orarootagent_root.l10
-rw-r--r-- 1 grid oinstall    5 Jan 16  2015 orarootagent_root.pid
-rw-r--r-- 1 grid oinstall 4.5G Sep 29 11:58 orarootagent_root.log
[grid@d4jt6csvpra04 orarootagent_root]$ 

發現類似於這樣的大日誌情況還是蠻多的,在MOS上看了下,與bug有關,檢查自己的環境是11.2.0.4.0並且沒有打上任何補丁,MOS建議如下:

It is caused by unpublished Bug 18700935 - CLOUD:ACLDX0085 OCSSD LOG IS NOT ROTATED

At some point in time, the Clusterware alert log reports an attempted logfile rotation failure.

As a result, the last logfile 'ocssd.110' is never deleted. This may be due to the logfile being open during logfile delete or a permissions issue on the file itself.  

The ocssd.bin thread that performs log file rotation 'clsd_logThread' encounters the delete failure and this causes the logfile never to be deleted/rotated, resulting in ocssd.log continually growing in size.


Extract of the error in $GRID_HOME/log/<hostname>/alert<hostname>.log

[cssd(29355)]CRS-1713:CSSD daemon is started in clustered mode
2014-06-05 15:37:44.512:
[cssd(29355)]CRS-0009:log file "/u01/app/11.2.0.3/grid/log/ed28db01/cssd/ocssd.log" reopened
2014-06-05 15:37:44.512:
[cssd(29355)]CRS-0019:file rotation terminated. log file: "/u01/app/11.2.0.3/grid/log/ed28db01/cssd/ocssd.log"
[cssd(29355)]CRS-0014:An error occurred while attempting to delete file "/u01/app/11.2.0.3/grid/log/ed28db01/cssd/ocssd.l10" during log file rotation. Additional diagnostics: LFI-00142: Unable to delete an existing file [ocssd][l10] not owned by Oracle.

 

 

SOLUTION

The CSSD thread that encountered the LFI-00142 error needs to be restarted to ensure log rotation works again.

Manually deleting the logfile will not resolve the log rotation problem.


1).  Shutdown CRS on the node reporting the problem.

# crsctl stop crs

2).  Once CRS is down,  proceed to manually delete the 'ocssd.l10' file, or copy the logfile to another location if you need to keep a backup.

# rm  $GRID_HOME/log/<hostname>/cssd/ocssd.l10

3).  Startup Clusterware again

# crsctl start crs

 

If you are NOT able to schedule downtime and file size growth in the GRID Home is causing a space issue then copy the logs to another location and do the following
% echo 0 > ocssd.l10
  
Please note this does not resolve the log rotation problem but only allows you to free up some space.


4). Bug 18700935 has been fixed in 11.2.0.4.5 PSU for Unix/Linux platform and 11.2.0.4.12 Bundle for Windows platform. Please apply the patch if required

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28612416/viewspace-1811489/,如需轉載,請註明出處,否則將追究法律責任。

相關文章