資料庫監聽不定期出現異常故障處理

yuntui發表於2016-11-03
環境:
作業系統為:SunOS 5.10
資料庫版本:ORACLE RAC 11.2.0.3.0
該主機上有2個庫,一套RAC的節點1在上面,還有另外一個庫,之所以使用的是DB下的監聽而沒用GRID下的監聽,
是為了避免停掉CRS時影響另外一個庫的使用。

故障現象:
資料庫監聽不定期出現異常 ,從應用tnsping資料庫,時間花費很長甚至連不上

故障分析處理過程:

從應用主機tnsping如下:

racdb1_scenemon$tnsping racdb_new

TNS Ping Utility for Solaris: Version 11.2.0.3.0 - Production on 19-2月 -2014 15:46:23

Copyright (c) 1997, 2011, Oracle. All rights reserved.

已使用的引數檔案:


已使用 TNSNAMES 介面卡來解析別名
嘗試連線 (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 11.111.11.1)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 11.111.11.2)(PORT = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY = 5))))

OK (60000 毫秒)

從資料庫伺服器上檢視監聽狀態:

$ lsnrctl status

LSNRCTL for Solaris: Version 11.2.0.3.0 - Production on 19-FEB-2014 15:45:58

Copyright (c) 1991, 2011, Oracle. All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))



TNS-12535: TNS:operation timed out
TNS-12560: TNS:protocol adapter error
TNS-00505: Operation timed out   

將監聽程式予以kill並重啟監聽:

$ ps -ef |grep tns
oracle 18504 18468 0 15:45:15 pts/10 0:00 grep tns
oracle 24037 1 0 Nov 28 ? 6:08 /oracle/app/asm/11.2.0/grid/bin/tnslsnr LISTENER -inherit
oracle 9488 1 3 Jan 24 ? 4447:11 /oracle/app/db/product/11.2.0/db/bin/tnslsnr LISTENER -inherit
$ 
$ kill -9 9488
$ lsnrctl start

LSNRCTL for Solaris: Version 11.2.0.3.0 - Production on 19-FEB-2014 15:48:22

Copyright (c) 1991, 2011, Oracle. All rights reserved.

Starting /oracle/app/db/product/11.2.0/db/bin/tnslsnr: please wait...

TNSLSNR for Solaris: Version 11.2.0.3.0 - Production
Log messages written to /oracle/app/db/product/11.2.0/db/log/diag/tnslsnr/racdb/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=racdb)(PORT=1521)))

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Solaris: Version 11.2.0.3.0 - Production
Start Date 19-FEB-2014 15:48:22
Uptime 0 days 0 hr. 0 min. 0 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Log File /oracle/app/db/product/11.2.0/db/log/diag/tnslsnr/racdb/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=racdb)(PORT=1521)))
The listener supports no services
The command completed successfully   

 

檢查listener.log如下:

17-FEB-2014 16:02:26 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=racdb)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=180)(DELAY=5))(CID=(PROGRAM=sqlplus@racdb1)(HOST=racdb1)(USER=tmn))) * (ADDRESS=(PROTOCOL=tcp)(HOST=11.111.11.89)(PORT=41603)) * establish * racdb * 0
17-FEB-2014 16:02:26 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=racdb)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=180)(DELAY=5))(CID=(PROGRAM=sqlplus@racdb1)(HOST=racdb1)(USER=tmn))) * (ADDRESS=(PROTOCOL=tcp)(HOST=11.111.11.89)(PORT=41659)) * establish * racdb * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
Solaris Error: 24: Too many open files <<<<<<<<<
17-FEB-2014 16:02:26 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=racdb)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=180)(DELAY=5))(CID=(PROGRAM=sqlplus@racdb1)(HOST=racdb1)(USER=tmn))) * (ADDRESS=(PROTOCOL=tcp)(HOST=11.111.11.89)(PORT=41652)) * establish * racdb * 0
17-FEB-2014 16:02:26 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=racdb)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=180)(DELAY=5))(CID=(PROGRAM=sqlplus@racdb1)(HOST=racdb1)(USER=tmn))) * (ADDRESS=(PROTOCOL=tcp)(HOST=11.111.11.89)(PORT=41669)) * establish * racdb * 0
17-FEB-2014 16:02:26 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=racdb)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=180)(DELAY=5))(CID=(PROGRAM=sqlplus@racdb1)(HOST=racdb1)(USER=tmn))) * (ADDRESS=(PROTOCOL=tcp)(HOST=11.111.11.89)(PORT=41682)) * establish * racdb * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
Solaris Error: 24: Too many open files <<<<<<<<<<

Too many open files意味著Maximum Number Of Open Files Per Process 達到了上限。因此listener hang住的原因可能是該limit設定過小,進一步檢視硬體限制如下:

$ ulimit -Ha
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
coredump(blocks) unlimited
nofiles(descriptors) 65536
memory(kbytes) unlimited

檢視軟體限制如下:

$ ulimit -Sa
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 8192
coredump(blocks) unlimited
nofiles(descriptors) 256
memory(kbytes) unlimited

檢視/etc/system如下:

*ident  "@(#)system     1.18    97/06/27 SMI" /* SVR4 1.5 */

*

* SYSTEM SPECIFICATION FILE

*


* moddir:

*

*       Set the search path for modules.  This has a format similar to the

*       csh path variable. If the module isn't found in the first directory

*       it tries the second and so on. The default is /kernel /usr/kernel

*

*       Example:

*               moddir: /kernel /usr/kernel /other/modules




* root device and root filesystem configuration:

*

*       The following may be used to override the defaults provided by

*       the boot program:

*

*       rootfs:         Set the filesystem type of the root.

*

*       rootdev:        Set the root device.  This should be a fully

*                       expanded physical pathname.  The default is the

*                       physical pathname of the device where the boot

*                       program resides.  The physical pathname is

*                       highly platform and configuration dependent.

*

*       Example:

*               rootfs:ufs

*               rootdev:/sbus@1,f8000000/esp@0,800000/sd@3,0:a

*

*       (Swap device configuration should be specified in /etc/vfstab.)




* exclude:

*

*       Modules appearing in the moddir path which are NOT to be loaded,

*       even if referenced. Note that `exclude' accepts either a module name,

*       or a filename which includes the directory.

*

*       Examples:

*               exclude: win

*               exclude: sys/shmsys




* forceload:

*

*       Cause these modules to be loaded at boot time, (just before mounting

*       the root filesystem) rather than at first reference. Note that

*       forceload expects a filename which includes the directory. Also

*       note that loading a module does not necessarily imply that it will

*       be installed.

*

*       Example:

*               forceload: drv/foo




* set:

*

*       Set an integer variable in the kernel or a module to a new value.

*       This facility should be used with caution.  See system(4).

*

*       Examples:

*

*       To set variables in 'unix':

*

*               set nautopush=32

*               set maxusers=40

*

*       To set a variable named 'debug' in the module named 'test_module'

*

*               set test_module:debug = 0x13


* Begin FJSVssf (do not edit)

set ftrace_atboot = 1

set kmem_flags = 0x100

set kmem_lite_maxalign = 8192

* End FJSVssf (do not edit)

forceload: drv/fjpfca

* Begin MDD root info (do not edit)

rootdev:/pseudo/md@0:0,0,blk

* End MDD root info (do not edit)

*** Begin EMCpower added lines *** DO NOT EDIT BELOW THIS LINE ***

forceload: drv/emcpsf

forceload: drv/sd

forceload: drv/ssd

forceload: drv/emcp

forceload: misc/emcpgpx

forceload: misc/emcpmpx

forceload: misc/emcpvlumd

forceload: misc/emcpxcrypt

forceload: misc/emcpdm

forceload: misc/emcpioc

set emcp:bPxEnableInit=1

*** End EMCpower added lines *** DO NOT EDIT ABOVE THIS LINE ***



exec_user_stack = 1                           

set noexec_user_stack_log = 1

檢視/etc/project如下:

system:0::::

user.root:1::::

noproject:2::::

default:3::::

group.staff:10::::

user.oracle:100::oracle::


綜上所述,從listener.log的輸出看,nofilessoft limit過小,只有256,這會導致Solaris Error: 24: Too many open files 錯誤。從/etc/system/etc/project,我們沒有看到設定了soft limit

 

故障處理小結及後續建議

oracle使用者的soft limit提升為至少1024,然後重新oracle使用者登入,檢驗ulimit合格後,重新啟動資料庫和監聽。

具體解決辦法如下:

1、在/etc/system增加以下行

set rlim_fd_max=65536

set rlim_fd_cur=4096

2、重新登入ORACLE並檢驗oracle使用者的限制

su – oracle

ulimit -Ha

ulimit –Sa

3、重新啟動資料庫和監聽


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30633755/viewspace-2127667/,如需轉載,請註明出處,否則將追究法律責任。

相關文章