【故障|監聽】TNS-12518、TNS-00517和 Linux Error:32:Broken pipe
【故障|監聽】TNS-12518、TNS-00517和 Linux Error:32:Broken pipe
1.1 BLOG文件結構圖
1.2 前言部分
1.2.1 導讀和注意事項
各位技術愛好者,看完本文後,你可以掌握如下的技能,也可以學到一些其它你所不知道的知識,~O(∩_∩)O~:
① Linux Error: 32: Broken pipe的可能原因(重點)
② TNS-12518: TNS:listener could not hand off client connection的一般解決過程
③ SQL*Plus登入報錯:ORA-12537: TNS:connection closed
Tips:
① 本文在itpub(http://blog.itpub.net/26736162)、部落格園(http://www.cnblogs.com/lhrbest)和微信公眾號(xiaomaimiaolhr)上有同步更新。
② 文章中用到的所有程式碼、相關軟體、相關資料及本文的pdf版本都請前往小麥苗的雲盤下載,小麥苗的雲盤地址見:http://blog.itpub.net/26736162/viewspace-1624453/。
③ 若網頁文章程式碼格式有錯亂,請下載pdf格式的文件來閱讀。
④ 在本篇BLOG中,程式碼輸出部分一般放在一行一列的表格中。
本文如有錯誤或不完善的地方請大家多多指正,ITPUB留言或QQ皆可,您的批評指正是我寫作的最大動力。
1.3 故障分析及解決過程
1.3.1 故障環境介紹
專案 |
source db |
db 型別 |
單機 |
db version |
11.2.0.3.0 |
db 儲存 |
ASM |
OS版本及kernel版本 |
RHEL 6.5 |
1.3.2 故障發生現象及報錯資訊
登陸報錯:
[oracle@orcltest ~]$ sqlplus 'sys/"l@h\r/0"'@LHRDB as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Thu Mar 16 12:06:36 2017
Copyright (c) 1982, 2011, Oracle. All rights reserved.
ERROR: ORA-12537: TNS:connection closed
|
查監聽日誌(/u01/app/11.2.0/grid/network/admin/listener.ora),報錯如下:
16-MAR-2017 12:06:36 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=lhrdb)(CID=(PROGRAM=sqlplus)(HOST=orcltest)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.59.129)(PORT=12333)) * establish * lhrdb * 12518 TNS-12518: TNS:listener could not hand off client connection TNS-12547: TNS:lost contact TNS-12560: TNS:protocol adapter error TNS-00517: Lost contact Linux Error: 32: Broken pipe
|
1.3.3 故障分析
瀟湘隱者:http://www.cnblogs.com/kerrycode/p/4164838.html
我的解決參考:http://www.itpub.net/thread-1870217-1-1.html
根據MOS:Troubleshooting Guide for TNS-12518 TNS listener could not hand off client connection (文件 ID 550859.1)中,
Error: 32: Broken pipe
Error stack in listener log:
TNS-12518: TNS:listener could not hand off client connection TNS-12547: TNS:lost contact TNS-12560: TNS:protocol adapter error TNS-00517: Lost contact IBM/AIX RISC System/6000 Error: 32: Broken pipe
Cause: The error 32 indicates the communication has been broken while the listener is trying to hand off the client connection to the server process or dispatcher process.
Action: 1. One of reason would be processes parameter being low, and can be verified by the v$resource_limit view. 2. In Shared Server mode, check the 'lsnrctl services' output and see if the dispatcher has refused any connections, if so, then consider increasing the number of dispatchers. 3. Check the alert log for any possible errors. 4. Memory resource is also another cause for this issue. Check the swap, memory usage of the OS. 5. If RAC/SCAN or listener is running in separate home, check the following note:
Note: 1069517.1 ORA-12537 or TNS-12518 if Listener (including SCAN Listener) and Database are Owned by Different OS User
|
根據第5條,因為我的grid是11.2.0.1,而Oracle是11.2.0.3的,查詢MOS:ORA-12537 / ORA-12547 or TNS-12518 if Listener (including SCAN Listener) and Database are Owned by Different OS User (文件 ID 1069517.1),裡邊明確指出是由於$RDBMS_HOME/bin/oracle檔案的許可權問題導致。
1.3.4 故障解決
$ORACLE_HOME/bin/oracle檔案的許可權問題,應該為-rwsr-s--x,屬主應該是oracle:asmadmin。
解決辦法:
方法:
1、修改oracle的許可權為6751
[root@orcltest bin]# ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle -rwxr-s--x. 1 oracle asmadmin 210823844 May 18 2014 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle [root@orcltest ~]# chmod 6751 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle [root@orcltest bin]# ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle -rwsr-s--x. 1 oracle asmadmin 210823844 May 18 2014 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle [root@orcltest bin]#
|
2、重新用setasmgidwrap設定oracle
[root@orcltest bin]# ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle -rwxr-s--x. 1 oracle asmadmin 210823844 May 18 2014 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle [root@orcltest ~]# /u01/app/11.2.0/grid/bin/setasmgidwrap -o /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle [root@orcltest bin]# ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle -rwsr-s--x. 1 oracle asmadmin 210823844 May 18 2014 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle [root@orcltest bin]# [root@orcltest bin]# stat /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle File: `/u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle' Size: 210823844 Blocks: 411776 IO Block: 4096 regular file Device: 802h/2050d Inode: 1717737 Links: 1 Access: (6751/-rwsr-s--x) Uid: ( 501/ oracle) Gid: ( 504/asmadmin) Access: 2017-03-16 11:05:44.809363974 +0800 Modify: 2014-05-18 17:09:50.508549983 +0800 Change: 2017-03-16 12:33:15.733816820 +0800 [root@orcltest bin]# [root@orcltest bin]#
|
重新連線:
[oracle@orcltest ~]$ sqlplus 'sys/"l@h\r/0"'@LHRDB as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Thu Mar 16 13:32:48 2017
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options
SYS@LHRDB>
|
1.4 故障處理總結
如果可執行檔案$ORACLE_HOME/bin/oracle的屬主或許可權設定出了問題,那麼可能會造成很多問題。例如:無法登陸到資料庫、ora-600錯誤、“TNS-12518: TNS:listener could not hand off client connection”、“Linux Error: 32: Broken pipe”、“ORA-12537: TNS:connection closed”、訪問ASM磁碟出錯等。解決辦法很簡單,可以在grid使用者下執行setasmgidwrap命令重新配置$ORACLE_HOME/bin/oracle可執行檔案的許可權和屬主或者直接將oracle檔案的許可權修改為6751。$ORACLE_HOME/bin/oracle可執行檔案正確屬主應該是oracle:asmadmin,並且許可權必須有s共享才可以,如下所示:
[root@orcltest ~]$ which setasmgidwrap /u01/app/11.2.0/grid/bin/setasmgidwrap [root@orcltest ~]$ setasmgidwrap -o /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle [root@orcltest ~]$ ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle -rwsr-s--x 1 oracle asmadmin 232399083 Apr 21 2015 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle [root@orcltest ~]# ll /u01/app/11.2.0/grid/bin/oracle -rwsr-s--x. 1 grid oinstall 203972117 Jan 5 2015 /u01/app/11.2.0/grid/bin/oracle [root@orcltest ~]# chmod 6751 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle [root@orcltest ~]# ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle -rwsr-s--x 1 oracle asmadmin 232399083 Apr 21 2015 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle [root@orcltest bin]# cd /u01/app/oracle/product/11.2.0/dbhome_1/bin/ [root@orcltest bin]# which stat /usr/bin/stat [root@orcltest bin]# stat oracle File: `oracle' Size: 210823844 Blocks: 411776 IO Block: 4096 regular file Device: 802h/2050d Inode: 1717737 Links: 1 Access: (6751/-rwsr-s--x) Uid: ( 501/ oracle) Gid: ( 504/asmadmin) Access: 2017-03-16 12:33:44.809363974 +0800 Modify: 2014-05-18 17:09:50.508549983 +0800 Change: 2017-03-16 11:05:15.733816820 +0800
|
問:在UNIX/LINUX環境中,oracle資料庫啟動後存在許多後臺程式和前臺程式,雖然相關程式產生一些trace檔案也是常有的事情,但是真正是什麼決定了oracle相關程式的屬性呢?
答:通常來說,oracle的後臺程式的呼叫是依賴於$ORACLE_HOME/bin/oracle這個二進位制檔案,但它從遠端連入而分配的伺服器程式(server process)相關屬主的屬性則是繼承自listener程式,而listener程式的屬主屬性同樣是程式自其啟動的使用者(分oracle使用者和grid使用者)$ORACLE_HOME/bin/oracle的屬主屬性。
其他原因:
啟動ORACLE監聽的時候 報錯 Linux Error: 32: Broken pipe
原因:原來/home/oracle/product/9.2.0.1.0/network/log/listener.log 檔案超過2G,監聽就會斷掉。
解決辦法:清空日誌檔案
[oracle@localhost oracle]$ cd $ORACLE_HOME/network/log
[oracle@chicago log]$ cat </dev/null> listener.log
至此問題解決
有個命令可以對這個寫日誌進行關閉
LSNRCTL> set log_status off
要繼續寫日誌,只需要
LSNRCTL> set log_status on
也可以在 listener.ora 檔案裡面新增一項
LOGGING_LISTENER = OFF
新增了這個之後無論你怎麼重啟監聽,都不會寫日誌了。
1.5 用到的SQL集合
setasmgidwrap -o /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle chmod 6751 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle stat oracle |
1.6 參考文章
1.6.1 MOS
1.6.1.1 Troubleshooting Guide for TNS-12518 TNS listener could not hand off client connection (文件 ID 550859.1)
In this Document
Purpose |
Troubleshooting Steps |
Section I: Steps to approach ORA-12518/TNS-12518 Error: |
Section II: Commonly Known Errors: |
Error: 32: Broken pipe |
Error: 11: Resource temporarily unavailable |
Error: Connection Pooling limit reached |
Error: 24: Too many open files |
Section III: Errors Specific to Windows |
Error: 2: No such file or directory |
Error: 233: Unknown error |
Error: 54: Unknown error |
Error: 10022: Unknown error |
References |
APPLIES TO:
Oracle Net Services - Version 10.1.0.5 to 12.1.0.2 [Release 10.1 to 12.1]Information in this document applies to any platform.
PURPOSE
This article discusses how to troubleshoot the ORA-12518/TNS-12518 listener errors.
TROUBLESHOOTING STEPS
Section I: Steps to approach ORA-12518/TNS-12518 Error:
ORA-12518/TNS-12518 indicates a problem while listener hands off the client connection to the server process or dispatcher process.TNS-12518 is logged in the listener log. The client might receive ORA-12518 or some other disconnection errors like ORA-12537. Once TNS-12518 is noted in the listener log, follow the below steps to approach and resolve this error.
Let us have a small discussion about how actually database connections are made:
In Dedicated mode, database client contacts listener and supplies the SERVICE NAME of the database. Then listener spawns a dedicated server process and hands off the client connection to this dedicated server process. TNS-12518 indicates a problem while handing off the client connection to the server process.
In Shared Server mode, database client contacts listener and supplies the SERVICE NAME of the database. Then listener hands off the client connection to one of the dispatcher configured for that service. TNS-12518 indicates a problem while handing off the client connection to the dispatcher server process.
Though this error is logged in the listener log, the listener is just the messenger, ORA-12518/TNS-12518 is mostly related to RDBMS and OS resources.
Step 1. Is listener version compatible to the database
Check if you are using compatible listener version for your database version.
If the database is 8i then use 8i or 9i listeners only. 10g listeners are not compatible to work with 8i databases.
For 9i databases, 9i or 10g listeners can be used.
For 10g databases, only 10g listeners can be used.
The general rule is that use the higher version of the listener when there is a version mismatch between database and the listener.
Step 2. Gather more information from listener log
The first place you would look for the TNS-12518 error is the listener log. Usually the listener log would be located under $ORACLE_HOME/network/log directory. You can use 'lsnrctl status' command output to look for the location of the listener log file.
- - -
- - -
Listener Parameter File /ora10g/home_ora10g/network/admin/listener.ora
Listener Log File /ora10g/home_ora10g/network/log/listener.log
- - -
- - -
listener log gives the complete error stack and the database service name to which the client tried to connect to.
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Linux Error: 32: Broken pipe
In the above example, listener log shows the complete error stack, the bottom error being 32 is the OS error. It also shows that the jdbc client from IP 10.10.10.3 has tried to connect to the database service 'test.oracle.com' and failed with the error 12518.
Look for the lowest error in the stack. That is the error we have to concentrate on and try to resolve it. In the above example, the lower error is 'Linux Error: 32: Broken pipe'.
Step 3. Are service handlers in blocked state
Check if the handlers are in blocked state. Check the output of the 'lsnrctl services'. Examine the status information under the database service name. From the listener log you would know which database service was affected by the 12518 error, now with the output of the 'lsnrctl services' under that service name gather more information.
Instance "db10g", status READY, has 2 handler(s) for this service...
Handler(s):
"DEDICATED" established:9 refused:0 state:ready
LOCAL SERVER
"D000" established:10 refused:0 current:0 max:972 state:ready
DISPATCHER <machine: oid.mohan.com, pid: 25908>
(ADDRESS=(PROTOCOL=tcp)(HOST=oid.mohan.com)(PORT=33487))
The highlighted state should be in 'ready' state for the connection to be successful. If the state is in 'blocked' then the connection are not possible. The state of a handler could be in blocked state in the following scenario:
i. The database parameter processes reached its value.
ii. The database is in the process of startup or shutting down.
In shared server mode, the number of dispatchers should be set according to the load that you expect. 'lsnrctl services' output shows the maximum number(max:997) of connections that the dispatcher would accept and the number connections refused (refused:0) by this dispatcher. If any connections refused by the dispatcher, then consider increasing the number of dispatchers.
If you are using PFILE edit init.ora and increase the dispatchers parameter. If you are using SPFILE you can dynamically increase the dispatchers parameter by the'alter system set' command.
Step 4. Is a local BEQ connection successful
Check if local BEQ connection to the database works fine. It also verifies if the database is up and in good condition to accept the connection. If the database is down or in a hung state then a connection request to the database by the listener will not be possible.
Connect to the database server via telnet or ssh and check if a local bequeath SQL*Plus connection works. In other words, issue:
sqlplus username/password [Enter]
This connection bypasses the listener and directly connects to the database via the BEQ (bequeath) protocol. If this fails, then the TNS-12518 listener error is simply a result of the database issue.
One such error is:
ORA-12560: TNS:protocol adapter error
A possible cause for this error on Microsoft Windows servers, is that the Windows Database Service has not yet been created (common when creating a "standby" instance).
Resolution for this would be to create the Windows Service first by using the "oradim" command (see the Database Admin guide for details on oradim and service creation).
Step 5. Has number of processes reached its limit?
If local BEQ is successful, check the below query
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_AL LIMIT_VALU
-------------------- ------------------- --------------- ---------- ----------
processes 249 250 250 250
sessions 54 82 280 280
- - - - - -
- - - - - -
Verify if the processes or sessions reached its limit value. If these database parameters reached its limit value, then consider increasing it accordingly.
In the above example, the processes parameter has been set to 250. It's MAX_UTILIZATION has reached the limit value of 250, so the processes parameter should be increased further to accomodate the number of incoming connections.
Edit the init.ora and set the processes parameter to a higher value. By default, if you just increase the processes it is enough, the sessions value would automatically be increased.
Check the alert.log for a corresponding error such as "ksvcreate" process failed, etc. Check timestamp against listener.log timestamps for causal relationship.
Step 6. Are OS kernel parameters configured for optimum?
Database is operated by a single user, normally it would be 'oracle' user. At the Operating System level, there is a limit for the number of process spawned by a user. And also there is a limit for the total number of process running on the entire OS.
The Oracle Database and the newly spawned processes would be owned by the 'oracle' user. And so make sure that these values are set accordingly.On Unix these values are configured through the configurable OS kernel parameters and is specific to Operating Systems. You will have check your corresponding OS documentation for your OS.
For example for HP-UX the configurable kernel parameters are,
maxuprc Maximum number of processes for each non-root user
nproc Maximum number of processes on the system
Step 7. Does alert log have any errors?
Look in alert log and look for any errors related to memory or process during the time the error TNS-12518 occurred in the listener log. If the alert log has any memory related errors, there there is a potential memory resource issue at the OS level.
OS memory issues can be addressed by the below:
i. Make sure that the OS has been configured with the enough Swap memory. In case of Windows it is called as Virtual Memory.
ii. Reduce the size of SGA, thus the newly spawned server process will have some more system memory available.
iii. Reduce the PGA size, so that the newly spawned server process would occupy less memory.
iv. If you are in DEDICATED mode, try switching to MTS mode.
To address memory issue for 32-bit Windows: Refer Note 371983.1
However, the errors in the alert logs are not being discussed in this article, they are out of the scope of this article.
Step 8. If using a statically defined SID_DESC in the listener.ora file for your sid, ensure that it is configured properly.
A common mistake is to include a (PROGRAM=EXTPROC) parameter:
(SID_LIST=
(SID_DESC =
(GLOBAL_DBNAME = ORCL.oracle.com)
(SID_NAME = ORCL)
(PROGRAM=extproc)
(ORACLE_HOME = D:\oracle\product\10.2.0\db_1)))
This misconfiguration occurs when the PLSExtproc SID_DESC is copied, pasted and edited in the listener.ora file. The inclusion of the PROGRAM line will cause an ORA-12518 to be returned to the client. Here's the corrected SID_DESC for our example SID:
(SID_LIST=
(SID_DESC =
(GLOBAL_DBNAME = ORCL.oracle.com)
(SID_NAME = ORCL)
(ORACLE_HOME = D:\oracle\product\10.2.0\db_1)))
Section II: Commonly Known Errors:
This section lists some of the known and reported errors. It is also recommended that you refer the Section I above for a generic troubleshooting approach to the error TNS-12518.
Below shows example error stack that can be found in the listener log. The last line in the error stack shows the actual operating system name. Depending on the OS, only the operating system name would be different. For example, if you encounter error 32: Broken pipe, according to the OS, the last line in the error stack would be different only by the OS name, as shown below.
Solaris Error: 32: Broken pipe
HPUX Error: 32: Broken pipe
Linux Error: 32: Broken pipe
_______________________________________________________________________________________________________________________________________
Error: 32: Broken pipe
Error stack in listener log:
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
IBM/AIX RISC System/6000 Error: 32: Broken pipe
Cause:
The error 32 indicates the communication has been broken while the listener is trying to hand off the client connection to the server process or dispatcher process.
Action:
1. One of reason would be processes parameter being low, and can be verified by the v$resource_limit view.
2. In Shared Server mode, check the 'lsnrctl services' output and see if the dispatcher has refused any connections, if so, then consider increasing the number of dispatchers.
3. Check the alert log for any possible errors.
4. Memory resource is also another cause for this issue. Check the swap, memory usage of the OS.
5. If RAC/SCAN or listener is running in separate home, check the following note:
Note: 1069517.1 ORA-12537 or TNS-12518 if Listener (including SCAN Listener) and Database are Owned by Different OS User
______________________________________________________________________________________________________________________________________
Error: 11: Resource temporarily unavailable
Error stack in listener log:
TNS-12518: TNS:listener could not hand off client connection
TNS-12549: TNS:operating system resource quota exceeded
TNS-12560: TNS:protocol adapter error
TNS-00519: Operating system resource quota exceeded
IBM/AIX RISC System/6000 Error: 11: Resource temporarily unavailable
Cause:
As the error indicates operating system resource has exceeded.
Action:
1. Increase the appropriate OS kernel parameters for 'maximum number of processes allowed per user'.
For example for HP-UX the parameters are maxuprc and nproc.
______________________________________________________________________________________________________________________________________
Error: 12: Not enough space
Error stack in listener log:
TNS-12518: TNS:listener could not hand off client connection
TNS-12549: TNS:operating system resource quota exceeded
TNS-12560: TNS:protocol adapter error
TNS-00519: Operating system resource quota exceeded
IBM/AIX RISC System/6000 Error: 12: Not enough space
Cause:
This is a memory related issue. The error indicates that there is not enough memory available to spawn and hand off the client connections.
Typical problems are:
- Out of system memory / swap
- Out of process slots in the process table
- Streams resources depleted
- Out of File Handles
- sga memory usage
Action:
1. Check in the alert log for any possible memory related error.
2. Increase swap/Virtual memory if possible the available memory.
3. SGA and PGA can be reduced to address the memory consumption.
4. MTS mode can be used to reduce the amount of process and memory consumption.
________________________________________________________________________________________________________________________________________
Error: Connection Pooling limit reached
Error stack in listener log:
TNS-12518 TNS:listener could not hand off client connection
TNS-12564 TNS:connection refused
TNS-12602 TNS: Connection Pooling limit reached
Action:
1. Try increasing initial number of dispatchers.
_______________________________________________________________________________________________________________________________________
Error: 24: Too many open files
Error stack in listener log:
TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
Linux Error: 24: Too many open files
or
TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
Solaris Error: 24: Too many open files
Error Description:
Out of file descriptors
Cause:
Can be caused by the following unpublished defect:
BUG 13078786 - LISTENER GOES DOWN SUDDENLY W/ LINUX ERROR: 32: BROKEN PIPE
Action:
See Note 1527483.1 11.2 : ORA-12518 Listener Hangs and Reports "Too Many Open Files"
Use prescribed workaround OR apply one-off patch to your environment if available.
_______________________________________________________________________________________________________________________________________
Error: 1: Not Owner
Error Description:
The listener doesn't own or have permissions to write a necessary file
Cause:
The listener doesn't have adequate permission on socket files
Error stack in listener log:
TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
HPUX Error: 1: Not owner
Action: Clear /var/tmp/.oracle/ directory
IMPORTANT NOTE: In RAC environment, please use caution when removing existing socket files. See Note 2099377.1 How to remove Network socket files in a RAC Environment for Cluster/Resource startup issues
=========================================================================================================================
Section III: Errors Specific to Windows
It is also recommended that you refer the Section I above for a generic troubleshooting approach to the error TNS-12518.
This section briefly describes about the errors that are encountered on Windows Operating System. TNS-12518 most commonly occurs on 32-bit OS due to its memory constraint, however TNS-12518 can occur on 64-bit OS as well. See Note 873752.1 for more information on Windows memory addressing and the 3GB switch.
_______________________________________________________________________________________________________________________________________
Error: 2: No such file or directory
Error stack in listener log:
TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:proto adapter error
TNS-00530: Proto adapter error
32-bit Windows Error: 2: No such file or directory
Error Description: ERROR_FILE_NOT_FOUND 2 The system cannot find the file specified.
Cause: This indicates the database service is not actually available
Action: 1.Verify if the intended database really up and accepting local BEQ connections.
_______________________________________________________________________________________________________________________________________
Error: 233: Unknown error
Error stack in listener log:
TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
32-bit Windows Error: 233: Unknown error
Error Description:
ERROR_PIPE_NOT_CONNECTED
233
No process is on the other end of the pipe.
Cause:
The communication has been broken while the listener is trying to hand off the client connection to the server process or dispatcher process.
Action:
Refer Note 371983.1
_______________________________________________________________________________________________________________________________________
Error: 54: Unknown error
Error stack in listener log:
TNS-12518: TNS:listener could not hand off client connection
TNS-12571: TNS:packet writer failure
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
32-bit Windows Error: 54: Unknown error
Error Description:
ERROR_NETWORK_BUSY
54
0x36
The network is busy.
Cause:
This indicates a bottleneck at the network layer(TCP/IP).
Action:
1.Try increasing dispatchers and shared servers.
Error: 10022: Unknown error
Error stack in listener log:
12518 TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error TNS-00534: Failed to grant connection ownership to child
64-bit Windows Error: 10022: Unknown error
Error Description:
Error: 10022: Invalid Argument
Cause:
An invalid argument was supplied.
Action:
Check the Control Panel Services Properties and confirm both listener and instance are started with same "Log on As" account.
Macromedia}{Oracle JDBC Driver][Oracle]Connection refused, (DESCRIPTION=(TMP=)(VSNNUM=186646784)(ERR=12518)(ERROR_STACK= (ERROR=(CODE=12518)(EMFI=4))(ERROR=(CODE=12560)(EMFI=4))(ERROR=(CODE=530)(EMFI=4))(ERROR=(BUF='64-bit Windows Error:203: Unknown error'))))
REFERENCES
NOTE:873752.1 - Windows Memory Configuration: 32-bit and 64-bitNOTE:1069517.1 - ORA-12537 / ORA-12547 or TNS-12518 if Listener (including SCAN Listener) and Database are Owned by Different OS User
NOTE:371983.1 - 10gR2 Dedicated Connections Intermittently Fail with TNS-12518
NOTE:1527483.1 - 11.2 : ORA-12518 Listener Hangs and Reports "Too Many Open Files"
1.6.1.2 ORA-12537 / ORA-12547 or TNS-12518 if Listener (including SCAN Listener) and Database are Owned by Different OS User (文件 ID 1069517.1)
In this Document
Symptoms |
Cause |
Solution |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.3 and laterOracle Database Configuration Assistant - Version 11.1.0.7 and later
Oracle Net Services - Version 11.2.0.2 and later
Information in this document applies to any platform.
SYMPTOMS
In environment where listener home (including SCAN listener which resides in GRID Infrastructure/ASM home) and database home are owned by different OS user, ORA-12537 could happen when connecting through listener, when creating database through DBCA, or when installing database software and creating a database in runInstaller. Job Role Separation is a typical example as SCAN and local grid home listener is owned differently than database.
- Error detail
12537, 00000, "TNS:connection closed"
// *Cause: "End of file" condition has been reached; partner has disconnected.
// *Action: None needed; this is an information message.
Before the error is reported, connection could hang for a while, and a core file may also be generated.
- Screen output:
SQL> conn system/xxx@OSES
ERROR:
ORA-12537: TNS:connection closed
OR
PRCR-1079 : Failed to start resource ora.db11.db
CRS-5011: Check of resource db11 failed: details at ..
CRS-2674: Start of ora.db11.db on racnode1 failed
ORA-12546: TNS:permission denied
- listener_scann.log or listener.log
TNS-12518: TNS:listener could not hand off client connection
TNS-12546: TNS:permission denied
TNS-12560: TNS:protocol adapter error
TNS-00516: Permission denied
Linux Error: 13: Permission denied
OR
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Linux Error: 32: Broken pipe
- runInstaller error if installing
INFO: Starting Output Reader Threads for process /ocw/grid/bin/kfod
INFO: Parsing KFOD-00300: OCI error [-1] [OCI error] [ORA-12547: TNS:lost contact
INFO: Parsing ] [12547]
INFO: Parsing
INFO: The process /ocw/grid/bin/kfod exited with code 1
..
SEVERE: [FATAL] [INS-30502] No ASM disk group found.
CAUSE: There were no disk groups managed by the ASM instance +ASM1.
CAUSE
1. In environments where the listener is not run in the same ORACLE_HOME where the database resides, the listener owner (including SCAN listener) may not be able to access the oracle binary in the database home. This is common in RAC or whenever a GRID_HOME and a database ORACLE_HOME are installed.
As listener owner:
ls: /home/oracle/app/oracle/product/11.2/db/bin/oracle: Permission denied
2. Oracle binary in database home has wrong permission:
-rwxr-x--x 1 oracle asmadmin 184286251 Aug 9 16:25 /home/oracle/app/oracle/product/11.2/db/bin/oracle
The permission "-rwxr-x--x" is wrong as it's missing suid bit, oracle binary should have permission of 6751:
Note: If Job Role Separation is in place, the group will be <asmadmin>, otherwise it will be <oinstall>
3. File System for database home does not support setuid/suid or has nosetuid/nosuid set:
/home/oracle on /dev/dsk/diskoracle read/write/nosuid..
4. RDBMS_HOME/lib has wrong ownership/permission:
As listener owner:
ls: /home/oracle/app/oracle/product/11.2/db/lib: Permission denied
5. Another cause may be permissions on the RDBMS Directory structure or Home directory which needs to be accessed by the CRS user.
Check that the RDBMS $ORACLE_HOME is set to 755.
This can be seen from an OS trace such as strace or truss when using it to trace the CRS user running the "oracle" executable which fails with the "Permission denied" error.
Also:
a) Log in as the "GRID" user on each node, and issue the following (on each directory under the RDBMS Home) :-
ls: cannot open directory /u01/app/oracle: Permission denied
***NOTE: the Oracle directory has 700 for the permissions, which should be changed to 755:
Current (incorrect) -->
drwx------. 8 oracle oinstall 4096 Oct 12 08:38 oracle
Should be (correct) -->
drwxr-xr-x. 8 oracle oinstall 4096 Oct 12 08:38 oracle
b) Likewise the /product directory has 700 perms, so change to 755 -->
ls: cannot open directory /u01/app/oracle/product: Permission denied
Current (incorrect) -->
drwx------. 3 oracle oinstall 4096 Oct 12 08:58 product
Should be (correct) -->
drwxr-xr-x. 3 oracle oinstall 4096 Oct 12 08:58 product
c) Now a connection should work:
SQL>
SOLUTION
Solution is to make sure file system for database home has setuid/suid set, database binary($RDBMS_HOME/bin/oracle) has correct ownership and permission, and listener owner is able to access database oracle binary (as listener owner, "ls -l $RDBMS_HOME/bin/oracle" will tell)
If its Job Role Separation environment, please refer to the following document:
Oracle? Grid Infrastructure
Installation Guide
11g Release 2 (11.2)
Oracle ASM Job Role Separation Option with SYSASM
REFERENCES
NOTE:422173.1 - Local SQL*Plus Connection and DBCA Fails With: ORA-12547: TNS:Lost ContactNOTE:975457.1 - How to Troubleshoot Connectivity Issue with 11gR2 SCAN Name
NOTE:970619.1 - ORA-12545 or ORA-12537 While Connecting to RAC through SCAN name
1.6.1.3 TNS Listener Crash with Core Dump (文件 ID 549932.1)
In this Document
Symptoms |
Cause |
Solution |
References |
APPLIES TO:
Oracle Net Services - Version 10.2.0.1 to 10.2.0.4 [Release 10.2]Generic UNIX
***Checked for relevance on 6-JUL-2016***
SYMPTOMS
- There may be heavy load on the CPU shooting up to 100%.
- The number of sessions in the database is well below the upper or maximum limit defined in the parameter file.
- The listener crashes suddenly during this heavy CPU load generating the core.
- (Optional) Listener.Ora has SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF.
Listener Log:
-------------
.....
19-NOV-2007 13:40:49 * (CONNECT_DATA=(SID=ORAC)(CID=(PROGRAM=C:\pegasos\te\usys\bin\uniface.exe)(HOST=TERVI-NB179)(USER=kjokioja))) * (ADDRESS= (PROTOCOL=tcp)(HOST=10.12.152.5)(PORT=1670)) * establish * ORAC * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12571: TNS:packet writer failure
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
Linux Error: 104: Connection reset by peer
19-NOV-2007 13:40:49 * (CONNECT_DATA=(SID=ORAC)(CID=(PROGRAM=C:\pegaos\te\usys\bin\uniface.exe)(HOST=TERVI-0184A)(USER=paitasal))) * (ADDRESS=(PROTO COL=tcp)(HOST=10.12.176.136)(PORT=1574)) * establish * ORAC * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Linux Error: 32: Broken pipe
.........
........
The Operating system log (/var/log/messages) may show the following :
tnslsnr[6469]: segfault at 0000000000000018 rip 0000003eab66854d rsp 0000007fbfff9420 error 4
tnslsnr[7375]: segfault at 0000000000000018 rip 0000003eab668bb3 rsp 0000007fbfff9c70 error 4
Gdb on the core file generated shows the following stack :
Program terminated with signal 11, Segmentation fault.
Reading symbols from /opt/oracle/ora102/lib/libclntsh.so.10.1...(no debugging symbols found)...done.
........
.........
(gdb) bt
#0 0x00000032b7468bb3 in _int_free () from /lib64/tls/libc.so.6
#1 0x00000032b74691f6 in free () from /lib64/tls/libc.so.6
#2 0x000000000040f01f in nsglhe ()
#3 0x00000000004114e9 in nsglma ()
#4 0x00000000004061cb in main ()
(gdb) where
#0 0x00000032b7468bb3 in _int_free () from /lib64/tls/libc.so.6
#1 0x00000032b74691f6 in free () from /lib64/tls/libc.so.6
#2 0x000000000040f01f in nsglhe ()
#3 0x00000000004114e9 in nsglma ()
#4 0x00000000004061cb in main ()
The core indicates that the program terminated with signal 11, Segmentation fault .
SIGSEGV is reported for improper memory handling .The default action for a program upon receiving
SIGSEGV is abnormal termination. This action will end the process.
The virtual memory stats (vmstat) output may show huge paging/swapping activity .
For Example:
-------------
$vmstat 5 7 The si and so column under the swap section suggests the amount of memory swapped in from disk (/s)and amount of memory swapped to disk (/s) respectively.
r b w swpd free buff cache si so bi bo in cs us sy id
2 0 0 27124 3940 1160 74528 0 1304 956 5360 272 191 51 21 98
3 0 0 27080 3756 1180 72580 0 260 2552 388 218 429 90 10 89
14 2 1 26808 5096 1188 69868 84 1108 2016 9064 490 567 59 22 96
4 0 0 25548 5912 1192 73032 0 0 436 0 478 736 50 50 0
5 0 0 25548 3940 1192 73548 0 0 1560 0 301 385 93 7 0
1 1 1 25548 3336 1192 71800 8 176 2848 432 258 147 91 9 88
1 0 0 25544 4124 1200 70480 116 60 836 60 171 200 97 3 92
CAUSE
Extensive paging/swapping activity is a clear indication that the system is running out of the physical memory.
SOLUTION
Solution :
1. Increase the physical memory of the system.
OR
2. Apply the for unpublished Bug 6139856 if available for your platform.
OR
3. Configure Hugepages on the OS. Ref : Note 361323.1
A Note on HugePages:
Hugepages is a mechanism that allows the Linux kernel to utilise the multiple page size capabilities of modern hardware architectures. Its important to know that utilising a huge amount of physical memory with the default page size consumes the TLB and adds processing overhead. The Linux kernel is able to set aside a portion of physical memory to be able be addressed using a larger page size. Since the page size is higher, there will be less overhead managing the pages with the TLB. In the Linux 2.6 series of kernels, hugepages is enabled using the CONFIG_HUGETLB_PAGE feature when the kernel is built. Systems with large amount of memory can be configured to utilise the memory more efficiently by setting aside a portion dedicated for hugepages. The actual size of the page is dependent on the system architecture. A typical x86 system will have a Huge Page Size of 2048 kBytes. The huge page size may be found by looking at the /proc/meminfo :
# cat /proc/meminfo |grep Hugepagesize Hugepagesize: 2048 kB
For more on configuring Hugepages please refer Note 361323.1
REFERENCES
- LISTENER DIED BY SEGFAULT AFTER TNS ERRORNOTE:361323.1 - HugePages on Linux: What It Is... and What It Is Not...
1.6.2 部落格
瀟湘隱者:http://www.cnblogs.com/kerrycode/p/4164838.html
最近一週,有一臺ORACLE資料庫伺服器的監聽服務在凌晨2點過幾分的時間點突然崩潰,以前從沒有出現過此類情況,但是最近一週出現了兩次這種情況,檢查時發現瞭如下一些資訊:
檢查監聽服務的日誌資訊(/u01/app/oracle/product/10.2.0/db_1/network/log/listener.log),發現如下資訊
08-DEC-2014 02:07:41 * (CONNECT_DATA=(SERVICE_NAME=epps)(CID=(PROGRAM=D:\Tasks\FGIS2MES\CEK\gmt_auto.exe)(HOST=CEGWEB1)(USER=cegadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.21.34)(PORT=4419)) * establish * epps * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Linux Error: 32: Broken pipe
第二次錯誤資訊截圖
在metalink上查到ID 550859.1,出現Linux Error: 32: Broken pipe這個錯誤的原因大概有下面一些因素導致:
Cause:
The error 32 indicates the communication has been broken while the listener is trying to hand off the client connection to the server process or dispatcher process.
Action:
1. One of reason would be processes parameter being low, and can be verified by the v$resource_limit view.
2. In Shared Server mode, check the 'lsnrctl services' output and see if the dispatcher has refused any connections, if so, then consider increasing the number of dispatchers.
3. Check the alert log for any possible errors.
4. Memory resource is also another cause for this issue. Check the swap, memory usage of the OS.
1:錯誤原因之一:processes引數的值偏低
關於v$resource_limit,如下所示
Column |
Datatype |
Description |
|
RESOURCE_NAME |
VARCHAR2(30) |
資源名 |
Name of the resource |
CURRENT_UTILIZATION |
NUMBER |
資源的當前使用量 |
Number of (resources, locks, or processes) currently being used |
MAX_UTILIZATION |
NUMBER |
例項啟動以來,資源使用達到的最大值 |
Maximum consumption of this resource since the last instance start-up |
INITIAL_ALLOCATION |
VARCHAR2(10) |
初始化值,一般等於limit_value |
Initial allocation. This will be equal to the value specified for the resource in the initialization parameter file (UNLIMITED for infinite allocation). |
LIMIT_VALUE |
VARCHAR2(10) |
系統設定的資源值 |
Unlimited for resources and locks. This can be greater than the initial allocation value (UNLIMITED for infinite limit). |
如下所示,我們可以發現自從例項啟動以來,processes的最大值為152,而LIMIT_VALUE的值為170,所以可以排除這個原因導致上述錯誤。
SQL> select * from v$resource_limit;
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
--------------------- ------------------- --------------- ------------------ ------------
processes 113 152 170 170
sessions 115 154 192 192
enqueue_locks 57 117 2480 2480
enqueue_resources 40 86 1064 UNLIMITED
ges_procs 0 0 0 0
ges_ress 0 0 0 UNLIMITED
ges_locks 0 0 0 UNLIMITED
ges_cache_ress 0 0 0 UNLIMITED
ges_reg_msgs 0 0 0 UNLIMITED
ges_big_msgs 0 0 0 UNLIMITED
ges_rsv_msgs 0 0 0 0
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
--------------------- ------------------- --------------- ------------------ ------------
gcs_resources 0 0 0 0
gcs_shadows 0 0 0 0
dml_locks 0 76 844 UNLIMITED
temporary_table_locks 0 3 UNLIMITED UNLIMITED
transactions 2 12 211 UNLIMITED
branches 0 1 211 UNLIMITED
cmtcallbk 0 1 211 UNLIMITED
sort_segment_locks 39 131 UNLIMITED UNLIMITED
max_rollback_segments 11 11 211 65535
max_shared_servers 1 1 UNLIMITED UNLIMITED
parallel_max_servers 0 0 0 3600
22 rows selected.
SQL>
2: 如下所示,shared_servers的值大於1,伺服器是共享模式
客戶端連線資料庫都是專用伺服器模式dedicated server connection,如下所示
lsnrctl services 檢視時並沒有發現dispatcher has refused any connections,所以也可以排除dispatchers數量偏少的原因。這個如下截圖所示
3: 關於記憶體資源不足會導致該錯誤,如下所示,伺服器RAM 16G, 實體記憶體基本使用完,但是Swap基本上沒有使用多少,有點嫌疑,但是還是沒有明顯的證據。
檢視系統日誌messages,如下所示,第一次在02:07有一條錯誤記錄,檢視了一些資料後,判定應該是由記憶體訪問越界造成的。這個倒是一個支援記憶體不足導致監聽服務錯誤的有力證據。
[root@ceglnx01 log]# more /var/log/messages
Dec 7 04:02:13 ceglnx01 syslogd 1.4.1: restart.
Dec 8 02:07:41 ceglnx01 kernel: tnslsnr[8040]: segfault at 18 ip 0000003aee8705d5 sp 00007fff010a07f0 error 4 in libc-2.5.so[3aee80
0000+14e000]
Dec 8 07:53:22 ceglnx01 avahi-daemon[3706]: Invalid query packet.
Dec 8 07:54:02 ceglnx01 last message repeated 7 times
Dec 8 08:12:18 ceglnx01 avahi-daemon[3706]: Invalid query packet.
Dec 8 08:20:16 ceglnx01 last message repeated 9 times
第二次錯誤資訊,也是發生在凌晨2:07,具體錯誤資訊如下所示:
Dec 9 02:10:51 ceglnx01 kernel: INFO: task gdb:17053 blocked for more than 120 seconds.
Dec 9 02:10:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 9 02:10:51 ceglnx01 kernel: gdb D ffff880427d52840 0 17053 17019 0x00000080
Dec 9 02:10:51 ceglnx01 kernel: ffff8800bbb1dd50 0000000000000086 ffff8800bbb1dd20 0000000000000202
Dec 9 02:10:51 ceglnx01 kernel: ffff880003f78800 ffff88008d93c6c0 ffff880003f78bd0 ffffffff81231b4a
Dec 9 02:10:51 ceglnx01 kernel: 0000000000004800 0000000000000100 ffff8800bbb1de00 7fffffffffffffff
Dec 9 02:10:51 ceglnx01 kernel: Call Trace:
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff81231b4a>] ? cpumask_next_and+0x20/0x32
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff814527e9>] schedule_timeout+0x36/0xe7
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff8104541f>] ? __enqueue_entity+0x79/0x7b
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff81045684>] ? enqueue_entity+0x263/0x270
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff8104367e>] ? need_resched+0x23/0x2d
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff8145265b>] wait_for_common+0xb7/0x12c
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff8104cf2f>] ? default_wake_function+0x0/0x19
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff810448c9>] ? task_rq_unlock+0x11/0x13
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff81452773>] wait_for_completion+0x1d/0x1f
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff81056ec5>] do_fork+0x2d0/0x339
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff81456080>] ? do_page_fault+0x28a/0x299
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff8101923b>] sys_vfork+0x25/0x27
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff81012113>] stub_vfork+0x13/0x20
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff81011db2>] ? system_call_fastpath+0x16/0x1b
Dec 9 02:10:51 ceglnx01 kernel: INFO: task gdb:17069 blocked for more than 120 seconds.
Dec 9 02:10:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 9 02:10:51 ceglnx01 kernel: gdb D 0000000000000000 0 17069 17022 0x00000080
Dec 9 02:10:51 ceglnx01 kernel: ffff88008d93fd50 0000000000000082 0000000000000000 0000000000000202
Dec 9 02:10:51 ceglnx01 kernel: ffff88008d93c6c0 ffffffff81aa84c0 ffff88008d93ca90 000000054227a93b
Dec 9 02:10:51 ceglnx01 kernel: 0000000000004c00 0000000000000000 0000000000000000 7fffffffffffffff
Dec 9 02:10:51 ceglnx01 kernel: Call Trace:
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff814527e9>] schedule_timeout+0x36/0xe7
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff8104541f>] ? __enqueue_entity+0x79/0x7b
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff81045684>] ? enqueue_entity+0x263/0x270
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff8104367e>] ? need_resched+0x23/0x2d
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff8145265b>] wait_for_common+0xb7/0x12c
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff8104cf2f>] ? default_wake_function+0x0/0x19
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff810448c9>] ? task_rq_unlock+0x11/0x13
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff81452773>] wait_for_completion+0x1d/0x1f
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff81056ec5>] do_fork+0x2d0/0x339
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff81456080>] ? do_page_fault+0x28a/0x299
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff8101923b>] sys_vfork+0x25/0x27
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff81012113>] stub_vfork+0x13/0x20
Dec 9 02:10:51 ceglnx01 kernel: [<ffffffff81011db2>] ? system_call_fastpath+0x16/0x1b
Dec 9 02:12:51 ceglnx01 kernel: INFO: task gdb:17053 blocked for more than 120 seconds.
Dec 9 02:12:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 9 02:12:51 ceglnx01 kernel: gdb D ffff880427d52840 0 17053 17019 0x00000080
Dec 9 02:12:51 ceglnx01 kernel: ffff8800bbb1dd50 0000000000000086 ffff8800bbb1dd20 0000000000000202
Dec 9 02:12:51 ceglnx01 kernel: ffff880003f78800 ffff88008d93c6c0 ffff880003f78bd0 ffffffff81231b4a
Dec 9 02:12:51 ceglnx01 kernel: 0000000000004800 0000000000000100 ffff8800bbb1de00 7fffffffffffffff
Dec 9 02:12:51 ceglnx01 kernel: Call Trace:
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff81231b4a>] ? cpumask_next_and+0x20/0x32
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff814527e9>] schedule_timeout+0x36/0xe7
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff8104541f>] ? __enqueue_entity+0x79/0x7b
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff81045684>] ? enqueue_entity+0x263/0x270
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff8104367e>] ? need_resched+0x23/0x2d
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff8145265b>] wait_for_common+0xb7/0x12c
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff8104cf2f>] ? default_wake_function+0x0/0x19
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff810448c9>] ? task_rq_unlock+0x11/0x13
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff81452773>] wait_for_completion+0x1d/0x1f
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff81056ec5>] do_fork+0x2d0/0x339
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff81456080>] ? do_page_fault+0x28a/0x299
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff8101923b>] sys_vfork+0x25/0x27
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff81012113>] stub_vfork+0x13/0x20
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff81011db2>] ? system_call_fastpath+0x16/0x1b
Dec 9 02:12:51 ceglnx01 kernel: INFO: task gdb:17069 blocked for more than 120 seconds.
Dec 9 02:12:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 9 02:12:51 ceglnx01 kernel: gdb D 0000000000000000 0 17069 17022 0x00000080
Dec 9 02:12:51 ceglnx01 kernel: ffff88008d93fd50 0000000000000082 0000000000000000 0000000000000202
Dec 9 02:12:51 ceglnx01 kernel: ffff88008d93c6c0 ffffffff81aa84c0 ffff88008d93ca90 000000054227a93b
Dec 9 02:12:51 ceglnx01 kernel: 0000000000004c00 0000000000000000 0000000000000000 7fffffffffffffff
Dec 9 02:12:51 ceglnx01 kernel: Call Trace:
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff814527e9>] schedule_timeout+0x36/0xe7
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff8104541f>] ? __enqueue_entity+0x79/0x7b
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff81045684>] ? enqueue_entity+0x263/0x270
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff8104367e>] ? need_resched+0x23/0x2d
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff8145265b>] wait_for_common+0xb7/0x12c
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff8104cf2f>] ? default_wake_function+0x0/0x19
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff810448c9>] ? task_rq_unlock+0x11/0x13
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff81452773>] wait_for_completion+0x1d/0x1f
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff81056ec5>] do_fork+0x2d0/0x339
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff81456080>] ? do_page_fault+0x28a/0x299
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff8101923b>] sys_vfork+0x25/0x27
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff81012113>] stub_vfork+0x13/0x20
Dec 9 02:12:51 ceglnx01 kernel: [<ffffffff81011db2>] ? system_call_fastpath+0x16/0x1b
Dec 9 02:14:51 ceglnx01 kernel: INFO: task gdb:17053 blocked for more than 120 seconds.
Dec 9 02:14:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 9 02:14:51 ceglnx01 kernel: gdb D ffff880427d52840 0 17053 17019 0x00000080
Dec 9 02:14:51 ceglnx01 kernel: ffff8800bbb1dd50 0000000000000086 ffff8800bbb1dd20 0000000000000202
Dec 9 02:14:51 ceglnx01 kernel: ffff880003f78800 ffff88008d93c6c0 ffff880003f78bd0 ffffffff81231b4a
Dec 9 02:14:51 ceglnx01 kernel: 0000000000004800 0000000000000100 ffff8800bbb1de00 7fffffffffffffff
Dec 9 02:14:51 ceglnx01 kernel: Call Trace:
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff81231b4a>] ? cpumask_next_and+0x20/0x32
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff814527e9>] schedule_timeout+0x36/0xe7
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff8104541f>] ? __enqueue_entity+0x79/0x7b
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff81045684>] ? enqueue_entity+0x263/0x270
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff8104367e>] ? need_resched+0x23/0x2d
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff8145265b>] wait_for_common+0xb7/0x12c
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff8104cf2f>] ? default_wake_function+0x0/0x19
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff810448c9>] ? task_rq_unlock+0x11/0x13
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff81452773>] wait_for_completion+0x1d/0x1f
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff81056ec5>] do_fork+0x2d0/0x339
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff81456080>] ? do_page_fault+0x28a/0x299
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff8101923b>] sys_vfork+0x25/0x27
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff81012113>] stub_vfork+0x13/0x20
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff81011db2>] ? system_call_fastpath+0x16/0x1b
Dec 9 02:14:51 ceglnx01 kernel: INFO: task gdb:17069 blocked for more than 120 seconds.
Dec 9 02:14:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 9 02:14:51 ceglnx01 kernel: gdb D 0000000000000000 0 17069 17022 0x00000080
Dec 9 02:14:51 ceglnx01 kernel: ffff88008d93fd50 0000000000000082 0000000000000000 0000000000000202
Dec 9 02:14:51 ceglnx01 kernel: ffff88008d93c6c0 ffffffff81aa84c0 ffff88008d93ca90 000000054227a93b
Dec 9 02:14:51 ceglnx01 kernel: 0000000000004c00 0000000000000000 0000000000000000 7fffffffffffffff
Dec 9 02:14:51 ceglnx01 kernel: Call Trace:
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff814527e9>] schedule_timeout+0x36/0xe7
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff8104541f>] ? __enqueue_entity+0x79/0x7b
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff81045684>] ? enqueue_entity+0x263/0x270
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff8104367e>] ? need_resched+0x23/0x2d
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff8145265b>] wait_for_common+0xb7/0x12c
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff8104cf2f>] ? default_wake_function+0x0/0x19
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff810448c9>] ? task_rq_unlock+0x11/0x13
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff81452773>] wait_for_completion+0x1d/0x1f
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff81056ec5>] do_fork+0x2d0/0x339
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff81456080>] ? do_page_fault+0x28a/0x299
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff8101923b>] sys_vfork+0x25/0x27
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff81012113>] stub_vfork+0x13/0x20
Dec 9 02:14:51 ceglnx01 kernel: [<ffffffff81011db2>] ? system_call_fastpath+0x16/0x1b
Dec 9 02:16:51 ceglnx01 kernel: INFO: task gdb:17053 blocked for more than 120 seconds.
Dec 9 02:16:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 9 02:16:51 ceglnx01 kernel: gdb D ffff880427d52840 0 17053 17019 0x00000080
Dec 9 02:16:51 ceglnx01 kernel: ffff8800bbb1dd50 0000000000000086 ffff8800bbb1dd20 0000000000000202
Dec 9 02:16:51 ceglnx01 kernel: ffff880003f78800 ffff88008d93c6c0 ffff880003f78bd0 ffffffff81231b4a
Dec 9 02:16:51 ceglnx01 kernel: 0000000000004800 0000000000000100 ffff8800bbb1de00 7fffffffffffffff
Dec 9 02:16:51 ceglnx01 kernel: Call Trace:
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff81231b4a>] ? cpumask_next_and+0x20/0x32
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff814527e9>] schedule_timeout+0x36/0xe7
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff8104541f>] ? __enqueue_entity+0x79/0x7b
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff81045684>] ? enqueue_entity+0x263/0x270
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff8104367e>] ? need_resched+0x23/0x2d
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff8145265b>] wait_for_common+0xb7/0x12c
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff8104cf2f>] ? default_wake_function+0x0/0x19
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff810448c9>] ? task_rq_unlock+0x11/0x13
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff81452773>] wait_for_completion+0x1d/0x1f
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff81056ec5>] do_fork+0x2d0/0x339
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff81456080>] ? do_page_fault+0x28a/0x299
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff8101923b>] sys_vfork+0x25/0x27
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff81012113>] stub_vfork+0x13/0x20
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff81011db2>] ? system_call_fastpath+0x16/0x1b
Dec 9 02:16:51 ceglnx01 kernel: INFO: task gdb:17069 blocked for more than 120 seconds.
Dec 9 02:16:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 9 02:16:51 ceglnx01 kernel: gdb D 0000000000000000 0 17069 17022 0x00000080
Dec 9 02:16:51 ceglnx01 kernel: ffff88008d93fd50 0000000000000082 0000000000000000 0000000000000202
Dec 9 02:16:51 ceglnx01 kernel: ffff88008d93c6c0 ffffffff81aa84c0 ffff88008d93ca90 000000054227a93b
Dec 9 02:16:51 ceglnx01 kernel: 0000000000004c00 0000000000000000 0000000000000000 7fffffffffffffff
Dec 9 02:16:51 ceglnx01 kernel: Call Trace:
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff814527e9>] schedule_timeout+0x36/0xe7
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff8104541f>] ? __enqueue_entity+0x79/0x7b
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff81045684>] ? enqueue_entity+0x263/0x270
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff8104367e>] ? need_resched+0x23/0x2d
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff8145265b>] wait_for_common+0xb7/0x12c
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff8104cf2f>] ? default_wake_function+0x0/0x19
Dec 9 02:16:51 ceglnx01 kernel: [<ffffffff810448c9>] ? task_rq_unlock+0x11/0x13
然後在metalink文件 ID 549932.1中發現如下一些資訊
APPLIES TO:
Oracle Net Services - Version 10.2.0.1 to 11.1.0.7 [Release 10.2 to 11.1]
Generic UNIX
***Checked for relevance on 22-MAR-2013***
SYMPTOMS
· There may be heavy load on the CPU shooting up to 100%.
· The number of sessions in the database is well below the upper or maximum limit defined in the parameter file.
· The listener crashes suddenly during this heavy CPU load generating the core.
· (Optional) Listener.Ora has SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF.
Listener Log:
-------------
.....
.....
19-NOV-2007 13:40:49 * (CONNECT_DATA=(SID=ORAC)(CID=(PROGRAM=C:\pegasos\te\usys\bin\uniface.exe)(HOST=TERVI-NB179)(USER=kjokioja))) * (ADDRESS= (PROTOCOL=tcp)(HOST=10.12.152.5)(PORT=1670)) * establish * ORAC * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12571: TNS:packet writer failure
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
Linux Error: 104: Connection reset by peer
19-NOV-2007 13:40:49 * (CONNECT_DATA=(SID=ORAC)(CID=(PROGRAM=C:\pegaos\te\usys\bin\uniface.exe)(HOST=TERVI-0184A)(USER=paitasal))) * (ADDRESS=(PROTO COL=tcp)(HOST=10.12.176.136)(PORT=1574)) * establish * ORAC * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Linux Error: 32: Broken pipe
.........
........
The Operating system log (/var/log/messages) may show the following :
tnslsnr[5841]: segfault at 0000000000000018 rip 0000003eab66854d rsp 0000007fbfff9230 error 4
tnslsnr[6469]: segfault at 0000000000000018 rip 0000003eab66854d rsp 0000007fbfff9420 error 4
tnslsnr[7375]: segfault at 0000000000000018 rip 0000003eab668bb3 rsp 0000007fbfff9c70 error 4
Gdb on the core file generated shows the following stack :
#gdb /home/oracle/db_1/bi/tnslsnr core.7375 Core was generated by `/opt/oracle/ora102/bin/tnslsnr LISTENER -inherit'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /opt/oracle/ora102/lib/libclntsh.so.10.1...(no debugging symbols found)...done.
........
.........
(gdb) bt
#0 0x00000032b7468bb3 in _int_free () from /lib64/tls/libc.so.6
#1 0x00000032b74691f6 in free () from /lib64/tls/libc.so.6
#2 0x000000000040f01f in nsglhe ()
#3 0x00000000004114e9 in nsglma ()
#4 0x00000000004061cb in main ()
(gdb) where
#0 0x00000032b7468bb3 in _int_free () from /lib64/tls/libc.so.6
#1 0x00000032b74691f6 in free () from /lib64/tls/libc.so.6
#2 0x000000000040f01f in nsglhe ()
#3 0x00000000004114e9 in nsglma ()
#4 0x00000000004061cb in main ()
The core indicates that the program terminated with signal 11, Segmentation fault .
SIGSEGV is reported for improper memory handling .The default action for a program upon receiving
SIGSEGV is abnormal termination. This action will end the process.
The virtual memory stats (vmstat) output may show huge paging/swapping activity .
For Example:
-------------
$vmstat 5 7 The si and so column under the swap section suggests the amount of memory swapped in from disk (/s)and amount of memory swapped to disk (/s) respectively.
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
2 0 0 27124 3940 1160 74528 0 1304 956 5360 272 191 51 21 98
3 0 0 27080 3756 1180 72580 0 260 2552 388 218 429 90 10 89
14 2 1 26808 5096 1188 69868 84 1108 2016 9064 490 567 59 22 96
4 0 0 25548 5912 1192 73032 0 0 436 0 478 736 50 50 0
5 0 0 25548 3940 1192 73548 0 0 1560 0 301 385 93 7 0
1 1 1 25548 3336 1192 71800 8 176 2848 432 258 147 91 9 88
1 0 0 25544 4124 1200 70480 116 60 836 60 171 200 97 3 92
Note: You may also use the top command to check the system memory usage.
CAUSE
Extensive paging/swapping activity is a clear indication that the system is running out of the physical memory.
SOLUTION
Solution :
1. Increase the physical memory of the system.
OR
2. Apply the Patch 6139856 for unpublished Bug 6139856 if available for your platform.
OR
3. Configure Hugepages on the OS. Ref : Note 361323.1
A Note on HugePages:
Hugepages is a mechanism that allows the Linux kernel to utilise the multiple page size capabilities of modern hardware architectures. Its important to know that utilising a huge amount of physical memory with the default page size consumes the TLB and adds processing overhead. The Linux kernel is able to set aside a portion of physical memory to be able be addressed using a larger page size. Since the page size is higher, there will be less overhead managing the pages with the TLB. In the Linux 2.6 series of kernels, hugepages is enabled using the CONFIG_HUGETLB_PAGE feature when the kernel is built. Systems with large amount of memory can be configured to utilise the memory more efficiently by setting aside a portion dedicated for hugepages. The actual size of the page is dependent on the system architecture. A typical x86 system will have a Huge Page Size of 2048 kBytes. The huge page size may be found by looking at the /proc/meminfo :
# cat /proc/meminfo |grep Hugepagesize Hugepagesize: 2048 kB
For more on configuring Hugepages please refer Note 361323.1
REFERENCES
BUG:6752308 - LISTENER DIED BY SEGFAULT AFTER TNS ERROR
NOTE:361323.1 - HugePages on Linux: What It Is... and What It Is Not...
如上官方文件所示,有三個解決方案:
1: 增加系統實體記憶體
2: 打補丁Patch 6139856
3: 啟用HugePage設定
鑑於當時作業系統有100多天沒有重啟過了,於是在2014-12-12 23:00重啟了一下Linux伺服器,到目前為止已經執行了3天,暫時沒有出現這個錯誤。因為有可能一些記憶體洩露也會導致記憶體資源不足情況,例如,TNSListener Leaking Memory Using Dedicated Server (文件 ID 785742.1)。所以暫時沒有實施上面方案2、3、想執行一段時間,驗證一下自己的想法,如果還是出現這個錯誤,則嘗試方案2、3、
[root@ceglnx01 ~]# who -b
system boot 2014-12-12 23:11
從另外一方面分析,在凌晨2點過幾分出現這個錯誤,是因為,有兩個比較大的作業在此時執行。消耗的伺服器資源比較大,所以也從側面驗證了記憶體資源不足。
另外,在網上看到listener.log日誌檔案大於2G時,也可能出現這個問題。所以有時候也需要排查“監聽日誌檔案過大可能也是導致監聽器自己服務宕掉”。
http://blog.csdn.net/killvoon/article/details/49420485
oracle 10.2.0.4
一生產系統監聽異常停止了,listener.log中報出如下錯誤:
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Linux Error: 32: Broken pipe
並且作業系統日誌/var/log/messages中丟擲類似如下錯誤:
tnslsnr[5841]: segfault at 0000000000000018 rip 0000003eab66854d rsp 0000007fbfff9230 error 4
在metalink上有這篇文件:549932.1
版本:
Oracle Net Services - Version 10.2.0.1 to 11.1.0.7 [Release 10.2 to 11.1]
Generic UNIX
***Checked for relevance on 22-MAR-2013***
問題現象:
- There may be heavy load on the CPU shooting up to 100%.
- The number of sessions in the database is well below the upper or maximum limit defined in the parameter file.
- The listener crashes suddenly during this heavy CPU load generating the core.
- (Optional) Listener.Ora has SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF.
Extensive paging/swapping activity is a clear indication that the system is running out of the physical memory.
解決方法:
1. Increase the physical memory of the system.
OR
2. Apply the for unpublished Bug 6139856 if available for your platform.
OR
3. Configure Hugepages on the OS. Ref : Note 361323.1
--------------------------------------------------------------------------------------------------------------算是oracle bug問題了,當作業系統實體記憶體不足,swap/page 耗盡,將會導致listener異常崩潰。
而且從作業系統日誌中,可以看到linux自己kill 程式的資訊(由於事後總結,且資訊在內網內,許可權有限,貼不出日誌內容)。
所以我的理解就是,當作業系統實體記憶體居高不下,作業系統會自己殺掉一些他認為的空閒程式之類,而不巧,殺掉的恰好是oracle的監聽程式,
從而導致監聽異常崩潰。
之所以說恰好是監聽程式,是因為在/var/log/messages中,看到之前也有殺掉oracle程式的資訊,但當時監聽並未停掉,所以懷疑當時殺掉的並不是oracle監聽程式,可能是其他非本地程式。
我的解決參考:http://www.itpub.net/thread-1870217-1-1.html
論壇各位大拿,幫幫忙吧。
問題描述:
伺服器斷電重啟後客戶端無法登陸。資料庫本地 sqlplus / as sysdba可以登入。
C:\Users\duchanglei>sqlplus
SQL*Plus: Release 11.2.0.1.0 Production on Tue Jun 3 16:42:05 2014
Copyright (c) 1982, 2010, Oracle. All rights reserved.
ERROR:
ORA-12537: TNS:connection closed
環境:資料庫
[oracle@jkxtrac1 bin]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Tue Jun 3 16:46:12 2014
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
環境:作業系統
[root@jkxtrac1 log]# lsb_release -a
LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 5.6 (Tikanga)
Release: 5.6
Codename: Tikanga
監聽日誌:
<msg time='2014-06-03T16:48:57.448+08:00' org_id='oracle' comp_id='tnslsnr'
type='UNKNOWN' level='16' host_id='jkxtrac1'
host_addr='10.199.102.21'>
<txt>TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Linux Error: 32: Broken pipe
</txt>
</msg>
alert日誌
沒有相關資訊,沒有報任何錯誤。
作業系統日誌:
[root@jkxtrac1 log]# tail -n50 messages
Jun 3 14:27:42 jkxtrac1 avahi-daemon[8081]: Registering new address record for 10.199.102.23 on eth0.
Jun 3 14:27:42 jkxtrac1 avahi-daemon[8081]: Withdrawing address record for 10.199.102.23 on eth0.
Jun 3 14:27:42 jkxtrac1 avahi-daemon[8081]: Registering new address record for 10.199.102.23 on eth0.
Jun 3 14:27:42 jkxtrac1 avahi-daemon[8081]: Withdrawing address record for 10.199.102.23 on eth0.
Jun 3 14:27:42 jkxtrac1 avahi-daemon[8081]: Registering new address record for 10.199.102.23 on eth0.
Jun 3 14:28:56 jkxtrac1 gconfd (root-10454): 正在啟動(版本 2.14.0),pid 10454 使用者“root”
Jun 3 14:28:56 jkxtrac1 gconfd (root-10454): 地址“xml:readonly:/etc/gconf/gconf.xml.mandatory”解析為位於 0 的只讀配置源
Jun 3 14:28:56 jkxtrac1 gconfd (root-10454): 地址“xml:readwrite:/root/.gconf”解析為位於 1 的可寫入配置源
Jun 3 14:28:56 jkxtrac1 gconfd (root-10454): 地址“xml:readonly:/etc/gconf/gconf.xml.defaults”解析為位於 2 的只讀配置源
Jun 3 14:28:58 jkxtrac1 gconfd (root-10454): 地址“xml:readwrite:/root/.gconf”解析為位於 0 的可寫入配置源
Jun 3 14:28:58 jkxtrac1 hcid[7308]: Default passkey agent (:1.6, /org/bluez/applet) registered
Jun 3 14:28:58 jkxtrac1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found
Jun 3 14:28:58 jkxtrac1 last message repeated 2 times
Jun 3 14:28:59 jkxtrac1 nm-system-settings: Loaded plugin ifcfg-rh: (c) 2007 - 2008 Red Hat, Inc. To report bugs please use the NetworkManager mailing list.
Jun 3 14:28:59 jkxtrac1 nm-system-settings: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-lo ...
Jun 3 14:28:59 jkxtrac1 nm-system-settings: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth1 ...
Jun 3 14:28:59 jkxtrac1 nm-system-settings: ifcfg-rh: read connection 'System eth1'
Jun 3 14:28:59 jkxtrac1 nm-system-settings: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-usb0 ...
Jun 3 14:28:59 jkxtrac1 nm-system-settings: ifcfg-rh: read connection 'System usb0'
Jun 3 14:28:59 jkxtrac1 nm-system-settings: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth0 ...
Jun 3 14:28:59 jkxtrac1 nm-system-settings: ifcfg-rh: read connection 'System eth0'
Jun 3 14:29:01 jkxtrac1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found
Jun 3 14:39:53 jkxtrac1 kernel: qla2xxx 0000:8b:00.1: scsi(4:1:3): Abort command issued -- 1 1fe 2002.
Jun 3 14:53:04 jkxtrac1 kernel: qla2xxx 0000:8b:00.1: scsi(4:1:3): Abort command issued -- 1 34f 2002.
Jun 3 15:00:18 jkxtrac1 CLSD: The clock on host jkxtrac1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Jun 3 15:32:43 jkxtrac1 CLSD: The clock on host jkxtrac1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Jun 3 15:49:19 jkxtrac1 scim-bridge: The lockfile is destroied
Jun 3 15:49:19 jkxtrac1 scim-bridge: Cleanup, done. Exitting...
Jun 3 16:03:49 jkxtrac1 CLSD: The clock on host jkxtrac1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Jun 3 16:14:00 jkxtrac1 kernel: qla2xxx 0000:8b:00.1: scsi(4:1:3): Abort command issued -- 1 cd7 2002.
Jun 3 16:28:59 jkxtrac1 scim-bridge: Failed to open the panel socket
Jun 3 16:29:00 jkxtrac1 scim-bridge: Failed to open the panel socket
Jun 3 16:29:00 jkxtrac1 scim-bridge: Panel client has not yet been prepared
Jun 3 16:29:00 jkxtrac1 last message repeated 3 times
Jun 3 16:29:00 jkxtrac1 scim-bridge: The lockfile is destroied
Jun 3 16:29:00 jkxtrac1 scim-bridge: Cleanup, done. Exitting...
Jun 3 16:29:00 jkxtrac1 Cleanup, done. Exitting...
Jun 3 16:29:27 jkxtrac1 gconfd (root-10454): GConf 伺服器不在使用,正在關閉。
Jun 3 16:29:27 jkxtrac1 gconfd (root-10454): 退出
Jun 3 16:30:43 jkxtrac1 kernel: scim-launcher[10521] trap stack segment rip:2b38d4ea70f1 rsp:7fffacbf26f0 error:0
Jun 3 16:32:13 jkxtrac1 gconfd (root-1975): 正在啟動(版本 2.14.0),pid 1975 使用者“root”
Jun 3 16:32:13 jkxtrac1 gconfd (root-1975): 地址“xml:readonly:/etc/gconf/gconf.xml.mandatory”解析為位於 0 的只讀配置源
Jun 3 16:32:13 jkxtrac1 gconfd (root-1975): 地址“xml:readwrite:/root/.gconf”解析為位於 1 的可寫入配置源
Jun 3 16:32:13 jkxtrac1 gconfd (root-1975): 地址“xml:readonly:/etc/gconf/gconf.xml.defaults”解析為位於 2 的只讀配置源
Jun 3 16:32:14 jkxtrac1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found
Jun 3 16:32:14 jkxtrac1 last message repeated 2 times
Jun 3 16:32:14 jkxtrac1 gconfd (root-1975): 地址“xml:readwrite:/root/.gconf”解析為位於 0 的可寫入配置源
Jun 3 16:32:14 jkxtrac1 hcid[7308]: Default passkey agent (:1.18, /org/bluez/applet) registered
Jun 3 16:32:17 jkxtrac1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found
Jun 3 16:34:54 jkxtrac1 CLSD: The clock on host jkxtrac1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
網路上說資源問題,我的資源很多啊:
top - 16:53:12 up 2:28, 2 users, load average: 0.21, 0.29, 0.27
Tasks: 665 total, 2 running, 662 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.7%us, 0.1%sy, 0.0%ni, 99.0%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32889308k total, 5147440k used, 27741868k free, 181984k buffers
Swap: 34933332k total, 0k used, 34933332k free, 3341088k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2154 root 15 0 328m 19m 10m R 20.7 0.1 0:35.01 gnome-terminal
471 root 10 -5 0 0 0 S 0.9 0.0 0:36.27 kacpid
5947 root 15 0 13160 1568 828 R 0.7 0.0 0:00.58 top
8764 grid 15 0 245m 31m 12m S 0.7 0.1 0:15.21 gipcd.bin
9342 grid 15 0 1323m 57m 40m S 0.7 0.2 0:48.72 oracle
7476 root 15 0 3824 600 496 S 0.5 0.0 0:06.27 acpid
8723 grid 15 0 306m 37m 16m S 0.5 0.1 0:29.02 oraagent.bin
8763 root 16 0 292m 35m 15m S 0.5 0.1 0:35.45 orarootagent.bi
9615 root 15 0 272m 25m 14m S 0.5 0.1 0:42.74 orarootagent.bi
7036 root 10 -5 0 0 0 S 0.2 0.0 0:07.41 kondemand/9
7042 root 10 -5 0 0 0 S 0.2 0.0 0:03.46 kondemand/15
7050 root 10 -5 0 0 0 S 0.2 0.0 0:00.06 kondemand/23
8750 grid 15 0 174m 25m 11m S 0.2 0.1 0:04.80 gpnpd.bin
8798 root RT 0 249m 92m 56m S 0.2 0.3 0:04.35 cssdmonitor
9019 root 17 0 241m 22m 10m S 0.2 0.1 0:02.35 octssd.bin
9346 grid 15 0 1314m 30m 21m S 0.2 0.1 0:02.10 oracle
30771 oracle 15 0 12.8g 51m 36m S 0.2 0.2 0:09.80 oracle
30955 oracle 15 0 12.8g 42m 39m S 0.2 0.1 0:01.83 oracle
SQL> select * from v$resource_limit;
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
------------------------------ ------------------- --------------- ---------------------------------------- ----------------------------------------
processes 51 54 1000 1000
sessions 59 64 1536 1536
enqueue_locks 39 52 19523 19523
enqueue_resources 36 36 6976 UNLIMITED
ges_procs 47 49 1001 1001
ges_ress 9448 9448 32571 UNLIMITED
ges_locks 4771 5092 49133 UNLIMITED
ges_cache_ress 1649 1649 0 UNLIMITED
ges_reg_msgs 112 134 2730 UNLIMITED
ges_big_msgs 35 78 2730 UNLIMITED
ges_rsv_msgs 0 0 1000 1000
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
------------------------------ ------------------- --------------- ---------------------------------------- ----------------------------------------
gcs_resources 36726 36726 UNLIMITED UNLIMITED
gcs_shadows 36561 36563 UNLIMITED UNLIMITED
smartio_overhead_memory 0 0 0 UNLIMITED
smartio_buffer_memory 0 0 0 UNLIMITED
smartio_metadata_memory 0 0 0 UNLIMITED
smartio_sessions 0 0 0 UNLIMITED
dml_locks 0 0 6756 UNLIMITED
temporary_table_locks 0 0 UNLIMITED UNLIMITED
transactions 0 0 1689 UNLIMITED
branches 0 0 1689 UNLIMITED
cmtcallbk 0 2 1689 UNLIMITED
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
------------------------------ ------------------- --------------- ---------------------------------------- ----------------------------------------
max_rollback_segments 11 11 1689 65535
sort_segment_locks 0 1 UNLIMITED UNLIMITED
k2q_locks 0 0 3072 UNLIMITED
max_shared_servers 1 1 UNLIMITED UNLIMITED
parallel_max_servers 1 1 970 3600
27 rows selected.
重啟資料庫伺服器2次,也沒弄好、。
問題解決,結貼。 方法: 1、修改oracle.exe的許可權為6751; [oracle@jkxtrac1 bin]$ ls -l ./oracle -rwxr-s--x 1 oracle asmadmin 239627031 Jan 21 18:59 ./oracle [oracle@jkxtrac1 bin]$ ls -l oracle -rwxr-s--x 1 oracle asmadmin 239627031 Jan 21 18:59 oracle [oracle@jkxtrac1 bin]$ chmod 6751 oracle [oracle@jkxtrac1 bin]$ ls -l oracle -rwsr-x--x 1 oracle asmadmin 239627031 Jan 21 18:59 oracle 2、重新用setasmgidwrap設定oracle.exe [root@jkxtrac1 ~]# cd /data/app/11.2.0/grid_1/bin/ [root@jkxtrac1 bin]# ./setasmgidwrap o=/home/oracle/app/oracle/product/11.2.0/dbhome_1/bin/oracle 感謝各位朋友的幫助;; |
About Me
...............................................................................................................................
● 本文作者:小麥苗,只專注於資料庫的技術,更注重技術的運用
● 本文在itpub(http://blog.itpub.net/26736162)、部落格園(http://www.cnblogs.com/lhrbest)和個人微信公眾號(xiaomaimiaolhr)上有同步更新
● 本文itpub地址:http://blog.itpub.net/26736162/viewspace-2135468/
● 本文部落格園地址:http://www.cnblogs.com/lhrbest/p/6559092.html
● 本文pdf版及小麥苗雲盤地址:http://blog.itpub.net/26736162/viewspace-1624453/
● QQ群:230161599 微信群:私聊
● 聯絡我請加QQ好友(642808185),註明新增緣由
● 於 2017-03-16 08:00 ~ 2017-03-16 19:00 在泰興公寓完成
● 文章內容來源於小麥苗的學習筆記,部分整理自網路,若有侵權或不當之處還請諒解
● 版權所有,歡迎分享本文,轉載請保留出處
...............................................................................................................................
拿起手機使用微信客戶端掃描下邊的左邊圖片來關注小麥苗的微信公眾號:xiaomaimiaolhr,掃描右邊的二維碼加入小麥苗的QQ群,學習最實用的資料庫技術。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26736162/viewspace-2135468/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Broken pipe
- java.net.SocketException: Broken pipeJavaException
- rac的vip和監聽莫名故障
- 【故障|監聽】TNS-12545、TNS-12560、TNS-00515和Linux Error: 111LinuxError
- 解決ssh的"Write failed: Broken pipe"問題AI
- Kubelet 錯誤日誌 broken pipe 和 connection reset by peer 的原因分析
- RedisClient 報出java.net.SocketException: Broken pipe異常RedisclientJavaException
- 資料庫監聽夯故障分析資料庫
- 動態監聽和靜態監聽
- Oracle 修改預設監聽埠故障分析Oracle
- Oracle RAC Database 11.1.0.6監聽故障案例OracleDatabase
- oracle靜態監聽和動態監聽Oracle
- TNS-12518: TNS: 監聽程式無法分發客戶機連線
- grid監聽故障,本地地址不能連線
- oracle 11.2.0.3.6升級故障---監聽報錯Oracle
- 【OSX】解決Terminal ssh連線"Write failed Broken pipe"問題AI
- swift 訊息監聽和鍵值監聽(kvo)Swift
- Linux用inotify監聽檔案和目錄Linux
- 屬性和監聽
- Oracle 10g 兩個監聽程式的故障Oracle 10g
- org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipeApacheclientExceptionJava
- 【listener】oracle靜態監聽和動態監聽 【轉載】Oracle
- [android]MonkeyRunner“java.net.SocketException: Broken pipe”錯誤解決辦法AndroidJavaException
- Linux pipe功能Linux
- 事件和事件監聽器事件
- Vue3.0的遞迴監聽和非遞迴監聽Vue遞迴
- TNS-12547TNS-12560 TNS-00517 Linux Error: 104: Connection reset by peerLinuxError
- 監聽乙太網(四) Packet32函式SDK (轉)函式
- 監聽器和過濾器過濾器
- Spring Boot 事件和監聽Spring Boot事件
- oracle例項和監聽器Oracle
- 監聽 watch props物件屬性監聽 或深度監聽物件
- Linux中的pipe(管道)與named pipe(FIFO 命名管道)Linux
- 動態監聽與靜態監聽
- 在rhel5上啟動監聽器lsnrctl start報錯_Linux Error: 113: No route to hostLinuxError
- Flutter - 生命週期監聽和管理Flutter
- nginx建立和監聽套接字分析Nginx
- ASM例項配置監聽和TNSASM