oracle不定期的出現當機的問題診斷
環境:
AIX5L+HACMP+ORACLE9I
分析alert,沒有發現錯誤:
Fri Apr 27 17:08:05 2007
Thread 1 advanced to log sequence 612
Current log# 2 seq# 612 mem# 0: /oracle2/oracle/oradata/mssdb2/redo02.log
Fri Apr 27 17:08:05 2007
ARC0: Evaluating archive log 1 thread 1 sequence 611
ARC0: Beginning to archive log 1 thread 1 sequence 611
Creating archive destination LOG_ARCHIVE_DEST_1: '/oracle2/oracle/oradata/oraclearchivelog/1_611.dbf'
ARC0: Completed archiving log 1 thread 1 sequence 611
Fri Apr 27 17:18:52 2007
Shutting down instance: further logons disabled
Shutting down instance (immediate)
License high water mark = 117
Waiting for dispatcher 'D000' to shutdown
All dispatchers and shared servers shutdown
Fri Apr 27 17:20:49 2007
ALTER DATABASE CLOSE NORMAL
Fri Apr 27 17:20:49 2007
SMON: disabling tx recovery
SMON: disabling cache recovery
Fri Apr 27 17:20:49 2007
Shutting down archive processes
Archiving is disabled
Fri Apr 27 17:20:49 2007
ARCH shutting down
Fri Apr 27 17:20:49 2007
ARCH shutting down
Fri Apr 27 17:20:49 2007
ARC1: Archival stopped
Fri Apr 27 17:20:49 2007
ARC0: Archival stopped
Fri Apr 27 17:20:49 2007
Thread 1 closed at log sequence 612
Successful close of redo thread 1.
Fri Apr 27 17:20:50 2007
Completed: ALTER DATABASE CLOSE NORMAL
Fri Apr 27 17:20:50 2007
ALTER DATABASE DISMOUNT
Completed: ALTER DATABASE DISMOUNT
ARCH: Archiving is disabled
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
ARCH: Archiving is disabled
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
Fri Apr 27 17:22:39 2007
檢查hacmp的日誌,發現出錯時刻hacmp正在執行切換:
hacmp的out檔案:
:server_restart[144] :server_restart[144] odmget -qmonitor=ora2_db and name=NOTIFY_METHOD HACMPmonitor
:server_restart[144] cut -d" -f2
:server_restart[144] grep value =
NOTIFY=
:server_restart[146] :server_restart[146] cut -d -f1
:server_restart[146] echo
NOTIFY_SCRIPT=
:server_restart[146] [[ -n ]]
:server_restart[162] :server_restart[162] odmget -qmonitor=ora2_db and name=CLEANUP_METHOD HACMPmonitor
:server_restart[162] cut -d" -f2
:server_restart[162] egrep value =
CLEANUP=/hacmpscript/ora2stop-db.sh
:server_restart[163] :server_restart[163] cut -d -f1
:server_restart[163] echo /hacmpscript/ora2stop-db.sh
CLEANUP_SCRIPT=/hacmpscript/ora2stop-db.sh
:server_restart[163] [[ -n /hacmpscript/ora2stop-db.sh ]]
:server_restart[163] [[ -x /hacmpscript/ora2stop-db.sh ]]
:server_restart[167] dspmsg scripts.cat 9334 server_restart: Calling user specified cleanup method /hacmpscript/ora2stop-db.shn server_restart /hacmpscript/ora2stop-db.sh
server_restart: Calling user specified cleanup method /hacmpscript/ora2stop-db.sh
:server_restart[168] /hacmpscript/ora2stop-db.sh
:server_restart[168] ODMDIR=/etc/objrepos
SQL*Plus: Release 9.2.0.1.0 - Production on Wed May 9 16:26:13 2007
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SQL> Connected.
SQL> Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> Disconnected from Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
檢查hacmp的test指令碼,沒有發現問題:
test-db-mssdb2.sh的內容:
#!/bin/sh
#
# Copyright (c) MDCL Inc. All Rights Reserved.
# Created by Li Han Guang 2006.01.16
#
# Should return 0 if the service was start succesfully; otherwise,
# return non-zero
#
# check the following process:
# ora_smon_$sid
# ora_pmon_$sid
# ora_dbw0_$sid
# ora_lgwr_$sid
# ora_reco_$sid
# database connection
ORACLE_SID="mssdb2"
ORACLE_HOME=/oracle2/oracle/product/9.2.0
ORA_USER="oracle2"
#ISQL=svrmgrl
ISQL="sqlplus /nolog"
LOG_FILE="/tmp/test_mssdb2.log"
#
# log messages, need var $LOG_FILE
#
log()
{
echo [`date +"%Y-%m-%d %H:%M:%S"`] $1 >> $LOG_FILE
echo [`date +"%Y-%m-%d %H:%M:%S"`] $1
}
get_status()
{
str=$1
# ps -ef | cut -c52- | grep $str | grep -v grep > /dev/null 2>&1
ps -ef | grep $str | grep -v grep > /dev/null 2>&1
STATUS=$?
case $STATUS in
0) return 0;;
*) return 1;;
esac
}
#
#checks for smon process
#
check_smon()
{
sid=$1
get_status ora_smon_$sid
return $?
}
#
#checks for pmon process
#
check_pmon()
{
sid=$1
get_status ora_pmon_$sid
return $?
}
#
#checks for dbw0 process
#
check_dbw0()
{
sid=$1
get_status ora_dbw0_$sid
return $?
}
#checks for lgwr process
#
check_lgwr()
{
sid=$1
get_status ora_lgwr_$sid
status=$?
return ${status}
}
#
#checks for reco process
#
check_reco()
{
sid=$1
get_status ora_reco_$sid
return $?
}
####################################
# start to check oracle processes
####################################
#checks for database connection
#
su - $ORA_USER -c "${ISQL} > /tmp/$$.out 2> /tmp/$$.err " <
select sysdate from dual;
exit;
EOF
grep SYSDATE /tmp/$$.out >/dev/null 2>&1
if test "$?" -ne 0
then
# cat /tmp/$$.*
log "ORACLE: Cannot connect to database for ORACLE_SID=${ORACLE_SID}."
exit 1
else
log "Oracle instance ${ORACLE_SID} can be connected"
fi
rm -f /tmp/$$.*
check_smon ${ORACLE_SID}
if test "$?" -ne 0
then
log "ORACLE: ora_smon_${ORACLE_SID} is NOT running."
exit 1
else
log "ORACLE: ora_smon_${ORACLE_SID} is running"
fi
check_pmon ${ORACLE_SID}
if test "$?" -ne 0
then
log "ORACLE: ora_pmon_${ORACLE_SID} is NOT running."
exit 1
else
log "ORACLE: ora_pmon_${ORACLE_SID} is running"
fi
check_dbw0 ${ORACLE_SID}
if test "$?" -ne 0
then
log "ORACLE: ora_dbw0_${ORACLE_SID} is NOT running."
exit 1
else
log "ORACLE: ora_dbw0_${ORACLE_SID} is running"
fi
check_lgwr ${ORACLE_SID}
if test "$?" -ne 0
then
log "ORACLE: ora_lgwr_${ORACLE_SID} is NOT running."
exit 1
else
log "ORACLE: ora_lgwr_${ORACLE_SID} is running"
fi
check_reco ${ORACLE_SID}
if test "$?" -ne 0
then
log "ORACLE: ora_reco_${ORACLE_SID} is NOT running."
exit 1
else
log "ORACLE: ora_reco_${ORACLE_SID} is running"
fi
exit 0
根據hacmp的test指令碼檢查test指令碼對應的日誌778500.out發現連線資料庫存在問題:
SQL*Plus: Release 9.2.0.1.0 - Production on Fri Apr 27 17:12:32 2007
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SQL> ERROR:
ORA-12540: TNS:internal limit restriction exceeded
SQL> SP2-0640: Not connected
SQL>
繼續檢查ORACLE資料庫的監聽日誌LISTERNER.LOG,發現如下錯誤:
27-APR-2007 17:10:33 * (CONNECT_DATA=(SID=mssdb2)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.192.139.194)(PORT=40707)) * establish * mssdb2 * 12500
TNS-12500: TNS:listener failed to start a dedicated server process
TNS-12540: TNS:internal limit restriction exceeded
TNS-12560: TNS:protocol adapter error
TNS-00510: Internal limit restriction exceeded
IBM/AIX RISC System/6000 Error: 11: Resource temporarily unavailable
27-APR-2007 17:12:25 * 12502
TNS-12502: TNS:listener received no CONNECT_DATA from client
No longer listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.192.139.202)(PORT=1521)))
27-APR-2007 17:14:43 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=Unicom_ora2)(USER=oracle2))(COMMAND=stop)(ARGUMENTS=64)(SERVICE=LISTENER)(VERSION=153092352)) * stop * 0
TNSLSNR for IBM/AIX RISC System/6000: Version 9.2.0.1.0 - Production on 27-APR-2007 17:22:47
Copyright (c) 1991, 2002, Oracle Corporation. All rights reserved.
最後分析該問題引起的可能原因:
一、 AIX作業系統引數maxuproc:
執行相應的檢查,發現該引數使用的預設值,需要調整:
$ lsattr -El sys0|grep maxuproc
maxuproc 128 Maximum number of PROCESSES allowed per user True
二、 ORACLE資料庫的引數PROCESSES和PGA_AGGREGATE_TARGET
執行相應的檢查,發現該2個引數都為預設值,需要調整:
SQL> show parameter processes
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 1
db_writer_processes integer 1
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 150
SQL> show parameter pga
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 25165824
彙總建議,提交客戶。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7916042/viewspace-915337/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- GC機制和OutOfMemory問題的診斷GC
- Oracle Stream實戰(10)—問題診斷Oracle
- Oracle效能問題診斷一例Oracle
- GC BUFFER BUSY問題的診斷GC
- SQL問題診斷SQL
- 診斷Oracle資料庫Hanging問題Oracle資料庫
- 診斷oracle clusterware問題,需要收集並上傳的檔案Oracle
- 在Oracle10g中診斷效能問題Oracle
- MySQL資料庫診斷:InnoDB關機問題MySql資料庫
- 操作oracle出現的問題Oracle
- GreysJava線上問題診斷工具Java
- 問題診斷和PLSQL方面SQL
- 一次gc buffer busy問題的診斷GC
- 一次網路問題的診斷(二)
- J2EE效能問題的診斷示例
- oracle安裝出現的問題Oracle
- Xcode斷點除錯出現的問題XCode斷點除錯
- 使用crsctl工具診斷cluster問題
- Oracle中診斷阻塞的sessionOracleSession
- ORACLE診斷事件的總結Oracle事件
- Oracle透過AWR的SQL ordered by Gets和SQL ordered by Reads診斷問題OracleSQL
- 如何診斷oracle資料庫執行緩慢或hang住的問題Oracle資料庫
- 表空間檢測異常的問題診斷
- ORACLE診斷案例Oracle
- Oracle故障診斷Oracle
- ORACLE診斷事件Oracle事件
- 使用MTR命令診斷網路問題
- RAC系統的問題診斷最佳實踐,及常見問題分析
- JProfiler for Mac:提升效能和診斷問題的終極工具Mac
- 使用strace診斷奇怪的sqlplus登入問題SQL
- 使用pt-stalk診斷MySQL的間歇性問題MySql
- 診斷和解決CPU利用率高的問題(zt)
- oracle 效能診斷工具Oracle
- Oracle診斷事件列表Oracle事件
- ORACLE診斷事件(zt)Oracle事件
- 如何診斷和解決db2問題DB2
- Timesten問題診斷手冊總結
- bea記憶體洩漏問題診斷記憶體