一次Oracle診斷案例-SGA與Swap

yingyifeng306發表於2021-06-17

Oracle 診斷案例 -SGA Swap

 

案例描述 :

 

使用者報告,伺服器啟動一段時間以後,無法建立資料庫連線

重新啟動幾分鐘以後,再次無法連線

 

系統無法正常使用 .

 

1. 登陸系統

 

SunOS 5.8

 

login: root

Password:

Last login: Tue Mar 23 13:56:59 from 172.16.31.41

Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001

You have new mail.

 

2. su Oracle 使用者

檢查啟動的 Oracle 程式

 

發現後臺程式正常,有一定量的使用者連線

 

wapplatform:/>su - oracle

Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001

You have new mail.

/export/home1/oracle>ls

admin codesyndealt31 exp.sh local.cshrc local.profile oraclebak oui v6_database

app exp.log jre local.login nsmail oradata swan

export/home1/oracle>cd admin

/export/home1/oracle/admin>ps -ef|grep ora

oracle 25269 25258 0 13:58:36 pts/3 0:00 grep ora

oracle 25257 24906 0 13:58:31 pts/4 0:00 vi alert_HSWAPDB.log

oracle 25267 1 1 13:58:34 ? 0:00 oracleHSWAPDB (LOCAL=NO)

oracle 25184 1 0 13:56:57 ? 0:00 ora_p007_HSWAPDB

oracle 25182 1 0 13:56:57 ? 0:00 ora_p006_HSWAPDB

oracle 25193 1 0 13:57:03 ? 0:01 oracleHSWAPDB (LOCAL=NO)

oracle 25209 1 0 13:57:09 ? 0:00 oracleHSWAPDB (LOCAL=NO)

oracle 25176 1 0 13:56:57 ? 0:00 ora_p003_HSWAPDB

oracle 25180 1 0 13:56:57 ? 0:00 ora_p005_HSWAPDB

oracle 25172 1 0 13:56:56 ? 0:00 ora_p001_HSWAPDB

oracle 25178 1 0 13:56:57 ? 0:00 ora_p004_HSWAPDB

oracle 25170 1 0 13:56:56 ? 0:00 ora_p000_HSWAPDB

oracle 24254 24240 0 12:08:25 pts/2 0:00 -ksh

oracle 25174 1 0 13:56:56 ? 0:00 ora_p002_HSWAPDB

oracle 25244 1 1 13:58:23 ? 0:00 oracleHSWAPDB (LOCAL=NO)

oracle 25218 1 0 13:57:23 ? 0:00 oracleHSWAPDB (LOCAL=NO)

oracle 25159 1 0 13:56:42 ? 0:02 ora_qmn0_HSWAPDB

oracle 25230 1 0 13:57:40 ? 0:01 oracleHSWAPDB (LOCAL=NO)

oracle 25161 1 0 13:56:42 ? 0:00 ora_s000_HSWAPDB

oracle 25149 1 0 13:56:41 ? 0:01 ora_lgwr_HSWAPDB

oracle 25157 1 0 13:56:42 ? 0:00 ora_cjq0_HSWAPDB

oracle 24906 3698 0 13:47:47 pts/4 0:00 -ksh

oracle 25153 1 0 13:56:42 ? 0:01 ora_smon_HSWAPDB

oracle 25058 7464 0 13:55:14 pts/1 0:00 -ksh

oracle 25163 1 0 13:56:42 ? 0:00 ora_d000_HSWAPDB

oracle 25155 1 0 13:56:42 ? 0:00 ora_reco_HSWAPDB

oracle 25151 1 0 13:56:41 ? 0:00 ora_ckpt_HSWAPDB

oracle 25145 1 0 13:56:41 ? 0:00 ora_dbw0_HSWAPDB

oracle 25199 1 15 13:57:04 ? 0:49 ora_j000_HSWAPDB

oracle 4149 4146 0 12:05:11 pts/5 0:00 -ksh

oracle 25232 1 0 13:57:41 ? 0:00 oracleHSWAPDB (LOCAL=NO)

oracle 25119 1 0 13:56:29 ? 0:00 oraclehswapdb (LOCAL=NO)

oracle 25075 1 0 13:55:34 ? 0:00 /export/home1/oracle/app/bin/tnslsnr LISTENER -inherit

oracle 24374 4149 0 12:21:56 pts/5 0:00 sqlplus /nolog

oracle 25143 1 0 13:56:41 ? 0:00 ora_pmon_HSWAPDB

oracle 25258 25242 0 13:58:31 pts/3 0:00 -ksh

/export/home1/oracle/admin>ps -ef|grep ora_

oracle 25275 25258 0 13:58:42 pts/3 0:00 grep ora_

oracle 25184 1 0 13:56:57 ? 0:00 ora_p007_HSWAPDB

oracle 25182 1 0 13:56:57 ? 0:00 ora_p006_HSWAPDB

oracle 25176 1 0 13:56:57 ? 0:00 ora_p003_HSWAPDB

oracle 25180 1 0 13:56:57 ? 0:00 ora_p005_HSWAPDB

oracle 25172 1 0 13:56:56 ? 0:00 ora_p001_HSWAPDB

oracle 25178 1 0 13:56:57 ? 0:00 ora_p004_HSWAPDB

oracle 25170 1 0 13:56:56 ? 0:00 ora_p000_HSWAPDB

oracle 25174 1 0 13:56:56 ? 0:00 ora_p002_HSWAPDB

oracle 25159 1 0 13:56:42 ? 0:02 ora_qmn0_HSWAPDB

oracle 25161 1 0 13:56:42 ? 0:00 ora_s000_HSWAPDB

oracle 25149 1 0 13:56:41 ? 0:01 ora_lgwr_HSWAPDB

oracle 25157 1 0 13:56:42 ? 0:00 ora_cjq0_HSWAPDB

oracle 25153 1 0 13:56:42 ? 0:01 ora_smon_HSWAPDB

oracle 25163 1 0 13:56:42 ? 0:00 ora_d000_HSWAPDB

oracle 25155 1 0 13:56:42 ? 0:00 ora_reco_HSWAPDB

oracle 25151 1 0 13:56:41 ? 0:00 ora_ckpt_HSWAPDB

oracle 25145 1 0 13:56:41 ? 0:00 ora_dbw0_HSWAPDB

oracle 25199 1 13 13:57:04 ? 0:51 ora_j000_HSWAPDB

oracle 25143 1 0 13:56:41 ? 0:00 ora_pmon_HSWAPDB

 

3. 檢查Alert.log 警報日誌檔案

 

/export/home1/oracle/admin>ls

hswapdb

/export/home1/oracle/admin>cd *

/export/home1/oracle/admin/hswapdb>ls

bdump cdump create pfile udump

/export/home1/oracle/admin/hswapdb>cd bdump

/export/home1/oracle/admin/hswapdb/bdump>

 

/export/home1/oracle/admin/hswapdb/bdump>ls -l *.log

 

-rw-r--r-- 1 oracle dba 813396 Mar 23 13:57 alert_HSWAPDB.log

/export/home1/oracle/admin/hswapdb/bdump>vi *.log

"alert_HSWAPDB.log" 18888 lines, 813396 characters (115 null)

Tue Jun 24 21:17:14 2003

Starting ORACLE instance (normal)

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

SCN scheme 3

Using log_archive_dest parameter default value

LICENSE_MAX_USERS = 0

SYS auditing is disabled

Starting up ORACLE RDBMS Version: 9.2.0.3.0.

System parameters with non-default values:

processes = 400

timed_statistics = TRUE

shared_pool_size = 117440512

large_pool_size = 83886080

java_pool_size = 33554432

control_files = /export/home1/oracle/oradata/hswapdb/control01.ctl,

 

/export/home1/oracle/oradata/hswapdb/control02.ctl,

/export/home1/oracle/oradata/hswapdb/control03.ctl

db_block_size = 8192

db_cache_size = 352321536

compatible = 9.2.0.0.0

db_file_multiblock_read_count= 16

fast_start_mttr_target = 300

undo_management = AUTO

undo_tablespace = UNDOTBS1

undo_retention = 10800

remote_login_passwordfile= EXCLUSIVE

db_domain = eygle.com

instance_name = hswapdb

dispatchers = (PROTOCOL=TCP) (SERVICE=hswapdbXDB)

job_queue_processes = 10

hash_join_enabled = TRUE

background_dump_dest = /export/home1/oracle/admin/hswapdb/bdump

user_dump_dest = /export/home1/oracle/admin/hswapdb/udump

core_dump_dest = /export/home1/oracle/admin/hswapdb/cdump

sort_area_size = 524288

db_name = hswapdb

open_cursors = 300

star_transformation_enabled= FALSE

query_rewrite_enabled = FALSE

pga_aggregate_target = 154140672

aq_tm_processes = 1

 

.................

 

Tue Mar 23 13:40:45 2004

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 11, op = fork, loc = skgpspawn5

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

Tue Mar 23 13:42:02 2004

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

Tue Mar 23 13:55:38 2004

Starting ORACLE instance (normal)

Shutting down instance: further logons disabled

Tue Mar 23 13:56:20 2004

Shutting down instance (abort)

License high water mark = 26

Instance terminated by USER, pid = 25112

Tue Mar 23 13:56:37 2004

Starting ORACLE instance (normal)

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

SCN scheme 3

Using log_archive_dest parameter default value

LICENSE_MAX_USERS = 0

SYS auditing is disabled

Starting up ORACLE RDBMS Version: 9.2.0.3.0.

System parameters with non-default values:

processes = 400

timed_statistics = TRUE

shared_pool_size = 117440512

large_pool_size = 83886080

java_pool_size = 33554432

control_files = /export/home1/oracle/oradata/hswapdb/control01.ctl,

 

/export/home1/oracle/oradata/hswapdb/control02.ctl,

/export/home1/oracle/oradata/hswapdb/control03.ctl

db_block_size = 8192

db_cache_size = 352321536

compatible = 9.2.0.0.0

db_file_multiblock_read_count= 16

fast_start_mttr_target = 300

undo_management = AUTO

undo_tablespace = UNDOTBS1

undo_retention = 10800

remote_login_passwordfile= EXCLUSIVE

db_domain = eygle.com

instance_name = hswapdb

dispatchers = (PROTOCOL=TCP) (SERVICE=hswapdbXDB)

remote_dependencies_mode = SIGNATURE

job_queue_processes = 10

hash_join_enabled = TRUE

background_dump_dest = /export/home1/oracle/admin/hswapdb/bdump

user_dump_dest = /export/home1/oracle/admin/hswapdb/udump

core_dump_dest = /export/home1/oracle/admin/hswapdb/cdump

sort_area_size = 524288

db_name = hswapdb

open_cursors = 300

star_transformation_enabled= FALSE

parallel_automatic_tuning= TRUE

query_rewrite_enabled = FALSE

pga_aggregate_target = 154140672

aq_tm_processes = 1

PMON started with pid=2

DBW0 started with pid=3

LGWR started with pid=4

CKPT started with pid=5

SMON started with pid=6

RECO started with pid=7

CJQ0 started with pid=8

QMN0 started with pid=9

Tue Mar 23 13:56:42 2004

starting up 1 shared server(s) ...

Tue Mar 23 13:56:42 2004

starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...

Tue Mar 23 13:56:43 2004

ALTER DATABASE MOUNT

Tue Mar 23 13:56:47 2004

Successful mount of redo thread 1, with mount id 3253076635.

Tue Mar 23 13:56:47 2004

Database mounted in Exclusive Mode.

Completed: ALTER DATABASE MOUNT

Tue Mar 23 13:56:47 2004

Current log# 2 seq# 2136 mem# 0: /export/home1/oracle/oradata/hswapdb/redo02.log

Successful open of redo thread 1.

Tue Mar 23 12:24:54 2004

SMON: enabling cache recovery

Tue Mar 23 12:24:56 2004

Undo Segment 1 Onlined

Undo Segment 2 Onlined

Undo Segment 3 Onlined

Undo Segment 4 Onlined

Undo Segment 5 Onlined

Undo Segment 6 Onlined

Undo Segment 7 Onlined

Undo Segment 8 Onlined

Undo Segment 9 Onlined

Undo Segment 10 Onlined

Successfully onlined Undo Tablespace 1.

Tue Mar 23 12:24:56 2004

SMON: enabling tx recovery

Tue Mar 23 12:24:56 2004

Database Characterset is ZHS16GBK

Tue Mar 23 12:25:01 2004

SMON: Parallel transaction recovery tried

Tue Mar 23 12:25:01 2004

replication_dependency_tracking turned off (no async multimaster replication found)

Completed: ALTER DATABASE OPEN

Tue Mar 23 12:28:26 2004

/* OracleOEM */ ALTER DATABASE DATAFILE '/export/home1/oracle/oradata/hswapdb/users01.dbf' RESIZE 2501760K

Tue Mar 23 12:28:26 2004

ORA-3297 signalled during: /* OracleOEM */ ALTER DATABASE DATAFILE '/export/h...

Tue Mar 23 12:28:32 2004

/* OracleOEM */ ALTER DATABASE DATAFILE '/export/home1/oracle/oradata/hswapdb/users01.dbf' RESIZE 2501760K

ORA-3297 signalled during: /* OracleOEM */ ALTER DATABASE DATAFILE '/export/h...

Tue Mar 23 12:28:53 2004

/* OracleOEM */ ALTER DATABASE DATAFILE '/export/home1/oracle/oradata/hswapdb/users01.dbf' RESIZE 3501760K

Tue Mar 23 12:28:53 2004

ORA-3297 signalled during: /* OracleOEM */ ALTER DATABASE DATAFILE '/export/h...

Tue Mar 23 13:40:45 2004

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 11, op = fork, loc = skgpspawn5

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

Tue Mar 23 13:42:02 2004

skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

:q

 

發現資料庫多次重起,並記錄了部分錯誤資訊

 

該提示說明資料庫無法 spawn a new session.

 

4. 嘗試連線資料庫

 

收到錯誤資訊,無法連線資料庫

 

$ sqlplus "/ as sysdba"

 

SQL*Plus: Release 9.2.0.3.0 - Production on 星期二 3 23 14:14:06 2004

 

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

 

ERROR:

ORA-12540: TNS: 超出內部限制

 

 

請輸入使用者名稱 :

ERROR:

ORA-12540: TNS: 超出內部限制

 

 

請輸入使用者名稱 :

ERROR:

ORA-12540: TNS: 超出內部限制

 

 

SP2-0157: 3 次嘗試之後無法 CONNECT ORACLE, 退出 SQL*Plus

 

 

內部限制超過,通常說明某些系統資源不足 .

 

5. 檢查監聽器

 

發現部分連線被拒絕

 

 

/export/home1/oracle>lsnrctl services

 

LSNRCTL for Solaris: Version 9.2.0.3.0 - Production on 23-3 -2004 14:37:23

 

Copyright (c) 1991, 2002, Oracle Corporation. All rights reserved.

 

正在連線到 (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC)))

服務摘要 ..

服務 "PLSExtProc" 包含 1 個例程。

例程 "PLSExtProc", 狀態 UNKNOWN, 包含此服務的 1 個處理程式 ...

處理程式 :

"DEDICATED" 已建立 :0 已被拒絕 :0

LOCAL SERVER

服務 "hswapdb.eygle.com" 包含 2 個例程。

例程 "hswapdb", 狀態 UNKNOWN, 包含此服務的 1 個處理程式 ...

處理程式 :

"DEDICATED" 已建立 :6 已被拒絕 :0

LOCAL SERVER

例程 "hswapdb", 狀態 READY, 包含此服務的 1 個處理程式 ...

處理程式 :

"DEDICATED" 已建立 :21 已拒絕 :6 狀態 :ready

LOCAL SERVER

服務 "hswapdbXDB.eygle.com" 包含 1 個例程。

例程 "hswapdb", 狀態 READY, 包含此服務的 1 個處理程式 ...

處理程式 :

"D000" 已建立 :0 已被拒絕 :0 當前 : 0 最大 : 972 狀態 : ready

DISPATCHER <machine: wapplatform, pid: 25839>

(ADDRESS=(PROTOCOL=tcp)(HOST=wapplatform)(PORT=32869))

命令執行成功

 

檢查 listener.log

 

23-3\324\302 -2004 12:19:40 * (CONNECT_DATA=(SID=hswapdb)(CID=(PROGRAM=C:\WINNT\Microsoft.NET\Framework\v1.1.4322\aspnet_wp.e

xe)(HOST=SWAN)(USER=SYSTEM))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.30.125)(PORT=1291)) * establish * hswapdb * 12500

TNS-12500: TNS\243\272\274\340\314\375\306\367\316\264\304\334\306\364\266\257\327\250\323\303\265\304\267\376\316\361\306\36

7\275\370\263\314

TNS-12540: TNS\243\272\263\254\263\366\304\332\262\277\274\253\317\336\317\336\326\306

TNS-12560: TNS: \320\255\322\351\312\312\305\344\306\367\264\355\316\363

TNS-00510: \263\254\263\366\304\332\262\277\274\253\317\336\317\336\326\306

Solaris Error: 12: Not enough space

23-3\324\302 -2004 12:19:50 * (CONNECT_DATA=(SID=hswapdb)(CID=(PROGRAM=C:\Program Files\PLSQL Developer\PLSQLDev.exe)(HOST=SW

AN)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.30.125)(PORT=1292)) * establish * hswapdb * 12500

TNS-12500: TNS\243\272\274\340\314\375\306\367\316\264\304\334\306\364\266\257\327\250\323\303\265\304\267\376\316\361\306\36

7\275\370\263\314

TNS-12540: TNS\243\272\263\254\263\366\304\332\262\277\274\253\317\336\317\336\326\306

TNS-12560: TNS: \320\255\322\351\312\312\305\344\306\367\264\355\316\363

TNS-00510: \263\254\263\366\304\332\262\277\274\253\317\336\317\336\326\306

Solaris Error: 12: N pace

 

 

6. 退出 Oracle 使用者檢查

 

檢查系統日誌資訊,發現大量失敗的 su 操作

swap 區不足的報告

 

/export/home1/oracle/admin/hswapdb/bdump>exit

wapplatform:/>dmesg

 

2004 03 23 星期二 14 00 32 CST

Mar 22 22:52:36 wapplatform elfexec: [ID 700856 kern.notice] ps: Cannot find ^?ELF^A^B^A

Mar 22 22:53:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full

Mar 22 22:53:09 wapplatform elfexec: [ID 700856 kern.notice] w: Cannot find ^?ELF^A^B^A

Mar 22 22:53:53 wapplatform last message repeated 4 times

Mar 22 22:56:28 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A

Mar 22 22:58:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full

Mar 22 22:59:54 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A

Mar 22 23:02:26 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full

Mar 22 23:03:00 wapplatform last message repeated 1 time

Mar 22 23:08:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full

Mar 22 23:08:34 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A

Mar 22 23:10:27 wapplatform last message repeated 3 times

Mar 22 23:11:49 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A

Mar 22 23:11:52 wapplatform last message repeated 1 time

Mar 22 23:13:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full

Mar 22 23:18:01 wapplatform last message repeated 1 time

Mar 22 23:23:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full

Mar 22 23:28:01 wapplatform last message repeated 1 time

Mar 22 23:33:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full

Mar 22 23:38:01 wapplatform last message repeated 1 time

Mar 22 23:43:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full

Mar 22 23:48:01 wapplatform last message repeated 1 time

Mar 22 23:53:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full

Mar 22 23:58:01 wapplatform last message repeated 1 time

Mar 23 00:00:00 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full

Mar 23 00:00:00 wapplatform sendmail[3075]: [ID 702911 mail.crit] My unqualified host name (wapplatform) unknown; sleeping

 

for retry

Mar 23 00:01:00 wapplatform sendmail[3075]: [ID 702911 mail.alert] unable to qualify my own domain name (wapplatform) --

 

using short name

Mar 23 00:02:36 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full

Mar 23 00:03:02 wapplatform last message repeated 1 time

Mar 23 00:08:02 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full

....

 

Mar 23 10:18:15 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full

Mar 23 10:20:41 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full

Mar 23 10:20:47 wapplatform last message repeated 1 time

Mar 23 10:23:15 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full

Mar 23 10:24:38 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full

Mar 23 10:24:43 wapplatform last message repeated 1 time

Mar 23 10:24:55 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full

Mar 23 10:25:06 wapplatform last message repeated 2 times

Mar 23 11:09:31 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3118 (su)

Mar 23 11:09:39 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3121 (su)

Mar 23 11:10:48 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3137 (su)

Mar 23 11:18:02 wapplatform sshd[3620]: [ID 800047 auth.error] error: grantpt: Not enough space

Mar 23 11:18:02 wapplatform sshd[3620]: [ID 800047 auth.error] error: session_pty_req: session 0 alloc failed

Mar 23 11:18:43 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3636 (su)

Mar 23 11:19:47 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3672 (su)

Mar 23 11:20:20 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3694 (su)

Mar 23 11:22:23 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3736 (sshd)

Mar 23 11:23:17 wapplatform tmpfs: [ID 518458 kern.warning] WARNING: /tmp: File system full, swap space limit exceeded

Mar 23 11:23:40 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3804 (su)

Mar 23 11:23:40 wapplatform last message repeated 8 times

Mar 23 11:23:56 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3806 (ps)

Mar 23 11:23:56 wapplatform last message repeated 12 times

Mar 23 11:24:01 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3808 (w)

Mar 23 11:24:01 wapplatform last message repeated 8 times

Mar 23 13:40:56 wapplatform su: [ID 810491 auth.crit] 'su root' failed for root on /dev/pts/2

Mar 23 13:46:26 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 24888

 

(sqlplus)

Mar 23 13:49:18 wapplatform su: [ID 810491 auth.crit] 'su oracle' failed for root on /dev/pts/6

Mar 23 13:54:03 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 25035 (su)

Mar 23 13:54:08 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 25036 (su)

 

 

現在基本可以判斷是交換區的問題,當然和 Oracle SGA 設定有關 .

 

7. 檢查系統記憶體及交換區使用

 

$ top

 

last pid: 25456; load averages: 0.67, 0.70, 0.69

 

14:10:03

93 processes: 91 sleeping, 2 on cpu

CPU states: 72.7% idle, 14.9% user, 2.7% kernel, 9.7% iowait, 0.0% swap

Memory: 1024M real, 34M free, 752M swap in use, 10M swap free

 

PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND

25199 oracle 1 40 0 674M 631M cpu/2 8:03 16.32% oracle

25209 oracle 1 30 0 675M 630M sleep 0:03 0.13% oracle

25159 oracle 1 48 0 674M 628M sleep 0:03 0.06% oracle

25384 oracle 1 58 0 2632K 1736K cpu/0 0:01 0.05% top

25145 oracle 143 58 0 682M 630M sleep 0:01 0.03% oracle

25446 oracle 1 58 0 674M 628M sleep 0:00 0.03% oracle

25149 oracle 15 58 0 682M 626M sleep 0:00 0.02% oracle

25075 oracle 1 48 0 17M 7208K sleep 0:00 0.01% tnslsnr

25151 oracle 11 58 0 676M 624M sleep 0:00 0.01% oracle

25366 oracle 1 10 0 674M 628M sleep 0:00 0.00% oracle

25356 oracle 1 18 0 674M 628M sleep 0:00 0.00% oracle

25360 oracle 1 20 0 674M 628M sleep 0:00 0.00% oracle

25364 oracle 1 20 0 674M 628M sleep 0:00 0.00% oracle

25362 oracle 1 20 0 674M 628M sleep 0:00 0.00% oracle

25330 oracle 1 28 0 674M 628M sleep 0:00 0.00% oracle

 

 

發現實體記憶體僅為 1G free 部分為 34M ,交換區使用了 752M ,僅 10M free

系統記憶體嚴重不足, Swap 區不足

 

8. 檢查資料庫的 SGA 設定

 

發現 SGA 設定為 : 622299344 bytes

接近 600M

 

wapplatform:/>su - oracle

Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001

You have new mail.

/export/home1/oracle>sqlplus "/ as sysdba"

 

SQL*Plus: Release 9.2.0.3.0 - Production on 星期二 3 23 14:02:30 2004

 

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

 

 

連線到 :

Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production

With the Partitioning, OLAP and Oracle Data Mining options

JServer Release 9.2.0.3.0 - Production

 

SQL> show sga

 

Total System Global Area 622299344 bytes

Fixed Size 731344 bytes

Variable Size 268435456 bytes

Database Buffers 352321536 bytes

Redo Buffers 811008 bytes

SQL>

 

對於 RAM 小於 1G 的系統, Dedicated 模式下 ,Oracle SGA 一般不應超過 1/2 實體記憶體 .

 

9. 第一步調整

減小 SGA ,為系統保留足夠的記憶體。重起 oracle

 

10. 增加 swap

 

wapplatform:/>df -k

檔案系統 千位元組 用了 可用 容量 掛接在

/dev/dsk/c0t1d0s0 3099093 105421 2931691 4% /

/dev/dsk/c0t2d0s0 10325760 8359637 1862866 82% /usr

/proc 0 0 0 0% /proc

fd 0 0 0 0% /dev/fd

mnttab 0 0 0 0% /etc/mnttab

/dev/dsk/c0t1d0s3 1018382 285914 671366 30% /var

swap 3904 24 3880 1% /var/run

swap 3936 56 3880 2% /tmp

/dev/dsk/c0t1d0s5 1671823 459202 1162467 29% /opt

/dev/dsk/c0t2d0s7 7087473 6068462 948137 87% /export/home

/dev/dsk/c2t1d0s7 17413250 15900222 1338896 93% /export/home2

/dev/dsk/c0t3d0s7 17413250 13749782 3489336 80% /export/home1

/dev/dsk/c0t1d0s1 771110 382410 334723 54% /usr/openwin

/export/home/wapgw/luke

7087473 6068462 948137 87% /home/wap

 

wapplatform:/var/swap>cd /export/home1

wapplatform:/export/home1>ls

TT_DB lost+found oracle oracli9

wapplatform:/export/home1>mkdir swap

wapplatform:/export/home1>cd swap

wapplatform:/export/home1/swap>mkfile -v 1g swapfile1

swapfile1 1073741824 bytes

wapplatform:/export/home1/swap>id

uid=0(root) gid=1(other)

wapplatform:/export/home1/swap>swap -a /export/home1/swap/swapfile1

wapplatform:/export/home1/swap>swap -s

總數:分配了 623160k 位元組 + 保留 162704k = 已使用 785864k 1010936k 可用

 

11. 連線測試

 

系統恢復正常,問題解決

 

wapplatform:/export/home1/swap>su - oracle

Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001

You have new mail.

/export/home1/oracle>sqlplus "/ as sysdba"

 

SQL*Plus: Release 9.2.0.3.0 - Production on 星期四 3 25 11:56:28 2004

 

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

 

 

連線到 :

Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production

With the Partitioning, OLAP and Oracle Data Mining options

JServer Release 9.2.0.3.0 - Production

 

SQL> exit

Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production

With the Partitioning, OLAP and Oracle Data Mining options

JServer Release 9.2.0.3.0 - Production 中斷開

/export/home1/oracle>top

 

last pid: 5372; load averages: 0.25, 0.22, 0.29

 

11:57:58

148 processes: 137 sleeping, 9 zombie, 2 on cpu

CPU states: 98.8% idle, 0.2% user, 0.7% kernel, 0.2% iowait, 0.0% swap

Memory: 1024M real, 17M free, 824M swap in use, 934M swap free

 

PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND

5363 root 1 58 0 2680K 1736K sleep 0:00 0.24% top

5370 oracle 1 58 0 514M 469M sleep 0:00 0.18% oracle

5366 oracle 1 28 0 514M 469M sleep 0:00 0.11% oracle

5341 oracle 1 58 0 2680K 1736K cpu/2 0:00 0.10% top

5372 oracle 1 48 0 61M 3288K cpu/3 0:00 0.06% oracle

1288 oracle 1 48 0 514M 468M sleep 5:33 0.05% oracle

607 root 12 48 0 2768K 2312K sleep 1:48 0.03% mibiisa

25075 oracle 1 48 0 17M 7208K sleep 0:16 0.02% tnslsnr

1278 oracle 15 58 0 522M 466M sleep 0:49 0.02% oracle

374 root 11 53 0 3504K 2888K sleep 0:16 0.01% nscd

1280 oracle 19 58 0 518M 466M sleep 0:28 0.00% oracle

5361 root 1 46 0 1024K 680K sleep 0:00 0.00% sleep

5362 root 1 46 0 1024K 680K sleep 0:00 0.00% sleep

5469 root 1 36 0 1952K 1176K sleep 30:09 0.00% monithttp

4167 oracle 1 40 0 515M 471M sleep 29:38 0.00% oracle

 

問題總結 :

 

Oracle 資料庫問題的解決從來就離不開作業系統

 

很多時候我們必須透過作業系統一級的手段來診斷並解決問題 .

 

關於作業系統

 

一般 Swap 區的推薦值為 2XRAM

如果 Ram 很大,不一定非要把 Swap 設定為 2xSwap

但是通常至少設定 Swap = Ram

 

如果 Swap 區過小,在系統繁忙期間

產生大量交換無法換到磁碟,就會出現問題 .

如本案例就是這樣。

 

另外,如果系統 Ram 較小

通常設定 SGA < 1/2 Ram

 

要為 Server process OS 保留足夠的記憶體空間 .


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23732248/viewspace-2777110/,如需轉載,請註明出處,否則將追究法律責任。

相關文章