系統記憶體不足導致oracle程式被誤殺terminating the instance due to error 822
今天收到一個報警郵件,oracle程式已經不存在了
Alarm Time:2015-09-21 17:45:38
Trigger: Alive xyxdb_oa
Trigger status: PROBLEM
Trigger severity: High
Trigger URL:
Item values:
1. Alive (x.x.x.x:alive): 0
2. *UNKNOWN* (x.x.x.x :*UNKNOWN*): *UNKNOWN*
Original event ID: 760121
檢視到alert日誌
System state dump requested by (instance=1, osid=2044 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xyxdbp/xyxdb/trace/xyxdb_diag_2062_20150921174417.trc
Mon Sep 21 17:44:18 2015
PMON (ospid: 2044): terminating the instance due to error 822
Dumping diagnostic data in directory=[cdmp_20150921174417], requested by (instance=1, osid=2044 (PMON)), summary=[abnormal instance termination].
Instance terminated by PMON, pid = 2044
Mon Sep 21 17:46:39 2015
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 64 KB
Total Shared Global Region in Large Pages = 0 KB (0%)
Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB
RECOMMENDATION:
Total System Global Area size is 3282 MB. For optimal performance,
prior to the next instance restart:
1. Increase the number of unused large pages by
System State dumped to trace file /u01/app/oracle/diag/rdbms/xyxdbp/xyxdb/trace/xyxdb_diag_2062_20150921174417.trc
Mon Sep 21 17:44:18 2015
PMON (ospid: 2044): terminating the instance due to error 822
Dumping diagnostic data in directory=[cdmp_20150921174417], requested by (instance=1, osid=2044 (PMON)), summary=[abnormal instance termination].
Instance terminated by PMON, pid = 2044
Mon Sep 21 17:46:39 2015
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 64 KB
Total Shared Global Region in Large Pages = 0 KB (0%)
Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB
RECOMMENDATION:
Total System Global Area size is 3282 MB. For optimal performance,
prior to the next instance restart:
1. Increase the number of unused large pages by
at least 1641 (page size 2048 KB, total size 3282 MB) system wide to
RECOMMENDATION:
Total System Global Area size is 3282 MB. For optimal performance,
prior to the next instance restart:
1. Increase the number of unused large pages by
at least 1641 (page size 2048 KB, total size 3282 MB) system wide to
get 100% of the System Global Area allocated with large pages
2. Large pages are automatically locked into physical memory.
Increase the per process memlock (soft) limit to at least 3290 MB to lock
100% System Global Area's large pages into physical memory
********************************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 6
Number of processor cores in the system is 6
Number of processor sockets in the system is 1
CELL communication is configured to use 0 interface(s):
CELL IP affinity details:
NUMA status: non-NUMA system
cellaffinity.ora status: N/A
CELL communication will use 1 IP group(s):
Total System Global Area size is 3282 MB. For optimal performance,
prior to the next instance restart:
1. Increase the number of unused large pages by
at least 1641 (page size 2048 KB, total size 3282 MB) system wide to
get 100% of the System Global Area allocated with large pages
2. Large pages are automatically locked into physical memory.
Increase the per process memlock (soft) limit to at least 3290 MB to lock
100% System Global Area's large pages into physical memory
********************************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 6
Number of processor cores in the system is 6
Number of processor sockets in the system is 1
CELL communication is configured to use 0 interface(s):
CELL IP affinity details:
NUMA status: non-NUMA system
cellaffinity.ora status: N/A
CELL communication will use 1 IP group(s):
Grp 0:
[root@OA01-1-24 scripts]# cat /proc/50966/oom_
oom_adj oom_score oom_score_adj
[root@OA01-1-24 scripts]# cat /proc/50966/oom_adj
0
[root@OA01-1-24 scripts]# vim oomscore.sh
[root@OA01-1-24 scripts]# chmod u+x oomscore.sh
[root@OA01-1-24 scripts]# ./oomscore.sh
63 37608 /usr/bin/java -Djava.util.logging.config.file=/usr
31 51010 ora_mman_xyxdb
20 37579 /usr/bin/java -Djava.util.logging.config.file=/usr
16 51938 /usr/java/jdk1.7.0_79/jre/bin/java -Djava.util.log
14 51496 oraclexyxdb (LOCAL=NO)
13 51026 ora_smon_xyxdb
8 51167 oraclexyxdb (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROT
7 51034 ora_mmon_xyxdb
7 51014 ora_dbw0_xyxdb
[root@OA01-1-24 scripts]# cat /proc/50966/oom_adj
0
[root@OA01-1-24 scripts]# vim oomscore.sh
[root@OA01-1-24 scripts]# chmod u+x oomscore.sh
[root@OA01-1-24 scripts]# ./oomscore.sh
63 37608 /usr/bin/java -Djava.util.logging.config.file=/usr
31 51010 ora_mman_xyxdb
20 37579 /usr/bin/java -Djava.util.logging.config.file=/usr
16 51938 /usr/java/jdk1.7.0_79/jre/bin/java -Djava.util.log
14 51496 oraclexyxdb (LOCAL=NO)
13 51026 ora_smon_xyxdb
8 51167 oraclexyxdb (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROT
7 51034 ora_mmon_xyxdb
7 51014 ora_dbw0_xyxdb
6 51480 oraclexyxdb (LOCAL=NO)
查詢系統日誌
Sep 21 17:44:15 OA01-1-24 kernel: [39519] 500 39519 900699 5848 2 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [39521] 500 39521 900699 5877 5 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [42514] 500 42514 900846 10963 1 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [42578] 500 42578 900706 9012 1 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [43519] 0 43519 24998 1489 5 0 0 sshd
Sep 21 17:44:15 OA01-1-24 kernel: [43533] 0 43533 14309 550 5 0 0 sftp-server
Sep 21 17:44:15 OA01-1-24 kernel: [43557] 0 43557 14432 671 5 0 0 sftp-server
Sep 21 17:44:15 OA01-1-24 kernel: [44331] 89 44331 20234 861 2 0 0 pickup
Sep 21 17:44:15 OA01-1-24 kernel: [44491] 0 44491 1107908 148835 4 0 0 java
Sep 21 17:44:15 OA01-1-24 kernel: [44684] 500 44684 900015 4658 0 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45199] 500 45199 900699 5525 3 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45201] 500 45201 900699 5548 4 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45203] 500 45203 900704 8184 5 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45211] 500 45211 900699 5506 0 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45213] 500 45213 900699 5504 4 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45901] 0 45901 1051478 117538 2 0 0 java
Sep 21 17:44:15 OA01-1-24 kernel: [45943] 500 45943 900956 7194 0 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45945] 500 45945 900315 5444 1 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45947] 500 45947 900315 5423 5 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [46232] 0 46232 25226 152 4 0 0 sleep
Sep 21 17:44:15 OA01-1-24 kernel: Out of memory: Kill process 2074 (oracle) score 125 or sacrifice child
Sep 21 17:44:15 OA01-1-24 kernel: [39521] 500 39521 900699 5877 5 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [42514] 500 42514 900846 10963 1 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [42578] 500 42578 900706 9012 1 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [43519] 0 43519 24998 1489 5 0 0 sshd
Sep 21 17:44:15 OA01-1-24 kernel: [43533] 0 43533 14309 550 5 0 0 sftp-server
Sep 21 17:44:15 OA01-1-24 kernel: [43557] 0 43557 14432 671 5 0 0 sftp-server
Sep 21 17:44:15 OA01-1-24 kernel: [44331] 89 44331 20234 861 2 0 0 pickup
Sep 21 17:44:15 OA01-1-24 kernel: [44491] 0 44491 1107908 148835 4 0 0 java
Sep 21 17:44:15 OA01-1-24 kernel: [44684] 500 44684 900015 4658 0 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45199] 500 45199 900699 5525 3 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45201] 500 45201 900699 5548 4 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45203] 500 45203 900704 8184 5 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45211] 500 45211 900699 5506 0 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45213] 500 45213 900699 5504 4 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45901] 0 45901 1051478 117538 2 0 0 java
Sep 21 17:44:15 OA01-1-24 kernel: [45943] 500 45943 900956 7194 0 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45945] 500 45945 900315 5444 1 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45947] 500 45947 900315 5423 5 0 0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [46232] 0 46232 25226 152 4 0 0 sleep
Sep 21 17:44:15 OA01-1-24 kernel: Out of memory: Kill process 2074 (oracle) score 125 or sacrifice child
Sep 21 17:44:15 OA01-1-24 kernel: Killed process 2074, UID 500, (oracle) total-vm:3600064kB, anon-rss:3444kB, file-rss:1510892kB
通常是因為某時刻應用程式大量請求記憶體導致系統記憶體不足造成的,這通常會觸發 Linux 核心裡的 Out of Memory (OOM) killer,OOM killer 會殺掉某個程式以騰出記憶體留給系統用,不致於讓系統立刻崩潰。
後來檢視到開發人員在這臺db伺服器啟用了兩個tomcat應用,由於程式故障導致大量記憶體使用
至於oom killer 原理可以參閱http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html
我們可以配置核心引數來防止程式被殺
透過指令碼找出最容易被殺的程式
# vi oomscore.sh
#!/bin/bash
for proc in $(find /proc -maxdepth 1 -regex '/proc/[0-9]+'); do
printf "%2d %5d %s\n" \
"$(cat $proc/oom_score)" \
"$(basename $proc)" \
"$(cat $proc/cmdline | tr '\0' ' ' | head -c 50)"
#!/bin/bash
for proc in $(find /proc -maxdepth 1 -regex '/proc/[0-9]+'); do
printf "%2d %5d %s\n" \
"$(cat $proc/oom_score)" \
"$(basename $proc)" \
"$(cat $proc/cmdline | tr '\0' ' ' | head -c 50)"
done 2>/dev/null | sort -nr | head -n 10
[root@OA01-1-24 scripts]# ./oomscore.sh
63 37608 /usr/bin/java -Djava.util.logging.config.file=/usr
31 51010 ora_mman_xyxdb
20 37579 /usr/bin/java -Djava.util.logging.config.file=/usr
16 51938 /usr/java/jdk1.7.0_79/jre/bin/java -Djava.util.log
14 51496 oraclexyxdb (LOCAL=NO)
13 51026 ora_smon_xyxdb
8 51167 oraclexyxdb (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROT
7 51034 ora_mmon_xyxdb
7 51014 ora_dbw0_xyxdb
63 37608 /usr/bin/java -Djava.util.logging.config.file=/usr
31 51010 ora_mman_xyxdb
20 37579 /usr/bin/java -Djava.util.logging.config.file=/usr
16 51938 /usr/java/jdk1.7.0_79/jre/bin/java -Djava.util.log
14 51496 oraclexyxdb (LOCAL=NO)
13 51026 ora_smon_xyxdb
8 51167 oraclexyxdb (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROT
7 51034 ora_mmon_xyxdb
7 51014 ora_dbw0_xyxdb
6 51480 oraclexyxdb (LOCAL=NO)
[root@OA01-1-24 scripts]# cat /proc/51034/oom_score
7
[root@OA01-1-24 scripts]# cat /proc/51034/oom_score_adj
0
[root@OA01-1-24 scripts]# echo -15 >/proc/51034/oom_adj
[root@OA01-1-24 scripts]# cat /proc/51034/oom_score
1
[root@OA01-1-24 scripts]# cat /proc/51026/oom_adj
0
[root@OA01-1-24 scripts]# cat /proc/51026/oom_score
13
[root@OA01-1-24 scripts]# echo -15 >/proc/51026/oom_adj
[root@OA01-1-24 scripts]# cat /proc/51026/oom_adj
-15
[root@OA01-1-24 scripts]# cat /proc/51026/oom_score
1
7
[root@OA01-1-24 scripts]# cat /proc/51034/oom_score_adj
0
[root@OA01-1-24 scripts]# echo -15 >/proc/51034/oom_adj
[root@OA01-1-24 scripts]# cat /proc/51034/oom_score
1
[root@OA01-1-24 scripts]# cat /proc/51026/oom_adj
0
[root@OA01-1-24 scripts]# cat /proc/51026/oom_score
13
[root@OA01-1-24 scripts]# echo -15 >/proc/51026/oom_adj
[root@OA01-1-24 scripts]# cat /proc/51026/oom_adj
-15
[root@OA01-1-24 scripts]# cat /proc/51026/oom_score
1
[root@OA01-1-24 scripts]# echo -15 >/proc/51010/oom_adj
[root@OA01-1-24 scripts]# cat /proc/51010/oom_score
1
[root@OA01-1-24 scripts]# ./
alertbyday.sh oracle_cron.sh sendrman.py updatedb/
installora/ rmanbackup.sh sync_date.sh uploadbackup.sh
oomscore.sh senderrorlog.py tablespace_monitor.py
[root@OA01-1-24 scripts]# ./oomscore.sh
63 37608 /usr/bin/java -Djava.util.logging.config.file=/usr
20 37579 /usr/bin/java -Djava.util.logging.config.file=/usr
16 51938 /usr/java/jdk1.7.0_79/jre/bin/java -Djava.util.log
14 51496 oraclexyxdb (LOCAL=NO)
8 51167 oraclexyxdb (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROT
7 51014 ora_dbw0_xyxdb
6 51480 oraclexyxdb (LOCAL=NO)
5 52007 oraclexyxdb (LOCAL=NO)
5 51474 oraclexyxdb (LOCAL=NO)
1
[root@OA01-1-24 scripts]# ./
alertbyday.sh oracle_cron.sh sendrman.py updatedb/
installora/ rmanbackup.sh sync_date.sh uploadbackup.sh
oomscore.sh senderrorlog.py tablespace_monitor.py
[root@OA01-1-24 scripts]# ./oomscore.sh
63 37608 /usr/bin/java -Djava.util.logging.config.file=/usr
20 37579 /usr/bin/java -Djava.util.logging.config.file=/usr
16 51938 /usr/java/jdk1.7.0_79/jre/bin/java -Djava.util.log
14 51496 oraclexyxdb (LOCAL=NO)
8 51167 oraclexyxdb (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROT
7 51014 ora_dbw0_xyxdb
6 51480 oraclexyxdb (LOCAL=NO)
5 52007 oraclexyxdb (LOCAL=NO)
5 51474 oraclexyxdb (LOCAL=NO)
5 51466 oraclexyxdb (LOCAL=NO)
後來還檢查到一個問題,關於swap使用配置
[root@OA01-1-24 scripts]# cat /proc/sys/vm/swappiness
0
這裡0代表不使用swap
系統工程師更改的時候沒有注意,oracle最好不要關掉swap
重新修改
[root@OA01-1-24 scripts]# cat /proc/sys/vm/swappiness
60
總結:DB伺服器儘量專用,不然會出現很多意想不到事兒
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/24486203/viewspace-1805598/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- terminating the instance due to error481導致ASM無法啟動故障ErrorASM
- 【RAC】PMON: terminating the instance due to error 481Error
- LGWR (ospid: 29534): terminating the instance due to error 4021Error
- hp-ux記憶體不足導致交換UX記憶體
- Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481AIError
- LMON: terminating instance due to error 29702 -- ORA-29702Error
- Oracle9.2.0.4 RAC 升級到Oracle9.2.0.7 ,LMON: terminating instance due to error 29702OracleError
- [總結]9i RAC LMON: terminating instance due to error 29702Error
- [總結]9i RAC LMON: terminating instance due to error 29702Error
- iOS-程式錯誤導致App閃退了怎麼辦?Terminating app due to uncaught exception...iOSAPPException
- 共享記憶體段未釋放導致資料庫記憶體被耗盡記憶體資料庫
- 作業系統HugePage配置導致記憶體驟降探究作業系統記憶體
- 記憶體不足導致安裝時報錯ORA-3113(二)記憶體
- 記憶體不足導致安裝時報錯ORA-3113(一)記憶體
- 修改記憶體導致Ora-27100錯誤記憶體
- win10更新失敗提示記憶體不足怎麼回事_win10記憶體不足導致更新失敗如何修復Win10記憶體
- RAC Instance Crashes During Startup Due To Error 495Error
- windows系統提示虛擬記憶體不足的原因Windows記憶體
- dotnet 6 在 Win7 系統證書鏈錯誤導致 HttpWebRequest 記憶體洩露Win7HTTPWeb記憶體洩露
- 解決ORACLE共享記憶體不足的方法Oracle記憶體
- swap空間不足導致mysql被OOM kill案例MySqlOOM
- 為什麼 Go 中有的自定義 error 會導致記憶體溢位GoError記憶體溢位
- 升級Win10系統提示記憶體不足0xc0000017錯誤程式碼怎麼辦Win10記憶體
- 避免PHP-FPM記憶體洩漏導致記憶體耗盡PHP記憶體
- Composer 記憶體不足解決方案 PHP Fatal error: Out of memory記憶體PHPError
- 諾頓誤殺系統檔案 導致百萬臺電腦處於崩潰邊緣
- Oracle 11g RAC的ASM例項記憶體引數被修改導致無法啟動OracleASM記憶體
- win10系統記憶體不足提示0xc0000017錯誤的解決方法Win10記憶體
- Allowed memory size 記憶體不足記憶體
- android Handler導致的記憶體洩露Android記憶體洩露
- 記一次排序導致的記憶體危機排序記憶體
- 記憶體能被魯大師識別,但是不能被系統使用記憶體
- 電腦記憶體不足怎麼辦? 虛擬記憶體不足的解決辦法記憶體
- Android-Fragment 切換造成記憶體溢位,導致記憶體增長AndroidFragment記憶體溢位
- SQL Server 記憶體洩露(memory leak)——遊標導致的記憶體問題SQLServer記憶體洩露
- 中止程式導致系統HANG住
- Volley中listener導致的記憶體洩露記憶體洩露
- Oracle記憶體體系結構Oracle記憶體