某新裝業務1庫節點2啟動關閉資料庫很慢且sqlplus連線也很慢故障分析處理

yuntui發表於2016-11-03
一、環境描述
系統:HP-UX B.11.31 U ia64
資料庫:Oracle 11.2.0.4.5 RAC 檔案系統

二、故障描述
節點1正常,節點啟停庫很慢,大概需要11分鐘,而且connect很慢,大概需要7秒左右。

三、分析處理步驟
1、檢視db alert日誌如下:
Tue Dec 02 11:28:39 2014                                      <<=====資料庫開始啟動
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 32
Number of processor cores in the system is 32
Number of processor sockets in the system is 16
Private Interface 'lan101:101' configured from GPnP for use as a private interconnect.
  [name='lan101:101', type=1, ip=192.1.1.29, mac=00-23-7d-3e-22-d5, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'lan100:101' configured from GPnP for use as a public interface.
  [name='lan100:101', type=1, ip=15.133.55.16, mac=00-23-7d-3e-22-29, net=15.133.55.0/24, mask=255.255.255.0, use=public/1]
Public Interface 'lan100' configured from GPnP for use as a public interface.
  [name='lan100', type=1, ip=15.133.55.12, mac=00-23-7d-3e-22-29, net=15.133.55.0/24, mask=255.255.255.0, use=public/1]
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /oracle/app/oracle/product/11.2.0/db/dbs/arch
Tue Dec 02 11:28:49 2014
Autotune of undo retention is turned off. 
LICENSE_MAX_USERS = 0
。。。。
Tue Dec 02 11:29:27 2014
PMON started with pid=2, OS id=2671 
Tue Dec 02 11:29:42 2014
PSP0 started with pid=3, OS id=4085 
Tue Dec 02 11:29:58 2014
VKTM started with pid=4, OS id=4139 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
Tue Dec 02 11:30:12 2014
GEN0 started with pid=5, OS id=4208 
Tue Dec 02 11:30:27 2014
DIAG started with pid=6, OS id=4251 
Tue Dec 02 11:30:42 2014
PING started with pid=7, OS id=5285 
Tue Dec 02 11:30:57 2014
ACMS started with pid=8, OS id=5341 
Tue Dec 02 11:31:12 2014
DIA0 started with pid=9, OS id=5419 
Tue Dec 02 11:31:27 2014
LMON started with pid=10, OS id=5461 
Tue Dec 02 11:31:42 2014
LMD0 started with pid=11, OS id=6448 
* Load Monitor used for high load check 
* New Low - High Load Threshold Range = [30720 - 40960] 
Tue Dec 02 11:31:56 2014
LMS0 started with pid=12, OS id=6516 at elevated priority
Tue Dec 02 11:32:11 2014
LMS1 started with pid=13, OS id=6564 at elevated priority
Tue Dec 02 11:32:26 2014
LMS2 started with pid=14, OS id=6626 at elevated priority
Tue Dec 02 11:32:40 2014
RMS0 started with pid=15, OS id=7619 
Tue Dec 02 11:32:55 2014
LMHB started with pid=16, OS id=7686 
Tue Dec 02 11:33:10 2014
MMAN started with pid=17, OS id=7733 
Tue Dec 02 11:33:32 2014
DBW0 started with pid=18, OS id=7789 
Tue Dec 02 11:33:54 2014
DBW1 started with pid=19, OS id=8781 
Tue Dec 02 11:34:17 2014                    <<=====啟動一個程式幾十秒,正常情況下秒啟的!
DBW2 started with pid=20, OS id=8896 
Tue Dec 02 11:34:38 2014
DBW3 started with pid=21, OS id=8993 
Tue Dec 02 11:34:53 2014
LGWR started with pid=22, OS id=10241 
Tue Dec 02 11:35:08 2014
CKPT started with pid=23, OS id=10320 
Tue Dec 02 11:35:23 2014
SMON started with pid=24, OS id=10434 
Tue Dec 02 11:35:38 2014
RECO started with pid=25, OS id=10593 
Tue Dec 02 11:35:53 2014
MMON started with pid=26, OS id=11488 
Tue Dec 02 11:36:08 2014
MMNL started with pid=27, OS id=11592 
Tue Dec 02 11:36:08 2014
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
Tue Dec 02 11:36:38 2014
starting up 1 shared server(s) ...
Tue Dec 02 11:36:40 2014
Restarting dead background process DIA0   <<=====從29分到37分之間都是在啟動後臺程式,而這裡DIA0啟動之後又dead了、重啟。
Tue Dec 02 11:37:08 2014                                             
lmon registered with NM - instance number 2 (internal mem no 1)
Reconfiguration started (old inc 0, new inc 8)
List of instances:
 1 2 (myinst: 2) 
 Global Resource Directory frozen
* allocate domain 0, invalid = TRUE 
 Communication channels reestablished
Tue Dec 02 11:37:09 2014
 * domain 0 valid = 1 according to instance 1 
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Tue Dec 02 11:37:09 2014
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

二、透過分析alert日誌,我們發現建立程式很慢,由於建立一個連線會話ORACLE也會建立一個程式來處理,所以sqlplus也很慢。由於庫是新裝的還沒上線,很可能是系統引數設定有問題造成的。思路已經有了,下一步就是對比好的節點1跟有問題的節點2的核心引數了。
 
透過對比發現節點2的shmmax才1073741824 B(1G),初步懷疑故障由於該引數導致。
(引數shmmax釋義:每個共享記憶體段的最大位元組值,單位B,官方建議:為了避免效能下降,該值需大於或等於可用記憶體大小。)

三、修改引數並重啟主機後資料庫啟動關閉恢復正常,連線也很快了。
修改引數命令:
kctune shmmax=274598985728


如果以後遇到在生產庫前面是正常的,後面啟動關閉都很慢,這個時候就需考慮其他因素了,詳細請見:
Troubleshooting Database Startup/Shutdown Problems (文件 ID 851057.1)

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30633755/viewspace-2127675/,如需轉載,請註明出處,否則將追究法律責任。

相關文章