由一條日誌警告所做的調優分析
這個案例發生有段時間了,但是今天無意中看到當時的郵件,感覺還是收益匪淺,看來還是細節決定成敗啊。從一些日誌或trace 檔案中的警告資訊中我們可以發掘出潛在的問題。
當時系統中的使用者數很小,所以每天都能抽時間看看日誌記錄,看有沒有明顯的問題。結果在grep的時候發現trace檔案中有一些警告記錄。
當時的庫是10gR2的庫,現在已經升級到了11gR2.
/xxxx/oracle/xxxxx/oradmp/udump/xxxxx01_ora_15070.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /opt/app/oracle/xxxxxx/product/10.2.0
System name: Linux
Node name: xxxx
Release: 2.6.18-308.el5
Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
Machine: x86_64
Instance name: xxxxxx
Redo thread mounted by this instance: 1
Oracle process number: 472
Unix process pid: 15070, image: oracle@xxxxx (TNS V1-V3)
*** 2013-03-29 20:18:54.711
*** ACTION NAME:() 2013-03-29 20:18:54.707
*** MODULE NAME:(OGG-ECC_ARCG-GLOPEN_DATA_SOURCE) 2013-03-29 20:18:54.707
*** SERVICE NAME:(SYS$USERS) 2013-03-29 20:18:54.707
*** SESSION ID:(5029.3122) 2013-03-29 20:18:54.707
WARNING:Could not increase the asynch I/O limit to 224 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 256 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 224 for SQL direct I/O. It is set to 128
*** 2013-03-29 22:25:26.506
WARNING:Could not increase the asynch I/O limit to 224 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 256 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 256 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 288 for SQL direct I/O. It is set to 128
從警告的資訊可以看出,和IO相關,可能是什麼引數的設定出問題了。檢視MOS 發現有一篇相關的文章。
Warning:Could Not Increase The Asynch I/O Limit To XX For Sql Direct I/O (Doc ID 1302633.1)
其中給出的solution如下:
This feature is now "automatic" in the 2.6.x kernel, based on the physical capabilities of the disk device driver.
This should mean that the Linux Kernel is ready to perform ASYNC I/O.
All install requirements should be met.
To ensure ASYNC I/O can be performed by Oracle Database you need to verify or set the following parameters in the Database:
Then shutdown and startup the database and check if the warning reappears.
An HCVE report (refer to Note 250262.1) should report no remaining issues
If the above doesn't resolve the problem, then increase fs.aio-max-nr
關於檢視aio的內容可以使用如下的方式來檢視。
cat /proc/sys/fs/aio-max-nr
/sbin/sysctl -a |grep aio
當時得到的值是
fs.aio-max-nr = 65536
fs.aio-nr = 65472
可能一般來說問題處理到這個地方就告一段落了。
這個問題當時是叫給一個資深的專家來做的調優,他又要了一些其他的資訊。
最後給出的建議如下:
- Increase fs.aio-max-nr as advised in my initial e-mail to 3145728
- Change Oracle filesystemio_options to “SETALL” (enable both asynch and direct IO)
Following is suggested:
- VXFS – change mount options to:
mincache=direct,convosync=direct (Must be done along with the change of the Oracle filesystemio_options)
- Enable ODM
Pending upon the VXFS version – and True VXFS license supporting ODM (should be by default with VXFS 5.X)
Enable Veritas ODM - by doing “ln –s /opt/VRTSodm/lib64/libodm.so $ORACLE_HOME/lib/libodm10.so”
- Memory
As server has sufficient memory – consult with DBAs if it is worthwhile increasing Oracle cache
- Power
Consider disabling power management savings for better response / latency (BIOS level)
- Disable C-State
RH 5.X do not support C-State (BIOS level)
Kernel parameters:
vm.min_free_kbytes = 32768
vm.dirty_expire_centisecs = 500
vm.dirty_ratio = 15
vm.dirty_writeback_centisecs = 100
vm.swappiness = 0
vm.hugetlb_shm_group = <set to group id of user Oracle)
vm.nr_hugepages = 16384 # set hugepages to 32 GB - if total of Oracle instances cache grows beyond must increase it accordingly it is 2 MB pages)
# any increase requires a reboot !!
其中關於huge page的部分還是有點意思。其中關於設定的Hugepage數還可以透過一個指令碼來計算。
當時看得雲裡霧裡,詳細的學習了一下關於hugepage的設定,還是有所收穫。
其實在官方文件中已經有詳細的描述。hugepage的配置可以參考文件。
發現官方文件還是很有權威性和價值的。
計算hugepage的指令碼如下:
所以問題的分析和解決思路還是最重要的,可能大多數時候碰到問題都是頭疼醫頭,腳疼醫腳。處理問題還是沒有從整體的情況來做,以點帶面,能夠舉一反三才是問題處理的高手。
當時系統中的使用者數很小,所以每天都能抽時間看看日誌記錄,看有沒有明顯的問題。結果在grep的時候發現trace檔案中有一些警告記錄。
當時的庫是10gR2的庫,現在已經升級到了11gR2.
/xxxx/oracle/xxxxx/oradmp/udump/xxxxx01_ora_15070.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /opt/app/oracle/xxxxxx/product/10.2.0
System name: Linux
Node name: xxxx
Release: 2.6.18-308.el5
Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
Machine: x86_64
Instance name: xxxxxx
Redo thread mounted by this instance: 1
Oracle process number: 472
Unix process pid: 15070, image: oracle@xxxxx (TNS V1-V3)
*** 2013-03-29 20:18:54.711
*** ACTION NAME:() 2013-03-29 20:18:54.707
*** MODULE NAME:(OGG-ECC_ARCG-GLOPEN_DATA_SOURCE) 2013-03-29 20:18:54.707
*** SERVICE NAME:(SYS$USERS) 2013-03-29 20:18:54.707
*** SESSION ID:(5029.3122) 2013-03-29 20:18:54.707
WARNING:Could not increase the asynch I/O limit to 224 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 256 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 224 for SQL direct I/O. It is set to 128
*** 2013-03-29 22:25:26.506
WARNING:Could not increase the asynch I/O limit to 224 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 256 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 256 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 288 for SQL direct I/O. It is set to 128
從警告的資訊可以看出,和IO相關,可能是什麼引數的設定出問題了。檢視MOS 發現有一篇相關的文章。
Warning:Could Not Increase The Asynch I/O Limit To XX For Sql Direct I/O (Doc ID 1302633.1)
其中給出的solution如下:
[root@xyz ~]# cat /proc/sys/fs/aio-max-nr
65536
65536
SOLUTION
The aio-max-size kernel parameter doesn't exist in the 2.6.x Linux kernels.This feature is now "automatic" in the 2.6.x kernel, based on the physical capabilities of the disk device driver.
This should mean that the Linux Kernel is ready to perform ASYNC I/O.
All install requirements should be met.
To ensure ASYNC I/O can be performed by Oracle Database you need to verify or set the following parameters in the Database:
sql> alter system set disk_asynch_io=true scope=spfile;
sql> alter system set filesystemio_options=setall scope=spfile;
sql> alter system set filesystemio_options=setall scope=spfile;
Then shutdown and startup the database and check if the warning reappears.
An HCVE report (refer to Note 250262.1) should report no remaining issues
If the above doesn't resolve the problem, then increase fs.aio-max-nr
關於檢視aio的內容可以使用如下的方式來檢視。
cat /proc/sys/fs/aio-max-nr
/sbin/sysctl -a |grep aio
當時得到的值是
fs.aio-max-nr = 65536
fs.aio-nr = 65472
可能一般來說問題處理到這個地方就告一段落了。
這個問題當時是叫給一個資深的專家來做的調優,他又要了一些其他的資訊。
最後給出的建議如下:
- Increase fs.aio-max-nr as advised in my initial e-mail to 3145728
- Change Oracle filesystemio_options to “SETALL” (enable both asynch and direct IO)
Following is suggested:
- VXFS – change mount options to:
mincache=direct,convosync=direct (Must be done along with the change of the Oracle filesystemio_options)
- Enable ODM
Pending upon the VXFS version – and True VXFS license supporting ODM (should be by default with VXFS 5.X)
Enable Veritas ODM - by doing “ln –s /opt/VRTSodm/lib64/libodm.so $ORACLE_HOME/lib/libodm10.so”
- Memory
As server has sufficient memory – consult with DBAs if it is worthwhile increasing Oracle cache
- Power
Consider disabling power management savings for better response / latency (BIOS level)
- Disable C-State
RH 5.X do not support C-State (BIOS level)
Kernel parameters:
vm.min_free_kbytes = 32768
vm.dirty_expire_centisecs = 500
vm.dirty_ratio = 15
vm.dirty_writeback_centisecs = 100
vm.swappiness = 0
vm.hugetlb_shm_group = <set to group id of user Oracle)
vm.nr_hugepages = 16384 # set hugepages to 32 GB - if total of Oracle instances cache grows beyond must increase it accordingly it is 2 MB pages)
# any increase requires a reboot !!
其中關於huge page的部分還是有點意思。其中關於設定的Hugepage數還可以透過一個指令碼來計算。
當時看得雲裡霧裡,詳細的學習了一下關於hugepage的設定,還是有所收穫。
其實在官方文件中已經有詳細的描述。hugepage的配置可以參考文件。
發現官方文件還是很有權威性和價值的。
計算hugepage的指令碼如下:
#!/bin/bash # # hugepages_settings.sh # # Linux bash script to compute values for the # recommended HugePages/HugeTLB configuration # # Note: This script does calculation for all shared memory # segments available when the script is run, no matter it # is an Oracle RDBMS shared memory segment or not. # Check for the kernel version KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'` # Find out the HugePage size HPG_SZ=`grep Hugepagesize /proc/meminfo | awk {'print $2'}` # Start from 1 pages to be on the safe side and guarantee 1 free HugePage NUM_PG=1 # Cumulative number of pages required to handle the running shared memory segments for SEG_BYTES in `ipcs -m | awk {'print $5'} | grep "[0-9][0-9]*"` do MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q` if [ $MIN_PG -gt 0 ]; then NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q` fi done # Finish with results case $KERN in '2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`; echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;; '2.6') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;; *) echo "Unrecognized kernel version $KERN. Exiting." ;; esac # End
所以問題的分析和解決思路還是最重要的,可能大多數時候碰到問題都是頭疼醫頭,腳疼醫腳。處理問題還是沒有從整體的情況來做,以點帶面,能夠舉一反三才是問題處理的高手。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25462274/viewspace-1963454/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- alert日誌中的一條ora警告資訊的分析
- 一條看似平常的報警郵件所做的分析
- 一條sql語句的建議調優分析SQL
- 有條件分析oracle日誌Oracle
- 一條大sql的調優SQL
- oracle 11g的警告日誌Oracle
- Oracle之外部表警告日誌Oracle
- alert日誌中出現ash size的警告
- 刪除oracle 11g的警告日誌和監聽日誌Oracle
- 日誌分析-apache日誌分析Apache
- oracle效能調優:管理oracle日誌之調整線上日誌檔案Oracle
- 由sqlplus中的一個小細節所做的折騰SQL
- 警告日誌檔案alert_.log
- 使用Oracle的外部表查詢警告日誌Oracle
- 一條日誌訊息的現代生活
- 日誌分析一例
- 分析一段日誌
- log buffer及日誌管理深入分析及效能調整(一)
- ELK一個優秀的日誌收集、搜尋、分析的解決方案
- 對一次 GC日誌的分析GC
- [日誌分析篇]-利用ELK分析jumpserver日誌-日誌拆分篇Server
- ASM的優點總結--關於日誌檔案調整ASM
- 使用Oracle的外部表查詢警告日誌檔案Oracle
- 【11g】11g 中警告日誌的位置
- 使用外部表訪問警告日誌檔案
- Beego框架的一條神祕日誌引發的思考Go框架
- oracle日誌分析從列表中移去一個日誌檔案Oracle
- 玄機-第二章日誌分析-apache日誌分析Apache
- Apche日誌系列(4):日誌分析(轉)
- SAP 錯誤日誌的調查
- Android優化系列一:日誌清理Android優化
- FDOAGENT日誌分析
- crash日誌分析
- awk分析日誌
- pg日誌分析
- Oracle 警告日誌 (alert log) 中包含哪些內容 ?Oracle
- [awstats]一個基於perl的日誌分析工具
- MySQL慢日誌功能分析及優化增強MySql優化