nagios使用check_oracle_health配置文件

fei890910發表於2014-03-12
環境:192.168.10.101(監控機)
      192.168.10.10(被監控機)上面跑著oracle資料庫。

準備工作 在資料庫上建立使用者並賦予許可權
CREATE USER nagios IDENTIFIED BY oradbmon; 
GRANT CREATE SESSION TO nagios;
GRANT SELECT any dictionary TO nagios;
GRANT SELECT ON V_$SYSSTAT TO nagios;
GRANT SELECT ON V_$INSTANCE TO nagios;
GRANT SELECT ON V_$LOG TO nagios;
GRANT SELECT ON SYS.DBA_DATA_FILES TO nagios;
GRANT SELECT ON SYS.DBA_FREE_SPACE TO nagios;

GRANT SELECT ON sys.dba_tablespaces TO nagios;
GRANT SELECT ON dba_temp_files TO nagios;
GRANT SELECT ON sys.v_$Temp_extent_pool TO nagios;
GRANT SELECT ON sys.v_$TEMP_SPACE_HEADER  TO nagios;
GRANT SELECT ON sys.v_$session TO nagios;

1、檢視被監控是否安裝了perl?並且被監控機安裝DBI
輸入perl -v,出現以下資訊則說明已安裝
This is perl, v5.8.8 built for x86_64-linux-thread-multi
Copyright 1987-2006, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at the Perl Home Page.
下載DBI
wget
tar zxvf DBI-1.609.tar.gz 
cd DBI-1.609
perl Makefile.PL 
make all
make install
2、沒有報錯我們進行下一步安裝DBD-Oracle
wget
tar zxvf DBD-Oracle-1.52.tar.gz 
cd DBD-Oracle-1.52
perl Makefile.PL
執行上述命令你肯定會遇到如下錯誤:
Using DBI 1.605 (for perl 5.008005 on i386-linux-thread-multi) installed in 

/usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi/auto/DBI/ 
Configuring DBD::Oracle for perl 5.008005 on linux (i386-linux-thread-multi) 
Remember to actually *READ* the README file! Especially if you have any problems. 
Trying to find an ORACLE_HOME
Your LD_LIBRARY_PATH env var is set to '' 
      The ORACLE_HOME environment variable is not set and I couldn't guess it.
      It must be set to hold the path to an Oracle installation directory
      on this machine (or a machine with a compatible architecture).
      See the appropriate README file for your OS for more information.
      ABORTED!
然後你需要設定你的臨時ORACLE_HOME變數,參考你的oracle使用者的環境變數,貼上下面的語句:
export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
再執行perl Makefile.PL就OK了
make 
make install
3、被監控機最後一步開始安裝主角了,check_oracle_health
wget
tar zxvf check_oracle_health-1.6.3.tar.gz 
cd check_oracle_health-1.6.3
./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-
mymodules-dir=/usr/local/nagios/libexec --with-mymodules-dyndir=/usr/local/nagios/libexec
make all
make install
上面的步驟注意寫你自己的nagios安裝路徑。
檢視被監控機/usr/local/nagios/libexec目錄下外掛check_oracle_health是否有了?

4、切換到oracle使用者,試執行一下這個外掛看看?注意這裡資料庫最好開監聽
/usr/local/nagios/libexec/check_oracle_health --connect=你oracle的SID --user=oracle使用者 --password=oracle密碼 
--mode=tnsping
輸出如下資訊說明沒有問題:
OK - connection established to 你oracle的SID.
或者你可以把最後的--mode=tnsping換成--mode=tablespace-usage試試看是否能檢視所有表空間了?

5、上面是oracle使用者執行沒有任何問題,但是我們是root執行的,所以必須把oracle使用者下的所有變數加入到root使用者的變
量下,再嘗試上面的第4步看看是否有問題?沒問題則說明OK了!有問題則說明環境變數沒加好!

6、被監控測試自己是沒問題了,如何讓監控機去呼叫這個指令碼呢?在被監控上面的nrpe.cfg檔案加入如下內容:
vi /usr/local/nagios/etc/nrpe.cfg 我先加了三個服務      
command[check_oracle_health]=/usr/local/nagios/libexec/check_oracle_health --connect=你oracle的SID --
user=oracle使用者 --password=oracle密碼 --mode=tablespace-usage

command[check_oracle_health_tbs]=/usr/local/nagios/libexec/check_oracle_health --connect=prod --user=nagios 
--password=oradbmon --mode=tablespace-usage
command[check_oracle_health_tnsping]=/usr/local/nagios/libexec/check_oracle_health --connect=prod --
user=nagios --password=oradbmon --mode=tnsping
command[check_oracle_health_soft]=/usr/local/nagios/libexec/check_oracle_health --connect=prod --user=nagios 
--password=oradbmon --mode=soft-parse-ratio
儲存後退出,然後我們重啟被監控的nrpe服務
[root@James10g etc]# /etc/rc.d/init.d/xinetd restart
Stopping xinetd:                                           [  OK  ]
Starting xinetd:                                           [  OK  ]

/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
7、修改監控機的/usr/local/nagios/etc/objects下的兩個檔案,增加如下內容:
/usr/local/nagios/etc/objects/hosts.cfg
define host {  
        use                      linux-server  
        host_name        James10g_oracle  
        alias                    Oracle_10g  
        address              192.168.10.10
/usr/local/nagios/etc/objects/services.cfg
define service{
        use                                     generic-service         ; Name of service template to use
        host_name                       James10g_oracle
        service_description          check-oracle-tablespace
        check_command              check_nrpe!check_oracle_health_tbs
        }
define service{
        use                                    generic-service         ; Name of service template to use
        host_name                       James10g_oracle
        service_description          check-oracle-tnsping
        check_command              check_nrpe!check_oracle_health_tnsping
        }
define service{
        use                                    generic-service         ; Name of service template to use
        host_name                       James10g_oracle
        service_description          check-oracle-soft-parse-ratio
        check_command              check_nrpe!check_oracle_health_soft
        }

8、下面我們該到監控上去檢查這個外掛
/usr/local/nagios/libexec/check_nrpe -H 你的被監控機IP地址 -c check_oracle_health
[root@node1 objects]# /usr/local/nagios/libexec/check_nrpe -H 192.168.10.10 -c check_oracle_health_tbs
OK - tbs USERS usage is 4.87%
tbs UNDOTBS1 usage is 0.00%
tbs TOOLS usage is 54.94%
tbs TEMP usage is 0.00%
tbs SYSTEM usage is 1.47%
tbs SYSAUX usage is 0.77%
tbs EXAMPLE usage is 0.21% | 'tbs_users_usage_pct'=4.87%;90;98
'tbs_users_usage'=1597MB;29491;32112;0;32767
'tbs_users_alloc'=1601MB;;;0;32767
'tbs_undotbs1_usage_pct'=0.00%;90;98
'tbs_undotbs1_usage'=0MB;29491;32112;0;32767
'tbs_undotbs1_alloc'=1135MB;;;0;32767
'tbs_tools_usage_pct'=54.94%;90;98
'tbs_tools_usage'=164MB;270;294;0;300
'tbs_tools_alloc'=300MB;;;0;300
'tbs_temp_usage_pct'=0.00%;90;98
'tbs_temp_usage'=0MB;29491;32112;0;32767
'tbs_temp_alloc'=462MB;;;0;32767
'tbs_system_usage_pct'=1.47%;90;98
'tbs_system_usage'=481MB;29491;32112;0;32767
'tbs_system_alloc'=490MB;;;0;32767
'tbs_sysaux_usage_pct'=0.77%;90;98
'tbs_sysaux_usage'=253MB;29491;32112;0;32767
'tbs_sysaux_alloc'=260MB;;;0;32767
'tbs_example_usage_pct'=0.21%;90;98
'tbs_example_usage'=68MB;29491;32112;0;32767
'tbs_example_alloc'=100MB;;;0;32767
[root@node1 objects]# /usr/local/nagios/libexec/check_nrpe -H 192.168.10.10 -c check_oracle_health_tnsping
OK - connection established to prod.

如果出現錯誤 NRPE: Command 'check_oracle_health_tbs' not defined
    提示錯誤:NRPE: Command 'check_oracle_health_tbs' not defined
    這是因為沒有配置好兩端的NRPE和Nagios,使得monitoring server不能遠端執行check_disk命令.
    在被監控伺服器端,需要修改nrpe.cfg檔案:
    dont_blame_nrpe=1
    這將允許命令帶引數執行.
重啟nagios            
[root@node1 objects]# /etc/init.d/nagios restart

最後在介面上顯示下圖就就差不多了
                                                                 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29108064/viewspace-1108306/,如需轉載,請註明出處,否則將追究法律責任。

相關文章