Nagios監控系統搭建

stonebox1122發表於2015-03-15

 

1 環境設定

1.1 主機資訊

主機名

版本

IP

安裝軟體

用途

linuxserver

RHEL 6.5

192.168.230.136

nagios-4.0.8

nagios-plugins-2.0.3

nrpe-2.15

監控伺服器

linuxclient

RHEL 6.5

192.168.230.137

nagios-plugins-2.0.3

nrpe-2.15

被監控客戶端

參照http://blog.itpub.net/28536251/viewspace-1444918/配置好YUM

伺服器端網路配置:

[root@linuxserver ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0

HWADDR=00:0c:29:68:ab:26

TYPE=Ethernet

ONBOOT=yes

NM_CONTROLLED=no

BOOTPROTO=none

IPADDR=192.168.230.136

NETMASK=255.255.255.0

GATEWAY=192.168.230.2

DNS1=192.168.230.2

[root@linuxserver ~]# cat /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=linuxserver

[root@linuxserver ~]# cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

192.168.230.136   linuxserver

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

 

客戶端網路配置:

[root@linuxclient ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0

HWADDR=00:0c:29:ba:4f:56

TYPE=Ethernet

ONBOOT=yes

NM_CONTROLLED=no

BOOTPROTO=none

IPADDR=192.168.230.137

NETMASK=255.255.255.0

GATEWAY=192.168.230.2

DNS1=192.168.230.2

[root@linuxclient ~]# cat /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=linuxclient

[root@linuxclient ~]# cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

192.168.230.137   linuxclient

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

 

1.2 防火牆設定

為方便測試,暫時先關閉伺服器及客戶端的防火牆,可在系統搭建完成後,再開啟並設定防火牆策略。

伺服器端防火牆關閉:

[root@linuxserver ~]# /etc/init.d/iptables stop

iptables: Setting chains to policy ACCEPT: filter          [  OK  ]

iptables: Flushing firewall rules:                         [  OK  ]

iptables: Unloading modules:                               [  OK  ]

[root@linuxserver ~]# /etc/init.d/iptables status

iptables: Firewall is not running.

[root@linuxserver ~]# chkconfig iptables off

[root@linuxserver ~]# chkconfig --list iptables

iptables        0:off   1:off   2:off   3:off   4:off   5:off   6:off

 

客戶端防火牆關閉:

[root@linuxclient ~]# /etc/init.d/iptables stop

iptables: Setting chains to policy ACCEPT: filter          [  OK  ]

iptables: Flushing firewall rules:                         [  OK  ]

iptables: Unloading modules:                               [  OK  ]

[root@linuxclient ~]# /etc/init.d/iptables status

iptables: Firewall is not running.

[root@linuxclient ~]# chkconfig iptables off

[root@linuxclient ~]# chkconfig --list iptables

iptables        0:off   1:off   2:off   3:off   4:off   5:off   6:off

 

1.3 SELinux設定

    為方便測試,暫時先禁用伺服器及客戶端的SELinux,可在系統搭建完成後,再開啟並設定SELinux策略。

伺服器端SELinux禁用:

使用vim編輯/etc/selinux/config,將SELINUX=enforcing修改為SELINUX=disabled,然後重啟。

[root@linuxserver ~]# cat /etc/selinux/config

# This file controls the state of SELinux on the system.

# SELINUX= can take one of these three values:

#     enforcing - SELinux security policy is enforced.

#     permissive - SELinux prints warnings instead of enforcing.

#     disabled - No SELinux policy is loaded.

SELINUX=disabled

# SELINUXTYPE= can take one of these two values:

#     targeted - Targeted processes are protected,

#     mls - Multi Level Security protection.

SELINUXTYPE=targeted

[root@linuxserver ~]# init 6

 

客戶端SELinux禁用:

[root@linuxclient ~]# cat /etc/selinux/config

# This file controls the state of SELinux on the system.

# SELINUX= can take one of these three values:

#     enforcing - SELinux security policy is enforced.

#     permissive - SELinux prints warnings instead of enforcing.

#     disabled - No SELinux policy is loaded.

SELINUX=disabled

# SELINUXTYPE= can take one of these two values:

#     targeted - Targeted processes are protected,

#     mls - Multi Level Security protection.

SELINUXTYPE=targeted

[root@linuxclient ~]# init 6

 

2 Nagios監控linuxserver的安裝配置

2.1 基礎支援套件安裝

[root@linuxserver ~]# yum install -y gcc glibc glibc-common gd gd-devel xinetd openssl-devel

 

2.2 安裝httpdphp

[root@linuxserver ~]# yum install -y httpd php

[root@linuxserver ~]# /etc/init.d/httpd start

Starting httpd: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.230.136 for ServerName

                                                           [  OK  ]

    根據上面的提示,修改httpd配置檔案,結果如下:

[root@linuxserver ~]# grep ServerName /etc/httpd/conf/httpd.conf | grep 80

ServerName linuxserver:80

重啟httpd正常。

[root@linuxserver ~]# /etc/init.d/httpd restart

Stopping httpd:                                            [  OK  ]

Starting httpd:                                            [  OK  ]

使用瀏覽器訪問伺服器地址http://192.168.230.136/,出現如下頁面說明httpd安裝ok

clip_image002

    輸入如下命令生成php測試頁。然後使用瀏覽器訪問http://192.168.230.136/phpinfo.php,出現如下頁面說明php安裝ok

[root@linuxserver ~]# echo "" > /var/www/html/phpinfo.php

clip_image004

 

2.3 使用者及目錄設定

[root@linuxserver ~]# useradd -s /sbin/nologin nagios

[root@linuxserver ~]# mkdir /usr/local/nagios

[root@linuxserver ~]# chown -R nagios:nagios /usr/local/nagios

[root@linuxserver ~]# ll -d /usr/local/nagios

drwxr-xr-x 2 nagios nagios 4096 Mar 14 11:17 /usr/local/nagios

 

2.4 安裝Nagios

    Nagios涉及的安裝包,可在上下載。

[root@linuxserver ~]# tar -xvzf nagios-4.0.8.tar.gz

[root@linuxserver ~]# cd nagios-4.0.8

[root@linuxserver nagios-4.0.8]# ./configure --prefix=/usr/local/nagios/

[root@linuxserver nagios-4.0.8]# make all

[root@linuxserver nagios-4.0.8]# make install

# This installs the main program, CGIs, and HTML files

[root@linuxserver nagios-4.0.8]# make install-init

# This installs the init script in /etc/rc.d/init.d

[root@linuxserver nagios-4.0.8]# make install-commandmode

#This installs and configures permissions on the directory for holding the external command file

[root@linuxserver nagios-4.0.8]# make install-config

# This installs *SAMPLE* config files in /usr/local/nagios/etc

[root@linuxserver nagios-4.0.8]# make install-webconf

# This installs the Apache config file for the Nagios web interface

 

    進入安裝目錄,如出現下表的6個目錄,則說明安裝ok

[root@linuxserver nagios-4.0.8]# cd /usr/local/nagios/

[root@linuxserver nagios]# ls

bin  etc  libexec  sbin  share  var

序號

目錄名稱

用途

1

bin

可執行程式所在目錄

2

etc

配置檔案所在目錄

3

libexec

外部外掛所在目錄

4

sbin

CGI 檔案所在目錄

5

share

網頁檔案所在的目錄

6

var

日誌檔案所在的目錄

 

   新增nagios服務

[root@linuxserver ~]# chkconfig --add nagios

[root@linuxserver ~]# chkconfig nagios on

[root@linuxserver ~]# chkconfig --list nagios

nagios          0:off   1:off   2:on    3:on    4:on    5:on    6:off

 

由於/etc/httpd/conf.d/nagios.conf定義了訪問認證檔案,故需要建立訪問認證檔案及使用者名稱和密碼,以便透過web訪問nagios進行身份驗證,使用者名稱建議採用nagiosadmin,原因後面再講。

[root@linuxserver ~]# cat /etc/httpd/conf.d/nagios.conf

# SAMPLE CONFIG SNIPPETS FOR APACHE WEB SERVER

#

# This file contains examples of entries that need

# to be incorporated into your Apache web server

# configuration file.  Customize the paths, etc. as

# needed to fit your system.

 

ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"

 

#  SSLRequireSSL

   Options ExecCGI

   AllowOverride None

   Order allow,deny

   Allow from all

#  Order deny,allow

#  Deny from all

#  Allow from 127.0.0.1

   AuthName "Nagios Access"

   AuthType Basic

   AuthUserFile /usr/local/nagios/etc/htpasswd.users

   Require valid-user

 

Alias /nagios "/usr/local/nagios/share"

 

#  SSLRequireSSL

   Options None

   AllowOverride None

   Order allow,deny

   Allow from all

#  Order deny,allow

#  Deny from all

#  Allow from 127.0.0.1

   AuthName "Nagios Access"

   AuthType Basic

   AuthUserFile /usr/local/nagios/etc/htpasswd.users

   Require valid-user

 

[root@linuxserver ~]# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

New password:

Re-type new password:

Adding password for user nagiosadmin

 

    啟動nagios

[root@linuxserver ~]# /etc/init.d/nagios start

Starting nagios: done.

重啟httpd

[root@linuxserver ~]# /etc/init.d/httpd restart

Stopping httpd:                                            [  OK  ]

Starting httpd:                                            [  OK  ]

使用瀏覽器訪問http://192.168.230.136/nagios/,輸入使用者名稱和密碼,出現如下介面,說明nagios安裝ok

clip_image006

 

2.5 配置nagios

    Nagios的配置檔案位於/usr/local/nagios/etc/,各檔案具體用途如下表:

[root@linuxserver ~]# tree /usr/local/nagios/etc/

/usr/local/nagios/etc/

├── cgi.cfg

├── htpasswd.users

├── nagios.cfg

├── objects

   ├── commands.cfg

   ├── contacts.cfg

   ├── localhost.cfg

   ├── printer.cfg

   ├── switch.cfg

   ├── templates.cfg

   ├── timeperiods.cfg

   └── windows.cfg

└── resource.cfg

序號

檔名

用途

1

cgi.cfg

控制CGI訪問的配置檔案

2

nagios.cfg

主配置檔案

3

resource.cfg

變數定義檔案,定義變數,以便由其他配置檔案引用,如$USER1

4

commands.cfg

命令定義配置檔案,其中定義的命令可以被其他配置檔案引用

5

contacts.cfg

定義聯絡人和聯絡人組

6

localhost.cfg

監控本地主機的配置檔案

7

printer.cfg

定義監控印表機的一個配置檔案模板,預設沒有啟用此檔案

8

switch.cfg

定義監控路由器的一個配置檔案模板,預設沒有啟用此檔案

9

templates.cfg

定義主機和服務的一個模板配置檔案,可以在其他配置檔案中引用

10

timeperiods.cfg

定義Nagios 監控時間段的配置檔案

11

windows.cfg

監控Windows 主機的一個配置檔案模板,預設沒有啟用此檔案

下面對幾個重要的配置檔案進行說明。

2.5.1 nagios.cfg

[root@linuxserver ~]#  grep -v '^#' /usr/local/nagios/etc/nagios.cfg | grep -v '^$'

log_file=/usr/local/nagios/var/nagios.log

cfg_file=/usr/local/nagios/etc/objects/commands.cfg

cfg_file=/usr/local/nagios/etc/objects/contacts.cfg

cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg

cfg_file=/usr/local/nagios/etc/objects/templates.cfg

cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

object_cache_file=/usr/local/nagios/var/objects.cache

precached_object_file=/usr/local/nagios/var/objects.precache

resource_file=/usr/local/nagios/etc/resource.cfg

status_file=/usr/local/nagios/var/status.dat

status_update_interval=10

nagios_user=nagios

nagios_group=nagios

(省略了部分引數)

    nagios.cfgnagios的核心配置檔案,其中cfg_file變數用來引用物件配置檔案,如果有更多的物件配置檔案,須新增到此配置檔案才能生效。

 

2.5.2 cgi.cfg

[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/cgi.cfg | grep -v '^$'

main_config_file=/usr/local/nagios/etc/nagios.cfg

physical_html_path=/usr/local/nagios/share

url_html_path=/nagios

show_context_help=0

use_pending_states=1

use_authentication=1

use_ssl_authentication=0

authorized_for_system_information=nagiosadmin

authorized_for_configuration_information=nagiosadmin

authorized_for_system_commands=nagiosadmin

authorized_for_all_services=nagiosadmin

authorized_for_all_hosts=nagiosadmin

authorized_for_all_service_commands=nagiosadmin

authorized_for_all_host_commands=nagiosadmin

(省略了部分引數)

    該配置檔案中的authorized*引數的值預設均為nagiosadmin,故前面為nagiosadmin生成密碼檔案,如果是使用其他的使用者名稱,則此處就需要在nagiosadmin後面加上其他的使用者名稱。各引數的含義參考配置檔案中的註釋。

 

2.5.3 resource.cfg

[root@linuxserver ~]#  grep -v '^#' /usr/local/nagios/etc/resource.cfg | grep -v '^$'

$USER1$=/usr/local/nagios/libexec

    該配置檔案中的變數$USER1$指定了安裝nagios外掛的路徑。

 

2.5.4 localhost.cfg

[root@linuxserver ~]#  grep -v '^#' /usr/local/nagios/etc/objects/localhost.cfg | grep -v '^$'

define host{

        use                     linux-server

        host_name               localhost

        alias                    localhost

        address                 127.0.0.1

        }

define hostgroup{

        hostgroup_name         linux-servers

        alias                   Linux Servers

        members                localhost

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                PING

        check_command                  check_ping!100.0,20%!500.0,60%

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                Root Partition

        check_command                  check_local_disk!20%!10%!/

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                Current Users

        check_command                  check_local_users!20!50

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                Total Processes

        check_command                  check_local_procs!250!400!RSZDT

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                Current Load

        check_command                  check_local_load!5.0,4.0,3.0!10.0,6.0,4.0

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                Swap Usage

        check_command                  check_local_swap!20!10

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                SSH

        check_command                  check_ssh

        notifications_enabled              0

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                HTTP

        check_command                  check_http

        notifications_enabled              0

        }

    該配置檔案定義本機監控的引數及服務。其中,“linux-server”為在templates.cfg定義的主機模版,“local-service”為在templates.cfg定義的服務模版。

 

2.5.5 templates.cfg

[root@linuxserver ~]# sed -i 's/;.*$//g' /usr/local/nagios/etc/objects/templates.cfg

[root@linuxserver ~]#  grep -v '^#' /usr/local/nagios/etc/objects/templates.cfg | grep -v '^$'

define contact{

        name                            generic-contact

        service_notification_period     24x7

        host_notification_period        24x7

        service_notification_options    w,u,c,r,f,s

        host_notification_options       d,u,r,f,s

        service_notification_commands   notify-service-by-email

        host_notification_commands      notify-host-by-email

        register                        0

        }

define host{

        name                            generic-host

        notifications_enabled           1

        event_handler_enabled           1

        flap_detection_enabled          1

        process_perf_data               1

        retain_status_information       1

        retain_nonstatus_information    1

        notification_period             24x7

        register                        0

        }

define host{

        name                            linux-server

        use                             generic-host

        check_period                    24x7

        check_interval                  5

        retry_interval                  1

        max_check_attempts              10

        check_command                   check-host-alive

        notification_period             workhours

 

 

        notification_interval           120

        notification_options            d,u,r

        contact_groups                  admins

        register                        0

        }

define host{

        name                    windows-server

        use                     generic-host

        check_period            24x7

        check_interval          5

        retry_interval          1

        max_check_attempts      10

        check_command           check-host-alive

        notification_period     24x7

        notification_interval   30

        notification_options    d,r

        contact_groups          admins

        hostgroups              windows-servers

        register                0

        }

define host{

        name                    generic-printer

        use                     generic-host

        check_period            24x7

        check_interval          5

        retry_interval          1

        max_check_attempts      10

        check_command           check-host-alive

        notification_period     workhours

        notification_interval   30

        notification_options    d,r

        contact_groups          admins

        register                0

        }

define host{

        name                    generic-switch

        use                     generic-host

        check_period            24x7

        check_interval          5

        retry_interval          1

        max_check_attempts      10

        check_command           check-host-alive

        notification_period     24x7

        notification_interval   30

        notification_options    d,r

        contact_groups          admins

        register                0

        }

define service{

        name                            generic-service

        active_checks_enabled           1

        passive_checks_enabled          1

        parallelize_check               1

        obsess_over_service             1

        check_freshness                 0

        notifications_enabled           1

        event_handler_enabled           1

        flap_detection_enabled          1

        process_perf_data               1

        retain_status_information       1

        retain_nonstatus_information    1

        is_volatile                     0

        check_period                    24x7

        max_check_attempts              3

        normal_check_interval           10

        retry_check_interval            2

        contact_groups                  admins

        notification_options            w,u,c,r

        notification_interval           60

        notification_period             24x7

         register                        0

        }

define service{

        name                            local-service

        use                             generic-service

        max_check_attempts              4

        normal_check_interval           5

        retry_check_interval            1

        register                        0

        }

templates.cfg

    該配置檔案定義通知,主機及服務模版。

 

2.5.6 commands.cfg

[root@linuxserver ~]#  grep -v '^#' /usr/local/nagios/etc/objects/commands.cfg | grep -v '^$'

define command{

        command_name    notify-host-by-email

        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$

        }

define command{

        command_name    notify-service-by-email

        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$

        }

define command{

        command_name    check-host-alive

        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5

        }

define command{

        command_name    check_local_disk

        command_line    $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

        }

define command{

        command_name    check_local_load

        command_line    $USER1$/check_load -w $ARG1$ -c $ARG2$

        }

define command{

        command_name    check_local_procs

        command_line    $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$

        }

define command{

        command_name    check_local_users

        command_line    $USER1$/check_users -w $ARG1$ -c $ARG2$

        }

define command{

        command_name    check_local_swap

        command_line    $USER1$/check_swap -w $ARG1$ -c $ARG2$

        }

define command{

        command_name    check_local_mrtgtraf

        command_line    $USER1$/check_mrtgtraf -F $ARG1$ -a $ARG2$ -w $ARG3$ -c $ARG4$ -e $ARG5$

        }

define command{

        command_name    check_ftp

        command_line    $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_hpjd

        command_line    $USER1$/check_hpjd -H $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_snmp

        command_line    $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_http

        command_line    $USER1$/check_http -I $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_ssh

        command_line    $USER1$/check_ssh $ARG1$ $HOSTADDRESS$

        }

define command{

        command_name    check_dhcp

        command_line    $USER1$/check_dhcp $ARG1$

        }

define command{

        command_name    check_ping

        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

        }

define command{

        command_name    check_pop

        command_line    $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_imap

        command_line    $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_smtp

        command_line    $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_tcp

        command_line    $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$

        }

define command{

        command_name    check_udp

        command_line    $USER1$/check_udp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$

        }

define command{

        command_name    check_nt

        command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$

        }

define command{

        command_name    process-host-perfdata

        command_line    /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /usr/local/nagios/var/host-perfdata.out

        }

define command{

        command_name    process-service-perfdata

        command_line    /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios/var/service-perfdata.out

        }

    此配置檔案定義監控服務使用的命令名稱及命令,引用了resource.cfg中對$USER1$的定義,在localhost.cfg中引用了其中的一些命令。

 

2.5.7 contacts.cfg

[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/objects/contacts.cfg | grep -v '^$'

define contact{

        contact_name                    nagiosadmin           

        use                             generic-contact       

        alias                           Nagios Admin          

        email                           nagios@localhost      

        }

define contactgroup{

        contactgroup_name       admins

        alias                   Nagios Administrators

        members                 nagiosadmin

        }

    此配置檔案引用了templates.cfggeneric-contact的定義。

 

2.5.8 timeperiods.cfg

[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/objects/timeperiods.cfg | grep -v '^$'

define timeperiod{

        timeperiod_name 24x7

        alias           24 Hours A Day, 7 Days A Week

        sunday          00:00-24:00

        monday          00:00-24:00

        tuesday         00:00-24:00

        wednesday       00:00-24:00

        thursday        00:00-24:00

        friday          00:00-24:00

        saturday        00:00-24:00

        }

define timeperiod{

        timeperiod_name workhours

        alias           Normal Work Hours

        monday          09:00-17:00

        tuesday         09:00-17:00

        wednesday       09:00-17:00

        thursday        09:00-17:00

        friday          09:00-17:00

        }

define timeperiod{

        timeperiod_name none

        alias           No Time Is A Good Time

        }

define timeperiod{

        name                    us-holidays

        timeperiod_name         us-holidays

        alias                   U.S. Holidays

        january 1               00:00-00:00    

        monday -1 may           00:00-00:00    

        july 4                  00:00-00:00    

        monday 1 september      00:00-00:00    

        thursday 4 november     00:00-00:00     

        december 25             00:00-00:00    

        }

define timeperiod{

        timeperiod_name 24x7_sans_holidays

        alias           24x7 Sans Holidays

        use             us-holidays             ; Get holiday exceptions from other timeperiod

        sunday          00:00-24:00

        monday          00:00-24:00

        tuesday         00:00-24:00

        wednesday       00:00-24:00

        thursday        00:00-24:00

        friday          00:00-24:00

        saturday        00:00-24:00

        }

此配置檔案定義監控時間段,目前只使用第一個“24X7”。

 

 

2.6 安裝Nagios外掛

安裝配置好nagios後,使用瀏覽器訪問可以看到監控伺服器上的服務都沒有監控到,這是由於/usr/local/nagios/libexec/目錄下還沒有安裝外部外掛程式,接下來就安裝nagios外掛程式nagios-plugins

[root@linuxserver ~]# ll /usr/local/nagios/libexec/

total 0

clip_image008

[root@linuxserver ~]# tar -xvzf nagios-plugins-2.0.3.tar.gz

[root@linuxserver ~]# cd nagios-plugins-2.0.3

[root@linuxserver nagios-plugins-2.0.3]# ./configure --prefix=/usr/local/nagios/

[root@linuxserver nagios-plugins-2.0.3]# make && make install

再次檢視/usr/local/nagios/libexec/目錄可以看到增加了很多外部外掛程式。

重啟nagios後重新整理頁面,就可以看到監控伺服器上面的服務狀態了。

clip_image010

如果HTTP服務出現“HTTP WARNING: HTTP/1.1 403 Forbidden”報錯,原因是nagios監控HTTP時,會監控到/var/www/html/下面的index.html檔案,若沒有就會提示錯誤,建立一個檔案即可!

[root@linuxserver ~]# touch /var/www/html/index.html

[root@linuxserver ~]# /etc/init.d/httpd restart

Stopping httpd:                                            [  OK  ]

Starting httpd:                                            [  OK  ]

 

3 Nagios監控linuxclient的安裝配置

3.1 原理

    監控伺服器透過叫NRPE的附加元件對客戶端進行監控。

clip_image011

NRPE 總共由兩部分組成:

·         check_nrpe 外掛,位於監控主機上

·         NRPE daemon,執行在遠端的Linux主機上(通常就是被監控機)

按照上圖,整個的監控過程如下:

Nagios 需要監控某個遠端Linux 主機的服務或者資源情況時:

1.    Nagios 會執行check_nrpe 這個外掛,告訴它要檢查什麼;

2.    check_nrpe 外掛會連線到遠端的NRPE daemon,所用的方式是SSL

3.    NRPE daemon 會執行相應的Nagios 外掛來執行檢查;

4.    NRPE daemon 將檢查的結果返回給check_nrpe 外掛,外掛將其遞交給nagios做處理。

注意:NRPE daemon 需要Nagios 外掛安裝在遠端的Linux主機上,否則,daemon不能做任何的監控

 

3.2 客戶端基礎支援套件安裝

 [root@linuxserver ~]# yum install -y gcc glibc glibc-common gd gd-devel xinetd openssl-devel

 

3.3 客戶端使用者設定

[root@linuxclient ~]# useradd nagios

[root@linuxclient ~]# passwd nagios

Changing password for user nagios.

New password:

BAD PASSWORD: it is too simplistic/systematic

BAD PASSWORD: is too simple

Retype new password:

passwd: all authentication tokens updated successfully.

 

3.4 客戶端安裝nagios外掛

[root@linuxclient ~]# tar -xvzf nagios-plugins-2.0.3.tar.gz

[root@linuxclient ~]# cd nagios-plugins-2.0.3

[root@linuxclient nagios-plugins-2.0.3]# ./configure --prefix=/usr/local/nagios

[root@linuxclient nagios-plugins-2.0.3]# make && make install

[root@linuxclient nagios-plugins-2.0.3]# chown nagios:nagios /usr/local/nagios/

[root@linuxclient nagios-plugins-2.0.3]# chown -R nagios:nagios /usr/local/nagios/libexec/

 

3.5 客戶端安裝配置NRPE

[root@linuxclient nagios-plugins-2.0.3]# cd

[root@linuxclient ~]# tar -xvzf nrpe-2.15.tar.gz

[root@linuxclient ~]# cd nrpe-2.15

[root@linuxclient nrpe-2.15]# ./configure

[root@linuxclient nrpe-2.15]# make all

[root@linuxclient nrpe-2.15]# make install-plugin

[root@linuxclient nrpe-2.15]# make install-daemon

[root@linuxclient nrpe-2.15]# make install-daemon-config

[root@linuxclient nrpe-2.15]# make install-xinetd

    編輯/etc/xinetd.d/nrpe,為only-from引數增加監控伺服器地址。

[root@linuxclient nrpe-2.15]# cat /etc/xinetd.d/nrpe

# default: on

# description: NRPE (Nagios Remote Plugin Executor)

service nrpe

{

        flags           = REUSE

        socket_type     = stream

        port            = 5666

        wait            = no

        user            = nagios

        group           = nagios

        server          = /usr/local/nagios/bin/nrpe

        server_args     = -c /usr/local/nagios/etc/nrpe.cfg --inetd

        log_on_failure  += USERID

        disable         = no

        only_from       = 127.0.0.1 192.168.230.136

}

    編輯/etc/services,增加NRPE服務。

[root@linuxclient nrpe-2.15]# tail -1 /etc/services

nrpe        5666/tcp               # nrpe

重啟xinetd服務。

[root@linuxclient nrpe-2.15]# /etc/init.d/xinetd restart

Stopping xinetd:                                           [FAILED]

Starting xinetd:                                           [  OK  ]

 

    檢查nrpe是否啟動成功。

[root@linuxclient nrpe-2.15]# netstat -tunlp | grep 5666

tcp        0      0 :::5666        :::*            LISTEN      42522/xinetd

    可以看到nrpe服務啟動成功,但是是啟動在IPv6上,測試會報如下錯誤:

[root@linuxclient nrpe-2.15]# /usr/local/nagios/libexec/check_nrpe -H localhost

CHECK_NRPE: Error - Could not complete SSL handshake.

    /etc/modprobe.d/dist.conf中增加如下兩行,關閉IPv6,重啟後,再進行測試ok

[root@linuxclient nrpe-2.15]# tail -2 /etc/modprobe.d/dist.conf

alias net-pf-10 off

options ipv6 disable=1

[root@linuxclient nrpe-2.15]# init 6

[root@linuxclient ~]# netstat -tunlp | grep 5666

tcp        0      0 0.0.0.0:5666   0.0.0.0:*         LISTEN      1729/xinetd

[root@linuxclient ~]# /usr/local/nagios/libexec/check_nrpe -H localhost

NRPE v2.15

 

3.6 客戶端NRPE配置檔案

    NRPE的配置檔案為nrpe.cfg,根據實際情況進行修改後內容如下:

[root@linuxclient ~]# grep -v '^#' /usr/local/nagios/etc/nrpe.cfg | grep -v '^$'

log_facility=daemon

pid_file=/var/run/nrpe.pid

server_port=5666

nrpe_user=nagios

nrpe_group=nagios

allowed_hosts=127.0.0.1

 

dont_blame_nrpe=0

allow_bash_command_substitution=0

debug=0

command_timeout=60

connection_timeout=300

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

command[check_sda3]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda3

command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z

command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200   

command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%

    其中增加了swap的監控。客戶端到這兒就配置完了,接下來就需要到伺服器端增加對客戶端的監控內容。

 

3.7 伺服器端安裝NRPE

[root@linuxserver ~]# tar -xvzf nrpe-2.15.tar.gz

[root@linuxserver ~]# cd nrpe-2.15

[root@linuxserver nrpe-2.15]# ./configure

[root@linuxserver nrpe-2.15]# make all

[root@linuxserver nrpe-2.15]# make install-plugin

    測試以下伺服器端的check_nrpe與客戶端的nrpe daemon之間的通訊。

[root@linuxserver nrpe-2.15]# /usr/local/nagios/libexec/check_nrpe -H 192.168.230.137

NRPE v2.15

    返回版本資訊,說明通訊正常。

 

3.8 伺服器端配置檔案修改

3.8.1 command.cfg

    新增check_nrpe命令,命令的用法可以使用check_nrpe –h檢視。

[root@linuxserver etc]# tail -5 objects/commands.cfg

# 'check_nrpe' command definition

define command{

        command_name    check_nrpe

        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

        }

 

3.8.2 services.cfg

    新增加一個services.cfg檔案,新增對linuxclient客戶端監控的監控內容。

[root@linuxserver etc]# cat objects/services.cfg

define service{

        use                     local-service

        host_name               linuxclient

        service_description     check-host-alive

        check_command           check-host-alive

        }

 

define service{

        use                     local-service

        host_name               linuxclient

        service_description     Current Load

        check_command           check_nrpe!check_load

        }

 

define service{

        use                     local-service

        host_name               linuxclient

        service_description     Check Disk sda3

        check_command           check_nrpe!check_sda3

        }

 

define service{

        use                     local-service

        host_name               linuxclient

        service_description     Total Processes

        check_command           check_nrpe!check_total_procs

        }

 

define service{

        use                     local-service

        host_name               linuxclient

        service_description     Current Users

        check_command           check_nrpe!check_users

        }

 

define service{

        use                     local-service

        host_name               linuxclient

        service_description     Check Zombie Procs

        check_command           check_nrpe!check_zombie_procs

        }

 

define service{

        use                     local-service

        host_name               linuxclient

        service_description     Check Swap

        check_command           check_nrpe!check_swap

        }

3.8.3 hosts.cfg

    新增加一個hosts.cfg檔案,定義被監控客戶端的地址及相關屬性資訊。

[root@linuxserver etc]# cat /usr/local/nagios/etc/objects/hosts.cfg

define host{

        use                     linux-server

        host_name               linuxclient

        alias                   linuxclient

        address                 192.168.230.137

        }

 

define hostgroup{

        hostgroup_name          bsmart-servers

        alias                   bsmart servers

        members                 linuxclient

        }

 

3.8.4 nagios.cfg

    nagios.cfg中增加services.cfghosts.cfg配置檔案條目。

[root@linuxserver etc]# grep 'hosts.cfg' nagios.cfg

cfg_file=/usr/local/nagios/etc/objects/hosts.cfg

[root@linuxserver etc]# grep 'services.cfg' nagios.cfg

cfg_file=/usr/local/nagios/etc/objects/services.cfg

 

3.9 配置檔案關係圖

    最終的配置檔案關係如下圖:

clip_image013

 

3.10 伺服器端重啟服務

[root@linuxserver etc]# /etc/init.d/nagios restart

Running configuration check...

Stopping nagios: done.

Starting nagios: done.

    重啟後過一會就可以看到客戶端的情況了。

clip_image015

 

參考了http://www.cnblogs.com/mchina/archive/2013/02/20/2883404.html,謝謝哦!

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28536251/viewspace-1460822/,如需轉載,請註明出處,否則將追究法律責任。

相關文章