Nagios plugin check_sys 1.6

Purpose:

check_sys is a multiple purpose plugin which performs the following checks: The results are combined into a single line message. The most critical error code is chosen as the final plugin error code result. The plugin also checks new entries which have been added to the system log since the last execution of the plugin.

Advantages:

Installation:

  1. Edit file config.h:

  2. Compile file check_sys.c:
       cc -O2 -o check_sys check_sys.c
       strip check_sys

  3. Login as root
       su -

  4. Create group nagios and user nagios

  5. Install executable file and example files to /usr/local/nagios and prepare directory /var/local/nagios:
       ./install.sh

  6. AIX only: Prepare swap (paging space) utilization checks:
       cp -p /usr/bin/svmon /usr/local/nagios/libexec
       chown root /usr/local/nagios/libexec/svmon
       chmod 4750 /usr/local/nagios/libexec/svmon

Configuration:

/usr/local/nagios/etc/proctab:
This configuration file defines the checks of the process table and contains 5 fields separated by white space(s):
  1. Name of the process
  2. Critical minimum number of processes
  3. Warning minimum number of processes
  4. Warning maximal number of processes
  5. Critical maximal number of processes

/usr/local/nagios/etc/disctab:
This configuration file defines the checks of the amount of free disk space (mount points) and contains 3 fields separated by white space(s):
  1. Mountpoint
  2. Critical minimum amount of free space (in kB)
  3. Warning minimum amount of free space (in kB)

/usr/local/nagios/etc/romtab:
This configuration file lists all mount points which are allowed to be 100% full without triggering a critical alert. Usually this are mount points for CD-ROM drives.

/usr/local/nagios/etc/errpttab: (AIX only)
This configuration file defines the alert type (OK, WARNING, CRITICAL) which should be triggered by a certain type of errpt entry (by LABEL). By default every errpt entry which belongs to the classes H, O or U trigger a critical alert, while the remaining entries trigger warning alerts. The configuration file contains 2 fields separated by white space(s):
  1. Label
  2. Alert type

/usr/local/nagios/etc/loadrc:
This configuration file defines the critical (C_LOAD) and warning (W_LOAD) thresholds of the system load (run queue). C_LOAD and W_LOAD define 3 values separated by a comma: mean value of the last 5 minutes, mean value of the last 10 minutes and mean value of the last 15 minutes.

/usr/local/nagios/etc/memrc:
This configuration file defines the critical (C_SWAP) and warning (W_SWAP) thresholds of the amount of free swap (paging file) space in kB.

Syslog configuration:

Two different flavors of syslog are in general use: syslog and syslog-ng. Here are 2 configuration examples for both syslog and syslog-ng:
/etc/syslog.conf: (syslog)
*.err                            /var/local/nagios/critical.log
*.notice;*.!err;kern.none        /var/local/nagios/warning.log
/etc/syslog-ng/syslog-ng.conf: (syslog-ng)
filter f_nagios_critical { level(err, crit, alert, emerg); };
filter f_nagios_warning { level(notice, warning) and not facility(kern) and not match("STATS: dropped"); };

destination nagios_critical { file("/var/local/nagios/critical.log" owner(nagios) group(nagios) perm(0644)); };
destination nagios_warning { file("/var/local/nagios/warning.log"  owner(nagios) group(nagios) perm(0644)); };

log { source(src); filter(f_nagios_critical); destination(nagios_critical); };
log { source(src); filter(f_nagios_warning); destination(nagios_warning); };

Nagios configuration:

A possible service configuration:
define command {
    command_name            check-nrpe-u
    command_line            /usr/local/nagios/libexec/check_nrpe -n -H $HOSTADDRESS$ -u -c $ARG1$	
}

define service {
    host_name               event_template
    name                    event_template
    max_check_attempts      1
    notification_interval   0
    notification_options    c
    stalking_options        c
    register                0
    use                     dummy_template
}

define service {
    host_name               intranet
    service_description     check-sys
    check_command           check-nrpe-u!check_sys
    contact_groups          intranet-admins
    use                     event_template
}

Failover cluster services:

If you are running a failover cluster, some processes or mount points may only be available on the cluster node which holds a specific failover service IP address. You should query NRPE by using a node's host IP address to monitor all processes, mount points which are always available on that node. If you want to monitor processes, mount points, etc. which are only available on the node which holds a given failover service IP address, you should use that service IP address to query NRPE and use the new '-s' option to specify a different configuration directory. The '-s' switch also suppresses the disk full (romtab), swap utilization, load (run queue), syslog and errpt checks to prevent duplicate checks.
In the following example we have 2 nodes using 2 host IP addresses (10.0.0.16 and 10.0.0.17) and a failover service which may run on node 1 or node 2 using the failover service IP address 10.0.0.32:
define command {
    command_name            check-nrpe
    command_line            /usr/local/nagios/libexec/check_nrpe -n -H $HOSTADDRESS$ -c $ARG1$	
}

define command {
    command_name            check-nrpe-u
    command_line            /usr/local/nagios/libexec/check_nrpe -n -H $HOSTADDRESS$ -u -c $ARG1$	
}

define host {
    host_name               host_template
    alias                   Host Template
    address                 0.0.0.0
    check_command           check-ping
    max_check_attempts      10
    notification_interval   0
    notification_period     24x7
    notification_options    d, r
    contact_groups          dummy
    name                    host_template
    register                0
}

define service {
    host_name               event_template
    name                    event_template
    max_check_attempts      1
    notification_interval   0
    notification_options    c
    stalking_options        c
    register                0
    use                     dummy_template
}

define host {
    host_name               clusternode1
    alias                   Database Cluster Node 1
    address                 10.0.0.16
    parents                 10.0.0.1
    contact_groups          db-admins
    use                     host_template
}

define service {
    host_name               clusternode1
    service_description     check-sys
    check_command           check-nrpe-u!check_sys
    contact_groups          db-admins
    use                     event_template
}

define host {
    host_name               clusternode2
    alias                   Database Cluster Node 2
    address                 10.0.0.17
    parents                 10.0.0.1
    contact_groups          db-admins
    use                     host_template
}

define service {
    host_name               clusternode2
    service_description     check-sys
    check_command           check-nrpe-u!check_sys
    contact_groups          db-admins
    use                     event_template
}

define host {
    host_name               failoverservice
    alias                   Database Service
    address                 10.0.0.32
    parents                 clusternode1,clusternode2
    contact_groups          db-admins
    use                     host_template
}

define service {
    host_name               failoverservice
    service_description     check-srv
    check_command           check-nrpe!check_srv
    contact_groups          db-admins
    use                     event_template
}
Using the '-s' switch we override the default /usr/local/nagios/etc configuration directory by specifying a different configuration directory /usr/local/nagios/etc.srv used to monitor the failover service. Example of an nrpe.cfg file:
command[check_sys]=/usr/local/nagios/libexec/check_sys
command[check_srv]=/usr/local/nagios/libexec/check_sys -s /usr/local/nagios/etc.srv

Changelog

Changes since version 1.6:
Changes since version 1.5:
Changes since version 1.4:
Changes since version 1.3:

Author:

Patrick Kaell
patrick.kaell@police.etat.lu