check_snmp_cisco_wlc

Check the avaibility of Cisco WLC Access Points

check_snmp_cisco_wlc

check_snmp_cisco_wlc is a Icinga/Nagios plugin to monitor the status of
 
Cisco Wireless Lan Controller (former Airespace) access points

Author:
Martin Fuerstenau, Oce Printing Systems GmbH, martin.fuerstenau_at_oce.com

History and Changes

  • 13 Dec 2017 Version 1.5
      - Added the amount of APs to output.
  • 18 Apr 2016 Version 1.4
      - Added --showerror_only. This will only show WLCs causing trouble.

  • 21 Aug 2014 Version 1.3
      - Bugfix for blacklisted items. The blacklisted AP was still written
        to the AP list because it was still in the hash storing all elements.
        Now it is deleted from the hash instead of skipped only.

  • 26 Jun 2014 Version 1.2
      - Fixed some small issues in help und usage.
      - Added blacklist support (-B|--blacklist) for AP names. The blacklist
        is a case sensitive comma seperated list. If used with --isregexp
        every item of the list is interpreted as regular expression.

  • 04 Dec 2013 Version 1.1 (Thanks to Mihail Karageorgiev.)
      - Added SNMPv3 support.
      - fixed last 3 bytes AP.

  • 10 Aug 2012 Version 1
       - First released version.

    Syntax

    check_snmp_cisco_wlc -H  <hostname of IP-address>  -C <community-string> <--showerror_only>
    For other options see --help or readme.txt

    General

    Cisco Wireless Lan Controller (WLC) is in some parts a little bit tricky to monitor. At present this plugin is focussed on the availability of the access points (AP).

    The plugin test for the status of an AP. If an AP is downloading it is not available. This will give a warning alert. If it is disassociated it will give a critical alert. If an new AP joins the WLC is automatically added with a default name (ap_name.MAC-address). the plugin will determine this and give a warning. This warning disappears if the AP is configured and has a "real" name.

    The main problem in monitoring AP is the get an alert in case of a breakdown or power off of an AP. This is not a monitorable alert (normally) because the AP simply disappears from the WLC and after a power on it is back. There is no "offline" status to monitor. One method to solve this is to handle over the number of APs.

    The other more flexible method is to compare it with historical data. Therefore we will have a file to cache to old results (variable $plugin_cache around line 88). In my case to speed up cached results the cache directory is a tmpfs. The plugin compares the old data with the actual data. If there is no old data (first check) the actual data is stored and will be used as old data the next run. If old data is a subset of actual data old data is overwritten with the actual data. If there are APs in the old data but not in the actual data an critical alert is caused.

    To reset the alarm the cached data (old) can be removed by hand (but I am too lazy for this), by calling the plugin with option -r whith host address and without community string (does the same) or it can be resetted via a trick by acknowdging the problem.

    Here is how it goes.

    Command definition

    define command{
           command_name        check_cisco_wlc
           command_line    /usr/lib/nagios/my_plugins/check_snmp_cisco_wlc -H $HOSTADDRESS$ -C $ARG1$ --showerror
           }

    Service check definition

    define service{
        active_checks_enabled        1
        passive_checks_enabled        1
        parallelize_check        1
        obsess_over_service        1
        check_freshness            0
        notifications_enabled        1
        event_handler_enabled        1
        flap_detection_enabled        1
        process_perf_data               1
        retain_status_information    1
        retain_nonstatus_information    1
           
        host_name            cisco-wlc
        service_description        AccessPoints
        is_volatile            0
        check_period            24x7
        max_check_attempts              5
        normal_check_interval        5
        retry_check_interval            2
        contact_groups            network-adm,wlc-recover
        notification_interval           1440
        notification_period        24x7
        notification_options            c,w,r
        check_command            check_cisco_wlc!public
        }

    contactgroup wlc-recover

    contactgroup wlc-recover is important. A direct contact is also possible.

    This contactgroup only contains one member:

    This contactgroup only contains one member:

    define contactgroup{
            contactgroup_name    wlc-recover
            alias            Removes WLC historic data
            members            wlc-recover
            }

    contact wlc-recover

    Look at the service_notification_commands. From service_notification_options we only need option r but unfortunately sending a notification on for recovery is not possible.

    define contact{
            contact_name            wlc-recover
            alias                wlc-recover
            service_notification_period    24x7
            host_notification_period    24x7
            service_notification_options    c,w,r
            host_notification_options       n
            service_notification_commands    recover-cisco-wlc
            host_notification_commands    host-notify-by-email
            email                dummy@dummy.com
            }

    Definition of recover-cisco-wlc
     
    This definition call a wrapper shell script becaus we must filter out notifications for all other than r:

    define command{
           command_name        recover-cisco-wlc
           command_line    /usr/lib/nagios/my_plugins/wlc-recover "$NOTIFICATIONTYPE$" "$HOSTADDRESS$"
           }

    The little wrapper script

    #!/bin/bash

    NOTIFICATIONTYPE=$1
    HOSTADDRESS=$2

    if [ "$NOTIFICATIONTYPE" = "RECOVERY" ]
       then
       /usr/lib/nagios/my_plugins/check_snmp_cisco_wlc -H $HOSTADDRESS -r
    fi

    With this trick a member of the contactgroup network-adm can acknowledge the problem (which means "Yeah - I kicked out the AP. It's ok") and reset it to green.