NAN

Nagios Notification Daemon

nand

The Nagios Notification Daemon.

Description:

NAN is a front end for the nagios paging system. It's main design goal is to limit the number of notifications that nagios sends. It does this by concatenating pages together in a configurable way, and sending only 1 every x seconds. The configuration of how it concatenates and aggregates the messages is highly configurable using a flexible syntax of variables and grouping operators. With NAN it is possible to get a lot of information into a small string that is perfect for cell phones and pagers. Our company has used NAN internally for over a year and it is rock solid.

Concepts:

Timing:

There are three main things that NAN considers when it receives a notification from nagios: the delivery method (page, email, etc), the destination address (phone number, email address, etc) and the notification type (PROBLEM, ACKNOWLEDGEMENT and RECOVERY).

There are 2 main states for each destination address and notification type: clear and sending. It is in a clear state when no notifications have been sent for longer than the reset timer. It is in a sending state between the time of the last notification and when the reset time expires.

There are 3 counters used: resend time, initial delay and reset time.

  • Resend time: The time before we can send another notification to the current destination. This timer is used for all notification types. So once a specific notification has been sent to a destination, another of the same type won't get sent again until the resend timer expires.

  • Initial delay: The time from the first notification of a given type before it sends out the first message. All messages received during this time are concatenated. Each notification type will use this timer first, then the resend timer.

  • Reset time: The time of no activity before we reset to the clear state for the current address and notification type.

Commands:

Commands are shell commands that NAN executes to do the final delivery of the message. NAN selects which command to use based on the following criteria:

  • Q: what method of delivery should we use? (page, email, etc) how many notifications to send in this one page?

  • A: You can create as many commands as you want for each delivery method that are used based on how many notifications are getting concatenated. For example, you can have a big message with lots of info if there is only 1 notification to send. If there are between 2 and 5 you can have a more terse message with less info, and for 6 and above you can have really terse info. The ranges are completely configurable so you can taylor them to fit the method of delivery. More terse messages for pagers, more verbose for email, etc.

Summary line:

NAN includes a variable that is a summary of the number of each type of notification that is included in that notification. Example: P2 R1 A3 for 2 Problems, 1 Recover and 3 Acknowledgements.

Sorting:

When there are multiple notifications getting concatenated together, you can configure how they are sorted so that the most important notifications to you appear at the top.

New Variables:

NAN creates several new variables based on information from Nagios. Many of these are just different ways of saying the same information, such as ACK instead of ACKNOWLEDGEMENT. The original variables are never touched, and are passed straight through from Nagios.

NOTIFICATIONTYPE2:            PROBLEM         => PROB
                              RECOVERY        => RECVR
                              ACKNOWLEDGEMENT => ACK

HOSTSTATE2:                   UNREACHABLE     => UNRCHBL

SERVICESTATE2:                WARNING         => WARN
                              CRITICAL        => CRIT
                              UNKNOWN         => UNKN

TIMESINCELASTSTATECHANGE:      time in seconds since the LASTSTATECHANGE
TIMESINCELASTSTATECHANGE_LONG: HOURSE:MINUTES:SECONDS since
LASTSTATECHANGE

SERVICEINFO: If SERVICEACKAUTHOR is not set, then this is the value of SERVICEOUT,                otherwise it is: $SERVICEACKAUTHOR$: $SERVICACKCOMMENT$

HOSTINFO:    If HOSTACKAUTHOR is not set, then this is the value of HOSTOUT,                      otherwise it is: $HOSTACKAUTHOR$: $SERVICACKCOMMENT$

All variables get _SHORT version created which consists solely of the first character. For example, NOTIFICATIONTYPE_SHORT for PROBLEM would be P.

Any variable values that look like a unix timestamp (seconds since epoch) will get a _SHORTDATE and _LONGDATE version created. Example for the timestamp 1091139706:

_SHORTDATE: 2004-07-29 15:21:46
_LONGDATE:  Thu Jul 29 15:21:46 2004
Iteration and Grouping syntax:

Each command can have optional iteration and group syntax. The format is:

$[iteration_text][seperator][fields_to_group_on][group_seperator]$

When nan has multiple messages that it wants to concatenate together, it uses blocks like the above to format the resulting message. Any text that is outside of a iteration operator only evaluates against the first message it tries to send. So if you don't include any iteration operators in a given command, then only the first message will get printed.

Marker Description
iteration_text: All text and variables in this part are evaluated for each of the messages nan is trying to send.
seperator: Text that is put between each iteration of the iteration_text.
group_variables: This is a comma delimited list of variables (without the $ symbols) that the iteration will group on. For each variable given, all values of that variable that match will only be printed once.
group_seperator: For any variables that are not listed as a group variable, the values are concatenated together with the group_seperator seperating them.

This is probably best understood with an example. (Ignoring sorting).

<$SUMMARY_SHORT$> $[$NOTIFICATIONTYPE_SHORT$: $HOSTALIAS$/$SERVICEDESC$ $SERVICESTATE2$][ - ][NOTIFICATIONTYPE_SHORT,HOSTALIAS,SERVICESTATE2][,]$\n$SHORTDATETIME$\n*****\n

Given the following messages to send:

PROBLEM           host1   HTTP  CRITICAL
PROBLEM           host1   FTP   CRITICAL
RECOVERY          host2   FTP   OK
ACKNOWLEDGEMENT   host3   SMTP  WARNING

NAN will send the following email/page:

<P:2 R:1 A:1> P: host1/HTTP,FTP CRIT - R: host2/FTP OK - A: host3/SMTP WARN
2004-07-15 14:12:43
*****

As you can see, it's a lot of information pretty tightly packed, perfect for devices with limited capacity, such as pagers and cell phones.

Copyright

NAN is Copyright (C) 2010 Tekco Management Group, LLC Nagios is a registered trademark of Ethan Galstad.

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

Changelog:

2010-10-12 - Verified to work with Nagios 3.x

2010-08-05 - 0.04 released - Fixed a regex problem for variable values with colons and spaces in just the right place. Thanks to Brad Guillory for the fix.

2006-11-13 - Finally updated to work with Nagios 2.x!