check_snmp_temperature

check_snmp_temperature

This Temperature check plugin that retreives temperature sensor values from SNMP and can issue alerts if selected parameters are above given number. It also returns performance data for further nagios 2.0 post-processing

Setup

Make sure to check and if necessary adjust the the path to utils.pm. Make sure you have Net::SNMP perl module installed

If you want to check Dell servers, HP server, Juniper routers or Cisco Switches/Routers (cisco 7500, 5500, 2948) then you may skip much of the configuration hassles and use pre-programmed settings by using \"--type\" (or -T) parameter, you do still need to specify though if you want output as C or F with \'-o\' option (see examples). The plugin currently does not support finding critical & warning thresholds which most systems also report in SNMP, so actual threshold you will need to specify as well.

TYPES COMPATIBILITY NOTE: If you\'ve previously used 0.2x version of this plugin to check HP equipment, beware that 0.3 version has \"incompatible\" change in that it returns human-readable sensor names rather then using HP locale ids to enumerate sensors. If you need old behavior then instead of using \'-T hp\' as parameter use \'-N 1.3.6.1.4.1.232.6.2.6.8.1.3 -D 1.3.6.1.4.1.232.6.2.6.8.1.4\'

If you\'re using some other device then you need to check documentation to figure out correct parameters for this plugin, then specify base temperature sensor names table OID with \'-N\' and values table OID with \'-D. You also need to specify what base sensor temperature data type is with \"-i\" (see below).

The way plugin works is to walk the snmp tree from base names OID and find all the sensor names. Then it compares names given with \'-a\' (names are seperated by \',\') to those found in the snmp tree (in \'-a\' you\'re expected to specify one word which would be found in the full sensor name and is unique for thaqt `sensor) and uses OID ending (i.e. part of OID after the base) and adds it to base value table OID to create OID to be retrieved (similar to how you find ethernet statistics OIDs based on name of the interface and in fact many of SNMP parameters are like that).

Note: If you don\'t know temperature sensor attribute names on your system do:

 check_snmp_temperature -v -a '*' ...

(using -v option forces debugging output that should further help)

If your system does not have table with sensor names you can still use this plugin if you know exact temperature data OIDs. Then you specify list of names sensors should be known by with \'-n\' option and list of data OIDs with \'-d\' option (this can also be useful if you want to avoid having plugin do snmp table walk each time as retrieving specific list of OIDs is faster). You will still need to specify what is likely the same sensor names you you put in \'-n\' with \'-a\' or \'-A\' option.

Request: If you have an new type of device and as per above you figured out SNMP parameters that work, please send me email with this information so that I can add it as a new system type.

The values retrieved are compared to specified warning and critical values, but first the temperature has to be converted from base measurement units to measurement units you want. These units are Celsius (C) or Fahrenheit (F) or Kelvin (K) with input measurement unit specified with \'-i\' and output specified with \'-o\'. For input you sometimes have situation where sensor reports 10xRealValue, i.e. 33.5C is reported as 335 - this is supported too and then input type is specified as \'-i 10C\'.

Warning and critical values are specified with \'-w\' and \'-c\' and each one must have exact same number of values (separated by \',\') as number of sensor names specified with \'-a\'. Any values you dont want to compare you specify as 0 or just not specify (i.e. -w \',50,\'). In some cases you might not get data for specific sensor and want to substitute default value - this is supported with \'-u\' option (note that default values s in fact compared against -w and -c).

Additionally if you want performance output then use \'-f\' option to get all the sensors specified in \'-a\' or specify particular list of sensors for performance data with \'-A\' (this list can include names not found in \'-a\'). A special option of -A \'*\' will allow to get data from all sensors found and is this very useful to find what sensors you have with manual run.

Examples

 define command {

 command_name check_cisco_temperature

 command_line $USER1$/check_snmp_temperature.pl -f -H $HOSTADDRESS$ --type=cisco1 -o F -C $ARG1$ -a $ARG2$ -w $ARG3$ -c $ARG4$

 }

 define service{

 use std-service

 hostgroup_name cs2948

 service_description Temperature

 check_command check_cisco_temperature!foo!Chassis!160!190

 }

 define command{

 command_name check_dell_temperature

 command_line $USER1$/check_snmp_temperature.pl -H $HOSTADDRESS$ -C public -N .1.3.6.1.4.1.674.10892.1.700.20.1.8 -D .1.3.6.1.4.1.674.10892.1.700.20.1.6 -i 10C -o F -u 0 -a ARG1$ -w $ARG2$ -c $ARG3$ -f

 }

 define service {

 use std-service

 hostgroup_name dell_1750

 service_description Temperature

 check_command check_dell_temperature!CPU,Ambient,Bottom!110,90,0!135,110,0

 }

Also for some dell systems with all sensors enabled you can replace the above with:

 check_command check_temperature!'CPU,PROC_1,PROC_2,Ambient,Bottom,BMC Planar,BMC Riser\'!110,120,120,90,90,105,105!135,140,140,110,110,125,125