I wrote this simple check-script for nrpe/nagios to get the status of various zpool volumes in a box, and output the failed volumes if any such exist.
Syntax
$ path/check_zpool.sh [email] [email]
If no arguments are specified, the script will assume it's run for NRPE.
If one or more email addresses are specified, the script will send an email in case an array reports an error.
Output
tank: DEGRADED / data: rebuilding / system: ok
Failed/rebuilding volumes will always be first in the output string, to help diagnose the problem when recieving the output via pager/sms.
Various outputs explained:
ok | The device is reported as ok by zpool |
DEGRADED | The RAID volume is degraded, it's still working but without the safety of RAID, and in some cases with severe performance loss. |
rebuilding | The RAID is rebuilding, will return to OK when done |
unknown state | Volume is in an unknown state. Please report this to me (soren at klintrup.dk) so I can update the script include the following output: zpool status zpool list |