check_systemd

Nagios / Icinga monitoring plugin to check systemd.

pypi.org Build Status

check_systemd

check_systemd is a Nagios / Icinga monitoring plugin to check systemd for failed units.

This Python script will report a degraded system to your monitoring solution. It requires only the nagiosplugin library.

You can also test a single service with the -u, --unit parameter.

Installation

pip3 install check_systemd

Packages

Command line interface

usage: check_systemd [-h] [-u UNIT | -e UNIT] [-n] [-w SECONDS] [-c SECONDS]
                     [-t] [-W SECONDS] [-C SECONDS] [-i] [-v] [-V]

Copyright (c) 2014-18 Andrea Briganti a.k.a 'Kbyte' 
Copyright (c) 2019-21 Josef Friedrich 

Nagios / Icinga monitoring plugin to check systemd.

optional arguments:
  -h, --help            show this help message and exit
  -u UNIT, --unit UNIT  Name of the systemd unit that is being tested.
  -e UNIT, --exclude UNIT
                        Exclude a systemd unit from the checks. This option can
                        be applied multiple times, for example: -e mnt-
                        data.mount -e task.service. Regular expressions can be
                        used to exclude multiple units at once, for example: -e
                        'user@\d+\.service'. For more informations see the
                        Python documentation about regular expressions
                        (https://docs.python.org/3/library/re.html).
  -n, --no-startup-time
                        Don’t check the startup time. Using this option the
                        options '-w, --warning' and '-c, --critical' have no
                        effect. Performance data about the startup time is
                        collected, but no critical, warning etc. states are
                        triggered.
  -w SECONDS, --warning SECONDS
                        Startup time in seconds to result in a warning status.
                        Thedefault is 60 seconds.
  -c SECONDS, --critical SECONDS
                        Startup time in seconds to result in a critical status.
                        Thedefault is 120 seconds.
  -t, --dead-timers     Detect dead / inactive timers. See the corresponding
                        options '-W, --dead-timer-warning' and '-C, --dead-
                        timers-critical'. Dead timers are detected by parsing
                        the output of 'systemctl list-timers'. Dead timer rows
                        displaying 'n/a' in the NEXT and LEFTcolumns and the
                        time span in the column PASSED exceeds the values
                        specified with the options '-W, --dead-timer-warning'
                        and '-C, --dead-timers-critical'.
  -W SECONDS, --dead-timers-warning SECONDS
                        Time ago in seconds for dead / inactive timers to
                        trigger a warning state (by default 6 days).
  -C SECONDS, --dead-timers-critical SECONDS
                        Time ago in seconds for dead / inactive timers to
                        trigger a critical state (by default 7 days).
  -i, --ignore-inactive-state
                        Ignore an inactive state on a specific unit. Oneshot
                        services for example are only active while running and
                        not enabled. The rest of the time they are inactive.
                        This option has only an affect if it is used with the
                        option -u.
  -v, --verbose         Increase output verbosity (use up to 3 times).
  -V, --version         show program's version number and exit

Performance data:
  - count_units
  - startup_time
  - units_activating
  - units_active
  - units_failed
  - units_inactive

Project pages

Behind the scenes

To detect failed units this monitoring script runs:

systemctl list-units --all

To get the startup time it executes:

systemd-analyze

To check a specific unit (-u, --unit) this command is executed:

systemctl is-active 

To find dead timers this plugin launches:

systemctl list-timers --all

To learn how systemd produces the text output on the command line, it is worthwhile to take a look at systemd’s source code. Files relevant for text output are: basic/time-util.c, analyze/analyze.c.

Testing

pyenv install 3.6.12
pyenv install 3.7.9
pyenv local 3.6.12 3.7.9
pip3 install tox
tox

Deploying

Edit version number in check_systemd.py (without v)

git tag v2.0.11
git push --tags