check_borg

This is an Icinga2 plugin to monitor Borg backups. While this documentation does not cover how to use it with nagios, it can still be used with it.

This plugin checks for 3 different things:

  • last time of your most recent backup
  • return codes of your automatic backup script
  • repository consistency

Dependencies

This plugin needs borg to run properly.

How to use it

Return codes

This plugin checks the return code of your last borg run. Since you can not extrapolate these return codes, you will need to log them somewhere when you run your automatic borg scripts.

By default, the plugin looks for a file in /var/log/borg/borg-rc.log that looks like this:

create: 0
prune: 0

To log the return codes, you can use something like this in you backup script:

#!/bin/sh
REPOSITORY=username@remoteserver.com:backup

borg create -v --stats                          \
    $REPOSITORY::'{hostname}-{now:%Y-%m-%d}'    \
    /home                                       \
    /var/www                                    \
    --exclude '/home/*/.cache'                  \
    --exclude /home/Ben/Music/Justin\ Bieber    \
    --exclude '*.pyc'
create_rc=$?

borg prune -v --list $REPOSITORY --prefix '{hostname}-' \
    --keep-daily=7 --keep-weekly=4 --keep-monthly=6
prune_rc=$?

echo "create: $create_rc\nprune: $prune_rc" > /var/log/borg/borg-rc.log

If you do not want to check the return codes on either borg create or borg prune, you can pass the -C and -P arguments. More on this below.

CLI options

USAGE: 
  check_borg [-c 52] [-w 26] [-r repository] [-p repository_password] [-l return_code_log][-CHPSh]
    -c critical threshold in seconds (default: 187200 == 52 hours)
    -C don't check for borg create return code
    -h show this help message
    -H don't check for repository consistency with borg check
    -l return code log file (default: /var/log/borg/borg-rc.log)
    -p repository password
    -P don't check for borg prune return code
    -r repository
    -S run borg commands as super user (sudo)
    -w warning threshold in seconds (default: 93600 == 26 hours)

For example, you could call this plugin manually this way:

check_borg -r user@remoteserver:/data/borg/myrepository -p my_secret_password -l /var/log/borg/myrepository_rc.log

Output

Depending on how the checks went, the plugin will output one of these messages:

  • OK
  • UNKNOWN: return code log not found, not readable or incomplete
  • WARNING: borg create reached its normal end, but there were warnings
  • WARNING: borg prune reached its normal end, but there were warnings
  • WARNING: borg check reached its normal end, but there were warnings
  • WARNING: last complete backup was $time_since_last seconds ago. Warn is $WARN
  • CRITICAL: borg create did not reach its normal end
  • CRITICAL: borg prune did not reach its normal end
  • CRITICAL: borg check did not reach its normal end
  • CRITICAL: last complete backup was $time_since_last seconds ago. Crit is $CRIT

Icinga2 integration

To use check_borg with icinga2 you will need to add two things to your config.

First, we need to add a CheckCommand object. This will typically reside in your commands.conf file in the global-templates zone:

object CheckCommand "borg" {
  command = [ PluginDir + "/check_borg" ] //constants.conf -> const PluginDir

  arguments = {
    "-c" = "$borg_critical_threshold$"
    "-w" = "$borg_warning_threshold$"
    "-l" = "$borg_rc_log$"
    "-p" = "$borg_password$"
    "-r" = "$borg_repository$"
    "-C" = {
      set_if = "$borg_dont_check_create_rc$"
      description = "Don't check for borg create return code."
    }
    "-H" = {
      set_if = "$borg_dont_check_consistency$"
      description = "Don't check for repository consistency with borg check."
    }
    "-P" = {
      set_if = "$borg_dont_check_purge_rc$"
        }
    "-S" = {
      set_if = "$borg_sudo$"
      description = "Run borg commands as super user (sudo)."
    }
  }
  timeout = 1h

  vars.borg_dont_check_create_rc = false
  vars.borg_dont_check_consistency = false
  vars.borg_dont_check_purge_rc = false
  vars.borg_sudo = false

You might want to monitor you first runs to check how much time the check takes to run. borg check can take some time on large repositories and if you reach the timeout icinga2 will kill the process and leave borg lock files in place.

Once that is done, you need to define a service. Here is an example of a service you could declare:

object Service "borg" {
  import "generic-service"

  host_name = "my_server"
  check_command = "borg"
  check_interval = 1d
  vars.borg_repository = "username@remoteserver:/data/borg/myrepository"
    vars.borg_password = "my_secret_password"
}

Remote repositories

Note that if you connect to a remote server via SSH with borg, the user running icinga2 (nagios on Debian) will also need to be able to connect to this server.

You you can also bypass this issue by using the -S parameter and running all borg commands during the check as root.

To give the nagios permission to run borg as root, you can add this in /etc/sudoers.d/20_nagios:

nagios ALL = (root) NOPASSWD:SETENV: /usr/bin/borg

Downtimes

It might be a good idea to schedule a global downtime for icinga2 checks when you backup your machine.

Please be aware that if you run check_borg while a backup is underway, at best the check will fail because of the lock files and at worst it might affect the backup procedures. Proceed with caution.

Thanks

Most of the structure of this plugin comes from the great work Alexander Swen and others did on the check_puppet_agent plugin. Thank you!