check_pve

check_pve

Proxmox Virtual Environment Naemon/Icinga/Nagios plugin which checks various stuff via Proxmox API(v2).

Tested with: Naemon 1.0.6; Ruby 2.3.0, 2.3.3; PVE 5.0, 5.1

Requirements

Ruby

  • Ruby >2.3

PVE

A user/role with appropriate rights. See User Management for more information.

# /etc/pve/user.cfg
user:monitoring@pve:1:0::::::

role:PVE_monitoring:Datastore.Audit,Sys.Audit,Sys.Modify,VM.Audit:
acl:1:/:monitoring@pve:PVE_monitoring:

Usage

check_pve v0.2.2 [https://gitlab.com/6uellerBpanda/check_pve]

This plugin checks various parameters of Proxmox Virtual Environment via API(v2)

Mode:
  Cluster:
    cluster         Checks quorum of cluster
  Node:
    smart           Checks SMART health of disks
    updates         Checks for available updates
    subscription    Checks for valid subscription
    services        Checks if services are running
    storage         Checks storage usage in percentage
    cpu             Checks CPU usage in percentage
    memory          Checks Memory usage in gigabytes
    io_wait         Checks IO wait in percentage
  VM:
    vm_cpu          Checks CPU usage in percentage
    vm_disk_read    Checks how many kb last 60s was read (timeframe: hour)
    vm_disk_write   Checks how many kb last 60s was written (timeframe: hour)
    vm_net_in       Checks incoming kb from last 60s (timeframe: hour)
    vm_net_out      Checks outgoing kb from last 60s (timeframe: hour)

Usage: check_pve.rb [mode] [options]

Options:
    -s, --address ADDRESS            PVE host address
    -k, --insecure                   No SSL verification
    -m, --mode MODE                  Mode to check
    -n, --node NODE                  PVE Node name
    -u, --username USERNAME          Username with auth realm e.g. monitoring@pve
    -p, --password PASSWORD          Password
    -w, --warning WARNING            Warning threshold
    -c, --critical CRITICAL          Critical threshold
        --name NAME                  Name for storage
    -i, --vmid VMID                  Vmid of lxc,qemu
    -t, --type TYPE                  VM type lxc or qemu
    -x, --exclude EXCLUDE            Supported with following checks: services
        --timeframe TIMEFRAME        Timeframe for vm checks: hour,day,week,month or year
        --cf CONSOLIDATION_FUNCTION  RRD cf: average or max
    -v, --version                    Print version information
    -h, --help                       Show this help message

Options

  • -s: PVE host address, only https supported, e.g. pve-01.test.at

  • -k: Don't validate certificate

  • -n: PVE Node name

  • -u: Username with auth realm, e.g. monitoring@pve, root@pam

  • -i: vmid

  • -x: Exclude items. Multiple values separated by colon like: ksmtuned,pveproxy. Possible for following checks: services

  • --name: Storage name, e.g. local, local-lvm

  • --type: either lxc or qemu

  • --timeframe: time frame for rrd data; hour, day, week, month or year

  • --cf: consolidation function for rrd data; average or max

Modes

Cluster

Checks if the cluster is quorate. Warning if not. (/cluster/status)

./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -m cluster
OK - LNZ: Cluster ready - quorum is ok

Node

The node name (via -n option) is required for all node checks.

SMART

Checks SMART status of the disks. (/nodes/{node}/disks/list)

./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m smart
OK - No SMART errors detected

Updates

Displays a warning if new updates are available. (/nodes/{node}/apt/update)

./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m update
Warning - New updates available

Subscription

Checks if subscription is valid. (/nodes/{node}/subscription)

Specify warning threshold for minimum number of days subscription has to be valid. Critical status if the subscription has expired.

./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m subscription -w 303                                                 
Warning - Subscription will end at 2018-10-13

Services

Displays a warning if a service isn't running. (/nodes/{node}/services)

./check_pve.rb -s hv-vm-01.test.a -u monitoring@pve -p test1234 -n hv-vm-01 -m services
Warning - postfix, spiceproxy not running

To exclude services:

./check_pve.rb -s hv-vm-01.test.a -u monitoring@pve -p test1234 -n hv-vm-01 -m services -x postfix,spiceproxy
OK - All services running

Storage

Checks storage usage in percentage. Value will be rounded. (/nodes/{node}/storage/{storage}/status)

Specify datastore/storage with "--name" option.

./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m storage --name local -w 40 -c 60
Warning - Storage usage: 45% | Usage=45%;40;60

CPU

Checks CPU usage in percentage. Value will be rounded. (/nodes/{node}/status)

./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m cpu -w 40 -c 60
OK - CPU usage: 30% | Usage=1%;40;60

Memory

Checks memory usage in gigabytes. (/nodes/{node}/status)

./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m memory -w 40 -c 60
Warning - Memory Usage: 45GB | Usage=45GB;40;60

IO Wait

Checks IO wait/delay usage in percentage. Value will be rounded. (/nodes/{node}/status)

./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m io_wait -w 1 -c 3
OK - IO Wait: 0% | Wait=0%;1;3

VM

QEMU/KVM and LXC are supported.

Following options are necessary for all vm checks:

  • node (-n)
  • type (--type)
  • vmid (-i)
  • timeframe (--timeframe)
  • consolidation function (--cf)

> Note: These checks are parsing the rrddata from pve and do not reflect the actual data when the check has been run. It will always use the last item in the rrddata array. > Example: If you specify timeframe hour and disk read check it will display how much read io (kb) was done in the last 60s.

CPU

Check CPU usage in percentage. Value will be rounded. (/nodes/{node}/{type}/{vmid}/rrddata)

./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m vm_cpu -t qemu --timeframe hour --cf average -i 126 -w 80 -c 90
OK - CPU usage: 5% | Usage=5%;80;90

Disk read, write

Checks how much read/write io was done in kb. Value will be rounded. (/nodes/{node}/{type}/{vmid}/rrddata)

# read
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m vm_disk_read -t qemu --timeframe hour --cf average -i 126 -w 80 -c 90
Critical - Disk read: 294kb | Usage=294KB;80;90
# write
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m vm_disk_write -t qemu --timeframe hour --cf average -i 126 -w 80 -c 90
OK - Disk write: 66kb | Usage=66KB;80;90

Network traffic incoming, outgoing

Checks how much incoming/outgoing network traffic was done in kb. Value will be rounded. (/nodes/{node}/{type}/{vmid}/rrddata)

# read
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m vm_net_in -t qemu --timeframe hour --cf average -i 126 -w 80 -c 90
OK - Network usage in: 70kb | Usage=8KB;80;90
# write
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m vm_net_out -t qemu --timeframe hour --cf average -i 126 -w 80 -c 90
OK - Network usage out: 50kb | Usage=8KB;80;90