Check PVE

Icinga check command for Proxmox VE via API

check_pve

Icinga check command for Proxmox VE via API

Setup

Requirements

This check command depends on the following python modules:

  • enum
  • requests
  • argparse

Installation on Debian / Ubuntu

apt install python-enum34 python-requests

Installation on Redhat 6 / CentOS 6

yum install python-argparse python-enum34 python34-requests

Installation on Redhat 7 / CentOS 7

yum install python-enum34 python-requests

Installation on FreeBSD

pkg install py36-requests

Create a API user in Proxmox VE

Create a role named Monitoring and assign neccesarry privileges:

pveum roleadd Monitoring
pveum rolemod Monitoring --privs Sys.Modify,VM.Monitor,Sys.Audit,Datastore.Audit,VM.Audit

Create a user named monitoring and set password:

pveum useradd monitoring@pve --comment "The ICINGA 2 monitoring user"
pveum passwd monitoring@pve

Assign monitoring role to user monitoring

pveum aclmod / -user monitoring@pve -role Monitoring

For further information about the Proxmox VE privilege system have a look into the documentation.

Usage

The icinga2 folder contains the command defintion and service examples for use with Icinga2.

usage: check_pve.py [-h] -e API_ENDPOINT -u API_USER -p API_PASSWORD [-k] -m
                    {cluster,cpu,memory,storage,io_wait,updates,services,subscription,vm,replication,disk-health}
                    [-n NODE] [--name NAME] [--vmid VMID]
                    [--expected-vm-status {running,stopped,paused}]
                    [--ignore-vm-status] [--ignore-service NAME]
                    [--ignore-disk NAME] [-w TRESHOLD_WARNING]
                    [-c TRESHOLD_CRITICAL] [-M]

Check command for PVE hosts via API

optional arguments:
  -h, --help            show this help message and exit

API Options:
  -e API_ENDPOINT, --api-endpoint API_ENDPOINT
                        PVE api endpoint hostname
  -u API_USER, --username API_USER
                        PVE api user (e.g. icinga2@pve or icinga2@pam,
                        depending on which backend you have chosen in proxmox)
  -p API_PASSWORD, --password API_PASSWORD
                        PVE api user password
  -k, --insecure        Don't verify HTTPS certificate

Check Options:
  -m {cluster,cpu,memory,storage,io_wait,updates,services,subscription,vm,vm_status,replication,disk-health}
                        Mode to use.
  -n NODE, --node NODE  Node to check (necessary for all modes except cluster)
  --name NAME           Name of storage or vm
  --vmid VMID           ID of virtual machine or container
  --expected-vm-status {running,stopped,paused}
                        Expected VM status
  --ignore-vm-status    Ignore VM status in checks
  --ignore-service NAME
                        Ignore service NAME in checks
  --ignore-disk NAME    Ignore disk NAME in health check
  -w TRESHOLD_WARNING, --warning TRESHOLD_WARNING
                        Warning treshold for check value
  -c TRESHOLD_CRITICAL, --critical TRESHOLD_CRITICAL
                        Critical treshold for check value
  -M                    Values are shown in MB (if available). Tresholds are
                        also treated as MB values

Examples

Check cluster health

./check_pve.py -u  -p  -e  -m cluster
OK - Cluster 'proxmox1' is healthy'

Check CPU load

./check_pve.py -u  -p  -e  -m cpu -n node1
OK - CPU usage is 2.4%|usage=2.4%;;

Check memory usage

./check_pve.py -u  -p  -e  -m memory -n node1
OK - Memory usage is 37.44%|usage=37.44%;; used=96544.72MB;;;257867.91

Check disk-health

./check_pve.py -u  -p  -e  -m disk-health -n node1
OK - All disks are healthy|wearout_sdb=96%;; wearout_sdc=96%;; wearout_sdd=96%;; wearout_sde=96%;;

Check storage usage

./check_pve.py -u  -p  -e  -m storage -n node1 --name local
OK - Storage usage is 54.23%|usage=54.23%;; used=128513.11MB;;;236980.36

./check_pve.py -u  -p  -e  -m storage -n node1 --name vms-disx
CRITICAL - Storage 'vms-disx' doesn't exist on node 'node01'

Check subscription status

./check_pve.py -u  -p  -e  -m subscription -n node1 -w 50 -c 10
OK - Subscription of level 'Community' is valid until 2019-01-09

Check VM status

Without specifying a node name:

./check_pve.py -u  -p  -e  -m vm --name test-vm
OK - VM 'test-vm' is running on 'node1'|cpu=1.85%;; memory=8.33%;;

With memory thresholds:

./check_pve.py -u  -p  -e  -m vm --name test-vm -w 50 -c 80
OK - VM 'test-vm' is running on 'node1'|cpu=1.85%;; memory=40.33%;50.0;80.0

With a specified node name, the check plugin verifies on which node the VM runs.

./check_pve.py -u  -p  -e  -m vm -n node1 --name test-vm
OK - VM 'test-vm' is running on node 'node1'|cpu=1.85%;; memory=8.33%;;

./check_pve.py -u  -p  -e  -m vm -n node1 --name test-vm
WARNING - VM 'test-vm' is running on node 'node2' instead of 'node1'|cpu=1.85%;; memory=8.33%;;

If you only want to gather metrics and don't care about the vm status add the --ignore-vm-status flag:

./check_pve.py -u  -p  -e  -m vm --name test-vm --ignore-vm-status
OK - VM 'test-vm' is not running

Specify the expected VM status:

./check_pve.py -u  -p  -e  -m vm --name test-vm --expected-vm-status stopped
OK - VM 'test-vm' is not running

For hostalive checks without gathering performance data use vm_status instead of vm. The parameters are the same as with vm.

Check storage replication status

./check_pve.py -u  -p  -e  -m replication -n node1
OK - No failed replication jobs on node1

FAQ

Could not connect to PVE API: Failed to resolve hostname

Verify that your DNS server is working and can resolve your hostname. If everything is fine check for proxyserver environment variables (HTTP_PROXY,HTTPS_PROXY), which maybe not allow communication to port 8006.

Contributors