check_pve
Proxmox Virtual Environment Naemon/Icinga/Nagios plugin which checks various stuff via Proxmox API(v2).
Tested with: Naemon 1.0.6; Ruby 2.3.0, 2.3.3; PVE 5.0, 5.1, 6.1
Requirements
Ruby
- Ruby >2.3
PVE
A user/role with appropriate rights. See User Management for more information.
# /etc/pve/user.cfg
user:monitoring@pve:1:0::::::
role:PVE_monitoring:Datastore.Audit,Sys.Audit,Sys.Modify,VM.Audit:
acl:1:/:monitoring@pve:PVE_monitoring:
How to add user and role
pveum useradd monitoring@pve -comment "Monitoring User"
pveum passwd monitoring@pve
pveum roleadd PVE_monitoring -privs "Datastore.Audit,Sys.Audit,Sys.Modify,VM.Audit"
pveum aclmod / -user monitoring@pve -role PVE_monitoring
Usage
check_pve v0.2.4 [https://gitlab.com/6uellerBpanda/check_pve]
This plugin checks various parameters of Proxmox Virtual Environment via API(v2)
Mode:
Cluster:
cluster Checks quorum of cluster
Node:
smart Checks SMART health of disks
updates Checks for available updates
subscription Checks for valid subscription
services Checks if services are running
storage Checks storage usage in percentage
cpu Checks CPU usage in percentage
memory Checks Memory usage in gigabytes
io_wait Checks IO wait in percentage
net_in Checks inbound network usage in kilobytes
net_out Checks outbound network usage in kilobytes
ksm Checks KSM sharing usage in megabytes
VM:
vm_cpu Checks CPU usage in percentage
vm_disk_read Checks how many kb last 60s was read (timeframe: hour)
vm_disk_write Checks how many kb last 60s was written (timeframe: hour)
vm_net_in Checks incoming kb from last 60s (timeframe: hour)
vm_net_out Checks outgoing kb from last 60s (timeframe: hour)
Usage: check_pve.rb [options]
Options:
-s, -H, --address ADDRESS PVE host address
-k, --insecure No SSL verification
-m, --mode MODE Mode to check
-n, --node NODE PVE Node name
-u, --username USERNAME Username with auth realm e.g. monitoring@pve
-p, --password PASSWORD Password
-w, --warning WARNING Warning threshold
-c, --critical CRITICAL Critical threshold
--name NAME Name for storage
-i, --vmid VMID Vmid of lxc,qemu
-t, --type TYPE VM type lxc or qemu
-x, --exclude EXCLUDE Exclude (regex)
--timeframe TIMEFRAME Timeframe for vm checks: hour,day,week,month or year
--cf CONSOLIDATION_FUNCTION RRD cf: average or max
-v, --version Print version information
-h, --help Show this help message
Options
-
-s, -H: PVE host address, only https supported, e.g. pve-01.test.at
-
-k: Don't validate certificate
-
-n: PVE Node name
-
-u: Username with auth realm, e.g. monitoring@pve, root@pam
-
-i: vmid
-
-x: Exclude items (regex)
-
--name: Storage name, e.g. local, local-lvm
-
--type: either lxc or qemu
-
--timeframe: time frame for rrd data; hour, day, week, month or year
-
--cf: consolidation function for rrd data; average or max
Modes
Cluster
Checks if the cluster is quorate. Warning if not. (/cluster/status)
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -m cluster
OK - LNZ: Cluster ready - quorum is ok
Node
The node name (via -n option) is required for all node checks.
SMART
Checks SMART status of the disks. (/nodes/{node}/disks/list)
Allows exclude option: --exclude '^/dev/sda'
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m smart
OK - No SMART errors detected
Updates
Displays a warning if new updates are available. (/nodes/{node}/apt/update)
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m update
Warning - New updates available
Subscription
Checks if subscription is valid. (/nodes/{node}/subscription)
Specify warning threshold for minimum number of days subscription has to be valid. Critical status if the subscription has expired.
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m subscription -w 303
Warning - Subscription will end at 2018-10-13
Services
Displays a warning if a service isn't running. (/nodes/{node}/services)
Allows exclude option: --exclude 'ksmtuned'
./check_pve.rb -s hv-vm-01.test.a -u monitoring@pve -p test1234 -n hv-vm-01 -m services
Warning - postfix, spiceproxy not running
To exclude services:
./check_pve.rb -s hv-vm-01.test.a -u monitoring@pve -p test1234 -n hv-vm-01 -m services -x postfix,spiceproxy
OK - All services running
Storage
Checks storage usage in percentage. Value will be rounded. (/nodes/{node}/storage/{storage}/status)
Specify datastore/storage with "--name" option.
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m storage --name local -w 40 -c 60
Warning - Storage usage: 45% | Usage=45%;40;60
CPU
Checks CPU usage in percentage. Value will be rounded. (/nodes/{node}/status)
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m cpu -w 40 -c 60
OK - CPU usage: 30% | Usage=1%;40;60
Memory
Checks memory usage in gigabytes. (/nodes/{node}/status)
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m memory -w 40 -c 60
Warning - Memory Usage: 45GB | Usage=45GB;40;60
IO Wait
Checks IO wait/delay usage in percentage. Value will be rounded. (/nodes/{node}/status)
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m io_wait -w 1 -c 3
OK - IO Wait: 0% | Wait=0%;1;3
Network usage
Checks network usage (In/Out) in kilobytes. Value will be rounded. (/nodes/{node}/rrddata)
# inbound
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m net_in --timeframe hour --cf max -w 100 -c 1000
Critical - Network usage in: 1276KB | Usage=1276KB;100;1000
# outbound
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m net_out --timeframe hour --cf max -w 100 -c 1000
Critical - Network usage out: 26893KB | Usage=26893KB;100;1000
VM
QEMU/KVM and LXC are supported.
Following options are necessary for all vm checks:
- node (-n)
- type (--type)
- vmid (-i)
- timeframe (--timeframe)
- consolidation function (--cf)
> Note: These checks are parsing the rrddata from pve and do not reflect the actual data when the check has been run. It will always use the last item in the rrddata array. > Example: If you specify timeframe hour and disk read check it will display how much read io (kb) was done in the last 60s.
CPU
Check CPU usage in percentage. Value will be rounded. (/nodes/{node}/{type}/{vmid}/rrddata)
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m vm_cpu -t qemu --timeframe hour --cf average -i 126 -w 80 -c 90
OK - CPU usage: 5% | Usage=5%;80;90
Disk read, write
Checks how much read/write io was done in kb. Value will be rounded. (/nodes/{node}/{type}/{vmid}/rrddata)
# read
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m vm_disk_read -t qemu --timeframe hour --cf average -i 126 -w 80 -c 90
Critical - Disk read: 294kb | Usage=294KB;80;90
# write
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m vm_disk_write -t qemu --timeframe hour --cf average -i 126 -w 80 -c 90
OK - Disk write: 66kb | Usage=66KB;80;90
Network traffic incoming, outgoing
Checks how much incoming/outgoing network traffic was done in kb. Value will be rounded. (/nodes/{node}/{type}/{vmid}/rrddata)
# read
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m vm_net_in -t qemu --timeframe hour --cf average -i 126 -w 80 -c 90
OK - Network usage in: 70kb | Usage=8KB;80;90
# write
./check_pve.rb -s hv-vm-01.test.at -u monitoring@pve -p test1234 -n hv-vm-01 -m vm_net_out -t qemu --timeframe hour --cf average -i 126 -w 80 -c 90
OK - Network usage out: 50kb | Usage=8KB;80;90