Check graphite
Check metrics from graphite from nagios, icinga2 or compatible monitoring solution.
Source code at https://gitlab.com/samuelbf/check_graphite/. Forked from Disqus nagios-plugins repository.
Usage
% ./check_graphite.py -h
usage: check_graphite.py [-h] [-U URL] -t TARGETS --from _FROM
[--until _UNTIL] [-W WARN] [-C CRIT]
[-c COUNT | --percentile PERCENT] [--over] [--under]
[--empty-ok]
Check metrics from graphite API
optional arguments:
-h, --help show this help message and exit
-U URL, --graphite-url URL
Graphite URL [http://localhost/]
-t TARGETS, --target TARGETS
Target to check
--from _FROM From timestamp/date
--until _UNTIL Until timestamp/date [now]
-W WARN, --warning WARN
Warning if datapoints over WARNING
-C CRIT, --critical CRIT
Critical if datapoints over CRITICAL
-c COUNT, --count COUNT
Alert when at least COUNT metrics are over/under thresholds [1]
--percentile PERCENT Use nPercentile Graphite function on the target (returns one datapoint)
--over Alert when data OVER specified WARNING or CRITICAL threshold [True]
--under Alert when data UNDER specified WARNING or CRITICAL threshold [False]
--empty-ok Empty data from Graphite is OK
Examples
-
Check "metricsReceived <= 1200" in the last 10 minutes :
$ check_graphite -U http://localhost:8888/ --from=-10minutes -t metricsReceived -W 1200 GRAPHITE WARNING : metricsReceived is 1403.0 (highest value) | metricsReceived=1403.0;1200.0;;; metricsReceived=None/None/1403.0/1387.0/615.0/618.0/615.0/621.0/None/None
-
Check "metricsReceived <= 1200" yesterday for at least 70% of values :
$ check_graphite -U http://localhost:8888/ -t metricsReceived -W 1200 -C 1400 --percentile=70 --from=yesterday --until=today GRAPHITE WARNING : metricsReceived is 1387.0 (highest value) | 'nPercentile(metricsReceived, 70)'=1387.0;1200.0;1400.0;;
-
Check "metricsReceived <= 1200" and "committedPoints <= 1200" for at least 70% of values in the last 10 minutes :
$ check_graphite -U http://localhost:8888/ --from=-10minutes -t metricsReceived -t committedPoints -W 1200 -C 1400 --percentile=70 GRAPHITE WARNING : metricsReceived is 1387.0 (70th percentile) | 'nPercentile(metricsReceived, 70)'=1387.0;1200.0;1400.0;; OK : committedPoints is 639.0 (70th percentile) | 'nPercentile(committedPoints, 70)'=639.0;1200.0;1400.0;;
-
Check "metricsReceived" and "committedPoints" not over 1200 more than 2 times in the last 10 minutes :
$ check_graphite -U http://localhost:8888/ --from=-10minutes -t "aliasByMetric(carbon.agents.*.{metricsReceived,committedPoints})" -W 1200 -C 1400 --count=3 GRAPHITE OK : committedPoints is 636.0 (third highest value) | committedPoints_3=636.0;1200.0;1400.0;; committedPoints=None/None/0.0/2692.0/599.0/636.0/633.0/639.0/None/None OK : metricsReceived committedPoints is 621.0 (third highest value) metricsReceived=None/None/1403.0/1387.0/615.0/618.0/615.0/621.0/None/None | metricsReceived_3=621.0;1200.0;1400.0;;
Alternatives
Comparison with disqus' version
This script is based on a litte script by disqus, ported to python3 by Debian, and aims to be a drop-in replacement for it. This script does not support --confidence
, --beyond
or --compare
flags, though.
The main difference is compliance with Nagios plugin guidelines for performance data and that it returns UNKNOWN
status on graphite server error or missing values (rather than CRITICAL
).
Other implementations
Other check_graphite implementations include :
- JKrauska's version, relies on NagAconda, which is not python3-compatible yet
- NETWAYS' version
- kali's shell/curl version, not tested
- Obfuscurity, in ruby, referenced by icinga's own check_graphite command