check_graphite

Check graphite

Check metrics from graphite from nagios, icinga2 or compatible monitoring solution.

Source code at https://gitlab.com/samuelbf/check_graphite/. Forked from Disqus nagios-plugins repository.

Usage

% ./check_graphite.py -h
usage: check_graphite.py [-h] [-U URL] -t TARGETS --from _FROM
                         [--until _UNTIL] [-W WARN] [-C CRIT]
                         [-c COUNT | --percentile PERCENT] [--over] [--under]
                         [--empty-ok]

Check metrics from graphite API

optional arguments:
  -h, --help            show this help message and exit
  -U URL, --graphite-url URL
                        Graphite URL [http://localhost/]
  -t TARGETS, --target TARGETS
                        Target to check
  --from _FROM          From timestamp/date
  --until _UNTIL        Until timestamp/date [now]
  -W WARN, --warning WARN
                        Warning if datapoints over WARNING
  -C CRIT, --critical CRIT
                        Critical if datapoints over CRITICAL
  -c COUNT, --count COUNT
                        Alert when at least COUNT metrics are over/under thresholds [1]
  --percentile PERCENT  Use nPercentile Graphite function on the target (returns one datapoint)
  --over                Alert when data OVER specified WARNING or CRITICAL threshold [True]
  --under               Alert when data UNDER specified WARNING or CRITICAL threshold [False]
  --empty-ok            Empty data from Graphite is OK

Examples

  • Check "metricsReceived <= 1200" in the last 10 minutes :

    $ check_graphite -U http://localhost:8888/ --from=-10minutes -t metricsReceived -W 1200 GRAPHITE WARNING : metricsReceived is 1403.0 (highest value) | metricsReceived=1403.0;1200.0;;; metricsReceived=None/None/1403.0/1387.0/615.0/618.0/615.0/621.0/None/None

  • Check "metricsReceived <= 1200" yesterday for at least 70% of values :

    $ check_graphite -U http://localhost:8888/ -t metricsReceived -W 1200 -C 1400 --percentile=70 --from=yesterday --until=today GRAPHITE WARNING : metricsReceived is 1387.0 (highest value) | 'nPercentile(metricsReceived, 70)'=1387.0;1200.0;1400.0;;

  • Check "metricsReceived <= 1200" and "committedPoints <= 1200" for at least 70% of values in the last 10 minutes :

    $ check_graphite -U http://localhost:8888/ --from=-10minutes -t metricsReceived -t committedPoints -W 1200 -C 1400 --percentile=70 GRAPHITE WARNING : metricsReceived is 1387.0 (70th percentile) | 'nPercentile(metricsReceived, 70)'=1387.0;1200.0;1400.0;; OK : committedPoints is 639.0 (70th percentile) | 'nPercentile(committedPoints, 70)'=639.0;1200.0;1400.0;;

  • Check "metricsReceived" and "committedPoints" not over 1200 more than 2 times in the last 10 minutes :

    $ check_graphite -U http://localhost:8888/ --from=-10minutes -t "aliasByMetric(carbon.agents.*.{metricsReceived,committedPoints})" -W 1200 -C 1400 --count=3 GRAPHITE OK : committedPoints is 636.0 (third highest value) | committedPoints_3=636.0;1200.0;1400.0;; committedPoints=None/None/0.0/2692.0/599.0/636.0/633.0/639.0/None/None OK : metricsReceived committedPoints is 621.0 (third highest value) metricsReceived=None/None/1403.0/1387.0/615.0/618.0/615.0/621.0/None/None | metricsReceived_3=621.0;1200.0;1400.0;;

Alternatives

Comparison with disqus' version

This script is based on a litte script by disqus, ported to python3 by Debian, and aims to be a drop-in replacement for it. This script does not support --confidence, --beyond or --compare flags, though.

The main difference is compliance with Nagios plugin guidelines for performance data and that it returns UNKNOWN status on graphite server error or missing values (rather than CRITICAL).

Other implementations

Other check_graphite implementations include :