check_graphite

Check graphite

Check metrics from graphite from nagios, icinga2 or compatible monitoring solution.

Source code at https://gitlab.com/samuelbf/check_graphite/. Forked from Disqus nagios-plugins repository.

Usage

    % ./check_graphite.py -h
    usage: check_graphite.py [-h] [-U URL] -t TARGETS --from _FROM
                             [--until _UNTIL] [-W WARN] [-C CRIT]
                             [-c COUNT | --percentile PERCENT] [--over] [--under]
                             [--empty-ok]

    Check metrics from graphite API

    optional arguments:
      -h, --help            show this help message and exit
      -U URL, --graphite-url URL
                            Graphite URL [http://localhost/]
      -t TARGETS, --target TARGETS
                            Target to check
      --from _FROM          From timestamp/date
      --until _UNTIL        Until timestamp/date [now]
      -W WARN, --warning WARN
                            Warning if datapoints over WARNING
      -C CRIT, --critical CRIT
                            Critical if datapoints over CRITICAL
      -c COUNT, --count COUNT
                            Alert when at least COUNT metrics are over/under thresholds [1]
      --percentile PERCENT  Use nPercentile Graphite function on the target (returns one datapoint)
      --over                Alert when data OVER specified WARNING or CRITICAL threshold [True]
      --under               Alert when data UNDER specified WARNING or CRITICAL threshold [False]
      --empty-ok            Empty data from Graphite is OK

Examples

  • Check "metricsReceived <= 1200" in the last 10 minutes :

    $ check_graphite -U http://localhost:8888/ --from=-10minutes -t metricsReceived -W 1200
    GRAPHITE WARNING : metricsReceived is over 1200.0 at least once | metricsReceived=1403.0;1200.0;;;
    metricsReceived=None/None/1403.0/1387.0/615.0/618.0/615.0/621.0/None/None
  • Check "metricsReceived <= 1200" yesterday for at least 70% of values :

    $ check_graphite -U http://localhost:8888/ -t metricsReceived -W 1200 -C 1400 --percentile=70 --from=yesterday --until=today
    GRAPHITE WARNING : metricsReceived is over 1200.0 at least 30% of points | nPercentile(metricsReceived,70)=1387.0;1200.0;1400.0;;
  • Check "metricsReceived <= 1200" and "committedPoints <= 1200" for at least 70% of values in the last 10 minutes :

    $ check_graphite -U http://localhost:8888/ --from=-10minutes -t metricsReceived -t committedPoints -W 1200 -C 1400 --percentile=70
    GRAPHITE WARNING : metricsReceived is over 1200.0 at least 30% of points | nPercentile(metricsReceived,70)=1387.0;1200.0;1400.0;;
    OK : committedPoints is under 1200.0 | nPercentile(committedPoints,70)=639.0;1200.0;1400.0;;
  • Check "metricsReceived" and "committedPoints" not over 1200 more than 2 times in the last 10 minutes :

    $ check_graphite -U http://localhost:8888/ --from=-10minutes -t "aliasByMetric(carbon.agents.*.{metricsReceived,committedPoints})" -W 1200 -C 1400 --count=3
    GRAPHITE OK : committedPoints is under 1200.0 | committedPoints_3=636.0;1200.0;1400.0;;
    committedPoints=None/None/0.0/2692.0/599.0/636.0/633.0/639.0/None/None
    OK : metricsReceived is under 1200.0
    metricsReceived=None/None/1403.0/1387.0/615.0/618.0/615.0/621.0/None/None | metricsReceived_3=621.0;1200.0;1400.0;;

Alternatives

Comparison with disqus' version

This script is based on a litte script by disqus, ported to python3 by Debian, and aims to be a drop-in replacement for it. This script does not support --confidence, --beyond or --compare flags, though.

The main difference is compliance with Nagios plugin guidelines for performance data and that it returns UNKNOWN status on graphite server error or missing values (rather than CRITICAL).

Other implementations

Other check_graphite implementations include :