fetch_elastic

A Nagios/Icinga plugin to alert on search hits in Elasticsearch

fetch_elastic

This is a Nagios-compatible module which will perform a search against Elasticsearch and alert if the number of hits are above the specified values. It takes a json payload compatible with Elastic's REST API.

Usage of ./fetch_elastic: [options] [elastic hosts]
Host format: http://host1:9200
  -I string
        CloudID
  -a string
        API Token
  -c int
        Critical number of hits
  -ca string
        CA Certificate file
  -cf string
        Counter file for persistent errors
  -e    Run as event handler to remove counter file
  -i string
        Elasticsearch Index (default "index")
  -o string
        Custom phrasing for output (default "hits for elastic search.")
  -p string
        Password
  -q string
        JSON file holding the query
  -s int
        Current status according to Icinga. Remove counter file if 'event' and 'OK'
  -u string
        Username
  -w int
        Warning number of hits

The command itself is straightforward. Writing the json config for your searches will require an understanding of Elastic beyond just Kibana's query language. I recommend reading their documentation at https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html

Here is an example. This query will look for hosts starting with "database" in the name, and check for errors that include postgresql's default port.

{
  "size": 1,
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase_prefix": {
            "host.name": {
              "query": "database"
            }
          }
        },
        {
          "query_string": {
            "default_field": "log",
            "query": "ERROR AND 5432"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-10m",
              "lte": "now"
            }
          }
        }
      ]
    }
  }
}

match_phrase_prefix selects the beginning of the host name. query_string uses a Lucene type search string. range in this case is set to look at the last 10 minutes, and I would accordingly set Nagios or Icinga to check every 10 minutes. For size I've put 1, because we don't actually need any of this data; we're going off the total hits value, not counting the number of returned results. By default, total hits will cap at 10,000. You can adjust this if needed, but it will be more taxing on your cluster and will slow the check down.

In this case, I might want to watch for persisting errors but warn right away. Here's an example of how this would run warning the first time and going critical if it sees 5.

/usr/lib/nagios/plugins/fetch_elastic -i my_index -q /opt/elastic_queries/example.json -w 1 -c 5 http://myelasticserver:9200

Counter vs gauge

This check can continue to count across checks and increment from the last check. It stores the count in a file and adds to it on the next run. Removing the counter file will reset. An example of this:

/usr/lib/nagios/plugins/fetch_elastic -i my_index -q /opt/elastic_queries/example.json -w 1 -c 5
-cf /var/run/icinga2/cmd/mycounter.bin http://myelasticserver:9200

This command can also be used as an event handler when the -e flag is specified. In this case, if you process check result to OK status, the plugin will remove the counter file for you. This will save you the trouble of logging into the montioring server to remove it by hand. Example usage:

/usr/lib/nagios/plugins/fetch_elastic -e -s 0 -cf /var/run/icinga2/cmd/mycounter.bin