check-journald

Icinga/Nagios checkscript of systemd journal entries

check_journald.sh

Nagios / Icinga plugin to count error-level log entries in systemd journald over a configurable time window.

Description

check_journald.sh queries journalctl for log entries at or above a configurable severity level within a fixed lookback window and alerts based on count thresholds.

It evaluates:

Error count — number of log lines at or above the configured priority level
Unit scope — optionally restricted to one or more specific systemd units
Exclusion filter — optional regex to suppress known-noisy patterns
Context lines — recent matching log lines are included in the plugin output for fast triage

Requirements

Linux system with systemd / journald
journalctl available in PATH
The monitoring user must have read access to the journal:
```
usermod -aG systemd-journal nagios
```
bash, grep
Nagios / Icinga (or compatible monitoring system)

Installation

Copy the script to your plugin directory:

/usr/lib/nagios/plugins/check_journald.sh

Set executable permissions:

chmod 755 /usr/lib/nagios/plugins/check_journald.sh

Grant journal read access to the monitoring user:
```
usermod -aG systemd-journal nagios
```

Usage

check_journald.sh [--since ] [--unit ] [--priority ]
                         [--warn ] [--crit ] [--show-lines ]
                         [--exclude ]

Parameters

Parameter	Description
`--since`	Journald lookback window (default: `1h`). Accepts any `journalctl --since` value, e.g. `30m`, `2h`, `1d`
`--unit`	Filter to a specific systemd unit. Can be repeated for multiple units
`--priority`	Minimum log priority to count (default: `err`). All entries at this level and more severe are counted
`--warn`	Warning threshold for matching entry count (default: `1`)
`--crit`	Critical threshold for matching entry count (default: `10`)
`--show-lines`	Number of recent matching lines to append to output for triage (default: `3`, set to `0` to disable)
`--exclude`	Extended regex (grep -E) to exclude matching lines from the count

Default Values

Parameter	Default
since	`1h`
priority	`err`
warn	`1`
crit	`10`
show-lines	`3`

Priority Levels

Journald uses syslog-compatible priority levels. The --priority flag sets the minimum severity to count — all entries at that level and above (more severe) are included.

Level	Name	Typical use
0	`emerg`	System is unusable
1	`alert`	Immediate action required
2	`crit`	Critical condition
3	`err`	Error condition (default)
4	`warning`	Warning condition
5	`notice`	Normal but significant
6	`info`	Informational
7	`debug`	Debug messages

Threshold Semantics

Metric	Behavior
Log entry count	higher is worse

Example with --warn 1 --crit 10:

0 entries → OK
1–9 entries → WARNING
≥ 10 entries → CRITICAL

Example Usage

# System-wide, err+ in last 1h (defaults)
check_journald.sh

# Shorter window, lower threshold
check_journald.sh --since 30m --warn 1 --crit 5

# Monitor a specific unit
check_journald.sh --unit sshd --priority err --warn 1 --crit 5

# Monitor multiple units
check_journald.sh --unit nginx.service --unit php-fpm.service --since 2h

# Catch warnings too, with tighter thresholds
check_journald.sh --priority warning --warn 10 --crit 50

# Exclude known-noisy patterns
check_journald.sh --unit postfix.service --exclude "Connection reset by peer|Timeout"

# No context lines in output (cleaner for some dashboards)
check_journald.sh --show-lines 0 --warn 5 --crit 20

Example Output

[OK]: 0 err+ log entries in last 1h [system-wide] | log_errors=0;1;10;0;

[WARNING]: 3 err+ log entries in last 1h [system-wide] | log_errors=3;1;10;0;
  2025-10-01T08:12:44+0200 myhost kernel: EXT4-fs error (device sdb1)
  2025-10-01T08:13:01+0200 myhost sshd[2341]: error: Could not load host key
  2025-10-01T08:14:22+0200 myhost postfix[9812]: error: connect to smtp.example.com

[CRITICAL]: 14 err+ log entries in last 1h [unit=nginx.service] | log_errors=14;1;10;0;
  2025-10-01T08:19:11+0200 myhost nginx[441]: [error] connect() failed (111: Connection refused)
  2025-10-01T08:19:14+0200 myhost nginx[441]: [error] connect() failed (111: Connection refused)
  2025-10-01T08:19:17+0200 myhost nginx[441]: [error] no live upstreams while connecting to upstream

[OK]: 0 err+ log entries in last 2h [units=nginx.service,php-fpm.service] (exclude: Timeout) | log_errors=0;1;10;0;

Return Codes

Code	State
0	OK
1	WARNING
2	CRITICAL
3	UNKNOWN

NRPE Integration

Example /etc/nagios/nrpe.cfg:

# System-wide error check
command[check_journald]=/usr/lib/nagios/plugins/check_journald.sh --since 1h --warn 1 --crit 10

# Per-unit check
command[check_journald_nginx]=/usr/lib/nagios/plugins/check_journald.sh --unit nginx.service --warn 1 --crit 5

Or with dynamic arguments:

command[check_journald]=/usr/lib/nagios/plugins/check_journald.sh $ARG1$

(Restart NRPE after changes)

Notes

The --since value is passed directly to journalctl --since " ago". Supported formats follow journald's time syntax: 30m, 1h, 2h, 1d
The monitoring user needs to be a member of the systemd-journal group to read the journal without root. Add with usermod -aG systemd-journal nagios
--warn 1 (default) means a single error in the window triggers a WARNING. Raise this threshold for noisy services where some errors are expected
--exclude is applied after journalctl filtering — it does not affect what journald returns, only what gets counted
The --show-lines context is appended as extra lines after the main status line, compatible with Icinga2's $output$ and $long_output$ variables
When multiple --unit flags are given, journalctl filters by any of the specified units (OR logic)
Pair the check interval with your --since window to avoid counting the same entries twice (e.g. 1h check interval with --since 1h)

License

MIT

Services
Consulting
Trainings
Support
Subscriptions
Connect
Forum
GitHub