check_spectrum_scale

This python script checks various aspects of an IBM Spectrum Scale (a.k.a GPFS) cluster. It verifies the node state, filesystem mount state, capacity and inode utilization, and physical disk state of IBM Spectrum Sclae systems.

check_spectrum_scale

This python script checks various aspects of an IBM Spectrum Scale (a.k.a GPFS) cluster. It verifies the node state, filesystem mount state, capacity and inode utilization of IBM Spectrum Sclae systems.

Note

This script is in the development process! We don't have implemented the full functionality. Please report bugs to Ph.Posovszky@gmail.com or on GitHub in the Issus tracker.

Open Soruce Release

https://github.com/theGidy/check_spectrum_scale

Permissions for Nagios/Icinga

Change the execution permissions in visudo

icinga  ALL=(ALL) NOPASSWD: /usr/lpp/mmfs/bin/mmgetstate
icinga  ALL=(ALL) NOPASSWD: /usr/lpp/mmfs/bin/mmrepquota
icinga  ALL=(ALL) NOPASSWD: /usr/lpp/mmfs/bin/mmlsquota

NRPE.cfg Example

command[check_quota_user]=/usr/lib/nagios/plugins/check_spectrum_scale.py quota -w 95 -c 97 -d Processing_1 -t u -L
command[check_fileset_linked]=/usr/lib/nagios/plugins/check_spectrum_scale.py filesets -d Processing_1 -l  -L -w 0 -c 2
command[check_fileset_inode]=/usr/lib/nagios/plugins/check_spectrum_scale.py filesets -d Processing_1 -i  -L -w 90 -c 96
command[check_status_quorum]=/usr/lib/nagios/plugins/check_spectrum_scale.py status -q
command[check_status_nodes]=/usr/lib/nagios/plugins/check_spectrum_scale.py status -n -w 2 -c 1
command[check_status_node]=/usr/lib/nagios/plugins/check_spectrum_scale.py status -s

Example

Status

Status of the gpfs system

This check will be result in warning/critical it are less than warning/critical nodes are online.

./check_spectrum_scale.py status -n -w 2 -c 1
OK - 3 Nodes are up.|nodesUp=3;5;3;; totalNodes=3 nodesDown=0

Check Quorum nodes

This check will be result in a critical if less thans n/2+1 quorum nodes are online

./check_spectrum_scale.py status -q
OK - (2/2) nodes are online!|quorumUp=2;2;2;;

Check node gpfs status

This check will be result in a critical if the node is in another state than "active"

./check_spectrum_scale.py status -s
OK - Node gpfs-node1.test.de is in state:active|nodesUp=3;5;3;; totalNodes=3 nodesDown=0 quorumUp=2;2;;;

Filesystem

FileSet

Check link status

Check the link status of all filesets, if more than 4/6 are unlinked its in warning/critical state

./check_spectrum_scale.py filesets -d Processing_1 -l  -L -w 4 -c 6
OK - 9/9 filesets are linked|Linked=14;4;6;0;14 Unlinked=0;4;6;0;14 Deleted=0;4;6;0;14 
Linked FileSets: root, largeHome, set1, set2, set3, set4, set5, set6, temp
Unlinked FileSets: 
Deleted FileSets: 

Check link status specific fileset

Check the link status of the largeHome filesets, if more than 1/1 are unlinked its in warning/critical state

./check_spectrum_scale.py filesets -d Processing_1 -f largeHome -l  -L -w 1 -c1
OK - 1/1 filesets are linked|Linked=1;4;6;0;1 Unlinked=0;4;6;0;1 Deleted=0;4;6;0;1
Linked FileSets: largeHome
Unlinked FileSets: 
Deleted FileSets: 

Check inode utilization

Check the inode utilization of all sfilesets, if more than 90/97 percent are occupied its in warnig/critical state

./check_spectrum_scale.py filesets -d Processing_1 -i  -L -w 90 -c 96
OK - Inode utilization is normal|root=64266752;6476697.6;2590679.04;;64766976 blockSiz:0KB;;;;largeHome=10014720;2000025.6;800010.24;;20000256 blockSiz:0KB;;;;Geo_Data=19499520;2000025.6;800010.24;;20000256 blockSiz:0KB;;;;Cal_Sentinel=19499520;2000025.6;800010.24;;20000256 blockSiz:0KB;;;;Pol-InSAR_InfoRetrieval=19899904;2000025.6;800010.24;;20000256 blockSiz:0KB;;;;TSM_TDM_SARData=24161792;3000012.8;1200005.12;;30000128 blockSiz:0KB;;;;TDL_Workspace=1939968;209715.2;83886.08;;2097152 blockSiz:0KB;;;;TAXI=1996800;209715.2;83886.08;;2097152 blockSiz:0KB;;;;Software_Linux=1971712;209715.2;83886.08;;2097152 blockSiz:0KB;;;;Processing_Server_Access=0;0.0;0.0;;0 blockSiz:0KB;;;;TDM_SEC_Cal=19693056;2000025.6;800010.24;;20000256 blockSiz:0KB;;;;TDM_SEC=10277376;2000025.6;800010.24;;20000256 blockSiz:0KB;;;;HR_Projekte=630272;100044.8;40017.92;;1000448 blockSiz:0KB;;;;temp=1789952;209715.2;83886.08;;2097152 blockSiz:0KB;;;;
Critical FileSets: 
Warning FileSets: 

Pools

Check all pools

This check will test if some pool are above 95/97% percent of saturation for the data/meta space on the device Processing_1 with a long output

./check_spectrum_scale.py pools -d Processing_1 -w 95 -c 97 -L
Critical - Data Pool: 1 Meta Pool: 0|Data_Pool_2=7261755392;9367607705.6;3747043082.24;;62419992576 Data_Pool_1=2315413504;9367607705.6;3747043082.24;;93676077056 Meta_system=3773308928;0.0;0.0;;3901249536
Critical Data Pool: Pool_1
Warning Data Pool: 
Critical Meta Pool: 
Warning Meta Pool: 

Check one or more specific pools

This check will test if specific pools are above 95/97% percent of saturation for the data/meta space on the device Processing_1 with a long output

./check_spectrum_scale.py pools -d Processing_1 -w 95 -c 97 -p Pool_1,Pool_2 -L
Critical - Data Pool: 1 Meta Pool: 0|Data_Pool_2=7261755392;9367607705.6;3747043082.24;;62419992576 Data_Pool_1=2315413504;9367607705.6;3747043082.24;;93676077056 Meta_system=3773308928;0.0;0.0;;3901249536
Critical Data Pool: Pool_1
Warning Data Pool: 
Critical Meta Pool: 
Warning Meta Pool: 

Quota

Usage

This check will test if some quota is above 95/97% percent of saturation for the fileSystem Processing_1

./check_spectrum_scale.py quota -d Processing_1 -w 95 -c 97
WARNING - Block: 1 File: 0|blockViolation=1 blockCritical=0 fileViolation=0 fileCritical=0

Usage only for specific fileset

This check will test if some quota is above 95/97% percent of saturation for the fileSystem Processing_1 and fileset largeHome

./check_spectrum_scale.py quota -d Processing_1 -w 95 -c 97 -fs largeHome
WARNING - Block: 1 File: 0|blockViolation=1 blockCritical=0 fileViolation=0 fileCritical=0

Usage only for specific user

This check will test if some quota is above 95/97% percent of saturation for the user "user1"

./check_spectrum_scale.py quota -d Processing_1 -w 95 -c 97 -fs largeHome
WARNING - Block: 1 File: 0|blockViolation=1 blockCritical=0 fileViolation=0 fileCritical=0

Usage only for specific group

This check will test if some quota is above 95/97% percent of utilization for the group "admins"

./check_spectrum_scale.py quota -w 95 -c 97 -d Processing_1 -n admins -t g
OK - No Violations detected|blockViolation=0 blockCritical=0 fileViolation=0 fileCritical=0