Sean Carolan
2009-01-29 20:13:21 UTC
We have two Nagios servers each for monitoring different networks.
The production network has over 1200 service checks and the average
host check time is around 4 seconds:
Host Check Execution Time: 4.03 / 4.15 / 4.039 sec
The UAT network has only 120 checks. For some reason, starting
yesterday we have seen a huge spike in the average Host Check
Execution Time:
Host Check Execution Time: 4.03 / 24.09 / 16.236 sec
This is causing all sorts of false alarms. I tried to log onto the
server and run some checks from the command line and indeed, the
check_ping plugin runs really, really slow. The odd thing is that if
I just do a standard "ping hostname" it's nice and fast. We have not
changed or updated anything on this Nagios server, nor are we seeing
any kind of elevated CPU usage.
Has anyone else experienced anything like this? I'm not sure where to
look to start troubleshooting the problem.
The production network has over 1200 service checks and the average
host check time is around 4 seconds:
Host Check Execution Time: 4.03 / 4.15 / 4.039 sec
The UAT network has only 120 checks. For some reason, starting
yesterday we have seen a huge spike in the average Host Check
Execution Time:
Host Check Execution Time: 4.03 / 24.09 / 16.236 sec
This is causing all sorts of false alarms. I tried to log onto the
server and run some checks from the command line and indeed, the
check_ping plugin runs really, really slow. The odd thing is that if
I just do a standard "ping hostname" it's nice and fast. We have not
changed or updated anything on this Nagios server, nor are we seeing
any kind of elevated CPU usage.
Has anyone else experienced anything like this? I'm not sure where to
look to start troubleshooting the problem.