Discussion:
[Nagios-users] change service check timeout in Nagios
Michael W. Lucas
2009-06-29 19:46:53 UTC
Permalink
Hi,

It appears that numerous people have had this issue, but most of them
have it because they're doing something else wrong or have other
issues. I have a check that legitimately takes longer than 10 seconds
to complete, and none of the documentation or archives appear
relevant.

I run check_vrrp against hardware on the other side of the world.
(I've suggested that I install a monitor closer to the remote
equipment, but that's just not feasible right now.) Running
check_vrrp by hand gets me an answer in 14-17 seconds.

# /usr/bin/time -h /usr/local/libexec/nagios/check_snmp_vrrp.pl -H hostname -C community -s backup -t 20
8 vrid backup :OK
14.58s real 0.43s user 0.17s sys
#

When I use this plugin in Nagios, however, it errors out with:

CRITICAL - Plugin timed out after 10 seconds

Yes, I know it times out in 10 seconds. The plugin takes 14-17
seconds to run. The timeout appears to be within Nagios itself, as
the script doesn't time itself out.

Surely there's a built-in way to tell Nagios that either a particular
check can take longer, or increase the 10-second timeout within Nagios
itself? The service_check_timeout value kills runaway processes,
which this isn't. I know increasing the timeout would impact my
performance for my other apps, but I have a separate Nagios instance
for these devices.

If I can't change the timeout I'll write an external script that runs
the program and forwards the results to Nagios as a passive check, but
that seems unnecessarily cumbersome.

Any suggestions? Am I missing something in the documentation?

Any help appreciated,

==ml
--
Michael W. Lucas ***@BlackHelicopters.org, ***@FreeBSD.org
http://www.BlackHelicopters.org/~mwlucas/
Latest book: Cisco Routers for the Desperate, 2nd Edition
http://www.CiscoRoutersForTheDesperate.com/
Marc Powell
2009-06-29 20:44:01 UTC
Permalink
Post by Michael W. Lucas
Surely there's a built-in way to tell Nagios that either a particular
check can take longer, or increase the 10-second timeout within Nagios
itself? The service_check_timeout value kills runaway processes,
which this isn't.
But that's what you need to change. Think of it as the maximum amount
of time a plugin is allowed to run if the plugin doesn't terminate
itself within a reasonable amount of time. Misbehaving plugins are
just the most common reason to hit this timeout. It should be set
higher than the longest expected running time for any of your plugins.
Post by Michael W. Lucas
I know increasing the timeout would impact my
performance for my other apps, but I have a separate Nagios instance
for these devices.
Not necessarily. Only if your other plugins are badly behaved and
don't terminate themselves in a timely manner. Mine is typically set
to 60 seconds as I have some long running checks as well.

--
Marc
Michael W. Lucas
2009-06-30 14:10:40 UTC
Permalink
Post by Marc Powell
Post by Michael W. Lucas
Surely there's a built-in way to tell Nagios that either a particular
check can take longer, or increase the 10-second timeout within Nagios
itself? The service_check_timeout value kills runaway processes,
which this isn't.
But that's what you need to change. Think of it as the maximum amount
of time a plugin is allowed to run if the plugin doesn't terminate
itself within a reasonable amount of time. Misbehaving plugins are
just the most common reason to hit this timeout. It should be set
higher than the longest expected running time for any of your plugins.
Thanks for your response. The following settings have been my nagios
config for months now:

service_check_timeout=180
host_check_timeout=60

I would expect that a service check plugin would be allowed to run for
3 minutes. But the Web interface and log still show:

CRITICAL - Plugin timed out after 10 seconds

Obviously, I'm missing something somewhere? Do any other config
settings interact with this somehow?

This is with 3.0.6 on FreeBSD 7.2 i386, BTW.

Thanks,
==ml
--
Michael W. Lucas ***@BlackHelicopters.org, ***@FreeBSD.org
http://www.BlackHelicopters.org/~mwlucas/
Latest book: Cisco Routers for the Desperate, 2nd Edition
http://www.CiscoRoutersForTheDesperate.com/
Michael W. Lucas
2009-06-30 14:16:07 UTC
Permalink
Post by Michael W. Lucas
Post by Marc Powell
Post by Michael W. Lucas
Surely there's a built-in way to tell Nagios that either a particular
check can take longer, or increase the 10-second timeout within Nagios
itself? The service_check_timeout value kills runaway processes,
which this isn't.
But that's what you need to change. Think of it as the maximum amount
of time a plugin is allowed to run if the plugin doesn't terminate
itself within a reasonable amount of time. Misbehaving plugins are
just the most common reason to hit this timeout. It should be set
higher than the longest expected running time for any of your plugins.
Thanks for your response. The following settings have been my nagios
service_check_timeout=180
host_check_timeout=60
I would expect that a service check plugin would be allowed to run for
CRITICAL - Plugin timed out after 10 seconds
Obviously, I'm missing something somewhere? Do any other config
settings interact with this somehow?
This is with 3.0.6 on FreeBSD 7.2 i386, BTW.
<Lucas smacks his own forehead and says bad things about his own
upbringing>

Posted so that an answer appears in the archives.

I'm using "negate -u" CRITICAL in front of my check. negate has its
own timeout value that must be set to extend the life of a check!

Not a Nagios issue at all.

==ml
--
Michael W. Lucas ***@BlackHelicopters.org, ***@FreeBSD.org
http://www.BlackHelicopters.org/~mwlucas/
Latest book: Cisco Routers for the Desperate, 2nd Edition
http://www.CiscoRoutersForTheDesperate.com/
Loading...