Discussion:
[Nagios-users] NRPE: Unable to read output; but works when run under strace ...
Florian Ernst
2012-10-08 18:31:21 UTC
Permalink
Hello all,

given a fairly well-running monitoring setup with about 18k services I
thought I had understood the basics. However, the following leaves me
clueless, and I hope I'm merely missing something obvious here:

On an up-to-date Debian Squeeze (i386) OpenVZ guest I have established
that my monitoring user can execute a given command:

***@vserv08:/# sudo -u monitor -i /usr/lib/nagios/plugins/check_dummy 0 success; echo Exitcode: $?
OK: success
Exitcode: 0

So far, so good. Now entering NRPE, using a stripped-down config for
illustrating the point:

***@vserv08:/# grep -v -e '^$' -e '^#' /etc/nagios/nrpe.cfg
debug=1
nrpe_user=monitor
nrpe_group=monitor
allowed_hosts=127.0.0.1
command[dummy]=/usr/lib/nagios/plugins/check_dummy 0 success

***@vserv08:/# ps auxww | grep '[/]usr/sbin/nrpe'
monitor 7215 0.0 0.1 3704 892 ? Ss 15:20 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d

The process startup logged as follows:

Oct 8 15:20:22 vserv08 nrpe[7214]: Added command[dummy]=/usr/lib/nagios/plugins/check_dummy 0 success
Oct 8 15:20:22 vserv08 nrpe[7214]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Oct 8 15:20:22 vserv08 nrpe[7215]: Starting up daemon
Oct 8 15:20:22 vserv08 nrpe[7215]: Listening for connections on port 5666
Oct 8 15:20:22 vserv08 nrpe[7215]: Allowing connections from: 127.0.0.1

However, executing the dummy command won't work:

***@vserv08:/# /usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1 -c dummy
NRPE: Unable to read output

This has been logged as:

Oct 8 15:21:36 vserv08 nrpe[7234]: Connection from 127.0.0.1 port 48791
Oct 8 15:21:36 vserv08 nrpe[7234]: Host address is in allowed_hosts
Oct 8 15:21:36 vserv08 nrpe[7234]: Handling the connection...
Oct 8 15:21:36 vserv08 nrpe[7234]: Host is asking for command 'dummy' to be run...
Oct 8 15:21:36 vserv08 nrpe[7234]: Running command: /usr/lib/nagios/plugins/check_dummy 0 success
Oct 8 15:21:36 vserv08 nrpe[7234]: Command completed with return code 2 and output:
Oct 8 15:21:36 vserv08 nrpe[7234]: Return Code: 2, Output: NRPE: Unable to read output
Oct 8 15:21:36 vserv08 nrpe[7234]: Connection from 127.0.0.1 closed.

This strikes me as weird: nrpe tries to execute the defined command, but
somehow no output shows up. I know of the peculiarities that might arise
once sudo joins the team or when permissions aren't set appropriately,
but this doesn't apply here.

Playing around with the dummy command (substituting a shell script,
sprinkling '| tee -a logfile' into the code, ...) revealed that indeed
the desired text output is generated but somehow gets discarded. Perhaps
the monitoring user or even the whole system is subtly broken, but given
that there are ~400 similiarily setup systems (all using the same
workflow/automatisms for deploying the monitoring infrastructure) I was
starting to wonder how that might have happened ...

However, it got weirder: if I strace the nrpe process, everything works
as desired:

***@vserv08:/# strace -f -o /root/log -p 7215

And then in another terminal:

***@vserv08:/# /usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1 -c dummy
OK: success

Logged as follows:

Oct 8 15:21:57 vserv08 nrpe[7240]: Connection from 127.0.0.1 port 37275
Oct 8 15:21:57 vserv08 nrpe[7240]: Host address is in allowed_hosts
Oct 8 15:21:57 vserv08 nrpe[7240]: Handling the connection...
Oct 8 15:21:57 vserv08 nrpe[7240]: Host is asking for command 'dummy' to be run...
Oct 8 15:21:57 vserv08 nrpe[7240]: Running command: /usr/lib/nagios/plugins/check_dummy 0 success
Oct 8 15:21:57 vserv08 nrpe[7240]: Command completed with return code 0 and output: OK: success
Oct 8 15:21:57 vserv08 nrpe[7240]: Return Code: 0, Output: OK: success
Oct 8 15:21:57 vserv08 nrpe[7240]: Connection from 127.0.0.1 closed.

I found no further hints in the strace log, but this led me to assume
that there is some NRPE weirdness involved, and thus I'm writing here
instead of further digging through the system.

Any ideas?

Cheers,
Flo
Peter Kaagman
2012-10-08 21:54:21 UTC
Permalink
Lol... had me going for a while to... for me the awnser was in the sudo config (visudo)... it required a user on a tty... I believe the directive was "requiretty". Disable that line and you'll be fine I think.

Peter

________________________________________
Van: Florian Ernst [***@gmx.net]
Verzonden: maandag 8 oktober 2012 20:31
To: nagios-***@lists.sourceforge.net
Onderwerp: [Nagios-users] NRPE: Unable to read output; but works when run under strace ...

Hello all,

given a fairly well-running monitoring setup with about 18k services I
thought I had understood the basics. However, the following leaves me
clueless, and I hope I'm merely missing something obvious here:

On an up-to-date Debian Squeeze (i386) OpenVZ guest I have established
that my monitoring user can execute a given command:

***@vserv08:/# sudo -u monitor -i /usr/lib/nagios/plugins/check_dummy 0 success; echo Exitcode: $?
OK: success
Exitcode: 0

So far, so good. Now entering NRPE, using a stripped-down config for
illustrating the point:

***@vserv08:/# grep -v -e '^$' -e '^#' /etc/nagios/nrpe.cfg
debug=1
nrpe_user=monitor
nrpe_group=monitor
allowed_hosts=127.0.0.1
command[dummy]=/usr/lib/nagios/plugins/check_dummy 0 success

***@vserv08:/# ps auxww | grep '[/]usr/sbin/nrpe'
monitor 7215 0.0 0.1 3704 892 ? Ss 15:20 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d

The process startup logged as follows:

Oct 8 15:20:22 vserv08 nrpe[7214]: Added command[dummy]=/usr/lib/nagios/plugins/check_dummy 0 success
Oct 8 15:20:22 vserv08 nrpe[7214]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Oct 8 15:20:22 vserv08 nrpe[7215]: Starting up daemon
Oct 8 15:20:22 vserv08 nrpe[7215]: Listening for connections on port 5666
Oct 8 15:20:22 vserv08 nrpe[7215]: Allowing connections from: 127.0.0.1

However, executing the dummy command won't work:

***@vserv08:/# /usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1 -c dummy
NRPE: Unable to read output

This has been logged as:

Oct 8 15:21:36 vserv08 nrpe[7234]: Connection from 127.0.0.1 port 48791
Oct 8 15:21:36 vserv08 nrpe[7234]: Host address is in allowed_hosts
Oct 8 15:21:36 vserv08 nrpe[7234]: Handling the connection...
Oct 8 15:21:36 vserv08 nrpe[7234]: Host is asking for command 'dummy' to be run...
Oct 8 15:21:36 vserv08 nrpe[7234]: Running command: /usr/lib/nagios/plugins/check_dummy 0 success
Oct 8 15:21:36 vserv08 nrpe[7234]: Command completed with return code 2 and output:
Oct 8 15:21:36 vserv08 nrpe[7234]: Return Code: 2, Output: NRPE: Unable to read output
Oct 8 15:21:36 vserv08 nrpe[7234]: Connection from 127.0.0.1 closed.

This strikes me as weird: nrpe tries to execute the defined command, but
somehow no output shows up. I know of the peculiarities that might arise
once sudo joins the team or when permissions aren't set appropriately,
but this doesn't apply here.

Playing around with the dummy command (substituting a shell script,
sprinkling '| tee -a logfile' into the code, ...) revealed that indeed
the desired text output is generated but somehow gets discarded. Perhaps
the monitoring user or even the whole system is subtly broken, but given
that there are ~400 similiarily setup systems (all using the same
workflow/automatisms for deploying the monitoring infrastructure) I was
starting to wonder how that might have happened ...

However, it got weirder: if I strace the nrpe process, everything works
as desired:

***@vserv08:/# strace -f -o /root/log -p 7215

And then in another terminal:

***@vserv08:/# /usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1 -c dummy
OK: success

Logged as follows:

Oct 8 15:21:57 vserv08 nrpe[7240]: Connection from 127.0.0.1 port 37275
Oct 8 15:21:57 vserv08 nrpe[7240]: Host address is in allowed_hosts
Oct 8 15:21:57 vserv08 nrpe[7240]: Handling the connection...
Oct 8 15:21:57 vserv08 nrpe[7240]: Host is asking for command 'dummy' to be run...
Oct 8 15:21:57 vserv08 nrpe[7240]: Running command: /usr/lib/nagios/plugins/check_dummy 0 success
Oct 8 15:21:57 vserv08 nrpe[7240]: Command completed with return code 0 and output: OK: success
Oct 8 15:21:57 vserv08 nrpe[7240]: Return Code: 0, Output: OK: success
Oct 8 15:21:57 vserv08 nrpe[7240]: Connection from 127.0.0.1 closed.

I found no further hints in the strace log, but this led me to assume
that there is some NRPE weirdness involved, and thus I'm writing here
instead of further digging through the system.

Any ideas?

Cheers,
Flo

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
Florian Ernst
2012-10-09 18:14:09 UTC
Permalink
Hello Peter,

thanks for your reply.

However, as previously written, I know of the peculiarities that might
arise once sudo joins the team, and in the issue at hand sudo is no more
involved than being used for illustration purposes while the issue
itself doesn't even remotely touch sudo at all.
Furthermore, I never found any the necessity to deal with !requiretty on
Debian, but indeed had to make use of this sudo option on RHEL. Still,
no sudo involved in this case, sorry ...

Cheers,
Flo
Peter Kaagman
2012-10-10 04:00:16 UTC
Permalink
-----Oorspronkelijk bericht-----
Verzonden: dinsdag 9 oktober 2012 20:14
Aan: Nagios Users List
Onderwerp: Re: [Nagios-users] NRPE: Unable to read output; but works
when run under strace ...
Hello Peter,
thanks for your reply.
However, as previously written, I know of the peculiarities that might arise
once sudo joins the team, and in the issue at hand sudo is no more involved
than being used for illustration purposes while the issue itself doesn't even
remotely touch sudo at all.
Furthermore, I never found any the necessity to deal with !requiretty on
Debian, but indeed had to make use of this sudo option on RHEL. Still, no
sudo involved in this case, sorry ...
Cheers,
Flo
[Peter Kaagman]
Sorry that did not help you. Guess I should have read you post more closely...

Peter
Tech Support
2012-10-10 19:10:13 UTC
Permalink
Whenever I've had a problem with a plugin, and was trying to figure out
what was going on, I've had 100% success using a little PERL script called
capture-plugin.pl that I found. You can find it here:
http://www.waggy.at/nagios/capture_plugin.htm. You should check it out. It's
genius is in its simplicity.
Regards;
John


-----Original Message-----
From: Florian Ernst [mailto:***@gmx.net]
Sent: Tuesday, October 09, 2012 2:14 PM
To: Nagios Users List
Subject: Re: [Nagios-users] NRPE: Unable to read output; but works when run
under strace ...

Hello Peter,

thanks for your reply.

However, as previously written, I know of the peculiarities that might arise
once sudo joins the team, and in the issue at hand sudo is no more involved
than being used for illustration purposes while the issue itself doesn't
even remotely touch sudo at all.
Furthermore, I never found any the necessity to deal with !requiretty on
Debian, but indeed had to make use of this sudo option on RHEL. Still, no
sudo involved in this case, sorry ...

Cheers,
Flo

----------------------------------------------------------------------------
--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly what is
happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at
no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue.
::: Messages without supporting info will risk being sent to /dev/null
Florian Ernst
2012-10-15 09:07:56 UTC
Permalink
Hello John / Tech Support,

thanks for your reply, and sorry for the delay in answering.
Post by Tech Support
Whenever I've had a problem with a plugin, and was trying to figure out
what was going on, I've had 100% success using a little PERL script called
http://www.waggy.at/nagios/capture_plugin.htm. You should check it out. It's
genius is in its simplicity.
Well, it helps debugging plugin weirdness, but my problem lies with
NRPE: a check doesn't work when TTBOMK it should work, and it actually
works as desired when I strace the NRPE process.
At the moment I see no plugin to debug, but thanks anyway.

Cheers,
Flo
Florian Ernst
2013-01-08 13:53:03 UTC
Permalink
Hello all,

following up to myself ...
Post by Florian Ernst
[...]
However, it got weirder: if I strace the nrpe process, everything works
[...]
I found no further hints in the strace log, but this led me to assume
that there is some NRPE weirdness involved, and thus I'm writing here
instead of further digging through the system.
Any ideas?
Lacking any further clue as to what peculiarity might be responsible for
this weird behavior, I decided to simply let the nrpe process
permanently run under strace like
strace -f -q -e none -o /dev/null ...

The performance impact is minimal, and I finally get check results. Not
really a solution, but it'll do.

Cheers,
Flo
Eliezer Croitoru
2013-01-14 15:41:28 UTC
Permalink
I would try to run a basic script which logs the nrpe environment into a
file.
it will make sure what is available to you and what not.

test if:
there are environment vars..
there is a STDOUT\IN\ERR available etc..

#!/usr/bin/env ruby
f = File.open("/tmp/nrpetest.log","a")
f.sync = true

f.puts "ENVIROMENTAR VARS\n==========="
ENV.each {|k,p| f.puts "#{k} = #{p}"}

f.puts "STD DATA\n=========="
f.puts "stdout exists?: #{$stdout}"
if $stdout

f.puts "stdout is tty?: #{$stdout.isatty}"
begin
$stdout.write_nonblock "0"
f.puts "Wrote 0 to stdout"
rescue => e
f.puts "didnt wrote to stdout"
f.puts e.message
f.puts e.backtrace.inspect
end

end

modify the above to make sure what is the cause of this problem.
you can also try to access stdout\err

Regards,
Eliezer
Post by Florian Ernst
Hello all,
following up to myself ...
Post by Florian Ernst
[...]
However, it got weirder: if I strace the nrpe process, everything works
[...]
I found no further hints in the strace log, but this led me to assume
that there is some NRPE weirdness involved, and thus I'm writing here
instead of further digging through the system.
Any ideas?
Lacking any further clue as to what peculiarity might be responsible for
this weird behavior, I decided to simply let the nrpe process
permanently run under strace like
strace -f -q -e none -o /dev/null ...
The performance impact is minimal, and I finally get check results. Not
really a solution, but it'll do.
Cheers,
Flo
Florian Ernst
2013-01-14 19:10:48 UTC
Permalink
Hello Eliezer,

thanks for your reply.
Post by Eliezer Croitoru
I would try to run a basic script which logs the nrpe environment into a
file.
it will make sure what is available to you and what not.
[...]
modify the above to make sure what is the cause of this problem.
you can also try to access stdout\err
As mentioned in the original post¹ "indeed the desired text output is
generated [by the plugin] but somehow gets discarded [by NRPE]".
However, "if I strace the nrpe process, everything works as desired".

The latter is what irks me. The plugins themselves are running fine.

Cheers,
Flo


¹ http://sourceforge.net/mailarchive/message.php?msg_id=29939293
Loading...