Discussion:
[Nagios-users] check_dns problems
Robert Nelson
2004-03-22 19:40:04 UTC
Permalink
Hello,

I've used the check_dns plugin for quite awhile now at another job, and it worked flawlessly. When I got to my current job, I threw up Nagios and lo and behold, the first two services I put in to showcase the system were DNS and are showing red nearly all the time. Whoops!

I'm using check_dns on two freeBSD servers. Check_dns is reporting an effective downtime of 99% (the few times I get a recovery notice, I also get a failure notice within 5-10 minutes). At the same time, our entire office uses these two DNS servers exclusively, as well as all our clients, and no-one reports any DNS problems.

This has been going on for nearly a month now. I can't explain it at all. Using check_dns from the command line rapidly (up arrow, enter, up arrow, enter, etc.) gives me *maybe* a 2 in 10 failure rate. Doing nslookup's to the same servers gives possibly a 1/100 failure, altho some queries take more than the 10 seconds that the plugin times out in. At the same time, constant pings between the nagios and DNS servers work 100%. I have this problem with BOTH dns servers.

Setup is...

Nagios:
Nagios v1.2
check_dns (nagios-plugins 1.3.1) 1.8.2.3
Red Hat v8.0
Apache 2.0.4

DNS Servers:
FreeBSD 4.7-RELEASE-p3
named 8.3.3-REL

Thanks!


Rob Nelson
Network Engineer
Windchannel Communications
M: 919-538-6326
Subhendu Ghosh
2004-03-22 21:04:07 UTC
Permalink
Post by Robert Nelson
Hello,
I've used the check_dns plugin for quite awhile now at another job, and
it worked flawlessly. When I got to my current job, I threw up Nagios
and lo and behold, the first two services I put in to showcase the
system were DNS and are showing red nearly all the time. Whoops!
I'm using check_dns on two freeBSD servers. Check_dns is reporting an
effective downtime of 99% (the few times I get a recovery notice, I also
get a failure notice within 5-10 minutes). At the same time, our entire
office uses these two DNS servers exclusively, as well as all our
clients, and no-one reports any DNS problems.
This has been going on for nearly a month now. I can't explain it at
all. Using check_dns from the command line rapidly (up arrow, enter, up
arrow, enter, etc.) gives me *maybe* a 2 in 10 failure rate. Doing
nslookup's to the same servers gives possibly a 1/100 failure, altho
some queries take more than the 10 seconds that the plugin times out in.
At the same time, constant pings between the nagios and DNS servers work
100%. I have this problem with BOTH dns servers.
Setup is...
Nagios v1.2
check_dns (nagios-plugins 1.3.1) 1.8.2.3
Red Hat v8.0
Apache 2.0.4
FreeBSD 4.7-RELEASE-p3
named 8.3.3-REL
Thanks!
Do you have query logging enabled on the DNS servers and can see the
relative timing of the requests?

Are the check_dns checks for authoritative or recursive records?

Any problems with load on nagios or the dns systems?
--
-sg
Robert Nelson
2004-03-22 22:03:05 UTC
Permalink
Post by Subhendu Ghosh
Do you have query logging enabled on the DNS servers and can see the
relative timing of the requests?
Do not know, I'll check on that tomorrow
Post by Subhendu Ghosh
Are the check_dns checks for authoritative or recursive records?
Either or or. check_dns by default is www.yahoo.com, but I've tried the manual check_dns and nslookup tests I described with both types of records, same every time.
Post by Subhendu Ghosh
Any problems with load on nagios or the dns systems?
Nope, neither load nor waiting on IO seems high. I've also checked to make sure the addresses do proper reverse lookups, which they do for both machines.

Rob Nelson
***@windchannel.com
Robert Nelson
2004-03-23 17:50:11 UTC
Permalink
Post by Subhendu Ghosh
Post by Subhendu Ghosh
Do you have query logging enabled on the DNS servers and
can see the
Post by Subhendu Ghosh
relative timing of the requests?
Do not know, I'll check on that tomorrow
I've done further research, and the general consensus around the office is that FreeBSD sucks and no-one knows how to configure it except the guy who left ;) Probably will be redoing the servers soon, given a spare hour or so sometime.

In the meantime, I've set up a "test" host that's another DNS server I can rely on, to see if it has similar problems. If I didn't mention it earlier, btw, there's another DNS server on our network and it works fine - running RHLinux, of course. It's not as heavily burdened as the others, though, which is why I set up the "test" host. We'll see how it works.

Just trying to figure out what setting in FreeBSD could cause this or needs tweaked for quicker response times. The servers are P3 something-or-others and are barely used after 5pm, since all our clients are 8-5 businesses.

Rob Nelson
Network Engineer
Windchannel Communications
M: 919-538-6326

Loading...