Discussion:
[Nagios-users] Service checks for redundant hosts
Ben Prew
2013-07-30 18:05:28 UTC
Permalink
Hey,

I'm looking for some suggestions for implementing a service check on a
redundant host pair that access a shared resource.

Here's our setup:

We have N hosts that process (via delayed_job) a shared job queue
(mysql/redis). We have several checks that are host-specific (# of workers
on that host), but we also have several checks that examine the shared job
queue (# of unprocessed jobs).

I have several possible implementations:

============
1. Shared Job Queue check on single processing host (current setup)
Pros:
* We only get notified once when the shared queue is high

Cons:
* If the single host goes down, we lose the shared queue check

============
2. Shared Job Queue check on all processing hosts

Pros:
* If a single processing host goes down, the shared queue check still
functions

Cons:
* Multiple emails from hosts when the shared check fails

============
3. Shared Job Queue check on job queue host (ie the DB box)

Pros:
* If the DB goes down, you can't reach the queue anyway
* Single email on failure

Cons:
* The check requires app knowledge, which requires having the app deployed
on the job queue host

How are others adding a check like this? #2 and just bite the bullet for
multiple emails?

Thanks

Loading...