• contact
  • about
Home

Nagios, the monitoring tool that cried wolf

rene — Wed, 11/04/2009 - 09:49

Im finding the Nagios check_load check becoming slightly annoying and noisy during short bursts of high load on servers such as 4am when /etc/cron.daily jobs usually run. There are also other checks that I run where I dont care if they go into a WARNING state. Having too many alerts sent out can be detrimental to a server monitoring system as the poor person who gets the notifications will eventually consider Nagios crying wolf.

I need to know when servers hit a high load though I want to only be notified if its a sustained high load period. I also need to know when something is CRITICAL though dont really care if its in a WARNING state. If i did care if it was in a WARNING state then I would perhaps configure the check to use CRITICAL instead.

Anyhow, my Nagios configs below.

For a simple check load service check I set max_check_attempts which determines how many times Nagios will check the service if an error was to occur to a higher than usual value. If the check attempts reach the value of max_check_attempts and the service is still an error Nagios will change the service to a HARD state typically logging a CRITICAL alert. The default value of max_check_attempts is 5. Its also important to note that the duration between the checks is determined by retry_check_interval which defaults to 1 (minute).

In this example when the load average hits a sustained value of 8, Nagios will check the service 20 times every 1 minute. If the load is still at 8 or higher a CRITICAL alert will be sent.

define service {
        use                             generic-service
        hostgroup_name                  webservers
        service_description             check load
        check_command                   check_nrpe!check_load!5,5,5 8,8,8
        max_check_attempts              20
        notification_interval           0
}

To configure only CRITICAL alerts be sent to a contact I use the following contact configuration. The only real deviation from a standard configuration is service_notification_options which determines what levels (warning, critical, unknowns and recoveries). Ive removed w (warning) and u (unknown) as I only care for c (critical) and r (recovery) alerts.

define contact{
        contact_name                     rene
        alias                            Rene Cunningham
        service_notification_period      24x7
        host_notification_period         24x7
        service_notification_options     c,r
        host_notification_options        d,r
        service_notification_commands    notify-service-by-email
        host_notification_commands       notify-host-by-email
        email                            rene@rene.bz
        }

photos im taking

Pancakes in the afternoon. NOMsThe Cuckoo in Olindaeastern beach, GeelongGeelong maestrohawthorn vs Geelong at the MCGSt Marys church in GeelongseaplaneMEGANOMSChristmas in July at Ms Marplesfound in old album store in sassafrasoutside tea store in sassafrasEarl and green teaphoto.JPGantique store in the dandenongschicken parmigiana at rangersbruschetta at rangers in the dandenongstimeball towerDO NOT USE 50 cents!!!veggie patch week 2Photo1.jpgPhoto1.jpgNOMS!!$@photo.JPGred shoesphoto.JPGRBGdance Eugene, dancejust hanging outRoyal Botanical Gardens in MelbourneRoyal Botanical Gardens in Melbourne

about me


Passionate Systems Engineer.
Want to know more?

connect with me

search rene.bz

what im reading

  • Pivoting 101
  • A word of advice from my father about being frugal.
  • Fighting the summer productivity blahs
  • App Update: BlurFX
  • The elements of change
  • The Life Changing Nature of Gratitude
  • Evernote Essentials: The definitive guide to using Evernote
  • 9 Expert Tips For Better Writing
  • Coburg, Melbourne #iphoneography
  • Media Exponential
  • Little Collins St, Melbourne #iphoneography
  • Google I/O 2010 - Making Freemium work
  • The 8 lies that software developers tell
  • Coburg, Melbourne #iphoneography
  • TED talks – What the world needs
  • It’s As If Apple Has Hired Don Draper
  • TechCrunch TV: Speaking Of… Detroit, Featuring Scott and Jay Adelson
  • Why the World Needs Google TV
  • Federation Square No. 4, Melbourne #iPhoneography
  • North Melbourne Station
more

what im bookmarking

  • mmmmail! - Free disposable Email to RSS service.
  • The New York Times > Style > Slide Show > Single Space
  • Puppet - Using Multiple Environments - Puppet Labs
  • Muscle Beach
  • Doctrine - Doctrine ORM for PHP - Coding Standards
  • Using CPAN with a non-root account
  • AdvancedNetworking - cobbler - Trac
  • Simple jQuery Tabs Plugin
  • HTML5 Demos and Examples
  • When can I use...
  • Les RPM de Remi - enterprise - 5 - remi - x86_64
  • RPM Search RedHat EL 5 mysql-5.1.48-1.el5.remi.1.i386.rpm
  • Index of /SRPMS
  • AspireOne/AO751h - Community Ubuntu Documentation
  • Vel2010
  • InterfaceLIFT: Wallpaper sorted by Date
  • Software « michaeldehaan.net
  • about-company | Next New Networks
  • PHP 5.3.2 RPMs for CentOS 5.4
  • Slicehost Forum - CentOS 5.4 and PHP 5.3.2
more

podcasts im listening to

  • Shot of Jaq » The War Of The Editors
  • Shot of Jaq » Marketing Or Madness?
  • Shot of Jaq » Web vs. Desktop
  • Shot of Jaq » The Lobbyist’s Recipe
  • Shot of Jaq » Later, Data
more
  • contact
  • about