As we have reached part 3 or 4 in our series on monitoring your servers for free, I would like to take a moment to highlight a few gotcha with Nagios before moving into our final section where we will setup graphical monitoring with Observium. At this point we’ve gone through building our a Nagios server, setting up contacts, monitoring printers, as well as monitoring both Windows and Linux servers. There are certainly a few issues you are bound to encounter on your journey with Nagios monitoring. These issues cost me hours of struggle, digging, testing, and ultimately coming upon a resolution. I would like to pay forward some of this effort in the hope that I can help save some poor soul out there the time, effort, untold amounts of silent cursing and coffee drinking.
Monitoring SQL Express
Chances are if you have been a Windows administrator for very long and have deployed and application server or two, you have undoubtedly had to setup at least one SQL Express database. The challenge of monitoring SQL Express with Nagios is that our friends at Microsoft have created a service name that contains a dollar sign (MSSQL$SQLEXPRESS). The challenge is that the $ character must be escaped using $ in order for Nagios to correctly read the service name. Use the following service definition below as reference for making SQL Express play nice with Nagios.
service_description Service: SQL Server – SQLEXPRESS
check_command check_nt!SERVICESTATE!-d SHOWALL -l MSSQL$$SQLEXPRESS
Nagios localhost Warnings for HTTP
On your Nagios server you may encounter a yellow warning message regarding the HTTP service. The reason for this is that the NRPE client running on Linux servers is checking to ensure that apache is running and that it can locate and index file within the default docroot for apache. Resolving this issue on your Nagios server can be as simple as creating a blank file called index.html in the /var/www/html directory. A simple way to accomplish this is to use the touch command (ex touch /var/www/html/index.html
Once you create this dummy file you will need to restart apache (service httpd restart on RHEL or service apache2 restart on Debian variants) as well as the nagios service (service nagios restart).
Nagios Error When Rescheduling a Service Check (Error: Could not open command file ‘usr/local/nagios/var/rw/nagios.cmd’ for update!)
This and the SQL Express issue may very well be the most aggravating Nagios issues I encountered in my first deployment. However through being highly caffeinated and stubborn I did eventually find a solution. The root of the problem is that there is a permission setting that is getting flipped each time the Nagios service restarts. I have seen a few different fixes for this issue that both work. The first method I have seen corrects the problem at its root by correcting the broken permissions. However if this doesn’t resolve the issue for you, another alternative is to use a script that resets the permission each time the Nagios service runs. I recommend attempting the first method first as it is the more preferable fix, however if this doesn’t resolve your issue, try method two. Please note, terminal commands are in italics
#usermod -G nagios apache
#grep nagios /etc/group (ensure that the result shows that nagios is part of the apache group)
#service httpd restart (substitute apache2 instead of httpd on Debian based variants)
Alex Nogard’s blog lays out the methodology for creating a script that fixes the permission each time Nagios starts by adding the script into init.d/nagios. Please follow the link below for his instructions:
Automating NRPE Agent Deployment Through Puppet
Although outside the scope of this particular discussion, one way to automate deployment of NRPE throughout multiple web servers is to use puppet to facilitate this. I will perhaps visit this topic in later posts regarding Puppet Labs, however I will sum this point up quickly in a nutshell before we wrap up. If you have an existing Puppet infrastructure, you can simply have Puppet add the EPEL repo to each LAMP server, create an ensure installed statement to ensure that NRPE is installed, and finally push out a preconfigured nrpe.cfg that contains the correct server information for your Nagios server (located in /etc/nagios).
That brings me to the end of part 3 in our series on monitoring your servers for free. In the next installment, we’ll take a look at deploying Observium to give us graphical output for our servers. Til next time, may the coffee be endless and the uptime in your favor!