I've got Nagios and Munin up and running for my infrastructure. Now I've got to figure out how to best utilize those tools so that I'm not wasting system resources in the process of monitoring the system!
I searched on "nagios" and "munin" to see if there were any posts that already addressed this question. I didn't find any that answered my specific questions. But, for the sake of starting out by adding some value with this post, here is the list of things that are at least relevant:
I've got a basic Nagios monitoring setup in place. See the attached screenshot. I check for the number of logged in users, because I've got my servers set up with OpenVPN and there really should not be more than one user (me!) logged in at any point in time. I've got it set to send a Warning at 3 users logged in and Critical when 5 users are logged in. [I'm not really sure of the value of checking the number of running processes. But, this was one of the plugins that was being used by default.]
My real questions are how to best use Munin (given the fact that Nagios is already in use).
- Out of the box there are a ton of graphs being created and I don't know how to even interpret most of them. I feel like I'm definitely using up a bunch of resources to generate those graphs without getting any ROI. What is the point of the various graphs? Which ones do you use? How do I pick and choose which graphs I want to have generated? (I have been reading through the documentation I can find online on Munin and still have not come across answers to these questions. It looks like the answer to picking and choosing is a matter of adding and removing sym links from
- What additional benefit beyond Nagios does Munin provide? It seems that there is definitely some overlap, as from what I understand Munin can also send alerts. However, also from what I understand, it is common to run both Nagios and Munin. I am thinking that Munin provides more opportunity to look for trends in the graphs. But, I have not yet generated any of the reports that Nagios provides, such as the Trends report. Am I correct in thinking that Nagios provides monitoring on-the-spot, while Munin can do more to keep track of historical data?
At the moment, I feel that Munin is basically duplicating the things I'm already doing in Nagios and I'm wondering if I should be using Munin.
Finally, what types of things do you feel are truly important to monitor (whether via Nagios or Munin)? Off the top of my head, I would suggest CPU Load, Disk Utilization and whether the server is even up.