UPDATE cross posting this into SOC 2008 for any feedback
First, thanks for the great BOF talk at DrupalCon, Adrian and Boris. I've got to install this and get it running, but stepping through some of the code I'm wondering whether this is a good framework to build out a monitoring system based in Drupal (something else on my mind)? Perhaps there are a few monitoring metrics that would not work well, but on the other hand, perhaps you can keep a very good eye on a lot of metrics about the servers themselves which have the hostmaster2 back end installed on them. Since I have not setup hostmaster2 myself yet, forgive me if I'm going down the wrong track here. So the idea is, since the hostmaster2 framework has the groundwork laid out to connect a frontend with multiple backends (multiple servers) and there's the hosting_stat hook which is invoked by the provisioning_stat module on each agent, could a suite of modules be developed to, at the general server level:
- monitor disk usage
- memory usage
- apache benchmark each Drupal instance and aggregate stats
- monitor disk i/o
- raise flags for more case-specific cases like:
-- cron not successfully completed on a single Drupal site for over a day
-- # of new user accounts created in 24 hour period exceeds 100.
-- no new taxonomy terms created on the site in past 24 hours
(the idea w/ these is perhaps they're application-specific indicators of problems, and not necessarily a problem on every drupal site)
So, you could do some server-level monitoring, as well as what you've already started doing w/ the hosting_stat hook (application-level monitoring). Is the server level monitoring something others would find valuable? Are people using other tools for this, or do not have the need at all?
Next, I've never used Nagios personally, but I've heard it's a bear to setup, and for doing monitoring/alerts for simple things like excessive memory usage or disk space monitoring, it seems like it could potentially be faster to write simple scripts than get going on that application (though of course perhaps there are definite reasons to use it over something like this...)
Finally, w/ Google Summer of Code about to begin, is there anything in the hostmaster2 realm that would be great to get a student involved w/, like the BIND integration, or this server-level monitoring if it seems useful and practical to do w/ hostmaster2 framework?
thanks for any thoughts,
Ian

Comments
Definitely.
It provides a framework for adding additional information into the system. While it would probably not happen on the stats call, as stats are for the site records, I would definitely like a 'monitor' module, which queries the servers (and this could then be compared against the sites associated to the server).
I'm busy implementing a dispatcher for the hosting back end, so that you can add additional queues, that are configurable.
Basically, by implementing a drush command, and a hook_hosting_$type_queue, the dispatch command will manage the running of additional queues.
For instance, you would specify the frequency you would like all sites to be cronned in, such as once an hour. And the dispatcher will make sure that it batches off the cron queue processor, with the right amount of items to be processed on each queue call, and keep track of which processes are running / how long they have been running.
Monitoring would be an additional queue that gets processed, that iterates through the servers you have defined.
--
The future is so Bryght, I have to wear shades.