One of the big goals of the drupal.org redesign is to make it easier for end users to find the right modules for the site they're trying to build. With over 5,000 contributed modules, many of them providing similar functionality, it can be extremely difficult to choose. One method is to try to assess the "health" of the module, by how actively it's maintained, used, supported, etc.
One approach to gauging health is posted at Project ratings and reviews for drupal.org redesign. While that proposal deals with subjective factors, this post addresses objective facts about a module that can be computed and displayed for all projects hosted on drupal.org. While no single metric can tell you what module to use, and some knowledge will be required to make the best use of this data, it's important to make these statistics more readily available on drupal.org to empower users to make better decisions.
The details of this activity chart weren't specified at a technical level during the design phase, but the spirit of the design is that they wanted more ways to visualize the health of a project. As we're implementing the redesign, we've been empowered to provide as many charts containing specific data we think will best help end users make sense of what's going on with a project. Read on for our specific proposal, including what metrics to compute, and some ideas on how those are going to be visualized on the new drupal.org
There are literally dozens and dozens of metrics that could be captured and displayed. We've already got support for the usage of any given project and we're working on support for download statistics. Beyond those, we believe the following are important to assess the overall health of a project at a glance:
- Issue activity: Each project category (bug, feature request, etc.) -- number open vs. number fixed/closed
- Number of issue reporters (unique users filing new issues in a defined time period)
- Number of issue participants (unique users filing issues or comments in defined time period)
- Total number of issue comments posted over a week (ideally with a separate total for the number of comments by any of the project maintainers, both of which can be graphed on the same sparkling).
- Release activity: Number of releases in each time period
We're proposing to normalize all metrics to a weekly granularity. This would both simplify the storage (so we're not trying to store daily metrics) and the UI (since it'd be best if all the charts used the same granularity to make it easy to compare them).
Additionally, the following metrics could be useful, but we might not have time to implement them for the initial launch of the redesign:
- Commit activity (This would be great, but it's not worth our time to add this for CVS with the Git migration imminent).
- Number of lines added/removed
- Number of commits
- Total number of tests and percentage of tests that pass
- Total lines of code vs. lines of comment
- Average length of time that issues are open
- Number of unique users
- submitting patches
- reviewing patches
We're trying to strike a balance between what's relatively easy and sane to compute, implement and display visually, and things that will help users find the best project for their particular needs. Given that, if you have suggestions for other metrics we should be considering, please comment below!
Enter the project_metrics module
We looked at a lot of Drupal charting modules (see below), but basically none of them handle the storage for you. So, no matter how we end up displaying this data, we need somewhere to compute and store it. Enter the project_metrics module.
This would be a new sub-module included directly in the Project project. While the project_usage module is really only relevant to sites that are using Project to manage releases of Drupal code to track update_status usage, the project_metrics module could be useful for just about anyone running the Project suite. It would be responsible for computing the metrics and storing them.
To compute, the idea is we'd write a series of drush commands that would be run periodically to do all the heavy lifting to compute the right statistics for a given week. These commands would insert records into the project_metrics module's DB tables. Then, project_metrics would provide various ways to access and display the data (see below).
The basic architecture of the module is that it would invoke a hook to allow other modules to advertise what metrics they want to provide. The project_metrics module would then be responsible for invoking the appropriate functions in the other modules at the right frequency and storing the results. So, project_metrics itself wouldn't know how to query the issue database tables looking for statistics. That'd still be the responsibility of the project_issue module. However, project_issue wouldn't have to worry about invoking itself via cron, wouldn't have to manage its own tables to store the historical data, etc.
So how would all this data be visible on drupal.org? The key metrics would be exposed via sparklines on the project page itself. Depending on how many metrics we end up with, we might need to add a tab off project pages (or use JS to show/hide the full list of metrics) so that it's possible to drill down and find as many statistics as we provide, without overwhelming the user with all of that data directly on the default project pages. My vision is that there's an easy way to see 5 - 10 sparklines, each with datapoints at 1 week granularity, all vertically stacked so the weeks line up. That way, you can see how the different metrics correlate. So long as the scale of the horizontal axis is the same on all graphs (so they line up and are easy to compare), we can use a different scale for the vertical axis for each sparkline so that they all make the most visual sense (e.g. the number of releases in a week is probably going to be 0-4 most of the time, whereas the number of issue comments or lines of code added/removed could be in the hundreds or thousands). With the charts stacked so the weeks line up, you could easily see for example that one week the "number of lines of code added/removed" sparkline goes nuts, and the "number of open bugs" chart started climbing soon thereafter. ;)
There are some metrics and statistics specifically about the issue queues that are already on drupal.org, they're just mostly hidden. For example, you can view statistics about the Drupal core issue queue. This page will probably get some much-needed attention (it hasn't been touched in years). Although none of the UI parts of this proposal are set in stone, it's likely that we'll update these per-project issue statistics pages to include more of the issue-related metrics discussed above. The idea is that we'd put the current week's raw data in the tables near the top of the page, and then provide sparklines below to see how those values have changed over the weeks of the last year.
Additionally, we're going to expose some of these metrics to Solr to make it possible to filter and sort projects by various metrics. We've already done this with the project usage data (for example, this is the default sort order when you browse module projects on drupal.org). So, in addition to being able to sort by "Most installed" (and hopefully soon, "Most downloaded"), you might also be able to sort by "Most active issue queue", "Smallest % of open bugs", "Most commit activity", etc.
However, when it comes to visualizing the data that the project_metrics module would be providing, we've investigated a few possible ways to generate the necessary charts:
Sparkline-aware views display plugin
We could expose all of the project_metrics data to views, create views for whatever we care about, and write a Views display plugin that knows how to render our results as a sparkline. This could potentially be done as part of the Views charts project, or as its own new "Views sparkline" contribution. Either way, we'd hope to make use of the Sparkline module. The Views display plugin would simply be glue to take the data from the results of the query that Views ran and format that data in a way that the Sparkline module expects to be able to generate the sparkline itself.
We could potentially use the Charts API to handle our charting needs. We'd still probably drive the queries via Views and have a display plugin to render the results via the Charts API. So, this is more an alternative to the Sparkline module itself -- either way we'd probably be exporting the project_metrics data to Views and writing a display plugin.
We looked at the Quant module, but it doesn't really seem like it gets us very far. It can do really complicated queries to try to figure out the historical data for you, but that's going to probably kill the d.o database server. It doesn't do any storage for you. And, we'd have to write some code to expose our data to Quant. At that point, we might as well just expose it to Views since that seems a lot more flexible and powerful.
From now until around August 20th, we're just going to gather feedback on this proposal. To prevent the discussion from getting fragmented, please add comments directly on this post.
Starting around August 23rd, we're going to begin implementing the project_metrics module, and any changes to the rest of the Project suite, to make it possible to compute all these statistics. We expect all the backend work to take approximately two weeks.
Starting around September 7th, we're going to evaluate the front-end options and pick one to roll out on drupal.org. We're aiming to get these metrics visible on project pages and in the ApacheSolr index on the live drupal.org independent of the launch of the redesign theme (which is called "bluecheese"), unless it involves significant work in the existing drupal.org theme ("bluebeach").
Interested readers can check out the following threads for more on this juicy topic: