Project quality metrics system

Events happening in the community are now at Drupal community events on www.drupal.org.
drewish's picture

Andrew Morton
http://drupal.org/user/34869

Synopsis

I propose work on enhancing Drupal's Project module to add a
system that computes a number of quality metrics to assist in rating projects.
These metrics will help users determine which of the many contributed modules
and themes are being actively developed and maintained.

The Problem

Perhaps one of the most frustrating parts of building a Drupal site is
evaluating the many contributed modules to determine which meet the needs of
the site. For many non-technical users who are unable to fix their own bugs,
finding a module or theme that does what they need is not enough;
non-technical users need a module that is actively maintained and supported.

Recent changes to the Project module have made it far easier to find modules
and theme by category and by Drupal version compatibility but there is little
to help determine which modules are well supported. By looking at a theme or
module's project page, it is not readily apparent if there are developers and
users who can help answer questions, fix bugs, and ensure the modules are kept
up-to-date and operable. 

It's been suggested that count of downloads would provide a solution. I think
a simple ranking-by-number of downloads would only encourage developers to
"game the system", starting downloads to artificially increase the popularity
of their modules. A more comprehensive approach is needed, one that helps to
highlight the well maintained but un-popular (perhaps because of
a badly chosen name) module, and deemphasize the popular but
ill-maintained modules.

My Solution

The first, and most important, goal of my project will be to build a flexible,
efficient structure for computing metrics quality. Once that is in place, the
task shifts to determining what data will best indicate the activity around a
theme or module. Making sure that data is captured and available for analysis
is the natural next step. The final piece, and this is mostly specific to
Drupal.org, is to set weights for the metrics so they match with users
subjective feelings about a module's activity.

I envision building a sub-module for the Project module named project_metrics.
The module would:

  • Compute and cache the metrics periodically from within a
    hook_cron() implementation.
  • Use a plug-in architecture that groups metrics into .inc files. Each .inc
    file will have a dependency function to avoid loading the plug-in when its
    prerequisites are not in place. For instance, a user quality rating metric
    would depend on having a voting module enabled.
  • Display a block on each project page with a textual summary of the metrics.
  • Add "browse by" pages for users to view modules and themes sorted by their
    quality metrics.

Each metric will:

  • Present itself as a value from -1 to 1 representing "very bad" to "very
    good"
  • Present itself as a string understandable by an end user, e.g. "This is one
    of the most used modules on Drupal sites."
  • Provide an additional "help" string explaining what it means, e.g. "This
    module is used by more than 80% of the sites that have enabled the Drupal
    module and allow it to report usages statistics."
  • Provide configuration form to allow the administrator control over its
    "magic numbers", e.g. between X and Y commits in N months is good but more
    than indicates extremely active development and likely an unstable module.

The metrics will be grouped into categories, each reflecting a different usage
axis:

  • Developer activity - how often are new releases offered, how recent are
    commits, the number of feature requests opened vs. the number of closed, the
    number of uncommitted patches, etc.
  • User activity - the number of downloads, the number of sites using it based
    on Drupal.module callback data.
  • Support activity - the number of support requests opened vs. the number
    closed, the number of unique people commenting on issues.

The metrics will be averaged to determine a rating for each category. The
administrator can alter the weight of each metric within the category to make
one more important or to set the weight to zero, disabling the metric. For
example, the number of sites using it is twice as important as the number of
download, or don't factor the number of downloads in at all.

Time allowing, an additional aspect to considered is the role releases should
play in computing the metrics. Specifically, how should differences in version
upgrades be  represented (or weighted) if the current version works
perfectly but the newest release is buggy? If the 4.7 version of a module
worked perfectly and was widely used but the 5.0 version is a buggy,
incomplete upgrade with many open support requests, how should that be
reflected? Issues and bug reports are now linked to specific release nodes so
it should be possible to generate metrics on a per release basis. Research
would need to be done to determine whether and how this information will be
more useful. 

Profit for Drupal

This project will provide an immediate benefit for every Drupal user who is
trying to determine which modules and themes to use on a site. They will be
able to skip over the unsupported and buggy modules and themes and spend time
evaluating ones of higher quality. By highlighting the best Drupal has to
offer, new Drupal users will be given a much better first impression.

Roadmap

The following is a guide to what I hope to have finished at points through the
summer.

June 11th

  • Setup a test site with a database dump of realistic project information that
    can be used during development to test the performance and scalability of
    the metrics.
  • Chose two or three simple metrics from data the project module currently
    collects, e.g. number of downloads, number of open support requests, number
    of maintainers, time since new release was built.

July 1st

  • Create a project_metrics module skeleton and implement the basics of the
    metric plug-in infrastructure.

July 16th

  • Complete several metrics for data already collected in the project module.
  • Start developing the user interface for displaying the metrics and browsing
    for modules by quality.

August 1st

  • Create additional, more complicated metrics.
  • Enable the module on scratch.drupal.org.
  • Begin to determine appropriate weights for Drupal.org.

August 20th:

  • Upload my final code to Google's site.
  • Enable the module on Drupal.org.

All my work will be done in coordination with Derek Wright, the Project module
maintainer, to ensure my contributions meet with his approval and can be
committed to the module at the end of the summer and put into use on
Drupal.org.

Biography

I am in my senior year studying Computer Science at Portland State University
in Portland, Oregon. I have been working with open source PHP projects for
over four years. My first effort was Phlickr, a
Flickr API kit written for PHP 5. With Rob Kunkle, I co-authored
Building Flickr Applications with PHP published by Apress. This book covers using Phlickr as a tool for building websites and scripts to manage photos on Flickr.

I became involved with the Drupal project a year and a half ago while working
as the web administrator at KPSU (Portand State University's college radio
station). I have had several patches accepted for core (#34031, #33808,
#36029, #40847, #41703, #47853, #50234, #80079) and a few that are still
pending (#110981, #113385) but the majority of my work has been developing and
maintaining contrib modules. I wrote the Station module which is
used by many college, community and commercial radio stations to display their
programs, play lists and weekly schedule. I re-wrote and now maintain the
Audio module, which is used at KPSU to provide an archive of all shows
broadcast over our web stream. I wrote the Flickr module and recently have
become a maintainer of the Image module.

I am an excellent candidate to undertake this project. My prior work with
Drupal demonstrates my familiarity with both the code base and developer
community, and my ability to complete complex modules.

Comments

/me hugs drewish

dww's picture

+10,000 ;) yay!!! this is absolutely fantastic. a dream come true. frankly, this is the most important (and will be the highest-visibility) SoC project this year.

a few minor details:

  1. project doesn't currently collect download numbers directly. see http://drupal.org/node/102422
  2. usage data will be far more accurate and useful from the XML-RPC requests generated by the update_status.module than the drupal.module updates. see http://drupal.org/node/128827
  3. there are a few minor typos in the above proposal. i hope you clean that up before the final submission to google (i'd hate for them to deny it based on something silly like that).
  4. i know there's a mid-term deadline for committing some initial code into google's repository. so, you should consider that in your timeline. i wish i knew the exact date, but you should double-check it and plan to have something written in time for that. perhaps you'll want to swap the order of the basic metrics plugin skeleton and the test site and initial metric implementations, for example.

otherwise, looks perfect. i'm so happy... ;)

thanks!!!
-derek

Mid-term deadline

jpetso's picture

i know there's a mid-term deadline for committing some initial code into google's repository.
so, you should consider that in your timeline. i wish i knew the exact date, but you should
double-check it and plan to have something written in time for that.

The exact date of the mid-term deadline is July 9th.

Yeah, and, great proposal. I guess you've made yourself a safe ticket to this year's SoC with this :D

a somewhat related proposal

johan a's picture

I have made a suggestion at http://drupal.org/node/139291

Hopefully you would find merit in some of the ideas there too, and incorporate them to the overall solution.

Basically, I suggested Peer Reviews for projects so that, instead of actually just giving absolute ratings for the modules, we allow other users to review the modules. The problem today is that the module descriptions are woefully inadequate, and most of the time we have to install the module to see what it does. This poses a barrier of entry because it adds a level of effort and apprehension.

A review would be a more complete description of what a modules does, where the settings are, and functions or pages are made available to the user. Screen shots are essential to explain modules because its difficult to visualize many functions using plain words.

To differentiate Reviews from Comments, I suggested two methods. The first is by putting up a disclaimer of what a review is, imposing a minimum word count, and requiring at least one screenshot. The second, more complicated method, is where users can rate the reviews (not the modules itself), something like Amazon's "User Reviews".

The benefits of this review system is can now participate in documenting a module, which helps free up the developer's time to concentrate of development. Reviews will lower the barrier of entry for new users, who are bewildered by the huge array of modules with vague descriptions. Reviews will encourage users to try modules and better compare them first with similar modules.

Please have a look at the link and give this some thought. Allow users like me to participate in module development and advancing drupal even if they don't really have the skills in coding.

Security?

Bèr Kessels's picture

I would like to see one more addition. Far too often do I evaluate modules for clients and projects, that have more or less critical security issues.
Quality of code, as a metric, is not as important. The ones interested in good quality code are coders themselves.

But security should be a concern for each and everyone. Security is a concern for a lot of people already, luckily.

We should expose the information we could generate with http://drupal.org/project/coder and provide a usefull security summary on project pages.

Yet the only real way to expose security metrics to users, is to show how many issues were found vs how many were fixed. This, however brings us the problem that a module wich is never evaluated may appear just as secure as one wich was evaluated, but had no issues. It also brings us the problem that unsolved security issues may be kept secret (this is a choice left to the maintainer), an as such are unfixed, but not exposed to potential downloaders either.

To solve both problems, we could build a small disclosure system into project module: a bug marked as "security" will be made public and posted on the FP of the project, if no activity is recorded on that issue, or if it is not set to solved.

SoC 2007

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: