Project Quality Metrics on Drupal.org (meta document)

calebgilbert's picture

According to a survey of over 1,000 Drupal users, the most requested feature for Drupal.org is a recommendation system for the modules section. With the exponential proliferation of Drupal modules it's certainly easy to imagine why.

Because the discussion about this subject has been splintered into so many places, it was suggested (see here and here) to create a thread within the D.O. Redesign group in order to centralize the brainstorming, mockups, proposals, etc for this initiative.

What this thread is intended for (comments from dww):

  • Meta-discussion on the proposals: which ones are clearly a good idea now, which ones need more thought, which ones aren't going to work at all, etc.
  • Volunteers to work on the "low hanging fruit" -- the proposals that we can right now agree would be worth doing, but which won't take a huge amount of code to accomplish.
  • Further discussion on the specific proposals that need to be fine-tuned or otherwise better thought-out. I consider this particular proposal for end-user ratings and reviews (issue #50605) in this category. There are a lot of potential problems this will create, so we need to have good answers for those if we want to continue exploring this.

Here's an initial list of all known efforts in this space:

There are probably others scattered around, but this is at least a start... ;)

Before posting please consider reviewing some of the links included above, if you haven't already. There has already been lots of discussion and some activity for this topic, the goal now is to reach a point of broader consensus and action.

Comments

Something to keep in mind...

webchick's picture

Joomla!'s experience with allowing user-generated ratings/reviews of their modules:
http://www.joomla.org/component/option,com_jd-wp/Itemid,105/p,344/

Any efforts I spend on this issue will be spent on #79550. I trust objective measurements a lot more than humans. :)

I must say

calebgilbert's picture

...that joomla link is quite educational. Honestly, any proposal should have to address most/all of the concerns raised there (namely the clone accounts) before being accepted.

Along those lines, one method which I have seen implemented - is the use of a "trusted user status", which is required to be able to vote on certain things. Assumably, this is a role/permission which gets activated/deactivated at given threshold and which is related to an algorithm 'deemed to be meaningful'.

My experience...

eaton's picture

...Is that any web site that includes any kind of voting mechanism will be gamed. If the voting mechanism is used to rank anything, in any way, it'll be gamed hard. It's tough.

True, that.

dww's picture

Take it from Mr. VotingAPI... he speaks the truth. I'm with webchick: I'd rather see effort spent on the automated metrics that are harder (that's right, I said "harder", not "impossible") to game and will very frequently be more accurate and useful than user votes or reviews.

Ratings vs Reviews

alpritt's picture

The only way I have found star ratings to be useful is to find a mixture of positive and negative reviews so that I can read a cross section of opinions. Other than that I find them pretty meaningless. I think this is particularly true of Drupal modules, because modules are useful or not depending more on the project you are working on than some overall quality. It's not like we are voting on movies. So I completely agree that the objective measurements is the way to go.

Reviews are a little different, I think. If I read a review I can pretty easily tell whether it is well thought out, honest, unbiased and so forth. They are useful in ways that simple ratings are not. However, they can also easily lose out on their signal to noise ratio, so I think if you are going to do implement reviews you also need a system like on Amazon where the best reviews get more prominence -- based on a 'was this review useful' vote. Not sure if that would cause gaming problems as well though.

But either way, user ratings and reviews should be considered as separate entities with different merits and issues.

Adding

Amazon's picture

Adding http://drupal.org/node/188993 which recommends we have a custom module to parse the AWSTATS data files to get download statistics.
I am also adding this mostly off-topic thread about measurnig CVS statistics: http://drupal.org/node/187019

To seek, to strive, to find, and not to yield

New Drupal career! Drupal profile builders.
Try pre-configured and updatable profiles on CivicSpaceOnDemand

Amazon reviews

Amazon's picture

I think there is a more authoritative source on Amazon review cheating.

http://weblogs.java.net/blog/monsonhaefel/archive/2003/11/amazoncom_revi...

To seek, to strive, to find, and not to yield

New Drupal career! Drupal profile builders.
Try pre-configured and updatable profiles on CivicSpaceOnDemand

Gaming always can happen, but smartmobs persist

laura s's picture

Votes alone won't win the day, but votes + comments can make a big difference. The smart/savvy admin/webmistress will look at the votes, yes, but also look at the text behind the votes. When I look at CNet ratings, for example, I look at the reviews -- the discussions -- to see what's behind the numbers. This isn't new. You never can trust just numbers in an opt-in "poll" of any sort.

Is this a community that is so likely to game the system? You never can tell, when the community is truly open, such as this one. But I have to ask: When it comes to quality ratings of modules, aren't the ad hoc case studies -- the anecdotes -- going to balance out the voting systems in a way that can yield informative data for the attentive prospective downloader?


Laura
pingVision, LLC

Laura Scott
PINGV | Strategy • Design • Drupal Development

don't be paralyzed by imperfection

moshe weitzman's picture

sure, gaming happens in most of these systems. so what. does amazon shy away from reviews because of obvious gaming? does google quit working on search because of SEO sneakiness? My point is that user reviews are hugely valuable despite known imperfections. I think that engineers tend to discount imperfect solutions. To do so on this case is to deny the community a very valuable resource.

Paralysis vs. caution

dww's picture

If I was being "paralyzed by imperfection" this thread wouldn't exist. I'm just SO BURNED OUT on doing d.o support for important things with incredibly minor problems that people get all bent out of shape over. This topic is a huge can of worms, with lots of potential pitfalls, so I'm moving forward carefully, not rushing into anything half-baked.

buddylist as a trust web

moshe weitzman's picture

On the "gaming" problem, I wrote some code that I need to contribute for votingapi. It is an integration between voting and buddylist. So, in the extreme only the votes of your buddies count toward the scores that you see. The formula can be tweaked so that buddy votes are x% of the score and overall is the other 100-x%. This basically gets rid of the gaming problem, since unknown voters won't affect the scores. Further, it answers the "what module should i use" question in precisely the way i want it answered. Namely, I want to use the modules that Earl, Eaton, Dries, Angie, dww, etc. are using. Those are my trusted sources. I'm sure others have theirs.

One small issue with this is that new drupal people don't have a trusted network. I think thats easily remedied by showing which users have been befriended often. Or just list the members of the security team, the committers, and the Board.

I was thinking of running this ratings system on my own domain for a while until drupal.org gets its act together. But I haven't dedicated enough time to the site.

Anyway, thoughts on this approach are welcome.

Agree it's useful enough to try

Amazon's picture

I think user reviews are good, here are some suggestions, pick and choose.
1) They should link to the site that is using the module. This way end users can evaluate the recommendation as part of an evaluation of the site. If they chose not to use the module because it didn't meet their criteria then I am not sure a review is appropriate.
2) They should indicate what version they are using.
3) The criticism should be constructive and have some acknowledgment that they are using a volunteers work, and they can't make personal criticism.
4) I think we might want to consider whether project maintainers can choose to have ratings. There are projects you want to share with other developers in case they find it useful, you want bug reporting, or welcome patches. However, you didn't necessarily sign-up to have public reviews of your work and you don't want to be putting your reputation out there to be pre-requisite to sharing code. It's one thing to have criticism from your peers it's another thing to have potential clients read a scathing review of a module you have agreed to maintain, but it isn't a showcase of your work.
5) Apple has something called staff picks. It's what I personally use, and I'd would be interested in seeing a roll out of project reviews first start off with a closed trusted set of reviewers. It could allow for a certain tone to be set and I think having reviews from trusted sources is better to start with. We can then slowly grant reviewer rights to more people and ultimately consider opening it up.
6) I used to use Yelp, a lot. Initially having a few reviews with a rating was good and I found it quite useful. However, now when there are over a hundred reviews I find the ratings are basically worthless. The reviews are bland and generic and it's almost easier to just go to the restaurant than make sense of which review is reflective of the experience I am likely to have. In general, there's a lot of value being created by rating and doing reviews, but it seems like a lot of work is being created to manage that value creation.

We need some volunteers to manage reviews if they go forward. I think it's going to be a focused job and having some new volunteers babysit reviews is a pre-requisite to deployment. Analytical assessments might not be better than human reviews, but once implemented they are probably a lot more cost effective.

To seek, to strive, to find, and not to yield

New Drupal career! Drupal profile builders.
Try pre-configured and updatable profiles on CivicSpaceOnDemand

Cool idea, but I fear the performance implications

dww's picture

This is slick, and would be a great enhancement to ratings in general, and d.o project ratings in particular.

However, a few concerns:

A) I'm wondering how impractical a d.o buddylist implementation would become, and what kind of resources (both hardware and admin support) it would require. I've never used or looked at buddylist, so I have no direct experience.

B) I just thought of a very scary problem. :( One of the main features of a rating system would be the ability to sort modules based on ratings. The project browsing pages are already some of the most expensive queries on d.o, and some of the most complicated code. I shudder to think of how much more complicated the code and queries would have to get, and how many additional JOINs we'd be doing to say:

Show me all of the modules compatibile with 5.x, from the "Images" category, that have official releases, ordered by the total rating of all votes from all the people I think are a buddy.

Holy cow is that going to kill d.o's DB, especially since you can't cache any of this because it's different for every single user. :( This would be multiple full table scans since the ORDER BY clause would be a computed value and there'd be no indexes to help us.

If those problems can be solved, this would be a great solution to the gaming problem.

Partial solution for performance issue

calebgilbert's picture

In addition to whatever greater-minds-than-I come up with, asynchronous loading for these kind of searches/sorts would lighten the load a good bit, couldn't they? (e.g., at least we wouldn't have to load a full page request everytime)

"asynchronous loading" == caching == not possible

dww's picture

The only way to minimize the heavy lifting by doing the big terrible queries at some other time is to be able to cache the results of the queries and save them for later. However, as I pointed out above, this is basically impossible since all of the queries are specific for each user. So, we could cache the results for each user, for each project category, for each page in the pager. :( I suppose that'd have some minor benefit for users that frequently browse and go back and forth between different pages a lot in a short period of time. But, basically, it'd be uncached, since the cache results would only be valid for an incredibly small % of the requests we'd have to serve.

In the interest of not letting the thread die...

calebgilbert's picture

I guess, caching would definitely be out of the question for such queries for authenticated users at least. I'm not sure what percentage authenticated users are of D.O.'s browsing population, but I'm wondering if the asynchronous loading (so that full page loads could be skipped) would be enough to alleviate things that the infrastructure could weather it. I'm thinking it could based on my experience with playing around with such nuances.

Webchick, Amazon, et al -- is there a budget for helping fund this kind of work. Undoubtedly, dww and hunmonk and possibly others would be asked to devote many hours to helping implement this. Can the D.A. help facilitate something that the community-at-large is crying out for?

Possibly the right way to do

Chris Johnson@drupal.org's picture

Possibly the right way to do this is to extract the objective information from the d.o database once a day and put it into another database. Then users interested in filtering and sorting modules by various means would hit that second database, instead of d.o's.

This very much parallels classic business duplication of data from their live, transactional databases (OLTP) to decision-support databases (DS) for precisely this same reason. Transactions have to remain fast; reports and metrics can take a while. OLTP is usually a mix of read/write while DS is usually mostly read. The DS database can also be optimized for the kind of queries it will see.

Ratings vs. reviews vs. automated metrics

dww's picture

[Ratings]: It's not clear what you're rating. As many have pointed out, modules are specific to the problem you're trying to solve or the feature you're trying to add. What works great for one site/scenario might be totally wrong for another. So, what exactly are you voting on? Would the people who believe strongly in ratings (votes) please start posting a list of example questions? Ratings are basically survey research, and a huge part of that is asking the right question(s).

[Reviews]: IMHO, reviews could be more useful than ratings. However, in addition to the potential pitfalls that Amazon pointed out, I'd like to add this one: people will embed support requests, bug reports, and feature requests in their reviews, making more work for the maintainers. I believe a restricted set of trusted "staff picks" is a great solution to most of these problems, but that sounds like the dreaded "golden contrib" debate again. ;) Who maintains the "project reviewers" role on d.o? What are the criteria for membership? Who reviews the reviews? It's a lot of (valuable) work, but it's work nonetheless, and I doubt the association is planning to hire anyone to coordinate this as a part-time job.

[Automated metrics]: I never said that I'm totally opposed to human opinions regarding quality factoring into this. I just said that I'd rather see the initial effort going into the automated metrics, since I believe that in many cases, they'll tell you more about a project than a human's vote on a set of (at this point, undefined) questions, or even reviews. They're harder to game. They'll require much less on-going support/babysitting. One of the downsides is that you have to know how to interpret the numbers. So it's not going to be as immediately helpful for brand new Drupal users as a simple "5 golden stars" next to the project name whenever it's listed on the site. But, at some level, anything is better than what we have now, and I'd be more interested in starting with the automated metrics and moving out to the human-driven metrics once we've a) made some progress and b) have some experience. Other than a minor question on exactly where/how to display the metrics (see http://groups.drupal.org/node/6186) in the d.o UI, none of the automated metrics are really blocked on anything beyond someone standing up and producing the code. The human-metrics are blocked on many social and technical problems. That doesn't mean they can't be solved, but it'll be more work to get any human metrics in place.

Multiple rating axes?

dman's picture

RE point one, I think that anyone who's done a rating system (or planned one as I have) should be aware that there's a difference between:
"It's no good"
"I don't like it"
"It doesn't solve my particular problem today"
"It's written badly or is hard to use"
"it's badly documented/supported"
"It's not very popular or widely used"

So, like decent software reviews do, a fuller picture would be derived from voting on (possibly) several criteria.
This of course is exponentially more difficult to code, and somewhat trickier to browse and weight, so I don't seriously suggest this be attempted in the first round, but it's an aspect to consider...

Added another link

dww's picture

FYI: In case you're not looking at the revisions tab closely, I just added another link to the big list: http://drupal.org/node/63491

Votes + Comments + Editors Pick

theborg's picture

Agree with Laura (comment-21052) I always look at the comments after the rating, cNet also have the "editors rating" but maybe this means more work for the group of savy people Moshe mentioned. Also the have "the good" and "the bad", difficult to evalute without being influenced.

Software sites like snapfiles do something like that with:

  • Our ratings
  • Popularity
  • User opinions

An issue I see in all the ratings thing is that the popular modules will continue being popular and the new ones will slowly die because of the higher ratings ones, I've tried nearly every taxo module and found the one I was looking for after some time.

Rating/reviews/metrics are important but an acurate expanation of what the module does, a demo site and the interaction of it with the whole system/other modules is needed also.

Rating with review

jayjenxi's picture

I suggest a system where the user gets to vote but would have to give a compulsory review along with the rating. I believe this would give the readers a better idea of the rationale behind such votes. I've seen such a system being used on the Firefox Add-ons site. I think the system is feasible for Drupal.org

https://addons.mozilla.org/en-US/firefox/

It also has a recommendation section, where they have a list of add-ons that are popular and recommended. I'm not sure how exactly they come up with the recommendations. However, this can be implemented on Drupal.org by having the staff list down certain modules that users should try out to improve on the site functionality. This means that, unlike the review system where the staff would have to go through each and every module, they would only need to review modules that are popular and highly-rated by users.

In this system that I suggest, there should be a field for the reviewer to post link(s) of sites where they implemented the modules. This would allow readers to have a better idea of how the module helped the particular reviewer. This field can also be used to count the number of sites using the module. Although it might not be a complete pictures of the usage of the module, readers can have an idea of the popularity of the module. I got this idea from vbulletin.org, where they have a "Mark as installed" option for users of the mods to indicate their using the mod.

I believe that readers who see a high rating accompanied with unconvincing review would not be swayed by the vote.

Modules would not be ranked according to rating. This rating system would not have an average rating. The users would have to judge the rating from the individual ratings given by the reviewers.

I hope I managed to put my idea across well enough. Please do feedback with your thoughts and concerns.

Usage statistic created by modul update_status?

Thomas_Zahreddin's picture

I think the update_status modul requests for new versions, so there is a statistic about the useage of a certain modul on the drupal sites - am I right?

And I can't find this statistic though I think I saw this page.

Can someone post the link?

I almost went off with

Chris Johnson@drupal.org's picture

I almost went off with greggles, Michelle and MattKelly and implemented a module review system. In fact, I think Matt has already done some basics on his website.

I think we need to rope everyone who had such thoughts together here and make some progress.

I'm off to see if all the ideas the 4 of us put together in a Google doc have been expressed in this group. Matt is already active here, I think.

See also

Chris Johnson's picture

See also http://groups.drupal.org/module-metrics-and-ranking, in particular http://groups.drupal.org/node/7462.

Some key objective rating points from that document include:

  • install file exists?
  • official release exists?
  • documentation exists?
  • has recent commits?
  • documentation page exists?
  • hook_install exists?
  • hook_uninstall exists?
  • passes coder.module standard checks?
  • passes coder.module security checks?
  • has simpletests?
  • has complete simpletests?

Pivots getting ready for deployment

Amazon's picture

This is quick reminder that Pivots and Double pivots are being deployed on http://scratch.drupal.org, on the way to being deployed on D.O. Modules are being recommended based on content and a series of algorithms that are tuned to make those recommendations. Double pivots will recommend modules that are known to be used together based on context, and also the update status information.

Chris, looks like a good objective set of criteria for evaluating. You might want to also consider issue queue and commit activity.

Cheers,
Kieran

To seek, to strive, to find, and not to yield

New Drupal career! Drupal profile builders.
Try pre-configured and updatable profiles on CivicSpaceOnDemand

Dependencies

KarenS's picture

Another metric to use to assess the quality/importance of a project is the number of other modules that depend on it. Is there any way to get metrics on that using the dependencies data in the .info files?

That'd require d.o to parse all the .info files

dww's picture

We could in theory have a cron job that runs on d.o to parse all the .info files and keep a DB table populated with dependency info. Actually, we could put that in the packaging script when it's considering all the dev snapshots, since it already has to check out everything from CVS, anyway (to compare timestamps and see if anything changed and needs to be repackaged). It even already finds all the .info files, so it can add its extra attributes automatically (datestamp, project, version, etc). So, it wouldn't be that hard to have it always parse all the .info files it finds for each project, record the dependencies and update the DB. Creating an issue about this, and starting a patch, is left as an exercise for the interested reader. ;)

I really want to help out,

Steven Jones's picture

I really want to help out, but I'm not sure what to do. I've explored a few of the issues listed but they either seem very, very old or people just aren't too interested. I'm going to have a lot of free time in the next week or two and would really like to get stuck into something, just not sure what!

Guidance please!

old != bad

dww's picture

just because an issue is old doesn't make it a bad idea. it just means i don't personally have time to work on it, and no one else did, yet, either. my advice would be to find something you're personally interested in and get it closer to done.

Project metrics and comparison

sun's picture

I've just posted my findings about Project module's hidden project metrics in a new article in this group. Too much to summarize in a comment. I hope I didn't duplicate existing efforts.

Daniel F. Kudwien
unleashed mind

Daniel F. Kudwien
unleashed mind

quality metrics

earnie's picture

For me quality metrics would be the considerations I would use to decide a modules worth. I do not give too much credit for user supplied rankings on any piece of software. Metrics such as the number of downloads, the date of the last release, the number of issues resolved, the date of the oldest issue, the date of the youngest issue and the date of the last commit are all good factors for considering a modules value. I would also include the number of project page views to gather a sense of interest in a module. Modules with good interest but poor maintainership are good candidates for improvement or takeover and this information can benefit the community. IMO modules with poor interest and poor maintainership should be removed from the list of modules since they get in the way of the ones that do have good maintainership or good interest.