How Many Contributors Were There for Drupal6

greggles's picture
public
greggles - Mon, 2008-01-28 16:50

I've tabulated some data based upon the commit messages for Drupal6. Note, this is not a good judge of contributors. A better judge would be "people who have "commented on an issue and changed the status" or "uploaded an attachment" or even "posted a comment" but this is one way of gathering the data.

I wanted to capture my method and share the output data. The data is available here: http://spreadsheets.google.com/pub?key=pVuTBhrgLH93h7VQNyyynXw

cvs -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal co drupal
wget http://cvs.drupal.org/viewvc.py/drupal/contributions/tricks/cvs-release-...
mv cvs-release-notes.php\?revision\=1.6 drupal/cvs-release-notes.php
cd drupal
chmod +x cvs-release-notes.php

./cvs-release-notes.php DRUPAL-5-0 HEAD > CVS_CHANGELOG.TXT

Copy and paste the resulting data out of CVS_CHANGELOG.txt into a spreadsheet column H (H so that this formula will work)
=MID(H2;FIND("by";H2)+3;FIND(":";H2)-FIND("by";H2)-3)

Copy/paste that all the way down the spreadsheet. This finds the 'by *** :' section and gets just the ***

copy that column
paste into a text document
copy back into OpenOffice.org calc (spreadsheet program) - this fires the text import wizard
split on spaces and commas
find and replace all commas
find and replace junk - find "and" (match "Entire cells") and replace with blank space - ditto for "et" and "al" "by" etc.
manually fix certain names (Jose A Reyero, Gurpartap Singh, David Strauss, Moshe Weitzman, Rok Zlender)
manually fix lots more stuff
manually replace "myself" with Goba (at least for this release that's the case)
put everyone into one column, run a Pivot Table on it, Poof
publish - get more people to review, manually fix more, repeat


thanks for documenting

Gábor Hojtsy's picture
Gábor Hojtsy - Mon, 2008-01-28 17:30

Thanks for documenting, so we can do this in later releases and compare. For the record, that spreadsheet lists 272 contributors, the top ten being:

chx 141 1
pwolanin 87 2
Goba 67 3
dvessel 60 4
keith.smith 55 5
webernet 48 6
webchick 47 7
JirkaRybka 41 8
catch 38 9
bjaspan 36 10

Some people expected that we get a number beating the 492 in the Drupal 5.0 release announcement though. Since the number of issues closed/fixed grown to more then 1540, so the contributor number is expected to grow as well, if we find out how was the 492 counted. Repeating counting of the "number of people who commented on issues" metric on the 1540 issues we had should lead us to a comparable number I hope. Anyone interested in taking that up? (That definitely requires access to the drupal.org database)


Here's Drumm's slides with

catch's picture
catch - Wed, 2008-01-30 15:56

Here's Drumm's slides with similar stats for Drupal 5 for reference[1]. I guess the queries ran for those will be different with IFAC, but maybe some of it could be reused if it's still around?

  1. http://www.slideshare.net/drumm/maintaining-your-own-branch-of-drupal-co...

hm, no queries in the slides

Gábor Hojtsy's picture
Gábor Hojtsy - Wed, 2008-01-30 20:11

Hm, we need to gather these queries from him, hm.


FIND(":";H2) needs to

drumm@drupal.org's picture
drumm@drupal.org - Wed, 2008-01-30 20:34

FIND(":";H2) needs to include comma and period, both are used.


handled later, right?

greggles's picture
greggles - Thu, 2008-01-31 17:48

I think that was handled later:

> split on spaces and commas

That and the manual editing handles messages like "#123 by someone, another, and another: something".

If there was a commit message like "#1235: by someone" then that would break my parsing steps but I didn't ecounter any of those.

This isn't meant to be perfect (GIGO and all that), just close enough to be useful without taking me 10 hours.

--
Open Prediction Markets | Drupal Dashboard


That's a neat bit o' data,

keith.smith's picture
keith.smith - Thu, 2008-01-31 14:48

That's a neat bit o' data, speaking as someone who ranked #5, #74 and #180 (#180 is probably a duplicate of a commit already counted in #74).

Note though that your rank column is really a "row" column, as rank should be incremented only when the number of commits change to properly account for banding (ie, everyone with a patch of 1 should have the same rank).


fixups

greggles's picture
greggles - Thu, 2008-01-31 17:45

Yes, I was hoping that by publishing this I'd get feedback on names that were mixed up. JirkaRybka vs JirkaRbyka was another one.

I also did a slightly better little ranking formula that relies on a column for the row number (=IF(B1598=B1597;C1597;D1598)) this now counts ties in mentions as the same rank and jumps to the row number at the next level after the "tie". That seems like how I've seen it.

I updated the spreadsheet based on these - same location - http://spreadsheets.google.com/pub?key=pVuTBhrgLH93h7VQNyyynXw

--
Open Prediction Markets | Drupal Dashboard


Thanks for merging those

keith.smith's picture
keith.smith - Thu, 2008-01-31 19:32

Thanks for merging those entries, and refining this further. This is getting close. I notice that the Rank jumps some numbers entirely, though, like "12", "14", "18", "22", "23", "70" -"78", etc.


yes, on purpose

greggles's picture
greggles - Thu, 2008-01-31 22:59

The rank skips 12 on purpose. For example, Crell and merlinofchaos are tied at 30 mentions each, so they are both rank #11. The next person after is ranked not 12th, but based upon their row number since they aren't the 12th in line but in fact the 13th. At least this is the way I've always seen it done for rankings with ties.

--
Open Prediction Markets | Drupal Dashboard


Oh. Ok, yes, I understand

keith.smith's picture
keith.smith - Fri, 2008-02-01 02:34

Oh. Ok, yes, I understand what you're doing now. I've usually seen it simpler than that, without the skips, as rank does not necessarily imply a relationship to row order. But, now that I understand it, I see that what you have here is a perfectly reasonable way of doing it. Thanks for explaining (and thanks for the table!).