Data Mining
This group seeks to analyze Drupal-related data in order to better understand its evolution across different dimensions and over time. This includes both automated and manual analysis, chart and graphic generation, and may result in eventually developing various modules for drupal.org and its subdomains.
Recommender Bundles: Update
I've finished these things so far:
<
ul>
Announcing CiviCluster and CiviConference
I've created initial snapshots for CiviCluster and CiviConference for
users of Drupal 5. It may take up to 24 hours for them to appear in the
Drupal.org release system, but they are already in the DRUPAL-5 CVS branch.
CiviCluster will rapidly identify duplicate contacts and walk users
though merging them. CiviCluster supports CiviCRM 1.6 and 1.7 (except
for CiviEvent). CiviCluster will be updated shortly to support CiviEvent
schema changes.
http://drupal.org/project/civicluster
CiviConference allows online conference management and ticket sales by
Convenient SQL transactions with PressFlow Transaction
I've released an in-development version of PressFlow Transaction for developers interested in convenient encapsulation of SQL transactions. The key features are intelligent use of scope for COMMITs and ROLLBACKs as well as safe, intelligent nesting of transactions to get exception-like semantics.
Usage details are on the project page. Requires PHP 5.
(I posted this to the High-Performance group because encapsulating updates in transactions can dramatically improve performance.)
October Download Statistics
On November 15th Gerhard released the download statistics for all packages on Drupal.org (with formatting by Earl). Here are two charts that summarize the data and the accompanying Excel. Suggestions are welcome on how to improve them or on other ways to analyze and display the data.
1. Top 30 Packages (click thumbnail to enlarge)

These top packages are comprised of 3 versions of Drupal, 20 modules, 5 themes, and 2 videos.
2. Overall Distribution (click thumbnail to enlarge)

When looking at the distribution of downloads we see noticeable breaking points at 16, 36, and about 590, which segment packages into four classes: Tier 1 (critical), Tier 2 (very popular), Tier 3 (moderately popular), and Tier 4 (unpopular).
Group activity data analysis
There has been some recent work analyzing the growth on drupal.org, and I think we should do something similar for groups.drupal.org.
I would be interested in charts/histograms showing the distribution of:
- Groups by number of subscribers
- Groups by posts/week in the past three months
- Users by number of subscriptions (without identifying information)
- Users by number of posts (without identifying information)
- Users by number of posts (without identifying information)
- Total posts over time
- Total posts per week over time
- Total groups over time
- New groups per week over time
- Median subscriptions per user over time
Did I forget anything or should some of these be removed/tweaked? I am willing to generate the charts if someone can run the queries, and I can figure out the exact sql queries if needed.
Growth Graphs
In preparation for starting the 5.0 drumbeat, I was able to get killes (thx Gerhardt!) to run some analysis on drupal.org. I think if we can keep up this kind of growth (and with the spiking numbers of developers, projects, and activity on the site I think we can) 2007 could be a sort of tipping point for Drupal!
UPDATE: here's my blog post on the subject.


The source XLS file is also attached for your own viewing pleasure.




