data mining
Keeping content fresh: what's your strategy?
Hi, I'm currently in the planning stage of creating an online presence of a student guide to the university and the city of Ghent. There's one question that keeps popping up: how can you create a guide that is as accurate as possible, and how can you spot most easily what needs an update, a fact check or a rewrite?
I'm exploring a few routes, and I'd like some input and/or hear about your experiences. (This isn't strictly newspaper-related content, but since local newspapers often also try their best to be a guide to the city they cover, I thought I'd post it over here anyway.) Here goes:
Political Website Developer | NeonGecko.com Inc. - VotersLeagueAlliance.org
NeonGecko is looking for an experienced Drupal developer to enhance, extend, and support custom modules for the aggregation, management and presentation of political data. We are close to launch on a new network of political websites with the goal of informing and enabling voters as well as building communities around political issues.
October Download Statistics
On November 15th Gerhard released the download statistics for all packages on Drupal.org (with formatting by Earl). Here are two charts that summarize the data and the accompanying Excel. Suggestions are welcome on how to improve them or on other ways to analyze and display the data.
1. Top 30 Packages (click thumbnail to enlarge)

These top packages are comprised of 3 versions of Drupal, 20 modules, 5 themes, and 2 videos.
2. Overall Distribution (click thumbnail to enlarge)

When looking at the distribution of downloads we see noticeable breaking points at 16, 36, and about 590, which segment packages into four classes: Tier 1 (critical), Tier 2 (very popular), Tier 3 (moderately popular), and Tier 4 (unpopular).
Group activity data analysis
There has been some recent work analyzing the growth on drupal.org, and I think we should do something similar for groups.drupal.org.
I would be interested in charts/histograms showing the distribution of:
- Groups by number of subscribers
- Groups by posts/week in the past three months
- Users by number of subscriptions (without identifying information)
- Users by number of posts (without identifying information)
- Users by number of posts (without identifying information)
- Total posts over time
- Total posts per week over time
- Total groups over time
- New groups per week over time
- Median subscriptions per user over time
Did I forget anything or should some of these be removed/tweaked? I am willing to generate the charts if someone can run the queries, and I can figure out the exact sql queries if needed.
Next generation of Drupal data collection and analysis tools, featuring extensibility
I recently began sketching out a module that will ultimately allow quick generation of custom reports and graphs based on arbitrary tables (Drupal or other). It's called Reports.
After beginning work on this module, I was contacted by several others who are working on related modules.
There are currently several modules available that augment Drupal's internal reports/statistics, however, none of them, AFAIK, were designed to be extensible.
I believe Drupal would benefit greatly from an extensible set of tools for collecting data and creating reports and graphs related to a specific Drupal website. These tools would be flexible and should be able to utilize arbitrary tables and perhaps new tables whose purpose is solely for reporting purposes (ie "cooked" data such as user sessions).
Summer of Code proposal: User experience analysis with implicit meta-data
Please review this proposal and comment, it's not to late to get students to write up a proposal.



