Data Mining

This group seeks to analyze Drupal-related data in order to better understand its evolution across different dimensions and over time. This includes both automated and manual analysis, chart and graphic generation, and may result in eventually developing various modules for drupal.org and its subdomains.

danithaca's picture

Recommender Bundles: Update

I've finished these things so far:

<

ul>

  • Updated Recommender API to v2.0beta: improved performance; added BatchAPI support; added Drush support; added SimpleTest support; migrate to PHP5 OO paradigm for extensibility.
  • Released Browsing History Recommender that provides 2 blocks: "Users who browsed this node also browsed" and personalized recommendation "Recommended for you".
  • Login to post comments · Read more
    David Strauss's picture

    Announcing CiviCluster and CiviConference

    I've created initial snapshots for CiviCluster and CiviConference for
    users of Drupal 5. It may take up to 24 hours for them to appear in the
    Drupal.org release system, but they are already in the DRUPAL-5 CVS branch.

    CiviCluster will rapidly identify duplicate contacts and walk users
    though merging them. CiviCluster supports CiviCRM 1.6 and 1.7 (except
    for CiviEvent). CiviCluster will be updated shortly to support CiviEvent
    schema changes.

    http://drupal.org/project/civicluster

    CiviConference allows online conference management and ticket sales by

    1 comment · Read more
    David Strauss's picture

    Convenient SQL transactions with PressFlow Transaction

    I've released an in-development version of PressFlow Transaction for developers interested in convenient encapsulation of SQL transactions. The key features are intelligent use of scope for COMMITs and ROLLBACKs as well as safe, intelligent nesting of transactions to get exception-like semantics.

    Usage details are on the project page. Requires PHP 5.

    (I posted this to the High-Performance group because encapsulating updates in transactions can dramatically improve performance.)

    4 comments
    ChrisKennedy's picture

    October Download Statistics

    On November 15th Gerhard released the download statistics for all packages on Drupal.org (with formatting by Earl). Here are two charts that summarize the data and the accompanying Excel. Suggestions are welcome on how to improve them or on other ways to analyze and display the data.

    1. Top 30 Packages (click thumbnail to enlarge)

    These top packages are comprised of 3 versions of Drupal, 20 modules, 5 themes, and 2 videos.

    2. Overall Distribution (click thumbnail to enlarge)

    When looking at the distribution of downloads we see noticeable breaking points at 16, 36, and about 590, which segment packages into four classes: Tier 1 (critical), Tier 2 (very popular), Tier 3 (moderately popular), and Tier 4 (unpopular).

    Login to post comments · 3 attachments
    ChrisKennedy's picture

    Group activity data analysis

    There has been some recent work analyzing the growth on drupal.org, and I think we should do something similar for groups.drupal.org.

    I would be interested in charts/histograms showing the distribution of:

    1. Groups by number of subscribers
    2. Groups by posts/week in the past three months
    3. Users by number of subscriptions (without identifying information)
    4. Users by number of posts (without identifying information)
    5. Users by number of posts (without identifying information)
    6. Total posts over time
    7. Total posts per week over time
    8. Total groups over time
    9. New groups per week over time
    10. Median subscriptions per user over time

    Did I forget anything or should some of these be removed/tweaked? I am willing to generate the charts if someone can run the queries, and I can figure out the exact sql queries if needed.

    Login to post comments
    joshk's picture

    Growth Graphs

    In preparation for starting the 5.0 drumbeat, I was able to get killes (thx Gerhardt!) to run some analysis on drupal.org. I think if we can keep up this kind of growth (and with the spiking numbers of developers, projects, and activity on the site I think we can) 2007 could be a sort of tipping point for Drupal!

    UPDATE: here's my blog post on the subject.

    The source XLS file is also attached for your own viewing pleasure.

    10 comments · Read more · 3 attachments
    Syndicate content