[GSoc Proposal] Version Control Activity logging, Activity Streams and Development Statistics

Events happening in the community are now at Drupal community events on www.drupal.org.
cvangysel's picture

Short description

The Version Control API allows development-specific modules to interface with the server side of version control systems. We currently still lack the ability to really see what’s going on with a repository.

An activity stream is an overview of all actions in a system that are interesting to a user from his or her perspective.

This project includes a complementary module to the Version Control API that logs every change to every repository, displaying them and processing statistics about them.

The Problem

The Version Control API allows development-specific modules to interface with the server side of version control systems. The Project module is a great example of these modules, but we still lack the ability to really see what’s going on with a repository.

An activity stream is an overview of all actions in a system that are interesting to a user from his or her perspective. We’ve seen this in Twitter, as well as many other social networking sites (Facebook, last.fm, , …).

Github took the activity stream, threw away all the social awkwardness and filled it up with pure development-awesomness.

Only local images are allowed.

My Solution

My proposal is to extend the Version Control API with a complementary module that stores every action made for each available repository for each enabled version control system backend.

Developing is more than just repositories and branches: communication is key. Next to logging and processing data concerning repository changes, I would also like to document the social side of collaboration. Some examples:

  • People creating new projects
  • People creating issues
  • Issues getting assigned
  • People replying to issues
  • ...

These additions will require retrieving and logging information from the Project and Issues module.

Implementation

  1. Implementing hooks and catching & organizing data from the Version Control API module, the Projects module and the Issues module.
  2. Represent the data using already existing modules and by providing new Blocks and possibly extending Views.
    The Activity Stream could be implemented through the Activity module (http://drupal.org/project/activity). It provides an API for building & providing activity streams. I will build on its API if possible, if it’s not possible I will write my own implementation (including the RSS feeds).
  3. Using API’s of existing solutions to provide powerful statistics about Projects:
    1. Sampler (http://drupal.org/project/sampler) allows modules to easily collect and store calculated pieces of data.
    2. Charts (http://drupal.org/project/charts) allows visualization of data the gathered data.
    3. Quant (http://drupal.org/project/quant) provides an engine for producing quantitative, time-based analytics for virtually any Drupal component.

Planning

Please take into account that I’ve got examinations starting May 30 that go until June 22. I will both study and develop during that time, but my contributions will be less.

  • Before May 23
    Get familiar with the Version Control API codebase and refresh my knowledge of the Drupal 6 API, or Drupal 7 if work on a port has started then.
  • May 23 to June 10
    Implementing all the data fetching and information organization.
  • June 10 to June 22
    Solving the problem of the differences between the version control system, whilst keeping the abstract implementation of the Version Control API.
  • June 22 to July 11
    Integration with the Activity module and extending/rewriting it where necessary.
  • July 11
    Midterm Submission
  • July 11 to July 25
    Finalizing work on the Activity module.
    Writing my own data display facilities (Blocks, Views)
  • July 25 to August 15
    Finalizing work on custom Views.
    Implementation of the Chart, Sampler and Quant API.
  • August 15 to August 22
    Investigate the possibility of using Big Data Store implementations and see if it’s possibly to combine this with my Bachelor thesis.
    Documentation and Final Report submission.

Future possibilities

The purpose of this project is to be able to keep a reliable history of a project’s lifetime. The amount of captured data will therefor get pretty huge after a while. Big Data Store solutions like Apache Cassandra or HBase implementations are definite possible extensions to this project.

Drupal.org

This proposal is mainly written with the purpose of being rolled out on Drupal.org, as it’s functionality that it clearly lacks and clearly needs.

The amount of user-submitted projects will keep rising and projects will keep growing. We need to see relevant data and history fast if we want to keep our advantage. We need to be notified of changes that happen to projects we contribute to: activity streams are the solution to those problems (and not a mass amount of e-mail in my inbox).

About the author

My name’s Christophe Van Gysel, I’m a Computer Science student at the University of Antwerp (http://ua.ac.be). I’m interested in web technologies and applications, cloud programming and distributed systems.

I’m currently in the process of developing a social network site for a database-oriented class of mine. As an extra I’m adding tons of data mining algorithms to the network, including some work with activity feeds.

I’ve done some extensive work in C++, PHP and Javascript. I’ve got a solid background in web design as a hired freelancer and launched some “powered by Drupal” sites in the past.

You can find me on IRC in the #drupal-contribute and #drupal-vcs channels or contact me through my account on Drupal.org.

Puzzle

The answers are "", 0, 0.0, "0", FALSE and array().

My sandbox project can be found at http://drupal.org/sandbox/stophr/1119702.

Google Summer of Code 2011

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week