Version control integration ideas for GSoC 2009

Events happening in the community are now at Drupal community events on www.drupal.org.
jpetso's picture

Overview: Make integration of version control systems with Drupal rock, and at the same time help out a module family that's going to be running on drupal.org soon.

Description:

drupal.org has great integration with CVS, including stuff like a commit messages page (which can also be viewed separately for each project), CVS account management, commit restrictions provided dynamically by the website, or automatic packaging of release tarballs just by specifying the CVS tag when creating the release node. All of that makes drupal.org an immensely powerful and easy to use platform for managing Drupal projects in CVS. As a downside though, all of that works only for CVS, and the fact that it doesn't cope with other version control systems is one major reason that drupal.org is still sticking with CVS all the way.

Version Control API was born as a Summer of Code project (the logs!) in 2007, and aims to provide version control integration for Drupal while not depending on any specific version control system (VCS), and being flexible enough for usage on sites other than drupal.org. It also aims to provide all of cvs.module's functionality so that the latter can be replaced on drupal.org, and I'm convinced that drupal.org will be running Version Control API by summer, regardless of this year's Summer of Code. Last year's GSoC project for improving Version Control API was an epic fail, partly because it was in a less-than-functional state at that moment and partly because the student was, in hindsight, an egocentric nut who didn't give a damn about his project or the Drupal community.

This year though, the foundations are much more solid, Version Control API has been pimped up and ported to Drupal 6, and there's no shortage of interesting features that could be implemented. Here's a list of stuff that would be well suited for Summer of Code, in no particular order:

  • Improve the user authentication system so that backends for SVN, Git, Mercurial and Bazaar can share the code to export VCS accounts from the Drupal site to enable "vcs+ssh//:" access, and implement that export functionality.
  • Get the Version Control API based Repoview module out of its "experimental" state, and make it a kick-ass repository viewer that shows file history and diffs, switches between branches and revisions, integrates syntax highlighting, etc.
  • Better support for more version control systems. The Git and Mercurial backends are in need for a serious amount of love (both are still on Drupal 5 and haven't yet been adapted to recent API changes). There is not yet a VCS backend for Bazaar, which would rock just as much. The SVN backend's log parser works, but is slow as hell. All of the above could be improved with a set of hook scripts for recording commits instantly and restricting commit access as is prescribed by the site. (In case you wonder, the CVS backend is perfect. Don't even think about improving that one.)
  • Version Control API goes OOP: convert backends and array structures to classes in order to make use of the advantages of object orientation - lazy loading, modularization, better extensibility, better maintainability, and what not. Version Control API really screams for OOP in many places, but currently it's plain functions all around.
  • Refactor the project node integration module so that it's able to manage the most central aspect of distributed version control systems: private branches in cloned repositories. Launchpad for example can keep track of lots of branches per project while Version Control API's project node integration still sticks to the outdated CVS workflow of a single centralized location for a project's source code. As a related task, developers could be provided a simple way to clone an existing project directly from the project page.
  • Implement general improvements to version control integration: automatically closing bugs/issues when it's mentioned in the commit message, providing per-user subscriptions of commit notification mails filtered by project/path/author/etc., adding AJAX goodness to the commit log in order to inspect file diffs there on the fly, more actions/events/conditions for Version Control API's integration with Rules, a page that shows detailed commit statistics with nice diagrams... or whatever cool stuff you can think of.
  • More fine grained permissions for view/checkout access of directories within a given repository, which is less crucial for open source projects but often desired by companies on the higher end of the paranoia scale.
  • Work with the community to get a distributed version control system deployed on drupal.org for Drupal core (it seems contrib isn't going to be switched anytime soon), and implement whatever is required and makes sense for such a move. Likely involves a combination of some of the tasks above.
  • A commit digest module that enables awesomeness like this.

Of course that's way more functionality than will ever be implemented in a single Summer of Code project. Some of the above items will show some overlap, and some will probably require more effort than others. If you think some of those are cool, just pick your favorite one(s) and expand on them a bit so you can make an application for it. A bit of passion for version control systems is a necessary precondition to apply for one of these, and the level of abstraction probably requires a bit more skills than hacking on a typical Drupal module. So if you're just searching for a quick buck instead of wicked-cool technology and upstream involvement, other topics might be more suited - if, however, you like to have a real impact on Drupal project management sites (including drupal.org) and love to enable other people to get stuff done, then this is for you.

I probably again won't be able to do the mentor thing full speed for all of summer, as I'm going to Canada from mid July in order to work full time on non-Drupal stuff. So, interested mentors are just as welcome as interested students. Let's make Drupal take a lead role in interfacing with version control interfaces, and show Trac & Co. how to do rock!

Mentors:

  • jpetso - Primary mentor at least until mid July.
  • In Soviet Russia, Version Control API mentors YOU!!

Difficulty: depending on the chosen project, medium to hard

Comments

interesting

marvil07's picture

I'm interested in VCS's and I really like git, and I dream commiting to drupal with git :D (I don't want to make a flame here, just a comment :p)

New API changes really surprise me :D, thanks for the effort.

Well, at the moment some git love.

cool!

jpetso's picture

Great to see work being done on the Git backend - before GSoC has even started :D
kudos! applause! whoo.

Moving to Official Ideas list

alex ua's picture

I think it would be great to get another version control project for this year's SoC!

Alex Urevick-Ackelsberg
ZivTech: Illuminating Technology

Alex Urevick-Ackelsberg
ZivTech: Illuminating Technology

Comments on todolist for version control

marvil07's picture

Following jpetso suggestion, I reorganized some of the points I'm interested in contribute

Phase 1

  • Use OOP on Version Control api sound good, but like you see, we have to refactor each module that depends on this, so this should be the first
  • Sync cvs, subversion and git backends to api changes (hoping not huge changes :D)

Phase 2

  • Refactor the project node integration module so that it's able to manage the most central aspect of distributed version control systems: private branches in cloned repositories.
  • Improve user authentication is really relevant, and after some quick review of links associated, there are two methods to implement(or finish them): .htaccess/.htpasswd and SSH keys.
  • Release a not experimental Repoview

Phase 3

  • Work with the community to get a distributed version control system deployed on drupal.org for Drupal core (it seems contrib isn't going to be switched anytime soon), and implement whatever is required and makes sense for such a move

It seem some huge, but like I said

I dream commiting to drupal with git :D

Any suggestion about which tasks are more relevant, or maybe about the scope would be really apreciated.

BTW, @jpetso, at first time you appear not to be a mentor, but then it changed, thanks for it! (knowing it's only until mid July)

yeah baby

jpetso's picture

That's a great start, and looks like a good plan overall. It's also an ambitious one, although doable if you're putting in a lot of work.

Any suggestion about which tasks are more relevant, or maybe about the scope would be really apreciated.

Mmkay, let me try:

  • OOP for Version Control API, if done properly, might be a bit of a larger chunk.
    • My preferred solution for specifying VCS backends and delegating responsibility to them would be to use Larry Garfield's Handler module, but the contrib version (even the 6.x-2.x branch that doesn't show up on the project page) hasn't been updated since December. I believe Larry is still pushing to get Handlers into Drupal core, so it might make sense to defer using Handlers until the Drupal 7 port (which is of course out of scope for a Summer of Code project.)
    • Version Control API already is roughly structured into parts that correspond to the various "classes" - for example, functions named versioncontrol_item_*() or similar would go into a VersionControlItem class. Using PHP's ArrayAccess interface, it should be possible to create an initial compatibility layer so that e.g. accessing $item['path'] would refer to $item->path() or $item->setPath(). When all occurrences of the array access method have been replaced with the object ones, that compatibility layer can be removed.
    • Part of making the OOP transition will be to figure out how VCS specific properties and extensions should be handled, so that for example the CVS backend can still provide additional properties like a list of CVS modules and make it editable on the repository edit page.
    • I think that reworking the API itself might be relatively straightforward, but porting stuff to the objectified API as well as making sure that it still works is going to be some effort.
    • My co-maintainer Sam Boyer might have further input on that part, as he's pretty into OOP and a constant source of bright ideas in general. In fact, I hope that he'll chime in as mentor too :P
  • Refactoring the project node integration module to cope with multiple branches is probably the most difficult task from the ones you picked (but also a very important one to me). Difficult not only because it requires rethinking the current concepts and coming up with new definitions for project "ownership", "maintainers" and the "project" itself, but also because it should still be approximately as usable as the current interface even for centralized version control systems. At least that's necessary if we want it to be used going to be used on drupal.org.
    • Will probably also involve a new API function for creating new repositories, plus the difficulties of creating repositories somewhere else in the file system where the Drupal server user (on Debian: www-data) has no write permissions.
    • All in all, I would schedule at least a month for the project related changes, if not more (depending on your Drupal coding skills & how quick you get into the modules).
  • The user authentication stuff is easier, and might be done as quickly as in a week if all goes well. It's just as important as the project branch stuff, maybe even more, so I'd suggest to tackle it before taking on the project branch related changes.
  • Repoview is nice, but not critical for a DVCS switch in any way. Which means if you manage to pull off all of the above stuff, a pimped Repoview would be the cherry on top. If you want to focus solely on deploying Git for the drupal.org (core) repository though, Repoview will not be necessary for that, and you could instead focus on preparing the VCS conversion, perhaps getting involved with the infrastructure team directly.

Hope that helps! I can clearly see the potential of you doing this SoC project :-]

BTW, @jpetso, at first time you appear not to be a mentor, but then it changed, thanks for it! (knowing it's only until mid July)

Yeah, I figured that even with full-time work and moving to a new country, I won't be totally gone for a project that matters to me. So while the level of my involvement may decrease, I'd be disappointed if I couldn't at least finish my mentor role. Awesome students deserve proper mentoring!

Really good feedback!

marvil07's picture

My propose is now real on this group, I'll send it to GSoC app like emmajane suggest.

Sounds interesting

haxney's picture

As a Git aficionado myself, I too look forward to committing to Drupal with Git (in fact, I use Git to manage a Drupal site I run). I'm a student interested in GSoC.

I'm a little lost on where to begin, however. Should I create a proposal simply by copying this description into my own words? Do I have to come up with my own idea, separate from this one? It looks like I would basically like to do what is already described here, do I need to create something more than this?

Thanks for helping out a clueless newbie :)

Less the "what", but the "how" is important

jpetso's picture

As I wrote to marvil07 when reviewing his proposal, it makes little sense to copy the task descriptions because that doesn't say a lot about you (other than "I'd like to work on this", which might not be enough for getting an application approved). By submitting an application, you want to convince the mentors that you're both motivated enough to pull it off, and that you're skilled enough to come up with usable results.

How you manage to do this is up to you. The method that I recommend is to have a look at the code and try to understand how it works (and consequently, why it works the way it does), which also provides you with a feel of whether you are up for working on Version Control API code. When you've understood what's actually wrong with one of the items that need to be improved, you'll be able to put that into your own words, which is already a large chunk of demonstrating that you're the right person for this task. If you also add a few thoughts on how to fix these issues without simply rephrasing my own bullet points, that makes you a clear winner.

marvil07 has taken the jump-into-the-cold-water approach and tries to make the Git backend work for himself by submitting a few patches - not a huge amount of work yet, but a clear sign that he's able to contribute and productively participate in issue queues, the way that open source projects normally do.

Anyways, it's all about the "how" and "why", less about the "what".

So the challenge is as follows:

  • Come up with a set of tasks that you are interested in.
  • Have a look what's necessary in order to achieve the goals, and put those requirements into words.
  • Try to estimate how long they'll take to complete, and come up with a set of deliverables/milestones for every two or three weeks during the SoC coding time. Bonus points for detailed plans (even if they might work out differently in the end), because those show that you have done some research on the task at hand. If the schedule doesn't fit the 3-month SoC agenda, add or remove tasks until it does.
  • Make sure to state why you are personally interested in completing the project, what makes you able to pull it off, and perhaps why you're confident you won't quit half-way.

Note that receiving two (or maybe more?) applications for the same set of tasks makes selecting those a bit difficult for mentors. In principle it's possible to have two students working independently on the same tasks, although Google makes SoC collaboration on the same issues this a bit hard with their requirement to have the students list their concrete achievements. I'm sure we'd be able to make stuff work if you and marvil07 allocate some of the same tasks, assuming both of your applications are accepted. It would also be unfair to deny you listing the tasks that he chose, because those are arguably the more important ones of the whole list. If there is less overlap though, I'd be happy as well :)

In any case, I'd rather prefer to see a few tasks being solved properly than a large number of half-baked hacks, and I'd rather see one SoC project pulled off really well than two mediocre ones. Stuff like that is hard to tell in advance though, so both of you, try your best to make our (= mentors) estimations accurate. All we want is to be reasonably sure that the stuff that's listed in the application is going to get done in the end.

p.s.

jpetso's picture

I'll try to hang around on IRC (#drupal, #drupal-project) more so that prospective SoC applicants can talk to me live. Also, see my own application - or the Google cache, until the domain works again - for a nice example of how to do it really well. (Over-confident? Hell yeah! Only in specific areas, though.)

Good stuff! I'd like to see

stodge's picture

Good stuff! I'd like to see hooks (eg svn hooks) create a commit record in the db that is linked to a project issue.

SoC 2009

Group organizers

Group categories

Admin Tags

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: