Version control wrap-up, part 1: modules

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
jpetso's picture

It's over! Google Summer of Code 2007 had its official "pencils down" on Monday, 19:00 UTC - which was 21:00 at my place, perfect for a last sprint... but I digress. According to Google, mentors are supposed to evaluate projects based on the state that they were in at that time. Which means it's time for me to wrap it up and explain what I achieved in those two or three months. In addition to these writings, I set up a test site at http://www.petsovits.at/versioncontrol/ where you can try out most of these things in action.

The wrap-up is becoming too long to conveniently fit into one single blog entry, so I'll split it up into two halves. This one basically covers the hard facts: which modules I wrote during the Summer of Code, what they do, and how they work. (Part 2 is now also available.)

Gimme the heat

Essentially, the task was to rework cvs.module in a way that its higher-level functionality (commit logs, repository/account management, project node / project release node integration) can be used without specifically depending on CVS. We came to the conclusion that it would make sense to have a pure API module (going by the name of Version Control API, as collectively decided with the Drupal community) and re-implement cvs.module's user-visible functionality on top of this API. Here's a short rundown of each module.

Version Control API

Obviously, this is the core of the project. It contains the API itself and the admin screens, where the latter includes repository management, account management and general settings. (Link to the project page.)

Repository management is very pluggable for both VCS backends and external modules, so every module that needs to add some information to repositories can do so. When not being extended by other modules, repository settings include the name and root path/URL of the repository, as well as a repository-specific account registration message and URLs of external repository viewers and/or issue trackers. Account management, in its vanilla form, is very bare-bone and only shows the list of Drupal users to VCS accounts, with a link to the edit account form on the user page. General settings include the VCS admin email address, the authentication method (vanilla only provides 'none', so every user with 'use version control systems' permission can create new accounts) and messages shown on the account creation page.

The API is a mixture of retrieving stuff from the database and delegating to backend modules. It's extensively (some would say ridiculously) documented with apidox, in order to make it very straightforward for higher-level modules to use the API and for backends to implement it. For that matter, I have also been maintaining an example backend implementation called "FakeVCS backend" which demonstrates how functions and their result data might look like. This is a free-for-all especially for backend module authors who can simply copy-and-paste apidox and function signatures into their own backends and then use the demo code as a template for their implementations.

In essence, the API is built around various types of arrays (think of them as objects) which can be retrieved and passed around:

  • Repositories: contain fundamental information about the repository, like its name, root path/URL and the backend that powers this repository.
  • Items: Files or directories inside a specific repository, including information about its path, type (file or directory) and (file-level) revision, if applicable.
  • Accounts: Not represented as an array but rather as a combination of Drupal uid, VCS username and repository.
  • Operations (commits, branch operations, tag operations): Provide information about the author, repository, containing directory, and date/time of the operation. Commits additionally include the repository-wide revision (if applicable) and the commit message.
  • Commit actions: A set of modifications that happened during a commit, including information about the type of the action (added, modified, moved, copied, merged or deleted), the new/current version of the affected item, and its predecessor(s).

Repositories, accounts and operations are managed by the API module itself, and can be amended by backends and other modules by implementing the appropriate hooks (hook_versioncontrol_*(), where * is one of 'repository', 'account', 'commit', 'branch_operation' and 'tag_operation'). Items and commit actions are managed by the backends. Each of those objects may contain additional VCS specific information - for example, the CVS backend adds passwords to accounts, modules and log retrieval information to repositories, and the commit branch to commits. All in all, that makes for pretty good flexiblility.

A backend does not need to implement all functions that the Version Control API defines. The idea is that functionality for retrieving fundamental log information (items and actions that correspond to commits and branch/tag operations) is mandatory and likely to be stored in the database, whereas more advanced functionality like item history, directory/file contents, file annotations, and listing all branches and tags of an item, is optional for backends. That's because it's likely to directly interface to the VCS instead of querying the database, and this functionality is both harder to implement and potentially slower than the log retrieval functions. If a module makes use of an optional function, it has to check for its availability before calling it.

So, that's essentially the way that the API works, more details can of course be found in the module itself. (I think I should copy this explanation to a HACKING.txt file, makes for a great overview imho.) I hope we're not running out of space? The other modules still need to be covered as well... but it's a wrap-up, so I'll take the freedom to write a lot.

Oh, and external modules are, in general, not supposed to access the API module's tables directly. That's because you'd miss out on all the post-processing stuff that happens inside the API's retrieval functions - adding VCS specific information, filtering out unauthorized users, et cetera. I understand that may have performance implications in some cases, but if so we need to make the API powerful enough to get enough filtering information so that it can do the queries by itself. Sounds like a query builder... maybe something like this will be included sometime.

As the Version Control API is a pure API module (with a user interface for configuration, but not for end-user functionality), I split off a few extensions into separate modules. Three of them (Commit Log, Commit Restrictions, Version Control Account Status) implement very general functionality and will probably be used in almost all cases, so they have been put into subdirectories of the Version Control API. That way, they don't clutter the (already very extensive) API module while still being easily deployed - no hassles with fetching 6 different modules from their separate project pages.

Commit Log

This module has already been covered quite well in another status report, I don't think it's necessary to reiterate all of this once more. Basically, it uses the versioncontrol_get_commits() API function in order to retrieve an optionally filtered set of commits which is then displayed on a page. Very straightforward, you can see it in action on my demo site, although with a not all too extensive data set.

Commit Restrictions

Did I mention that the Version Control API also provides access hooks that enable any module to grant or deny access for commits, branches and tags? Previously, those restrictions were only given in the xcvs-config.php file, that is, they were private to the CVS hook scripts. By providing the appropriate API function, hook scripts can now rely on everything being done for them, and they just need to act on TRUE or FALSE. And because this functionality is so modular, it doesn't hurt to move even the generally applicable commit restrictions out of the main API module into a separate module, consequently named Commit Restictions.

What it does is extending the settings form for each repository with a set of possible restrictions (regular expressions for allowed/forbidden paths as well as for valid branches and tags), and implementing the access hooks. You can't see it on the demo site though, because neither are you granted admin permissions (if I didn't already provide you with an admin account) nor is there a real CVS repository that is hooked up to my site. However, it works, and you can try it at home.

Version Control Account Status

I mentioned the challenges for account approval in an earlier post, and only implemented the solution in the last few days. As mentioned further above, Version Control API's native account management only includes very bare-bone do-it-yourself account creation that doesn't need any approval by the VCS admin. Which may be cool for small sites with trusted users, but doesn't work for larger ones like drupal.org.

For that purpose, cvs.module provides a CVS application form (you probably might not get to see it anymore) and an additional tab on your user account edit form. The latter is being extended for the admins so that they can choose your account status (queued, pending, approved, declined, disabled) after you have applied for CVS access. All of that, including the conditions that you need to mark with "Yes", is hard-coded and as far as I could see, suitable for one repository only.

Version Control API works a bit differently. For one, account creation and account editing are done by the same form. Then, users can create accounts in different repositories. And third, I wanted to provide the opportunity to replace the current method with something else if needed. That's the main reason that Version Control Account Status is a separate module. ...oh right, the feature set.

This module provides two additional authorization methods that can be selected in the admin settings: site-wide approval, and per-repository approval. They were similar enough to put them in one module. The difference between those two methods is that site-wide approval is done once and afterwards the user can create accounts for all repositories known to the site, while per-repository approval requires the admin to approve VCS accounts for each repository that the user applies to.

In consequence, the application form for site-wide approval doesn't include registering a specific account, and thus needs to happen before the account is registered at all. Which is a bit different than drupal.org currently works. Per-repository approval on the other hand is more similar to the current workings of drupal.org, and can look exactly the same when only one repository exists on the site. (If there are more of them, the applicant needs to select the repository before getting to the application form.)

CVS backend

Just an implementation of CVS in order to provide the mandatory backend hooks. The module part is mainly database storage and retrieval, and the adapted xcvs-* trigger scripts process command line input and call the appropriate API functions. The challenge here was mainly to get the database tables right (and those of the API module) while the actual port was not too difficult. Compared to the original CVS facilities in cvs.module, this one still lacks account import/export and the capability to fill the database with cronjobs by parsing the log file. I'm still going to add these, though. (Link to the project page.)

Version Control / Project Node Integration

This module accomplishes the association of nodes of any content type (including the Project content type that is provided by the Project module) with maintainers. Somebody with node editing permissions can assign a repository and a directory inside this repository to the project node. Doing so not only provides a project specific commit log page at node/$nid/commitlog, but also an additional tab on the project page, called "Commit access". Like in the original cvs.module (where it's called "CVS access", of course), node owners can assign co-maintainers. All of that is put to use by the commit/branch/tag access hooks being implemented, and any VCS user may only commit to the projects that he maintains. (Link to the project page.)

This module does not cover the Project Release nodes, and that's a good amount of work that is still in the open and needs to be done before Version Control API & friends can take over drupal.org.

"Scrap that, we want a real wrap-up!"

With personal statements, metrics, and plans for the future? Hang on, I'm working on it.

Comments

nice work

moshe weitzman's picture

nice writeup. thanks.

do you know of anyone working on a subversion backend? seems like a logical next step.

Subversion backend

jpetso's picture

There sure seems to be some demand for such a thing... you're already the second one asking that question, not including my employer who would also like to see a Subversion backend. Easy answer:

  • No, I don't know of anyone currently writing a backend. Guess it would be beneficial to have a packaged release of the API module in order to stimulate development here, but I still want to add some minor stuff before doing so. Anyways, the backend API is quite stable (it hasn't changed in the last weeks, not counting the one or the other addition) and people can start writing backends now. I'm going to stress this in part 2 of my wrap-up anyways, but it doesn't hurt to mention it once more. For Subversion, aclight's work for sure is a good starting point for writing a backend.
  • I'm currently looking into a university practical where I would implement backends for Subversion and one other VCS, but I have to talk this over with the university staff. In case I get to do it, expect to hear about it.
  • I'm wondering if aclight has any ambitions here, or if he's satisfied with the current quick-and-dirty solution that he's already got running. Adam?
  • Have a mentioned that writing new backends is easy and well-documented?

Subversion back end

aclight's picture

As jpetso said, I've already ported the cvslog module and xcvs scripts to work with Subversion and they seem to be working quite well, though they haven't seen heavy use by the users of the site I'm developing yet so there may still be some bugs.

I haven't looked in depth at jpetso's work yet, but my feeling was that eventually I would write a Subversion back end (unless someone beats me to it) assuming that these new version control modules have a bright future. I don't want to spend much time writing a Subversion back end if the version control modules aren't going to continue to be developed actively, but I suspect that Drupal will move to them eventually, so that shouldn't be a problem.

Right now the site I'm building is sort of in a beta release state. I'm working on fixing immediate bugs with that, After it's stable, I'm probably going to first work on adding some features to some of the modules the site uses, and eventually I would get around to writing a Subversion backend.

If anyone is anxious to get this started now let me know and I'll send you the latest version of my Subversion integration code . If others are interested in a Subversion backend, maybe it would be a good idea to create that project now so there's a place to put code and issues.

Bright future

dww's picture

I have no intention of porting the existing cvslog.module to D6. My hope is that versioncontrol_cvs.module and friends will be all we need for CVS integration on d.o for D6 and beyond. Myself, AjK, and hunmonk are already officially co-maintainers of all of these modules (though I haven't actually gone through all the code in audit-level detail yet). Unless there's an unforseen disaster, I fully expect these modules to be the future for version control + project integration.

Sadly, that means getting all the release node + VCS integration working, since that's absolutely essential for the Drupal Release System(tm) (I can't really call it "new" after existing for almost a year). So, that's going to be a lot of work, but perhaps doing it via D6 FAPI will be easier, we'll have to see.