Project tracking and releases using Subversion

Posted by aclight on May 28, 2007 at 10:44pm

Hi Everyone

For the past few months, I've been working on a fork of the cvslog and project* modules to be able to use them with subversion instead of cvs. After consulting with dww, maintainer of these modules, I learned that someone (jpetso) had applied, and by now has received, a Google SoC position for creating an RCS abstraction layer. Eventually, such a layer would presumably make it a little easier to use a version control system other than CVS.

Since I needed to use subversion sooner rather than later, we decided that I would do an initial fork and try to get everything working with subversion. The changes I had to make to get subversion working would be help jpetso to know what needed to be changed and what could stay the same. My goal was to change as little as possible, and so I didn't change any function names or t() strings. I am using the locale module to take care of "translation" of CVS-->SVN and various other string changes. I also didn't need to change any of database tables, though the content that's stored in them is slightly different. I don't think subversion has an output for the # of lines added or subtracted, so those items aren't included. The value now stored in the # lines added field is the previous revision number. I think that's the only database change that was major. For everything else, the quantity stored in the field is the same or very similar to what is used in cvs.module.

I've attached all of the scripts and files I needed to change to get this working (I think). Change the extension to .zip after you download the file. Here's a summary of what needed to be changed:

Surprisingly, the cvs.module itself didn't need to be modified that much. The cvs_fetch_repository() function has been changed to call the svn binary properly. Note: I couldn't get cvs_use_file == 1 to work, but I'm not sure if this is due to my web host having security settings that block it from working. The other major change in cvs.module was to the cvs_process_log (now cvs_process_svn_log()) function. The subversion log XML parsing of this was taken from the subversion.module file created by halkeye. I also added a few ancillary functions as needed, such as cvs_get_svn_tag().
My web host doesn't support subversion over webdav, so I have to use svnserve. Because of that, I have to store subversion user passwords in plain text in my passwd configuration file. Since cvs.module passes passwords from the CVS application form through crypt(), I wouldn't have a way to give the user access to my repository. So, I now pass the passwords through str_rot13() to at least obfuscate the password. The clear text password can be seen in the CVS accounts page by clicking the Export tab.
The xcvs scripts were the hardest part of the conversion. Subversion and CVS are just fundamentally different in several ways, and so shoehorning the way that subversion works into the established database structure set up for cvs was a little tricky. Subversion uses revisions, and doesn't track modifications to individual files by keeping version numbers for the files. Furthermore, subversion handles branches and trunks in a somewhat more informal way than cvs does, and so there aren't any commands (AFAIK) for extracting the branch or tag of a certain transaction. The subversion book recommends keeping a /trunk, /branches, and /tags directory under the main directory for each project. Branches and tags go in their own directory under that hierarchy (eg. /branches/DRUPAL-5--1 or /tags/DRUPAL-5--1-6-beta). My scripts require such hierarchy to get branches and tags right.
I'm not sure if this is true of cvs, but svn allows you to use pretty much any type of script as a hook script. Since I'm most familiar with PHP, that's what I used for my subversion pre-commit and post-commit scripts. Those scripts should go in the hooks directory within your subversion repository. Make sure to make them executable.
Another nice benefit of subversion is that all commits are atomic. So, that allowed me to move some of the processing of a commit into the pre-commit script, but keep most of the logic and permissions in the xcvs scripts. If at any time the xcvs script(s) called by the pre-commit script gives an exit != 0, then the commit won't happen. I think subversion has a hook that is called even earlier than pre-commit, and which could be used to determine if the user has permission to commit, but without knowing what files were changed (and therefore which project node the commit affects), the script wouldn't know whether to accept or deny. So, the slight downside is that a commit with a lot of changes might get denied, but only after all of the changes were uploaded to the repository. But, since svn is more efficient with commits anyway, I don't see this as a big issue.
You'll notice that I included bootstrapping of the Drupal database in the xcvs scripts. This is nice since my database tables all have prefixes.
I modified the package-releas-nodes.php file to package from a subversion repository instead of cvs. I wanted files saved as .zip files on my site, so I've made that modification. I'm also not keeping Drupal code, so I removed some of the code to take care of that. I also have to run the script as a real web process (via a browser or wget) instead of from the command line, because my host's configuration of php5 from the command line doesn't work right.
Because of the above restriction, I changed the shebang lines to match the path of php5 on my host. You may need to change those back to /usr/bin/php or something similar.

Note: Though I have run some tests with this code, I wouldn't say it's close to being stable yet. Furthermore, though I've tried to code in a security conscious manner, I don't have a lot of knowledge about security issues WRT PHP, etc., so I can't promise the scripts are as secure as they should be.

I'm not planning on making this into a module of my own, because:

I don't have the time to fully maintain it
It's so close to the official cvslog module that it doesn't really make sense to have two separate modules that do almost the same thing
As I mentioned before, jpetso is going to work on the RCS abstraction layer this summer, and so hopefully his work will make maintaining two separate RCS modules unnecessary

With that being said, I'm happy to answer questions people have about why I did something one way versus another, and generally how things work. Outside of the xcvs and hook scripts, I'm really only familiar with the code that I modified, so don't ask me about how cvs.module or the project* modules are working outside of where they interact with svn.

On the same note, I'm glad to hear comments/criticisms of the changes, especially related to security problems. I will be using this code on a production site in the somewhat near future (at least until the RCS abstraction layer is completed), and so if there are problems I'd like to know about them.

I hope these changes are useful to some of you out there.

Adam

Attachment	Size
project_svn_acl_0.zip_.doc	53.99 KB
package-release-nodes.php_.txt	26.96 KB

Comments

Thanks, man

Posted by jpetso on May 29, 2007 at 6:36pm

This is great. Kudos for your work, I'm sure I can put it to good use!

However: I love pesto, but they don't quite spell me like that. ;D

Sorry

Posted by aclight on May 29, 2007 at 6:58pm

Hm....I guess not. Sorry about that.
AC

Hey, no problem

Posted by jpetso on May 29, 2007 at 7:05pm

You don't need to apologize, really. I mean, every time I only think of tasty noodles with pesto, it makes me feel good instantly. In fact, I even had those for lunch today :)

updated package-release-nodes.php

Posted by aclight on June 3, 2007 at 9:45pm

I realized today that the package-release-nodes.php script included above was one that I had partially edited but not the actual working script. I've attached the working script to the original post. This version also has all cvslog patches applied to it (by hand) so it is updated.

This version of the script also has an optional feature that checks to make sure that the host running the script and the host that called the script (via http) is the same host. This prevents someone else from manually calling the script themselves.

To call the script, I use wget and point it to a url like:
http://www.example.com/packaging/package-release-nodes.php?argv1=tag for tags and
http://www.example.com/packaging/package-release-nodes.php?argv1=branch for branches.

clarification on package-release-nodes.php

Posted by dww on June 5, 2007 at 5:49pm

note: package-release-nodes.php was never intended to be invoked via the web. on d.o, it's run as a CLI cron job, and the script lives completely out of the webroot in some dark, hidden place no one but those of us with shell access can find it. ;) however, your post here brings up the good point that we should do a better job of securing these scripts for the default installation of project.module and friends. i'd definitely commit a patch that added your $restrict_to_server functionality (enabled by default). it might also be nice to have something in .htaccess, too, but perhaps the restrict_to_server is all we need.

also, i hate to say it, but str_rot13() is hardly obfuscation. ;) i think str_rot13 is only provided as a joke from the php maintainers, no one should seriously consider that any better than plain text...

FYI re: the xcvs-* scripts, CVS just invokes an executable file pointed to by the configuration files. these can be shell scripts, php, perl, whatever.

haven't looked closely at your code here, but it's certainly a great start for the RCS abstraction stuff. thanks much!

what, rot13 isn't encryption? :)

Posted by aclight on June 5, 2007 at 6:55pm

@dww

I realize that package-release-nodes.php wasn't meant to be run from the web. The problem is that my web hosting provider allows both php4 and php5. The default installation is php4. For some reason, I can't get several of the xcvs scripts, as well as package-release-nodes.php to run from the command line under php4. There is a problem with global variables not being truly global. I assume it's some kind of security setting the host is using that is messing this up, but I haven't been able to figure it out. The other option is to use php5. They have run php5 as fastcgi, but they also provide a way to use the fastcgi method via the command line. Whenever I execute the php command start_session() when run from the command line, I get the "headers already sent" warning. When the exact same script is run via a web process (still using fastcgi), I don't get this problem. Again, I haven't been able to figure out why this is happening. But that's why I'm running package-release-nodes.php from wget instead from php on the command line.

Incidentally, I've noticed that if I sometimes get Apache 503 errors when I call my version of the package-release-nodes.php script (using wget, but not using the web browser directly). I'm using the --no-cache option with wget. If I clear the drupal cache using the link in the devel module, the problem goes away for some time (but I'm not sure exactly how long), but then it reappears. Any idea what might be going on with that?

I agree that rot13 is hardly obfuscation. Do you have a better suggestion? Since svnserve requires clear text passwords (I can't believe this is the case, but as far as I can tell it is true), I have to be able to recover the user's password when they enter it into the CVS application form. So, short of getting pretty complicated and using some form of public/private key pair, I can't think of a better way. I guess I could write a reversible algorithm that is a bit more obscure than rot13, but I don't see that as a big improvement. But if you're just making this point so that someone else doesn't think this is actual encryption, point taken!!

I can roll a patch for the $restrict_to_server and submit that. That's pretty simple.

see http://drupal.org/node/149575

Posted by dww on June 5, 2007 at 9:32pm

re: $restrict_to_server -- actually, i just added a .htaccess to this directory by default, to prevent .php from being visable. so, the patch for $restrict_to_server isn't really needed... in terms of your specific troubles with CLI, i'd rather not go too far out of my way to support web hosting accounts like this. maybe i'm wrong, but i can't imagine many people trying to use project* and the RCS stuff on shared hosting with no control over their servers...

DRUPAL_BOOTSTRAP_FULL

Posted by jpetso on July 23, 2007 at 1:20pm

Btw, the reason why you're getting "header already sent" warnings with DRUPAL_BOOTSTRAP_FULL is that the line
#!/usr/local/php5-fcgi/bin/php
is also present in xcvs-config.php, where it doesn't belong as it doesn't run by itself, the file is only included and doesn't need the shebang. Removing that line should make it possible to do full bootstrapping without warnings, which at least worked in my version of that file.