Commit restrictions in distributed version control

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
jpetso's picture

I just spent some time in #git to further investigate how our CVS access control scripts would translate to distributed version control systems, in order to help determine the right direction for our new GHOP-powered Git and Mercurial backends (currently being worked on, more on that to come later). Short answer: keep out of that altogether - it's not DVCS style to restrict project maintainers like that. Read on for a more detailed analysis.

For those who are not totally familiar with the CVS access control scripts, they're currently doing two things basically:
1. They only allow the commit to happen if the commit location is part of those projects where the user is registered as project maintainer. In addition, sandbox commits are allowed to every CVS account owner.
2. They forbid non-standard tag and branch names.
3. Not implemented currently, but would also be possible: checks on the code itself, say, for making sure that the code adheres to coding standards or satisfies all unit tests.

Is it necessary?

Now, repository layouts work very differently in distributed version control: instead of having a huge repository where everybody commits to, there are usually lots of small repositories for each project. That kinda makes 1. obsolete: users don't have commit access to other projects anyways because those are contained in wholly different repositories. Also, users would probably have their own sandbox as separate repository, or even more probable, they'd have a separate repository for each of their sandbox projects.

As for 2., controlling branch and tag names needs to be done in CVS because branches and tags exist repository wide, so granting users more freedom on that would cause chaos all over the whole repository. However, in distributed version control, branches and tags are always local to the respective project and can easily be deleted too (which means that the chaos won't appear without active maliciousness of the maintainer), so this main reason for restricting branch and tag names does not apply here.

Of course, branch and tag names are also used on drupal.org for determining how release tarballs should be named, and restricting names means that one can rely on the fact that each branch or tag can be transformed into a release name (and subsequently be released). The folks on #git didn't consider that alone a good reason to disallow branches and tags that are differently named, and indeed the lack of super strict name control could enable more possibilities for developers, the most prominent one being feature branches that can be developed in the open before being merged back into the master or stable branch. This wouldn't hurt other projects or developers - the only precondition is that the release scripts don't try to package everything but only consider tags and branches with "release compatible" names.

Which leaves us with 3. as the only remaining "valid" justification for access control, and doing it for just this purpose been countered by the people on #git with a couple of points:

  • Stuff like style checks can be done locally in the commit hook instead of server-side on receiving the commits. The latter would be a problem anyways as there are many potential commits in a "push" operation, and if, say, the first one is "invalid" then all the rest of the locally committed changes has to be reverted and redone. Not good.
  • If people intentionally remove the commit hooks from their local repository copies, they don't deserve project maintainership and their commit access should be revoked anyways.
  • Running unit or compilation tests on each commit is likely very hard on the server, and would be better suited to run as cronjob (together with a "dashboard" to report errors or other issues).

Conclusion

So, overall, the design and workflow of distributed VCS makes a good point against pretty much all of the arguments why access control scripts would be needed for those repositories. drupal.org is already one of the most restricted repositories that I know of, and if we could get by without these measures then anybody can. (Also mind that admins and organizations that disagree here could still do all of these access control checks, of course - it just wouldn't be integrated with Drupal.)

Comments

Very interesting!

ezyang's picture

This was a very interesting read for me. I have a few comments:

I don't know about Git, but Mercurial has "changegroup" /"prechangegroup" hooks, which handle all of the commits as a whole. While this does not completely fix the concerns of the first bullet point, it does make some things a little easier (for example, we can prevent floods of commit emails by bundling them together in changegroups).

In the end, I think implementing hooks, but for local repositories, would still be a good idea. For instance, Drupal has very specific branching and tagging conventions, and it'd be nice if a module maintainer could get instant feedback if they did something wrong.

I don't buy it for branch/tag names ;)

dww's picture

It's not just for the sake of the packaging scripts or release node integration (although that's a big part of it). One of the other major concerns here is preventing maintainer error. One of the others is predictability. End-users have to have some way to know what they're looking for and where to find it. If we allowed total freedom for branch and tag names, the ensuing chaos would make the Drupal contributions landscape a swamp of unusable mayhem. Look at the hell that broke lose when I forgot about the bizzaro backdoor way to create branches and tags:

http://drupal.org/node/152832
http://drupal.org/node/198278

Furthermore, being able to have a clear 1:1 relationship between tags and versions is essential for functionality like the CVS deploy module, the drush package manager (think "apt-get" for Drupal), and more.

Plus, given how little experience people have with version control and release management at all, some strict guidelines to remind people "no, that's a tag, you can't call your branch that," and vice versa, is quite helpful.

So, overall, especially in the Drupal contrib world (since that seems to be what you're talking about), I'm unconvinced that we don't at the very least need validation on branch and tag names.

That said, it would be nice to allow more advanced users the ability to add other kinds of tags besides release tags, and it would be slick to allow other branches besides the "official" release series branches. But, at this point, there's so much confusion and trouble created even in our relatively simple and fixed world, I don't know how we'd allow that power more widely without it being abused (unintentionally) to make a huge mess.

The access control itself makes sense. However, then it's just a different problem -- if you have co-maintainers on a given project, you'd like to be able to specify that on the project node admin UI like you can now, and have an automated system that adds the ssh keys or whatever to the repository's configuration automatically -- we're not going to want to mess with that stuff via manual effort.

I agree it'd be a bad idea to run automatic checks on code style or unit tests at commit time, but we don't do that now, so it's sort of a red-herring for your comparison.

Anyway, thanks for the write-up... definitely food for thought.

hm, yah

jpetso's picture

Ok, so we still need branch and tag checks. At least, those are not too hard on the user as branches and tags can be easily renamed into the right thing.

For the long term suitability to distributed VCS, we might want to enable multiple repositories per project, with one of them being the stable branch and others being development branches of different developers. The former could still keep the tight grip on tag and branch names, while the latter should be less visible but more freely manipulatable by the developer. (Launchpad.net has the "multiple repositories per project" thing today, and it's sure useful for having all developers and code in one place.)

This is a fun discussion

mikey_p's picture

So today, while on another totally different whim, I happened to install Git, Mercurial and Bazaar and start reading http://hgbook.red-bean.com/hgbook.html, http://betterexplained.com/articles/intro-to-distributed-version-control..., and I watched http://www.youtube.com/watch?v=4XpnKHJAok8 as well.

Anyway, based on my initial assessments today, I came to see the concept of commit restrictions as a rather silly idea. It's MY repository, why would you want to restrict my commit, tag, or branch? After all, you aren't going to be building releases from my repository, and I can't force anyone to pull a oddly named branch either.

The main concern I see, is that whatever repository that our releases are built from, is kept free from extraneous branches and tags, but even then, most DVCS allow easy branching to a new appropriate branch, and then the deletion of the original, without losing file history. So if we allow contributors to push to a central repo on hg.d.o or git.d.o, then we need to do access checks there to prevent bad tags and branches, so that bad tags can't be used to publish a release from.

Seems to me like there's alot of ways that we could try to setup something with a distributed system, and there are alot of more pressing questions than restricting branches and tags in users repos. Maybe this is just a semantic difference that I'm picking up on and we're really thinking the same thing. It's getting late here and I'll have another look at this tomorrow and hit IRC.

-Mike

P.S. One significant concern I have with Git, which I started looking at first today, before hitting Bazaar and Mercurial is the lack of a good Windows client (at least that's my understanding, I neither use, nor have a use for any form of Windows). However for my purposes and at least for Drupal as well, that would be a rather large problem. On the other hand TortoiseHG exists.

all your branch belong to us, harharhar

jpetso's picture

The thing is, it's not only YOUR repository, but it's also supposed to be used by hundreds of other developers in the Drupal world. Maybe not as project maintainer, maybe also not as patch provider, but at least they are going to download and test your project and maybe roll it out on their site from the version in the repository. dww's point is that while we don't want to artificially restrict project maintainers, we also want to reduce errors caused by inexperienced or lazy maintainers, plus the existence of such checks can make the whole ecosystem very predictable - say, you're an admin, and you can rely on the fact that each version that is compatible with Drupal 6.x will be named DRUPAL-6--*, and that the official repositories are not stuffed with working branches that are differently named in each project. More consistency, so to say.

So, in your private repository you can have any branches and tags that you like, we'll just be restricting the ones that are being pushed to the server, like, say, drupal.org. Optional setting for the admin, too. It's not quite DVCS style to handle it like that, and yes, there are more pressing concerns than this, but if it were to be rolled out on drupal.org then we'd need those restrictions. I don't like it either, but dww certainly demonstrated the need for having those.

I don't like it either, either. ;)

dww's picture

Note: in an ideal world, I wouldn't want this either. If we could trust every d.o code contributor to read the docs, understand the conventions, and get this right, I'd be thrilled. Life says otherwise.

I think we're in agreement

mikey_p's picture

I think we're on the same page here:

So, in your private repository you can have any branches and tags that you like, we'll just be restricting the ones that are being pushed to the server, like, say, drupal.org.

As I went to bed last night, I got a pretty good idea of how this could work. Each user has their own repo for each of their projects, and d.o has a repo for each project. The repo on d.o could maybe even have the branches already set up for each project (a DRUPAL-6--1 or DRUPAL-5) and not allow the creation of other branches, or have appropriate checks on creating new branches, and then its up to each developer to push from their stupid-content-tracker branch to d.o into say DRUPAL-6--1 branch.

Believe me, I have the utmost respect for our current repo maintainers, and don't want to create more work for them, but I do believe that fixing errors such as the erroneous branch names that popped up, will be easier with DVCS.