User authentication in non-CVS repositories

Posted by jpetso on January 29, 2008 at 4:57pm

How to grant access to repositories at all is an important issue, and it potentially comes with a slight regression compared to the current work flow for managing CVS accounts. One advantage of CVS is that it's easy to administer - in terms of user accounts, that would be a simple "passwd" file that contains all usernames that are allowed to commit. Dead easy to generate, and at least possible to keep in sync even in an automated way. However, more recent version control systems are less nice to handle - also caused by a better eye on security concerns. An overview.

Options for the administrator

Compared to the straight-forward approach of CVS, pretty every other version control system has a multitude of ways to access the repository. Both Subversion and all relevant distributed version control systems are providing SSH and HTTP(S)/WebDAV as proposed solution, and maybe even a proprietary protocol (svn://, git://) for quick private projects. Depending on the access method, that requires one or more of the following configuration options:

For the proprietary protocols, it's relatively easy. The git:// protocol doesn't provide any authentication at all (so it should only be used read-only) and Subversion has a simple text file that specifies users and their passwords - although this requires passwords to be stored as plaintext (baaad!) and is thus highly unsafe and not recommended as well. Might be the only choice for servers with many restrictions, though.
HTTP(S)/WebDAV requires configuration of Apache, which in turn can involve any authentication mechanism that Apache supports. The most widely used of these mechanisms is definitely an .htpasswd file... so that too would involve only one file to be maintained for user account management. This approach is also the preferred solution for Subversion, and for Subversion one can even have an even more detailed access control list that is stored in a separate file and controls how and which directories may be accessed by which user.
SSH (the preferred way of authentication for all the distributed VCS) is arguably the most difficult authentication method to administer in an automated fashion, as it normally involves creating real system users on the server, and assigning them the right privileges. When taking this approach, one would also need unique groups to each repository so that different project maintainers can write to the same repository. I imagine this insanely hard to manage and integrate this with Drupal.
As a further alternative when using SSH, it's also possible to retrieve users' ssh-keys instead of creating new system accounts and granting them shell access. There's a bit of effort involved in setting this up ("Shared SSH" docs for Mercurial, Gitosis daemon for Git) but once it's up and running, this should be easier on the administrator than the real system accounts.

Options for the Version Control API

As you can see, that's loads of possible ways on how to set up user authentication. I'm asking myself whether we should
a) try to support them all, and have a switch in each repository configuration that tells us which one to use (the most work intensive of all options), or
b) just try to support the ones that involve one or two files at maximum which can be generated automatically by a cronjob-called script, or
c) write up a howto that exactly explains how the admin has to set everything up, and depend on this, or
d) leave all of this crap to the site administrator and don't interfere with authentication at all - just provide the means to associate accounts with committer names and be done with it (current state of the non-CVS backends for Version Control API).

One more thing that might need further thought is the current per-repository view on accounts - with SSH's approach involving multiple users, it's much rather one user account on the whole server that enables access to further repositories. So instead of applying to each repository separately, there would be just one "SSH access" tab in user/*/edit instead of a "[repository] access" for each repo. And for distributed VCS, the "Commit access" tab on a project would not let one more directory through the access check but rather add or remove the user's system account to the user group of the repository.

The discomforting thing is that there's so many possible combinations, some behaviour depending on how they're combined, and that many of the differences are not even similar between multiple usages of the same VCS (or backend, for that matter). I can't tell the answer to all of that, just wanted to have everything written up in one place so that people with good ideas have a grasp of the scope of this issue. As always, input is highly appreciated.

Comments

Apache

Posted by aclight on January 29, 2008 at 5:43pm

My recommendation would be to go with Apache and .htaccess/.htpasswd files. Since that's a supported method for all relevant RCS systems, and it's probably the easiest to get set up and the most likely method to be supported by shared web hosts (though even then it's probably not that widely supported), I think that's the way to go. It's also pretty simple.

I'm curious to know why SSH is the preferred method for the distributed RCS systems. Is that because there is not necessarily a need for a web server on the same machine as the RCS system?

Thanks for this in depth comparison. This will be very useful as we go forward on this.

Preferred method

Posted by jpetso on January 29, 2008 at 6:38pm

I'm curious to know why SSH is the preferred method for the distributed RCS systems. Is that because there is not necessarily a need for a web server on the same machine as the RCS system?

I guess it's a mixture between personal preference of the (hard-core Unix command line) developers, and reducing dependencies to a minimum. Another good reason might be the work flow itself: public repositories of a distributed VCS are often used on a server that belongs to the developer, and there's likely an SSH account available already (while setting up Apache would require memory and more work).

I also think that people operating on shared web hosts should be a major influence on decisions regarding the Version Control API - yeah, you're on a shared host that lets you run Subversion, but I would assume the number of people that is able to run version control systems (especially distributed ones) and yet not able to administer the whole system is very small. "Pretty simple" and "easy to set up" is a good argument... we might just distinguish between .htpasswd and a "Don't interfere with user authentication" option.

Shared hosts

Posted by aclight on January 29, 2008 at 7:58pm

I also think that people operating on shared web hosts should be a major influence on decisions regarding the Version Control API - yeah, you're on a shared host that lets you run Subversion, but I would assume the number of people that is able to run version control systems (especially distributed ones) and yet not able to administer the whole system is very small. "Pretty simple" and "easy to set up" is a good argument... we might just distinguish between .htpasswd and a "Don't interfere with user authentication" option.

Actually, the shared host I use allows subversion and svnserve, but is not yet running Apache 2.0, so it's not possible for me to use SVN via Apache. But I think Dreamhost might allow this (they allow Subversion, at least), as well as some others.

But your point is certainly valid--we probably don't need to concern ourselves with whether shared hosting accounts will be able to work with the system we're creating, but it doesn't hurt.

As for our GHOP tasks I don't think we should require anything more fancy than supporting .htpasswd syncing with a password stored in the Drupal DB.

About SSH

Posted by ezyang on January 30, 2008 at 3:04am

This is speaking from my experience managing a Subversion repository with SSH + svnserve. This is similar to what is detailed in "As a further alternative when using SSH". The workflow would boil down to this:

When a user creates an account, they need to submit an SSH public key to authenticate their account with.
Drupal must have write access to authorized_keys for some user with SSH access
Drupal adds the authorized key to this file, prepending it with the command parameter that invokes svnserve and assigns that user's username to the key. It looks like this:
command="/usr/bin/svnserve -t --tunnel-user=username -r /home/drupal/svnroot ssh-rsa AAAAB3Nz... [rest of key]
The user now uses ssh+svn:// to access the repository; they, however, do NOT have shell access
Only one SSH account for Subversion users is necessary, as it is possible to assign a different username for every SSH key. For convenience, this would be the same one the web-server is running on; ideally it would be its own, empty account, which ran a cron job to update its authorized_keys file.

We'd have to train users how to use pageant in order to use their SSH key. Also, the server has to support SSH and arbitrary users (not easy if you're on shared hosting). But the benefits accorded by SSH, especially in terms of security and account management, are very significant. Also, much of this could easily be automated.

...getting nicer

Posted by jpetso on January 30, 2008 at 8:51am

So you can do the same thing with SVN as well... nice. That means we've got two exact same methods (.htaccess/.htpasswd, and SSH keys) working the exact same way for three different version control systems, and both work by just writing files to some fixed location. Suddenly, it doesn't look so impossible to implement anymore :D

Seems like we really want to concentrate on those two authentication methods. Now the question is just how this can be abstracted out so every backend can easily make use of it.

subversion + apache/dav + authz_svn is a good option

Posted by jandd on August 6, 2008 at 7:54pm

We use the combination of subversion, Apache dav_svn and authz_svn files for about four years now. We authenticate against a company wide LDAP user directory and grant read and/or write access to projects or even only modules of these projects to individual users or groups of users via svn_authz. This combination works very reliable and needs minimal administrative effort.

The authz_svn file could easily be generated. If another user authentication method is required (i.e. a passwd like file) all available Apache mechanisms can be used. I'm willing to help if there is a need for svn.

Jan

User authentication in non-CVS repositories

Comments

Apache

Preferred method

Shared hosts

About SSH

...getting nicer

subversion + apache/dav + authz_svn is a good option

Issue tracking and software releases

Group organizers

New groups

Group notifications