Personal sandboxes/repos/branches for issues for git

You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

We all agree that we want some sort of github-like fork/clone to participate in a project with git.
This means, that somehow, we want to be able that, for a given project and a given issue all users which have submitted a ssh key can contribute.

What we haven't decided on is how to do that, there are several ways.

One repository per project, per user

Each user gets a clone of the main repo hosted on d.o where she can do whatever she wants to.

When posting to an issue in the project, she can decide whether she wants to just comment, attach a patch (as usual), or get a patch from her personal repo. For the latter, she would select the branch which contains the commits which should turn into a patch. The posted comment then allows downloading the patch, viewing the selected branch and it needs to contain the commit id.

To merge the changes of the patch, the project maintainers could click on a link called 'merge' which would in turn merge all the commits from the personal branch into the corresponding branch of the project. We would need to take care here that at the time of the post, branch refers to one commit, but at the time of the merge, branch could refer to more commits we don't really want because they're potentially unreviewed. We also need a display to see if we have a clean merge between the project's branch and the personal branch.

A simpler alternative is simply to have the user submit a range from one tag to another as part of the post in the issue queue. The range of tags should then promptly be turned into a patch file, which would be the definitive representation of the patch and which would be tested by the buildbot, commented on or downloaded by other coders (particularly the ones who are not Git users) and eventually applied and committed by the project maintainer upon final acceptance.

Pro:

Lines of responsibility are 100% clear. You own your repo. Other people own their repos. Official repos are owned by maintainers. You can do whatever you want in your repo without fear of stepping on someone else's toes.

Likewise, permissions are clear. You have permission to push to your repo; nobody else does. Collaboration is via pulls or posted patches, not pushes or commits.

Ease of implementation: Because a public Git repo can live behind any URL on the net, and because there is no need to give multiple people write access to shared repos, this system can be set up with a minimum of work. We needn't even give users official Drupal-hosted repos -- users can submit URLs and branch/tag names for any public repo they have, hosted wherever they like. The first version of this can go live tomorrow -- just invite users to submit URLs and branch names together with the typical patches that they post to the issue queue. Then, at a later stage, the testbot can be augmented with the ability to fetch commits from a remote repo and generate patches from them.

Con:

When there are a lot of users collaborating, they don't see the work done by others, and they don't have one central place to work in, so work could be easily done twice. (Though this is going to be a problem anyway, unless we assume a 100% Git adoption rate among Drupal developers.)

It may be unclear from whom the project maintainers should merge. (This can presumably be resolved, as usual, in the issue queue thread. --mikebooth)

(A solution would be to allow other users to commit to an issue-specific branch in a personal repo, but that doesn't feel nice to me -- CorniI).
(I agree: giving users personal branches and then letting other users commit to those branches is overcomplicated, and will create security and ownership issues. --mikebooth)

One repository per project, per issue

Each issue gets a repository in which all collaborating users can commit. They either can work in personal branches (personal-[user]) or in a main branch. When they comment, they can, as above, choose a branch for which a patch should be attached.

Pro:

Everyone can work together and it's clear where to look for patches. (NOTE: This is also true for branch-per-issue, described below.)

When there are two users fighting about an approach, they easily can use a personal branch in that repo to develop 'their' version of a patch instead of collaborating with the others, if they choose so. (NOTE: This is also true for "one repo per project per user", described above.)

Con:

Disk space? As far as I (CorniI) remember disk space is not an issue, as a HD can be bought if necessary (This refers to an IRC discussion held maybe a half year ago). (mikebooth: Redundancy of disk space can perhaps be finessed, though not without some careful work. See how Github saves space.)

Unclear lines of responsibility. Should you make a commit on the end of the branch where someone else is working? Or would that be rude? Maybe it is always better to work on your own branch. But that creates additional complexity.

Complexity. If you thought reading a very long issue in the queue was a lot of work, just wait until that long linear issue becomes a tree.

(Of course, in practice, nobody will ever look through every branch of the issue repository. Instead, attention will be focused via the issue queue. Only a handful of patches will actually be debated in the queue, and eventually only one patch will be committed. But, given that this is true, do we really need such a complex infrastructure to support many parallel branches?)

Security vs Access: Who gets permission to commit to the repo? If we give it to everyone, spammers and griefers can trash our work. If we don't give it to everyone, we create a caste system: Those who have Git knowledge and commit privileges will have more influence than those who haven't applied for permission or who choose to use a different VCS. No matter what option we choose, adjudicating the resulting system will take up a lot of someone's time.

Consistency. Each per-issue repo will presumably be a clone of the parent project's official repo. That clone will be made at some point in time. After that, every time official commits are made to the official repos, all the child repositories need to be updated with git pull. So there will need to be a lot of cron jobs and complexity... or perhaps repos will be manually updated from time to time by users by hand, as needed.

One repository per project, one branch per issue

Each issue gets a branch in the main project repo, to which all collaborating users have access.

Pro:

One centralized place for collaborating for a given issue.

Takes up less space than the above solutions.

Con:

If there are two approaches to solve a problem, or commit fights, this isn't a nice solution, because we limit 2 users in one branch.
This will get the online-repo-viewer a long list of branches, imagine a branch for every drupal-related issue ;)

Both branch/repo-per-issue have the problem that a malicious user can chime in and bitch around and hindering collaboration. We need either a process which determines if a given user is allowed to commit to such issue-specific branches/repos or a fast way to ban such users.

(Many of the other cons of the repo-per-issue approach also apply to this one.)

Please feel free to add your own arguments, make this prettier, fix my mistakes, etc!

Comments

Per user / Per project

Damien Tournoud's picture

I think the "Per-User-per-project repositories" makes the most sense. We can provide a "branch from this point" to ease collaboration between several people working on the same patch.

Damien Tournoud

+ 10000000

EclipseGc's picture

I'm basically on board for this particular use case, the downside is we'll need quite a bit of HD space, the upside is we have ultimate flexibility.

Thoughts:

Each user needs the ability to have a clean repo to work on any given issue. Thus we're probably looking at a repo structure like thus:

users/[user-name]/[project-name]/[issue-nid].git

The user will, from that repo have the ability to add additional collaborators, delete old repos, spawn new (non-issue related) repos, etc. (I will try to provide a mockup today)

Additionally, the project maintainer(s) will have a screen from which they can see the pull requests for their project (i.e. contributors asking the maintainer to merge code from their repo back into the main). Once a maintainer has pulled code from a repo, the issue would be marked fixed (or something) and we could schedule an automated cleanup for all repos/tarballs created for the issue. (so that contributors can grab their code if they're so inclined)

I spent a bit of time on a mockup for the actual "issue" node yesterday. This is obviously very rough, but it's based on the per-issue-per-user approach.

http://skitch.com/eclipsegc/ni8nk/gitmockup

Hope it helps.

Eclipse

What you're sketching out

mike booth's picture

What you're sketching out here is per-user, per-project, per-issue repositories. That is overkill, in a big way. Checking out the contents of a repo is not necessarily a quick process; you won't want to do it more than once per project.

If you want to work on a project you clone (or fork, if you're on Github) the repository for that project. (We can include a handy "git clone"-able link to the project at the top of every issue queue page.) You only need one local clone of the core repo, and one local clone of each project repo that you're interested in.

Then, if you want to start a new branch to work on an issue, you just go to the appropriate repo and do

git branch ISSUENID

If you want to evaluate someone else's patch (say, one written by "mikebooth") and you know the URL of their repo and the tag/SHA1 of their proposed patch:

git remote add mikebooth [URL]
git branch mikebooth_ISSUENID mikebooth/[tag]

This will create a local branch named mikebooth_ISSUENID with the end of that branch pointing to the proposed commit.

I think that you're missing

CorniI's picture

I think that you're missing that a) the d.o repos will be cloned locally aka fast and b) the branch in which you work at the issue can be a remote of your local repo. But still, I would like to get the user out of the equation and get per-project per-issue repos with some handy copy&paste git remote and git fetch commands.

Hm, you're right, it's not as

mike booth's picture

Hm, you're right, it's not as bad as I thought.

per-project per-issue repos

fago's picture

I don't think this is a good idea, as it's a centralized approach. What if several users are working on diverse versions? Or a user works on it and then another one comes up with a completely new approach, then the first user continues to work on his version?
Having only branch per issue would result in a completely unlogical and confusing history for that branch - it doesn't show anything meaningful. Having a branch separately for each user creates clean histories, allows the user to control the code he is working on and still enables one to integrate improvements of others. But yes, we have to make it easy to do so.

update: Oh I see I got that wrong, I was actually talking about per-project per-issue branches.

Well for d.o. it probably makes no difference whether the repositories are per-project per-issue or per-project per-user. However for the user it's easier to work with the latter one as you automatically get a repos for each project on d.o. that contains everything you are working on.

I agree - this makes most

voxpelli's picture

I agree - this makes most sense.

One question is whether you should be able to authorize others to commit to your personal project or whether everyone needs to fork the project and push and pull from each other. I'm leaning more towards the latter - out of simplicity and maintainability.

Another question is whether it makes most sense to just be able to post specific commits to issues or whether it might be reasonable to link an entire branch to an issue and have it update the "patch" in some good way.

Also - non-issue branches should be allowed in the personal forks - the user should choose itself when to link something to an issue.

Lastly - does it make sense to have both a module repo and to let the maintainer have a personal copy? Would it make more sense if the maintainers copy was the module repo? I'm unsure myself - there's upsides and downsides with both solution. Having the maintainers repo being the module repo would be simpler - but would also pollute it with feature branches etc that might not ever make it into the master. In connection to this it has to be thought about whether the concept of co-maintainers should be scrapped - the maintainer can easily pull commits from the people he trust - there's no need for any formal permissions. Instead making it easy to switch between which repo is the current module repo could fulfill that role - perhaps even repos other than the maintainer's own could be the module repo without having to switch maintainership?

Agreed! However I think we

fago's picture

Agreed! However I think we really need the "branch from this point" functionality.

  • Per user / per project is much better than just having issue branches. Imagine the number of branches one would have for drupal core? :D Having a "branch from this point" would make sure that we get a proper patch history, which enables to easily review changes to a patch one might have already reviewed.

    • To be able to work with "branch from this point" efficiently, this needs to integrate with my single clone of the project. For that we could automatically add the branch to the users' clone on d.o. and show him the right command to check it out locally.
      Having multiple clones for different issues is surely way too cumbersome to work with.

    • Once I have started working on an issue and I have a branch for that the system should assist me to incorporating changes of others into my branch. Providing the proper command would be a first start. A way to cherry pick not conflicting commits directly on d.o. into my own branch would be a nice feature that might make sense to add later on.

    • Once we deployed that workflow and users are used to it we should disallow posting patches. Only that way we will be able to fully leverage the advantages as then we can rely on having git branches everywhere. Thus the testing bot can rely on git, we get patch creation histories and testers can easily check out the code to test and don't have to know how to patch.

What's the learning curve like?

webchick's picture

Won't this force all patch authors and reviewers to learn all the advanced Git commands like cherry pick and rebase and..? (Note: This is not necessarily a show-stopper, but it does vastly increase the training/documentation requirement.)

Can you put your workflow into command line format, so we can visualize how it would work for patch creators and reviewers?

Well I assumed that the

fago's picture

Well I assumed that the postable Git repositories are on d.o. - that way we ensure to preserve the Git history and others can still push their code from any outside repository to d.o. Furthermore this enables us to make use of our own web-based Git viewer in the workflow.

Also we should assist the user as much as possible and to automate things for him. So I try to figure out how the workflow could look like in detail:

Create a patch

To create a patch one would have a UI based possibility to clone the project X - so I get a personal copy on d.o. The system provides me with the commands to check it out:

git clone URL
git checkout --track -b issue-foo origin/7--1

I have everything I need and can start hacking and committing. Once everything is ready I do

git push origin issue-foo

to send the changes to my d.o. repository. Now I can go and post a reference to my work in an issue.

Review a patch

The reviewer sees my code and is able to go through its history using the web based GUI. Also we should a link to view/download a patch which can be generated by the Git web GUI. To allow the reviewer to give it a test we also provide the command for him:

  • For people have already a clone / commit access:
    git checkout branch
    git pull --squash URL branch //Adds the changes to the working tree, but doesn't commit it.

To commit the work:
git commit -m "#issue by foo fixed"
To remove the changes:
git reset --hard HEAD

  • For people don't having a clone yet, they can still easily test it just by doing:
    mkdir project
    git init
    git pull URL branch

Note, there is no manual patch creation or application required anymore. Newbies don't have to know how to patch and which options to apply so that the patches look right. Also one is able to give it a test without having to know how to patch.

Improving a patch

Finally, someone else looks at the patch and wants to fix it. To ease that I'd suggest to provide the branch from this point feature as suggested by Damien above. Thus I click that button, my repository gets the new branch including the code and its history and I can check it out:

git fetch origin
git checkout --track origin/issue-#42

// Start hacking and commit
git push

That's it.

@rebasing:
I think when we just merge in the changes from each issue as a single commit we don't need that. Still we can use it in some cases where we want to preserve single commits of others.
However having single merge commits makes it impossible to let git handle authors as it doesn't support multiple authors for a commit. So we would have to stay with the current convention to give credits. When we take over the users commits directly authorship would be retained, but I think this would extremely bloat the commit history of big projects like core.

@per-user-repository:
We have to think how the private repositories should be kept up2date. One way is to leave it up to the user. Another one would be to forbid committing to "official branches", so we can update them automatically. Also we could don't clone them at all, but then the user has to work with two remote repositories what is a bit more complicated..

First things first: Why reinvent this wheel?

mike booth's picture

From above:

We all agree that we want some sort of github-like fork/clone to participate in a project with git.

And I do agree. But my question is: Why isn't this Github-like entity called "Github"? Why on earth are we proposing to clone it? Is there a conspicuous surplus of extremely talented developers with spare time?

Here's how to provide Github-like forks and clones of the official Drupal git repos: Mirror them to Github. Now you're done.

The whole point of a distributed VCS is that it is distributed. The need for some sort of hard-to-configure, hard-to-administer, remote "sandbox" is not a stumbling block for developers anymore. They can commit to their local machine and upload patches to collaborate. If they want something better, they can push changes to $10-per-month Dreamhost accounts, or use gitosis to set up a server, or use Github or Gitorious. Then they just publish their repo URL and the tags/SHAs of their commits to other people, to the issue queue, and/or to the testbot.

(Now, I can see the logic of providing some kind of sandbox infrastructure for, say, designers who want to maintain a Drupal theme without knowing anything about version control at all. It would be ruthlessly designed to have a minimum of features and be strictly optimized for a Drupal.org commit workflow. But that's a different topic. The basic infrastructure for that -- Drupal.org running off a git repo -- isn't there yet.)

Github is a polished piece of work, built by hard-working, very knowledgeable full-time employees who have a headstart measured in years, and who nonetheless have to work hard to keep the thing running. Don't fool yourself into thinking that reinventing and administering Github will be effortless.

I don't see why reinventing Github's core Git-hosting features is a better use of time than, say, improving Drupal.org and its issue queue. Is it because Github has a scary monopoly? They have competitors. Is it because Gitorious' key technology is not free software? No.

If the fear is that developer discussion is also migrating to Github along with the code... that's even more reason to focus on reinventing the issue queue, rather than reinventing the DVCS hosting service.

I agree with this - before

voxpelli's picture

I agree with this - before replicating far to advanced repository support into Drupal.org we should think about whether Drupal.org needs to be able to provide advanced code hosting or if it's enough to just hook into the repos to find out the data necessary to support the issue queues and packaging systems.

The most important part of Drupal.org is the issue queue and project pages - not the vcs.

One set of tools

webchick's picture

My biggest -1 against this is I want our contributors to have to learn one (and only one) set of tools to contribute to any project in all of Drupal. When the more traditional patch workflow is followed, the same set of commands (cvs diff -up > patch.patch, patch -p0 < patch.patch) work on absolutely any bug, no matter if it's in core or if it's in a theme or module. You can hit the "bug bingo" and go to town. This is a huge and we as a community benefit from this kind of cross-over every single day.

Due to the limitations in CVS, something we've seen happen in the core queue this release cycle is a lot of teams moving to off-site repos to do their work, as you describe. Fields in Core and initial Views 7 port was done in a Bazaar repository. D7UX was in a Subversion repository. The Drupal 7 core theme initiatives are happening in GitHub.

When this happens, and people say "eff these crappy tools, let's go collaborate at $x external place" (be that a $10/month DreamHost account or GitHub or what have you), it blocks out 99.9% of the community (including core maintainers) from participating. Because they can't use the same tools they're used to, and they don't have time to learn new ones unless they're really determined. So we end up getting back a massive patch at the end that's totally impossible to review, with only the 3-4 people who actually worked on it having any real clue what it's doing.

So people moving their collaboration to GitHub, $10/month Dreamhost accounts, etc. is exactly what the move to a DVCS is intended to stop. We need for our contributors to learn one set of commands, one set of tools, and be able to use the same pattern to work on any project or patch in all of Drupal.org, as we've traditionally had over the entire history our project, save the past year or so. I don't see any way around this other than bringing at least some of GitHub's functionality under our own roof.

I see what webchick is trying

Pisco's picture

I see what webchick is trying to say, but I suspect that this isn't really a problem. The D in DVCS stands for Distributed after all. I think the important thing is to agree on and promote a certain workflow. The good thing about d.o is that it is a central place where people gather to discuss issues, i.e. the issue queue of a project, that's the value of d.o. We now have the workflow with patches which will continue to work even with Git. I strongly advise against trying to implement what others already have, because as mike booth has said, these people worked hard for making there services work, and it sure wasn't trivial. The good thing about Git (or a DVCS in general) is that now we're able to harness the power of these services for use at d.o. With Git people can clone project repos and work where and how they want, they then can come back to d.o and submit the patch they created, be it in form of a patch or in form of a merge request. See the section called "Ease of implementation" on this wiki page further up. If we can extend our infrastucture/software to allow such merge request, we would have a truly powerful and effective environment. Let try to explain how I see the usecase:

  • I find a bug in a module.
  • I create a ticket in the issue queue of that module.
  • I clone the Git repo from git://git.drupal.org/contributions/[project uri].git
  • I work on a patch
  • I push my clone to GitHub
  • I go to the issue queue and create a merge request giving the branch name and the url of my repo (it can could be anywhere, not necessarily on GitHub. Or else I could post my patch directly like it's done now.
  • The project* infrastructure can then automagically fetch my branch and create a patch for review by others or other can fetch my branch directly for review.
  • Someone else reviews my patch, enhances it and creates a new merge request (or posts the patch).
  • Finally a project maintainer fetches the changes (or applies the patch) he wants.

With this workflow the collaboration happens on d.o, but it allows to harness the possibilities of the D VCS.

I think most people will work with patches, but those who are comfortable with DCVS should be able to make use of the advantages, and that's exactly what people will do because that's what DVCSs are made for.

Can you clarify the *reviewer* workflow?

webchick's picture

Right now, to review any change against any project in all of Drupal.org, here's my workflow:

wget http://drupal.org/.../patch.patch
patch -p0 < patch.patch

Done. Doesn't matter if it's a core patch, a patch against Views module, a patch for Zen theme... any improvement across any project I can grab a copy of, play around with it, and post my results.

What you describe here sounds like I would need to know on a per-author basis what their preferred workflow was (GitHub, $10/month DreamHost account, etc.), how they structure their branches so I arrive at the proper code, and then change my steps as a reviewer accordingly. That sounds like an absolute nightmare.

First of all: As I suggest

mike booth's picture

First of all: As I suggest elsewhere in this thread, I wouldn't be so quick to think of "switching to Git" as synonymous with "abandoning patches". Patches are excellent things. They sit there on one page (though, admittedly, that page may be very long). They don't have external dependencies. They don't change over time. And by this point webchick's keyboard can probably apply them all by itself.

So I think that Git links should be posted to the issue queue along with a patch... or that a bot should take every Git link and immediately generate a patch from it, according to a protocol that you have to conform to if you want your patch to be reviewed. The patch is canonical. The Git links are only there to make it easier to tinker with the patch, to see the big series of commits that generated it, and for when the final commit is made. (So that the commit it can be made as a series of separate little patches, with individual contributor credit and individual commit messages, rather than as one big patch.)

In other words, I vote to preserve the following option for webchick:

wget http://drupal.org/.../patch.patch
patch -p0 < patch.patch

Now, suppose that nobody listens to me. Or suppose that webchick comes across a big honking patch that simply demands a review using Git.

Among other things, Git is a protocol. You don't have to know where git://github.com/jacobSingh/drupal.git is hosted. You just need to know the URL. Fretting about whether or not that URL is from Github or from some DreamHost account is like fretting about whether or not Google's server is running Ubuntu or Fedora, or whether or not it's in Denver or in Queens.

The question of "which branch has the patch, and what commits on that branch constitute the patch" is only a little trickier. We do need to establish a convention. The simplest is: Everything between the start of the branch (which lies on the series of commits that lead up to HEAD) and the end of the branch is the commit. And, by no coincidence, Git gives you a one-liner for that. To view the patch:

cd local_drupal_checkout
git pull   # or whatever it takes to make sure that your local HEAD is up to date
git co master   # or whatever branch is the main one
git remote add REPO_URL_TO_REVIEW
get fetch REPO_URL_TO_REVIEW
git diff HEAD...REPO_URL_TO_REVIEW/BRANCH_TO_REVIEW

If you like what you see, try out the patch.

git checkout -b reviewing_patch_XYZ REPO_URL_TO_REVIEW/BRANCH_TO_REVIEW

Now the code that the submitter tested is right there on your machine. The whole docroot. You can test it if you like. Or you can use git log to look at the list of commits that constitute the patch. You can review individual commits with git show. Et cetera and so forth.

If you like the results you might want to know if the patch applies to the current HEAD. The branch you're testing may be based on an earlier version of HEAD. I'm pretty sure that attempting a reroll is a one-liner

git rebase master   # or whatever is the name of the main branch

If that succeeds, you can test the result. If it vomits up conflict messages, you can try to resolve the conflicts in typical ways. Or you can give up and tell the submitter to do the reroll themselves. ;)

Awesome! Thanks for this!

webchick's picture

This really helps visualize how the collaboration process could work.

And yes, for at least "phase 1" we wanted to stick with sharing code changes via patches. One humongous learning curve at a time. ;) And in terms of raw numbers, we have far fewer module/theme/core committers than we do patch reviewers, so I think this makes sense. We might stay with patches for reviewers long-term, who knows. I think the discussion here though is more about focused around facilitating changes worked on by lots of different people at once.

But on the checkout side, I guess I was concerned that the access permissions for different public-facing repositories might be different. For example, to check out from Drupal.org cvs, it's anonymous:anonymous. On other projects' repositories, I've seen it be anonymous, no password. Still others, I have to register an account in order to check out code (dumb, but true).

So does git not have this same sort of access control around cloning branches? Is it automatically a free-for-all? Once you get the URL to the repository you have absolutely everything you need? Or is it still possible that on GitHub I do it one way, on Gitorious I do it another, and on git.microsoft.com still another? I'm quite happy to have this concern be rendered moot, and just be FUD from my ignorance of Git. :)

On all Git hosting services I

Pisco's picture

On all Git hosting services I have come across, you get an URL and you're able to clone the hole thing, no password, no username.

git clone git://github.com/jquery/jquery.git

or

git clone http://github.com/jquery/jquery.git

Almost all services offer the possibility to clone over HTTP. Now this are public, read only URL, the owner of the repo usually pushes to it over SSH using his SSH-key.

As mike booth pointed out, with the workflow I proposed you can go on working with your patch-workflow just as you do now. In the early phase because people submit patches, in a later phase because patches are fetches automagically in to the issue queue (I start believing that this is an important feature because the public repos may disappear over time and if we have a local copy of the proposed changed on d.o, one can easily go back a look at how things evolved, be it for historic interest or whatever)

Yeah, that could work for patch reviewers

webchick's picture

If a patch was auto-generated and automatically appended to the issue whenever a URL was entered, the workflow of patch reviewers would remain the same, regardless of changes on the back-end or where the repo was located. And storing copies of those are indeed important historical artifacts. I hadn't even considered the fact that we might lose a huge chunk of our project's history if we went the fully distributed way. EEEK! Totally not acceptable. So yes, move that feature up from a "nice to have" to "absolutely fricking critical." :)

However, my concerns remain for the workflow for patch contributors. Unless I'm mistaken, these folks do indeed need push access to repositories in order to collaborate with others on fixes, which in turn does actually mean a log in for every possible Git host site. Or else they work in their own sandbox somewhere else where they have commit access already, but then the main "maintainer" of that improvement has no fricking clue that there's work happening elsewhere until they get a big-ass merge request with no context. Exact same issue we currently have with the core themes atm, only instead of that happening in an isolated issue, now it'd be happening everywhere.

Again, more than happy to be proven wrong if I'm missing a clue-bat here.

A question too about history, since that's so incredibly important. Right now, any line of code in all of Drupal, any module, any theme, etc. can be resolved back via cvs annotate to the issue ID where it was discussed, the drupal.org usernames of the people who participated in it, the drupal.org username of the person who committed the change, etc. If work is being aggregated from this repo and that repo and the other repo, don't we lose that history?

re: history

adrinux's picture

As I understand it it's optional whether the commit history is pushed or not. I assume the same it true of pull and merge. The history can travel with the code. So no, we don't necessarily lose the history.

ps @webchick Isn't it about time you downloaded git and had a play ;)

re: patch workflow for contributors

adrinux's picture

Well, maintainer and contributor can still pull from each others public repos and keep their issue branches in sync, the problem is one of communication. I do think it's much better to have this all going on on d.org repos, where we can provide a web based overview of activity and interact with the issue queue too.

But at the end of the day we have no way to stop people using other services of they find that workflow suits them better. The only way forward really is integrating external services but encouraging the use of d.org by making the experience smooth so people want to use it by preference.

Yep, agreed...

webchick's picture

My goal is not to stop people from doing things outside of drupal.org (i.e. I highly doubt that Development Seed is going to drop their GitHub repo), but to set up an in-house "default" workflow that 95% of our community can use, and those 5% more advanced users can also push their changes to when they're done doing their stuff in whatever other workflow they're most comfortable with (including Bazaar, etc.).

And no, I can't try Git yet. :) I need to remain ignorant for as long as possible so I can keep the newbie perspective. :)

Newbie perspective. Ahhh!

adrinux's picture

Newbie perspective. Ahhh! That makes sense :) As does all the rest (apart from speaking for dev seed).

I suspect I don't quite

Pisco's picture

I suspect I don't quite understand what exactly you want to achieve and how it was until now. The picture I have mind is the following:

  1. Alice finds an issue with $module
  2. Alice opens a ticket at http://drupal.org/node/add/project-issue/$module
  3. Alice checks out the $module
  4. Alice starts working on a patch while checking the status of the ticket periodically
  5. Alice submits her patch to the ticket
  6. Bob downloads the patch, reviews and changes it a bit, he the re-submits it to the ticket
  7. Peter and Alice review the latest version of the patch and agree that it is ready for production
  8. Linda, the project owner, applies the patch upstream

I suppose this is how it's working at the moment (please correct me if I'm wrong!). Between 2. and 5. no one other than Alice may know that she is working on patch. Between 6. and 7. no one may know that Bob is working on reviewing that patch. This is what I observe on d.o. With Git we can preserve this exact workflow, which I think is what we should do in a first step.

The same with Git (basic user):

  1. Alice finds an issue with $module
  2. Alice opens a ticket at http://drupal.org/node/add/project-issue/$module
  3. Alice clones $module
  4. Alice starts working on a patch (hopefully in a Git branch) while checking the status of the ticket periodically
  5. Alice submits her patch to the ticket
  6. Bob downloads the patch, reviews and changes it a bit, he the re-submits it to the ticket
  7. Peter and Alice review the latest version of the patch and agree that it is ready for production
  8. Linda, the project owner, applies the patch upstream

not much of a difference, is it? The same for more advanced users:

  1. Alice finds an issue with $module
  2. Alice opens a ticket at http://drupal.org/node/add/project-issue/$module
  3. Alice clones $module and puts a copy on GitHub for backup
  4. Alice starts working on a patch in a Git branch for that issue while checking the status of the ticket periodically
  5. Alice pushes her branch to GitHub
  6. Alice places a merge request on that ticket, giving the URL to her repo and the branch name. The ticket infrastructure fetches the branch, that means every single commit object that Alice made to her issue branch, and stores them on d.o. (have a look at git format-patch)
  7. Bob downloads the patch(es), reviews and changes them a bit, pushes his version to his public repo, a makes another merge request.
  8. Peter and Alice review the latest version of the modification and agree that it is ready for production
  9. Linda, the project owner, applies the patch(es) upstream, either by downloading the patch(es) from d.o., or by pulling them from either Peter's or Alice's public repo.

If seen complaints about

big-ass chunks of code which in the end are too big to review

I think merge requests would mitigate this problem and if an issue needs a lot of work, Linda, the project owner, can create an issue branch on the main repo where others can track the current status. But the module owner has to remain the dictator or integration manager for the main project repo.

I don't think it's neither realistic nor necessary wanting to build an infrastructure of the like of GitHub, at least not in a first or second phase. We should try to find a way to harness these services and ensure that d.o. remains the central place where people collaborate and exchange. For someone looking for a module, it has to be self-evident to look for it on d.o and on d.o exclusively.

Does this make any sense?

Mostly right

webchick's picture

You are mostly right. The way we do patches now and the "Phase 1" Git conversion will work exactly like you describe. People work wherever they're most comfortable and upload the results of their changes as a patch, which will then be applied and reviewed by a patch reviewer. And for most simple patches, it doesn't really matter where the work happens, because they are only 1-25 lines of changes, done by a single author, and reviewed by a single patch reviewer and the core committer.

My workflow concerns are on the big patches. Fields in core. New database abstraction layer. D7UX. New core theme. These aren't 1-25 lines of changes done by a single author, these are 500-1000+ line changes done by 5-6 authors. These are the changes where CVS and our patch workflow have completely failed us, and result in people moving off-site to places like Launchpad, GitHub, etc. to collaborate.

When this happens, the community loses transparency into what's going on, and misses out on a huge learning opportunity to see the changes and discussions as they evolve. Patch reviewers need to learn new tools in order to review, so they don't, and end up getting the patch in a more-or-less finished state (since rolling patches is such a pain in the ass) when it is too big to review. People who want to use the same core workflow they use for the 95% of their other patches (described above) need to learn new tools, and get logins on new websites in order to help out with the efforts. So as a result, they don't, and so a huge portion of our contributor base is blocked form helping. And now only that original team of 5-6 developers has any fricking clue what that code does, and is on the hook for maintaining it for all eternity.

People flocking to external tools for collaboration has already had these kinds of disastrous consequences for Drupal 7 core. If we encourage this same workflow for all of our changes, we are effectively out-sourcing the Drupal community's entire mindshare and project history, and segmenting our developers into tiny little silos off in the ether some place instead of in a central place where they can help each other. That will absolutely destroy our community.

What we need to do instead is make it easy to collaborate on all changes, both large and small, here on the mothership of Drupal.org.

Okay, is that the consensus

Pisco's picture

Okay, is that the consensus for the everyday work, the small patches that happen on d.o?

As for the big patches, could you ask the "Fields in Core" people to explain why they chose GitHub and how their workflow looked like? I think this would help determine what the important features are, and how we could implement them on d.o.

It could well be that the fact of having Git on d.o is enough to make people move back to it. After all GitHub doesn't offer that many features that are interesting or even useful for such situations. In fact the only thing I see is the "merge request". I imagine they still needed an issue queue with steps and actions that had to be taken, tasks that had to be done.

What Gitorious and GitHub do

Pisco's picture

Today I've had a look at what Gitorious and GitHub do:

Please read carefully to understand the difference between the two. I think the Gitorious way suits our needs better and we should be able to adapt it for d.o., I think it would be a brilliant solution. Let me go through the steps:

  1. $project is host at http://git.drupal.org/project/$module
  2. Alice finds an issue and opens ticket
  3. Alice clones $module and starts working on a fix
  4. Alice places a merge request thereby specifying the URL and branch of her publicly available clone
  5. the d.o software fetches that source branch and creates a corresponding branch in the main repo of $module at git://git.drupal.org/node/add/project-issue/$module refs/merge-requests/$issue_number-$merge_request_number
  6. Bob can easily fetch that branch from the $project repo for review, comment on it, or place his own merge request
  7. Alice can, if she reviews her patch, push her new version to git://git.drupal.org/node/add/project-issue/$module refs/merge-requests/$issue_number-$merge_request_number thereby creating a new version of the merge request.
  8. When the fix is ready, Lisa, the project owner, can merge the merge-request branch into the upstream branch.

An important point is that only Alice can push into the merge-request branch, thereby creating a new version of the merge-request. This ensures that the merge-request doesn't get messed up.

What's cool about this?

  • The complete history of an issue and its patches lives on d.o.
  • The complete history of an issue and its patches is in the Git repo
  • Merge requests are versioned, and the different versions can be viewed on d.o
  • ... I'm sure I'm forgetting some cool aspects ...

What are the the downsides?

Everyone who want's to issue a merge-request must have a publicly readable Git repo, this can be anywere ... GitHub, Gitorious ... but it is a requirement.

How can we resolve that downside?

At some point in the near future we can start hosting clones on d.o! With that we the ability to provide easy clone functions on d.o: "clone this on d.o" with one click. What are the consequences for the d.o infrastucture: we need to implement the functions and provide the infrastructure. The computing power requirement shouldn't be too big because Git is very clever, when the repository to clone from is on a local machine:

The files under .git/objects/ directory are hardlinked to save space when possible. ( see the man page)

Even if we do not use that hardlinking feature because we wan't to be able to move repos around easily, it should not cause a big impact because such clones are not accessed often. Hosting clones on d.o doesn't need to be high priority, because we don't really need them for the first phases.

I think this would be a brilliant solution that should satisfy most peoples wishes. And it should even work for big rewrites like the ones webchick mentioned earlier.

I think this solution would cause quite some work, but we could draw some inspiration from the Gitorious project.

Okay, I'm beginning to

mike booth's picture

Okay, I'm beginning to understand the problem, but I'm still not sure what the stuff being discussed on this page has to do with the solution.

Two aspects of the problem stand out from webchick's description. One is that we need to have a common toolset, a lingua franca, which all Drupal developers can learn, after which they can collaborate on any Drupal development project. The other is that giant changes to Drupal Core are arriving as "a massive patch at the end that's totally impossible to review".

But didn't we just solve these problems? ;)

Let me outline my understanding of the proposed solution, after which webchick can correct me:

  • On Days 0 through N, we create an official Drupal Core git repo, hosted at Drupal.org, and give push ("commit") access to webchick and dries. (For contrib, we create official Drupal contrib git repos for each module or theme and give the committer(s) access.) Then we adjust the d.o core infrastructure to use these things, so that the testbot tests against the official Git repos and the release system makes releases based on tags and branches in the official Git repos.

  • On Day N+1, the core committers stand up and announce: "From now on, we will continue to accept very small, coherent patches in the issue queue. But if the patch is beyond a certain size or scope we will insist that it be (a) composed of a series of small, logical sub-patches; (b) based on a series of commits from a branch of a publicly-accessible Git repository that is derived from the official Drupal Core repository; and (c) accompanied by a URL, a branch name, a starting tag, and an ending tag in that repository so that reviewers can access the repository with Git and pull down the individual commits for review."

    ( I note for the record that I'm pretty sure that's redundant -- to specify a patch from a Git repo should require only the repo URL and a pair of tags that bracket the patch. And even the first tag may be optional, if the branch is rooted on the main line of development and the software can deduce that. But these are implementation details.)

    (Optionally, we can write a bot that can take the URL of a public repo, a starting tag, and an ending tag and compose a patch out of it, rather than requiring submitters to generate and post the patch. But that's an optional feature, not especially important. I think it is important that every submission to the issue queue be rendered immediately into a patch. For one thing, it immediately establishes a canonical copy of the proposed patch, one that won't have external dependencies. For another, we want people who aren't Git users to be able to read or test patches -- besotted as I sometimes am with Git, I know how much harder it is to create a Git remote and a tracking branch than it is to type git diff trunk > patch.patch, patch -p0 patch.patch.)

And... isn't that it?

There is no urgent need to build a d.o infrastructure for hosting people's personal Git repositories in public. Contributors of small patches don't need it. Reviewers of small patches don't need it. Reviewers of big patches only need it if they want to personally rework the patches and put up the results in Git repo form instead of uploading a patch for a more experienced Git user to apply to their repo. And, for anyone who does need it, Git hosting exists, several times over, and free of charge for open-source projects like Drupal. There are multiple free-as-in-beer services that provide Git repo hosting; if you're allergic to those there is free-as-in-freedom software for putting up your own server.

There is no need for every contributor's public Git repo to live on drupal.org. There is no need to write an elaborate set of bots that automatically create working directories on some central server for each developer, or each issue, or whatever. Git is a distributed protocol. This is not CVS anymore! You don't need a hall monitor to police who gets to make a commit! Everyone gets to make a commit -- to their own server, wherever it may be, after which they send out a URL and a branch name for review, and every reviewer runs approximately four commands:

cd my_local_drupal_7_checkout
git remote add repo_with_nifty_patch URL
git fetch repo_with_nifty_patch
git checkout -b patch_for_issue_foo repo_with_nifty_patch/BRANCH_NAME

Yes, that's a little bit nasty for beginners, but by next week the Drush folks will have it down to one command: drush fetch-latest-patch ISSUE_NID. That's not an infrastructure problem. It's a client usability problem.

Hey, did I just say that you

mike booth's picture

Hey, did I just say that you need to specify four things to specify a commit? That's wrong. Make that two things: The URL and the end of a branch. Git has a one-liner to compute the diff between the start and the end of a branch, given the name of the branch and the name of the main-line branch.

Sorry, I'm learning this as I go along. ;) It is so great to finally learn how to use Git outside of the Git-SVN sandbox. To say nothing of the Git-CVS sandbox.

Ok, starting to understand...

webchick's picture

Thanks. Starting to understand this now.

For context, here is my Ultimate AwesomeTastic Dream Of What Drupal.org Collaboration Could Be (tm) in Phase N+11:

Iterative progress

This would immediately solve some of my (and our community's) current biggest problems:

  1. We would gain transparency into what's happening between issue replies, and who's involved. No longer do we get some big-ass chunk of code in the end too big to review; instead we can see its evolution, and we can jump in at any time ("No, no! NOT THE SNARF BLATZ! ANYTHING BUT THAT!")
  2. Helping with development or testing at any point during this process is just one click away! (I have no idea what's on the other side of those clone/test links, btw. Could be as simple a pop-up auto-populated with some commands to copy/paste.)
  3. And then ideally (not pictured) there would be something deemed "good enough" for wide-scale testing could be pushed as what we currently term a "patch". This would show up as a reply and ping everyone's "My issues" queue.

If this stuff is going on in every possible git place in the entire planet, my concerns are as follows:

  1. We lose the transparency into the decision-making process, without having logins to every possible Git hosting service on the planet, and familiarity with how they display each one's commit messages.
  2. We lose the ability for folks to collaborate together on important initiatives, without without having logins to every possible Git hosting service on the planet, and familiarity with how to clone their code in each one's particular way.
  3. We lose the ability for patch reviewers, our single most treasured resource, to collaborate with developers until it's too late, without without having logins to every possible Git hosting service on the planet, and familiarity with how to test code in each one's particular way.

If we lose this centralization of our community collaboration, we also lose all the mentorship and learning that goes along with it, which is our community's single biggest strength. And we make it far, far more difficult for patch reviewers, our single most treasured resource, to get involved. That is my biggest fear with this move, and something I want to stop at all costs.

So. Can we have both fully decentralized development without losing the very centralized parts of our community that make it so great? I don't really see how it's possible, but am open to ideas.

Paranoia origins

webchick's picture

Incidentally, the origin for this paranoia is based on actual experience going on in the D7 issue queue right now.

I just tonight had to go and ping all three D7 core theme issues because there've been no updates to the issues in terms of patches to review for several days now on any of them. It's possible, perhaps even probable, that all of them are making progress at a fantastic rate, but me as both a core committer and first and foremost patch reviewer, have zero visibility into this process. As a result it sinks off my radar, along with the other reviewers, while all the while growing bigger and bigger (and thus more and more un-reviewable).

If this is what the future of d.o / git integration looks like, we're all doomed.

exactly

catch's picture

The number one workflow which git disrupts is that of the generalist core patch reviewer. Which is me, webchick, and several other people who check the core issue queue at least once a day, and review issues just for the sake of maintaining the core issue queue (as opposed to specific subsystems we look after, although most of the same people are doing that too).

That workflow is:

  1. Check 'my issues' - see if there's any updates.

  2. Check the 'patch queue' - see if there's any updates.

  3. Check some other queue like pending bugs, or critical to see things which have fallen through the cracks from the first too.

For any of these, click on an issue with new comments, see a new patch in there, chances are I'll have a quick run over the patch, usually in dreditor too. My process is usually to scan through the patch very quickly and look for obvious errors (or none), then post back with an update or status change if there's anything noteworthy.

If I have to click on a link to an external site, then it's likely the diff will be presented differently from plain text or dreditor, that already makes my patch review workflow about 70% less efficient. Trying to read cvs diffs on drupalcode.org is never as easy as reading a patch file for me, because I do it less often.

If there's no patch posted at all, then I assume nothing has happened and let it rot, unless I really, really care about the specific issue, which frankly isn't a very high proportion of core issues.

Anything which involves reviewing commit logs on an external site, clicking some buttons to view a diff myself, or something else which is different to "ooh, there's an update, click, post some feedback" means your patches don't get reviewed by webchick, me and probably 5-10 other people who are responsible for numerically the majority of code reviews to core. And then core development will get stalled just as bad as having to re-roll patches 500 times.

And yes this is just as much an issue for big patches too, because as soon as people can keep track of merges in their own repos, we'll see more people posting big unreviewable patches just because it's easier to do than maintain 5 interdependent patches (I know git can do stacked patch workflow but I bet only < 5 people use that the first three months we switch to it).

Well said, catch. I 100%

sun's picture

Well said, catch. I 100% agree.

This likely means that - if there will be any further deep integration for branches and issues - then we need the system to expose (possibly auto-generated) patch files, we can review.

All of this can be probably understood as the more detailed explanation of what I mean with Patch reviewers and developers need proper targets.

I won't approve a patch that I cannot review properly. And it's unlikely that I'll be working on a changeset without knowing whether the base I'm working off is the most current/agreed on. These are targets.

Daniel F. Kudwien
unleashed mind

Well, this nifty picture will

mike booth's picture

Well, this nifty picture will require more than one response, no doubt. But before I go to bed:

You don't need "a login to every possible Git hosting service on the planet" to monitor the contents of a public Git repository. Not if you define "a public Git repository" as "a repository that exposes a read-only, cloneable git:// URL that doesn't require a login". Which is how the ones I know of work. I've never had to log in to a Git repo, except to my own, which I can push to (and for those I use SSH keys).

Once you have such a URL the steps are all standardized. Here's Mikl Hogh's legendary Drupal mirror on Github:

git://github.com/mikl/drupal.git

Here's how you clone his repository:

git clone git://github.com/mikl/drupal.git

Here's how you run a test on a particular branch in his repository:

git clone git://github.com/mikl/drupal.git
git checkout -b branch_to_test origin/BRANCHNAME
[run tests]

Here's how you see what commits he's made lately. (The ones that he's bothered to push to his public repo, that is. This is Git; you can make all sorts of local changes and commits on your laptop that will never be seen by anyone):

git clone git://github.com/mikl/drupal.git /repos/mikl_drupal
cd /repos/mikl_drupal
git log origin/BRANCHNAME

And periodically thereafter...

cd /repos/mikl_drupal
git fetch
git log origin/BRANCHNAME

Obviously, these things could be done in fewer words by higher-level Drush scripts, or wrapped in a gorgeous web GUI with a cache (which is, of course, what Github has done).

Unified Web GUI for reviewing logs is an absolute must

webchick's picture

The vast majority of our community are not command-line junkies. If the answer to the "anyone can just paste in a URL to any git repo on the planet!" is "and now our patch reviewers need to read the output of the git log command," then we're sunk. :( That would raise the barrier of entry high enough to stop basically all of our non-technical patch reviewers (UX team, design team, "I have IE6" team) from contributing.

And I realize that individual IDEs/GUI clients might have slightly prettier output, but then this creates a totally inconsistent experience for every single one of our patch reviewers, who can no longer help one another through the basics because I'm on Eclipse and I go here, you're in SmartGit and you click there, etc. Again, a single set of tools that all of our contributors can use to work anywhere is the key to our collaboration.

So given the choice presented here, I'd much rather put work in on the implementation side to make drupal.org able to output this stuff itself. Reinventing the wheel, or not. I don't see what other choice we have.

Indeed, a Web GUI for reviewing logs is a must

mike booth's picture

... but, fortunately, once we get the official Drupal repos into Git several things will happen.

  • Five minutes later, someone will mirror the official Drupal repos into Github. And then all the logs will be reviewable through a relatively polished, publicly-available, state-of-the-art web GUI, where you can also download any branch or tag as a .tar or .gzip file, browse the source tree with one click, and look at awesome colored diffs of every patch.

    (For Drupal core, as you can see, my "five minutes" estimate is vastly pessimistic, because this already happened two years ago. ;)

    (No, there is nothing we can do to stop this even if we wanted to. To get users to leave (e.g.) Github one must lure them.)

  • Drupal.org will put up its own Web GUI for reviewing the logs of the official Drupal project repos. This is important to do, no question: There needs to be a set of logs that are easily recognizable as the official ones.

    However, most likely the official page will be usable but not quite as nice as Github's. It will, after all, be designed by otherwise busy people who do not run a startup company that lives and dies by the quality and usability of it's Git GUI. It will probably strongly resemble the default output of an open-source tool like Gitweb, as used by the official Git repo. So folks who are in the know will probably tend to gravitate to Github's GUI instead, unless they need to check on some odd discrepancy with the Github mirror.

  • Some day in the future, someone will invent a Web UI for viewing public Git repositories that is better than Github's. This "someone" might be Github itself, or it might be the Gitorious folks, or it might be the official Git project, or the makers of GitX, or Drupal.org. Whoever. But on the very day that this UI is unveiled, Drupal's public repos will probably be visible through it. Because they are public, viewable and mirrorable by anyone, and because if there's anything a DVCS is good at, it's mirroring. So everyone who likes a better UI will probably migrate to the latest and greatest Web UI as soon as it is available.

Meanwhile: I also agree emphatically that "Git for non-technical patch reviewers" is a problem worthy of attack. The problem has been neglected, the need is urgent, and the potential payoff is huge. Which is why I'd encourage people to attack this problem by making maximum use of the tools and the leverage that the existing ecosystem is providing.

This is a great diagram,

sdboyer's picture

This is a great diagram, webchick, but I think it's not quite the ultimate ideal for us to aspire to. Two reasons:

1) I'm scared of the performance implications of searching out all the updates that have been made and interleaving them with comments. Now, that is thinking about it from a sandboxes-only, no-repos-for-issues perspective, but it my first impression is that it could get a little nuts.

2) I think, especially for complex patches, your signal-to-noise ratio would get shot to hell. Are those commits all on the same branch - that is, are they compatible with one another? Are they sequential? Are we showing just branch tips, or every commit that gets made by someone in the interim between comments?

I'll put it in broader terms: consider how that diagram would change the nature of the issue queue. Right now, every bit of info (comment) on an issue is a bit of information that has been "curated" either by a person, or the testbot. Each comment, therefore, contains information that is unquestionably relevant to the discourse in the issue itself. If I'm being a good contributor, though, and making small incremental commits, then every commit I make isn't necessarily itself relevant to the discourse. If people need a commit log, they can check that out.

I think the better approach is to extend the comment form such that people can flag a particular commit as being relevant to the comment being made. With a commit specified, patches can be auto-generated, tests can be run, cloning information can be offered - and the person can still use the comment itself to write out a description of what they think is relevant to the 'patch' they've just posted. Just like we do now. And it means our issue queues won't suddenly start looking like RT.

There is no urgent need to

sdboyer's picture

There is no urgent need to build a d.o infrastructure for hosting people's personal Git repositories in public.

I just want to pick this line out to highlight it: YES! While having sandboxes is crucial to later phases, we do not need them right away. We can, for the moment, safely focus on just converting everything over, and the issue queues can still rely on patches. As we add the additional features later (sandboxes, issue queue integration, etc.), they can integrate in seamlessly.

Ease of implementation:

Pisco's picture

Ease of implementation: Because a public Git repo can live behind any URL on the net, and because there is no need to give multiple people write access to shared repos, this system can be set up with a minimum of work. We needn't even give users official Drupal-hosted repos -- users can submit URLs and branch/tag names for any public repo they have, hosted wherever they like. The first version of this can go live tomorrow -- just invite users to submit URLs and branch names together with the typical patches that they post to the issue queue. Then, at a later stage, the testbot can be augmented with the ability to fetch commits from a remote repo and generate patches from them.

I agree 99% with this approach. I think this is how a DVCS is intended to be used and my opinion is that d.o doesn't need to become a fancy repository hoster with fancy "clone me on d.o", "watch me" and so on features. By the way, if we really wanted "watch me" ability, we could implement an RSS feed for the project repo, i.e. as an extension to git-daemon. As for the clone you need nothing special to be able to clone a repo, and if a want to publish my clone I can do it wherever. I think it is important to understand the possibilities of a DCVS and actually use them.

The 1% where I do not so much agree, is that I wouldn't encourage people request clones by posting the URL to their repos, I think that'll end up in a mess. I'd encourage them to post patches until we have implement an automated why to make merge requests. That merge request should instantly fetch the commit objects from the given ref and provide the requester instantly with feedback, i.e. was the given URL available, did the given branch exists ... and so on.

I think we should keep things simple for a starter, that means give project maintainer access to a central project repo hosted on d.o giving only them write access and preserving the patch workflow as we have it now. Once this is done, we can start implementing the new features incrementally. I think there will be enough work adapting the existing infrastructure/code to play nice with Git.

I agree 100% with your 99%

mike booth's picture

I agree 100% with your 99% agreement. It is indeed "likely to end up a mess". External repo URLs aren't reliable. But I think they're as reliable as they need to be... when they're accompanied by a patch. Especially when the alternative is to spend the time to build our very own unreliable infrastructure.

This discussion as gone out of control

Damien Tournoud's picture

The debate should not be about the "why", but about the "how".

One of the strength of Drupal is that we are a strong community that promotes mentoring and collaboration. We fear (for good reasons) that if too many people host code outside drupal.org, the community will fragment itself and will lose its momentum.

The equivalent of "don't duplicate an existing module functionality" in DVCS language is "host your personal branch on drupal.org, and do your best to merge it back into the mainline as soon as possible, by collaborating with the current maintainer".

We don't want 100s of crappy forks of Services or Location floating around the net. We want people helping the main module getting better.

We are committed to deploy the tools necessary to keep both the development and the discussion on drupal.org. This is what this page is about.

Damien Tournoud

Good as it is?

cha0s's picture

I think the question is already largely how; how will we encourage people to contribute? Patches? Hosted branches? Can we make the experience seamless and consistent across our wide array of contributors?

We can continue having people contribute patches, and lose out on a lot of the good of DVCS, but still gain from not using CVS. Or, we can use the advanced tools these systems put in our hands, like branches and merging and forking each other (all night?).

I mean, I like the idea of being able to have parallel branches along a main trunk, that's a key to serious growth, imo. I hope it isn't too hard for people to pick up because I had an extremely hard time trying to get Ryan and Lyle to embrace branches whilst working for Ubercart; they basically considered it too complex. Patches are fine for a lot of things.

Maybe there is a middle ground we can find after all. :) What if we could have the option of hosted repositories, but have a way to specify exact version a patch is targeting, and drupal.org can (optionally) create a branch and apply the patch to it, totally automating the basic checks of whether a patch will apply, while of course being able to run any tests against the project's test repository. The issue post would have an attached patch as now, but also a link to the repo created for the patch.

I don't think a patch workflow is the best that can ever be done. It's fast and it can be very easy and quick between pros. Of course people are raising the issues, what about larger changes (Ubercart price API)? It's no doubt these would benefit from parallel maintained branches. Sure, provide a really dead-simple graphical diff (web) viewer. We have the diff module but I'm assuming (and I have no knowledge here so sorry if I'm wrong) diff module can't handle things like diffs generated from VCS.

I think too that in the same vein as 'attach a patch and a repo spring up around it automagically' we should also do the reverse... have a submitted branch automatically submit patches, somehow triggered by the branch maintainer (web interface, again? This raises questions about collaboration on parallel branches and who gets permission, etc.)

So, just some thoughts.

alexanderpas's picture

Each issue gets a branch in the main project repo, to which all collaborating users have access, in addition, additional branches can be made of this branch for alternate solutions.

Pro:
- One centralized place for collaborating for a solution for a given issue. (as each branch still has a single common branching point from the repo.)
- No need for fighting when two different solutions are availble.

Cons:
- Loads of branches (altrough this is more of a viewer problem.)

The way i see this is that our main entry point to visually see the different branches is actually the issue queue itself.

Webchick: Can we have the

andb's picture

Webchick:

Can we have the best of both worlds? Anyone can branch to their sandbox, but the project page is aware of this and says "Other branches you might be interested in..."

Fago is probably right that this level of talk belongs in the groups thread, so Ive moved here from the access thread (which depends on the outcome of this discussion) at http://drupal.org/node/714034.

Yours is a great question and the answer is yes. What you are talking about is the GUI for the implementation. The file listing, the interface to the "sandbox" could be on a user's page. I also suggest that it not be the module maintainer who makes the sandbox, but the user who wants to contribute to the code. I advocate storing in the same repository, the same file location as the main module code, which GREATLY helps those people who want to use external tools. Once its separated its work to combine, to track. But if its together in a single repo, its various branches can be displayed in numerous ways, including exactly what you suggest here. Gitolite already supports whats required, as concerns personal branches and branch level access, so the concept is defined, its not reinventing the wheel.

I fully support an "individual repo per project, with issues and sandboxes as branches approach". If you are stuck in cvs it might seem wrong or difficult, but after you use dcvs for even a short while, it will be clear it is the best solution for ease of finding all work on a project, speed of operations, and server resources required. The downside? I can't see a single one that can be solved with good UI design (for example, where are all MY branches).

The "con" listed on the top level post for One repository per project, one branch per issue is really a non issue. The module maintainer or the issue founder could make an issue branch. Maintainer could add write users as he sees fit to main or issue branches. Do you disagree with the dev path? You make your personal branch and prove what you want with code - just like you would with a patch! Only now, as you develop your patch I dont have to search for a new dload to patch my site with, I just git checkout to switch to my local copy of your branch and git pull to make sure I have your latest work. What could be easier?

I have to strongly agree that because of the nature of dvcs, we will always use the tools to interact with the repo that we like best - whether github, smartgit or CLI, it doesnt matter. The purpose of the d.o repo solution shouldn't be to make the best browser in the world, but it should be to keep development together. Thoughts are kept in the issue queue, code from anyone who wants to help in branches of the same repo. Personal branches can all be specified with specific terms, so its always clear which are the module maintainers "official" branches.

No, we will not put random

Damien Tournoud's picture

No, we will not put random branches of a project inside the same repository. That would make things a lot more complex to manage, and more fragile, but most importantly, it is completely unnecessary.

I don't know any project that does that. Everywhere in the git world, you have separate repositories for all the branches. For example, the "forks" you do in Github are separate (cloned) repositories. Of course, it is necessary to put in place some tools to track which repository is the fork/branch of another. But that's the easy part.

Damien Tournoud

Damien is 100% right, here.

sdboyer's picture

Damien is 100% right, here. There is no reason, at all, to put unrelated branches inside of the same repository. It's swimming upstream against how git works.

Agree that this sounds terribly confusing...

webchick's picture

If I check out the "Views" project from Git, I want to see merlinofchaos-sanctioned code. I do not want to see random futzing around by joe schmoe playing around with a Views port to Python. This seems like it would only cause terrible confusion on the part of our users, and a horrifying increase in support requests to the Views module issue queue ("So I tried the views-port-to-python branch cos that looked interesting, but I'm getting a compile error on line 124...")

I've always hosted my own

andb's picture

I've always hosted my own repos, so never had to deal with a truly staggering number of branches. Is this how github deals with it? Then I'd defer to their combined wisdom and support cloning their solution. In fact, has anyone spoken to github about making partnership to, in essence, "white label host" everything that is needed for d.o?

The approach is just wrong,

sdboyer's picture

The approach is just wrong, period - Github agrees, but we don't need their rubber stamp to prove its a bad idea. It runs counter to everything that modern dvcses try to do this - and especially git, given how it multiplexes the file structure.

Please see the REAMS of discussion in the original VCS thread about why external hosting, or even an external system run on our own servers, is really not an option.

More explanation of Git

mike booth's picture

I see a lot of suggestions that look like this:

"It's much better to have this all going on on d.org repos, where we can provide a web based overview of activity..."

I feel that this is based on a misunderstanding of what a Git workflow is like.

When I want to work on code in Git, I check it out to my local machine. Then I change it. Then I make a commit. Then I change it some more and make another commit. All these commits are on my local machine. Nobody else will know about them until I want them to, at which point I will consciously and deliberately push them up to a public repository where I have write access. (In my case, a personal repo on a service like Github.)

Git does not work like Etherpad or Google Docs. You do not necessarily get a steady stream of activity messages as a programmer tries one thing, then tries another. Unless the programmer decides to push up changes to the public every time they sneeze.

Moreover: You may think you want a notification every time a programmer sneezes, but this is not really what you want. Because either the result is an unreadable, jumbled, kitten-killing mess, or the programmer isn't working very fast, because (s)he is paralyzed with constant self-analysis -- before pressing any given key, (s)he is asking "Does this change logically belong as part of the commit that I'm about to make?"

Git largely solves this dilemma, but it does so in a particular way:

  • Private history in your local Git repo can be rearranged at will. You can have all kinds of messy fun. Commit one bad idea after another. Kill kittens left and right: Fix a bug over here, fix a bug over there, write a major update to node_load() for issue A, fix a spelling error, then work some more on issue A, then do some work for issue B. Try to make a commit in between each of these things -- but even if you forget to do so, you can fix it later.

    Then, three days from now, you can rearrange all these commits into a sensible order, merge the ones for issue A together, then split the one from issue B into two parts with separate, clearer messages. Maybe move the spelling fixes into a different local branch to be dealt with later. Then you push up the edited version of your branch, where every commit is clear and logical. You will look like some kind of organizational genius, and webchick will bless your name.

    This is why Git is awesome.

  • However. With great power comes great responsibility. Once you push a series of commits in a certain order to a public branch, the history of those commits is public. You can no longer rewrite that history. (Unfortunately, because Git was written by Linux kernel hackers, who tend to enjoy working without a net, the tool will not prevent you from trying. But the results will be broken, and they may well be broken in mind-bending ways.)

What am I getting at here?

  • Publishing your work is still a conscious act in Git.
  • The hard part of doing so is making your work worthy of publication. Git can't magically make your code sensible for you. You still have to do the writing yourself.
  • Once you've spent time making your code worthy of attention by a reviewer, the labor of pasting the commit's public URL into a form on the issue queue page and pressing "Submit" is the least of your problems.
    (Unless you aren't actually reading the issue queue. Do we really need to spend a lot of time building infrastructure to make it easier for coders to avoid visiting the issue queue? Do we want to encourage people to just make commits to their repos and trust that some invisible automated mechanism is telling the world about the results? Isn't that the problem we are trying to avoid?)

Here is what I would suggest:

  • Tomorrow -- literally tomorrow, though in fact one might wait until the day after the official Git Drupal repos are in place -- one could make a statement like this on any issue in the queue:

    "I see that this issue is getting complicated. Would the people who are working on this please post their Git repo URLs and branch names so that we can all follow along?"

    And people who want to share their work will post links to their public repos.

  • Now, if you want to see things that a programmer has been quietly pushing up to their issue branch... you can follow the URL and look, either using a Web browser or a Git client. (At least one of those will work.)

  • If a programmer decides to push up their work and wants to make sure that the folks following along on the issue queue notice the change, the programmer reposts a new URL pointing directly to the changed branch, with optional additional explanation. This is just how the queue works today, except with repo URLs and branch names as well as patches.

This is the simplest thing that can possibly work. And that might be enough to get us up to the really vital step: Giving the developers a few weeks or months to figure out how Git works before we embark on something more ambitious.

Here are some more relatively simple ideas for the future:

  • We may find that we want to formalize the process of signing up to work on an issue with Git by providing a tiny web form on each issue page that asks for your Git URL, issue-specific branch, and an optional GUI link to that branch. Then put a dashboard on each issue to display a list of the users who have signed up and where each user's issue-specific Git repo is.

  • We can make the dashboard code auto-check those Git URLs to make sure they resolve to something that looks like Drupal and a branch with a patch on it.

  • We can make the dashboard code auto-generate the patch from the issue-specific Git branch that a developer has registered and insert that patch into the stream of comments in the queue. It can do so whenever that developer (or, perhaps, anyone else) pokes a button on the issue page.

  • We may want a Drush script that somehow reads that issue queue dashboard, retrieves the links to people's Git repos, and assembles a local Git repo containing Core plus individual branches that are tracking each contributor's individual repo. At first glance I see no reason why this could not be a one-liner.

  • We may want to give programmers the option to take advantage of Git's post-push hooks (did you know that Git has post-push hooks?) to call a little d.o. web API that registers the existence of a new push and automatically updates the issue queue with a little post containing the commit message. That would be one step closer to webchick's dream interface.

# Now, if you want to see

Frando's picture

#

Now, if you want to see things that a programmer has been quietly pushing up to their issue branch... you can follow the >URL and look, either using a Web browser or a Git client. (At least one of those will work.)
#

If a programmer decides to push up their work and wants to make sure that the folks following along on the issue queue >notice the change, the programmer reposts a new URL pointing directly to the changed branch, with optional additional >explanation. This is just how the queue works today, except with repo URLs and branch names as well as patches.

Well, this is pretty much exactly what we have now with all these off-site repos on github. The problem is exactly your first point above. "you can follow the URL and look, either using a Web browser or a Git client" leads to complicated and inconsistent process. We have to explain to people how to find the recent patches, one has to "get" several different UIs, you have to leave the issue page and do a few more clicks to find what you want etc. You also never know exactly where the latest development happens, because in one issue there's a few git URLs floating around. And that harms collaboration.

See below for my proposal on how to keep it all together, instead.

My take: One repo per issue

Frando's picture

I think the simplest and best version is to have one repo per project per issue.

IMO, this has the following advantages to doing it with one repo per project per user (e.g. how fago explained it):

  • A) It's really simple to get a listing of all work done on the issue page. Just list all commits done in the repo with the newest on top.
  • B) The "Work from here" link next to each commit listed really has to do - nothing. It just shows the command to clone the repo, and the one to add it as a remote to an existing clone. No need to copy branches around on the server side or let users pull single branches out of another user's repo.
  • C) When doing one repo per project per user, IMO it would be far more complicated for our active contributors. You'd always have a *huge* number of branches in your clone, if someone else does work on the issue you'd have to add his branch as well and then remember that they belong together etc. It's just gets messy IMO. If we want to improve collaboration, IMO it's better to have repos for all work related to the issue.
  • D) For simple issues, all patches could be posted to one master branch, making it really simple and similar to our current workflow. For more complicated features, the active developers can work in their own branches, merging their work together as its getting along.
  • E) It's also *much* simpler for GUI users, as they can just clone one repo and be done (and don't have to hunt around for branches. Even if they're all listed, it's much more work for them to pull them all than to just clone a single repo)

I've written up some user

Frando's picture

Two more notes to address some of the disadvantages raised in the OP:

1) Diskspace is not much of a problem in either scenario, as for local clones, identical ref objects are hardlinked and not copies. The overhead for a new repo is 200KB or something I think. Should be manageable.

2) I don't think at all that it's a problem if the issue repo allow pushes by everyone who added an SSH key to his account. This is not much different as to how currently everyone can post patches. And the general agreement can just be to only push to the issue repo's master branch if you really know what you're doing, otherwise push to a new branch. Easy.

3) To make it easy track the official repo and merge in changes ("rerolling") the easiest would be to automatically update the official branches (DRUPAL-7--1 etc) in the issue repos on the server side, whenever an official project repo receives commits. For this to work reliably (fast forward only, ever), we just disllow pushes into the DRUPAL-* branches in the issue repos. Both the automatic fetching and disallowing the pushes can easily be done with git hooks.

I have also written up some user stories to demonstrate what I mean:

Patch creator

Patcher Paul has a clone of drupal.git. He notices a bug, so he goes into his repo, does
git checkout --track -b ugly_little_bug origin/DRUPAL-7--1
and starts hacking away.

Now he wants to share his work, so he goes to drupal.org and creates an issue. Once the issue is created, there's a button "Create repo for this issue". Paul clicks there.

Afterwards, he gets a message

The repository has been created. To start working in it, do
git clone git@drupal.org:git/issues/1234567.git
or to add the this issue's repository to another clone of drupal.git, do
git remote add 1234567 git@drupal.org:git/issues/1234567.git
git pull

As he already worked in a clone, he uses the second method and does
git remote add 1234567 git@drupal.org:git/issues/1234567.git
git pull

Then, to share his work, it's simply a
git push 1234567 ugly_little_bug:master

The commits he made are then shown at the top of the issue.

General developer

Developer Dan notices the issue. A few days ago, he did some work on the same feature, which he now wants to integrate into the issue. He has not shared his code so far, it's lying in the feature_x branch of his personal clone of drupal.git.
On the issue page, he clicks on the "Work from here" link next to the most recent click. He gets the following message:

To start working from here, do
git clone git@drupal.org:git/issues/1234567.git
git checkout --track master
Or, to add the this issue's repository to another clone of drupal.git, do
git remote add 1234567 git@drupal.org:git/issues/1234567.git
git pull 1234567/master

So he goes into his working copy and does
git remote add 1234567 git@drupal.org:git/issues/1234567.git
git pull
git checkout feature_x
git merge 1234567:master
git push 1234567 feature_x:dandeveloper.myversion

His commits, again, show up directly on the issue page. Like this, development goes on. Everyone knows immediately what's happening, because all commits show are listed chronologically on the issue.

Reviewer

Rachel Reviewer steps by. She clicks on the "Work from here" for the most recent link (which was done by Developer Dan in the dandeveloper.myversion branch).
She gets the following message:

To start working from here, do
git clone git@drupal.org:git/issues/1234567.git
git checkout --track dandeveloper.myversion
Or, to add the this issue's repository to another clone of drupal.git, do
git remote add 1234567 git@drupal.org:git/issues/1234567.git
git pull 1234567/dandeveloper.myversion
Diff between this commit and DRUPAL-7--1: 24fc371-18af0a4.patch
Diff between this commit and the repo's master branch: 24fc371-cd89ab8.patch

So she just does git clone and git checkout and can start reviewing the patch. To the same diffs as above locally, she does:
git diff origin/DRUAPL--7-1 dandeveloper.myversion
and to review just the parts Dan added:
git diff origin/master dandeveloper.myversion

She gives some feedback. Paul picks it up and fixes it. First, though, he integrates Dan's work into his master branch and merges it with some commits he did locally in the meantime.
git pull origin
git merge origin/dandeveloper.myversion master
[resolve potential conflicts, commit, hack some more, ...]
git push origin master

Like this development continues. If there are commits in Drupal, one can merge them in simply with
git checkout master
git pull origin/DRUPAL-7--1

Commiter

A few weeks later once the feature is fleshed out Dries comes by. To commit it to Drupal's master repo, he does this in his clone:
git checkout DRUPAL-7--1
git pull git@drupal.org:git/issues/1234567.git master
git push origin

OMG! We have a winner??

webchick's picture

I think I need to read this 7 or 8 more times to totally understand it ;), but at a glance this seems perfect to me, and addresses my concerns with community fragmentation, inconsistency for contributors, lack of ease of on the mothership collaboration on large patches, and huge barriers for non-technical contributors using GUIs.

I like it

mike booth's picture

This is pretty good.

  • Every change publishes a patch, so folks without the Git-fu can have a prayer of following along.

  • There is very little auto-created stuff. All you auto-create is one repo for each issue. That's good. Robot-created stuff is just noise; the signal is the pattern of what the humans have created.

  • There is no privileged branch: Each developer can push to either an existing branch or a new one, each push gets added to the end of the issue queue messages with its own "Work from Here" link; testers and reviewers who arrive on the scene will tend to pick the last "Work from here" link to look at, just as reviewers today tend to work with the last patch in the queue except on special occasions.

    This is great, because the issue queue discussion determines the priority of the branches. If visitors can't figure out quickly which branches are most recently active and which are stale, they'll end up lost.

    The only fly in the ointment is the "master" branch. It sounds important, so people will tend to assume it's where the action is, which may not always be true. It doesn't clearly belong to anyone, which will create uncertainty about who is supposed to touch it. Or it might turn into the branch where the traffic jam tends to break out. ;) My suggestion is that we amend your handy rule:

    only push to the issue repo's master branch if you really know what you're doing

    to read:

    Don't push to the issue repo's master branch, unless you're a core committer.

    And that should make it pretty clear. We should teach people to make branches based on topic and/or username, instead, kind of like your dandeveloper.myversion branch.

I do wonder: With every developer having push access to every issue repo, how long is it going to be before some well-meaning but inexperienced user trashes a repo by accidentally deleting a branch that they shouldn't have? Or trying to use git rebase on someone else's already-pushed branch and then pushing up the result? My understanding is that most such mistakes are recoverable, but that the recovery process is not for everyone and not something that you want to do every day. However, the good news is that all those auto-generated patch files will make perfect backups, the repo will presumably be backed up now and then as well, and the developers will tend to have local backups of their own, which should cover the situation pretty well.

Hm I find handling with a lot

fago's picture

Hm I find handling with a lot of remotes complicated and tried to avoid it for the workflow I proposed above. Refspecs like ugly_little_bug:master are not easy to grasp and I guess something that only more experienced git users understand. I'd not suggest letting users have to deal with that.

In comparison when having per project per user repos you can easily push/pull the stuff as it is in the repository, so you don't need to fuddle around with refspecs.

ad 3)
Agreed that makes sense. I suggested to do the same for per-user repos above.

His commits, again, show up directly on the issue page. Like this, development goes on. Everyone knows immediately what's happening, because all commits show are listed chronologically on the issue.

I'm not sure whether it's a good idea to let commits automatically pop up on the issue as it might spam it unnecessarily and significantly changes the way issue discussions work.
Still when having per-user per-project repos we could do the same. Once I posted a ref to my branch, d.o. could go and generate the update-comments once I commit.

Well, ugly_little_bug:master

Frando's picture

Well, ugly_little_bug:master really isn't that complicated IMO. Also, you can just work right in the master branch, then it would just be git push 1234567 master. Or you just do git push 1234567 ugly_little_bug, that would work equally, as it doesn't really matter whether the main code in the master branch or in another branch in the issue repo. The "Work from here" links always show the right branch to pick up, "master" would be purely convention. Maybe we don't even need that convention.

re ad 3):
Well, the commits don't have to show up as issue replies. I more imagined something like webchick's mockup above.
A nice phase 3 feature can then be to check in a post-update hook whether e.g. a commit message starts with "!issuereply" or something and if so add it as a comment on the issue.

Sure, it would also be possible with per user repos. We would somehow have to find out to which issue a branch or commit belongs. We could enforce naming the branches by nid, but then we'd have repos full of branches called "1234482", "891182", etc. Not sure..

Remember - as long as there's

sdboyer's picture

Remember - as long as there's a standardized way that interacting with the issue queue works, we can include aliases/scripts in the standard "drupal git plugin package" that can significantly ease potential confusion.

yep, but we have to support

Frando's picture

yep, but we have to support GUI users equally well. Not sure whether existing GUIs make it easy to include aliases or scripts.

Not in any of the ones I'm

sdboyer's picture

Not in any of the ones I'm aware of...but I haven't really gone digging, either. Besides, GUI users get screencasts!

(how bout THAT for an answer that dodges the thrust of your point :P)

thanks for your

CorniI's picture

thanks for your explanation!
Nice to see at least one person likes the per-issue repo approach as i do it ;)
and yeah, wrt 3), we've to see how that works out, especially in the edge case of a non-fast-forward push in the main repo of a project.

no way.

fago's picture

We really have to dis-allow non-fast-forward pushes in main repo-branches. What's in there, is in there.

Agreed. No non-fast-forward

Frando's picture

Agreed. No non-fast-forward pushes. If you mess it up, you have to do an additional commit to fix it if you pushed your messed up commits already. Everything else kills our clones.

yeah, that works usually, but

CorniI's picture

yeah, that works usually, but not if you intend to damage your repo, like with committing porn, warez, etc.
And even if not, I managed to commit an entire drupal install to the git repo of the versioncontrol_git I want to remove there, and that needs a non-fast-forward push as well.

Sorry, I'm not following -

sdboyer's picture

Sorry, I'm not following - why would that require a non-fast-forward push? You just make another commit that reverts the inadvertent adding of the whole drupal instance.

it puts ~1-2MB in your repo

CorniI's picture

it puts ~1-2MB in your repo forever which you don't need, and that doesn't solve the problem of adding illegal/unwanted files.

Space is cheap. If it's

sdboyer's picture

Space is cheap. If it's really that much of an issue, the branch with the botched commit can be deleted (and partially preserved, as needed).

And now I see what you mean about illegal files - I think the more common case, though, will be the one where there are licensing issues. That'll be more or less the same as it is now, though - file an issue with infra, and they can come in with super cow powers to clean it up. Same probably goes for the botched commit, actually.

ad B) Indeed, Work from here

fago's picture

ad B) Indeed, Work from here gets easier. But instead we have to setup the per issue branches. Where is the difference?

ad C)
Hm, when I'm an active contributor I have a huge number of branches, yes. But I prefer a huge of branches than a huge number of remotes with each having multiple branches!

ad D)
So still I have to configure a new remote for each simple issue? Puh.

ad E)
Hm, I have to clone one repo per issue I work on. With my solution I'd just have to get my clone of the project and get everything I work on. Isn't that convenient?
Still you can easily integrate work of others by pasting a single pull command provided by d.o.

The problem here is the case

CorniI's picture

The problem here is the case that you've got 5 people working together on one issue (like dbtng). You'd have to keep track of the work done by the 4 others and you'd need 4 remotes here instead of one, and you'd have to pull from 4 others and resolve the merge conflicts. As there is no central place for you 5 to work together, this will get very hard, if not impossible.
And because we keep the patch workflow, the reason for the git migration was that working together on one issue at d.o isn't possible at the moment. With the git migration, it's imho top priority that we have a good workflow for these issues, because else we don't gain much from migrating to git.

Yep, but that's a rare case.

fago's picture

Yep, but that's a rare case. While this should work out too, we should not make contributing simple fixes more complicated because of that.

For big changes like db-tng there we'd be the possibility for a lieutenant to step in and pull good stuff from others, e.g. in case of db-tng crell could pull changes of others in his personal repos. Thus this repos can serve as reference for others and finally gets merged into the project mainline.
I think this is applicable, as usually for such big changes there is a person that makes sure stuff is right before dries or webchick commit it.

But if we still want a common centralized repos people can push too, we still could
* open a branch in the projects repository, which is public / allow people to mark branches to be public?
* or make the creation of publicly accessible clones possible. (E.g. we could internally associate them with the anonymous user, so that shouldn't be hard.)

And because we keep the patch workflow, the reason for the git migration was that working together on one issue at d.o isn't possible at the moment.

I see. But as you can merge in the changes of a different issue branch, you can also easily pull in the changes from another persons repository. That's simple as it's the usual way developers work together with a DVCS...

re ad B) We don't have to set

Frando's picture

re ad B)
We don't have to set up per issue branches. We set up per issue repos, which is really simple. Say we have the following file structure on d.o:

/home/git/projects/<project>.git
/home/git/issues/<project>/<nid>.git

The link "Create repo for this issue" on the issue pages leads to a

  cd /home/git/issues
  git clone --bare --template=/home/git/issuetemplate /home/git/project/$PROJECT.git $NID.git

That's fast as just hardlinks are created for the ref objects, so not much of a problem. No need to fiddle with existing repos. /home/git/issuetemplate contains a pre-receive hook that checks if a branch name starts with "DRUPAL-" and if so aborts the push.

re ad C)
Well, of course we're discussing personal preferences here, so I won't argue much. But for me, remotes have quite some advantages. Once an issue is settled, I can just remove the remote from my clone, that's one command. With remotes, I have all the branches related to an issue grouped together.

re ad D)
What's the problem in "configuring a new remote for each simple issue"? That is just one command (that can also be copy pasted from a "Work from here" link):

git remote add 1234567 git@drupal.org:issues/1234567
git fetch 1234567

And off you go hacking. Like this, when someone creates a new branch in the issue repo with his work, you automatically pick it up the next time you do git pull 1234567.

In contrast, with one repo per project per user, especially for core, it would be quite some work to keep your clone more or less clean. Imagine on how many issues you'd submit a patch during D8 development. You wouldn't be allowed to delete branches, because others might have branched from you and things would get really messy if branches are deleted. So it would all pile up. With repos per issue, though, once the issue is settled you just remove the remote and be done.

re ad E)
You don't have to clone one repo per issue. You can just add it as a remote and pull it into your existing working clone. Most issue repos will just contain one branch or maybe two. So it'll be fast.
My point E) was meant for the less experienced developer who just stumbles by an issue. With one repo per user, he'd first have to get a personal clone if he doesn't have one yet, clone it locally, then pull the branch where the latest work in the issue happened from another user's branch, then push it to his own clone and then somehow make the issue aware of it.
With one repo per issue, it's just one clone and you can push right to the repo you just cloned. Sounds easier for me. But not too much of a difference, either.

ad B) yep, of course per

fago's picture

ad B) yep, of course per issue repos. That was what I meant, sry. Still it doesn't make any difference whether we have to setup per issue repositories automatically or per-user repositories.

ad C)
I agree that in the end it boils down to personal preferences. But what's wrong with deleting branches? If someone has pulled from it the history would be still in his repository.
However yes, we'd need a way to make sure the history of committed stuff is preserved once people delete their branches. Perhaps the truth lies in the middle, combine per-issue repos with per-user repos. E.g. each referenced branch from a user is automatically copied over in a per issue repos to preserve history. That way you could also use that as a single remote to pull in the work of others as you suggested.

ad E)
With own repo per user, I'd click the "branch from here button" and get the only need pull / clone command shown in dependence whether I have already a repo for this project. Once I'm finished, I do "git pull". So there would be need to anything special or care about remotes to get started!

ad D)
Also I'd have to setup the local branch before I could start hacking. So I have to organize multiple local branches for multiple issues - to avoid having clashing branch names I have to use refspecs. Thus the user has to understand that. Thus to do a small fix I have to add a new remote, fetch from it, create a new local branch, commit, push. Also I have to start the issue before I can push.

In comparison with a per-user repos I just branch from the mainline, commit, push it. Done. Also you can create an issue for it once you are done.

True points. After all, I

Frando's picture

True points. After all, I think git is powerful enough that both scenarios can work equally well. I still think that one repo per issue is simpler and more similar to our current workflow, though.

Thus to do a small fix I have to add a new remote, fetch from it, create a new local branch, commit, push.

Well, for adding the remote, fetching and creating the local branch, we could also provide a single command to copy and paste on the "Work from here" link:
git remote add 1234567 git@drupal.org:issues/1234567.git && git fetch 1234567 && git checkout -b 1234567.frando --track 1234567/master
Then it's just hack, commit, and git push 1234567 1234567.frando.

The one issue with per user repos remains the issue queue integration. If we enforce branches to be named after issue nids, you also have to create the issue prior to pushing. If we do not enforce that, how do we get the "work from here" buttons on the issue page? I mean, how do we know in a post-receive hook to which issue the received commits to branch foo_bar in user baz's clone of drupal.git belongs? With per issue repos, that's easy. And I could still first push the code to my sandboxy personal clone of drupal.git if I really want to share code before creating the issue, and then once the issue is created I just add the issue's repo as remote and push it over there.

But as I said, if done properly, both scenarios can work well I think. I fear though that with per user repos it will be more difficult to track ongoing development as it's more scattered around.

After all, I think git is

fago's picture

After all, I think git is powerful enough that both scenarios can work equally well.

Indeed.

If we do not enforce that, how do we get the "work from here" buttons on the issue page?

I think we don't need to. Once I post a reference to the branch in a issue, the system is aware of it. We don't need to have commits automatically pop up.

However we could once a reference on a branch is posted
* pull that code over in a per-issue repository

-> Thus we have the code also available per issue, so we have ensured the history + devs can use that to switch easily between different patch versions if desired. We could even allow pushes to the issue repository, so I could explicitly post the updated code to the issue by that.

But still having per user repos ensures handling with multiple remotes is completely optional and knowing the basic git pull, git push workflow is enough.

You guys have a lot of good

sdboyer's picture

You guys have a lot of good discussion going on here, and I'm having difficulty finding just the right spot to tag on to :)

I've been going back and forth over sandboxes (that is, per-user repos) vs./+ per-issue repos quite a bit this week, and I think I'm coming down mostly in favor of per-issue repos, while still having sandboxes. While there's no question that we COULD make either system work, the deciding factors for me come down to this:

  • History. Y'all just covered this, but if the branches used for issues are stored in sandboxes, then we have the problem where users performing routine cleanups on their sandboxes (and deleting those branches) has the potential to destroy the historical integrity of the issue queue. This is a MAJOR problem, one that basically requires that if we used sandbox branches directly in issues, that we maintain archives of those branches for posterity. Really, that's a nasty data synchronization problem that I don't want to deal with: think of the overhead required to monitor issues to see if sandbox branches have been deleted, then updated all the links to point to the archive branches. Eew. Data duplication is always a headache.

  • The whole push/pull/merge system is complex; new git users often spend a lot of time simply trying to understand what a remote is, and what they're doing when they interact with it. While git can support endless remotes, every additional remote that the user must interact with is not-insignificant mental overhead. Much more problematically, though, while everyone having their own sandbox does not (at all) preclude collaboration, it does change the nature of the commit history that will be generated. Consider the following scenario with sandboxes:

    1. Gorelik creates a new issue, registers a branch in his sandbox to the issue, and pushes a commit (A) to that branch. His log for the issue branch looks like this:
      1252ff6 test commit A
    2. Lhas wants to join in, so he adds Gorelik's issue branch as a remote, pulls it, then makes two commits (B & C), pushes to the branch in his sandbox, and registers that sandbox branch with the issue. His log:
      093b925 test commit C  
      ceb4cfa test commit B
      1252ff6 test commit A
    3. Gorelik makes another commit (D) and pushes it to his sandbox. His log:
      4385417 test commit D
      1252ff6 test commit A
    4. Katrin joins in, which means adding both Gorelik's AND Lhas' sandboxes (in that order). She makes a commit (E) and pushes it to her sandbox. Her log:
      67279cb test commit E
      d6fc190 Merge remote branch 'lhas/issue1' into issue1
      4385417 test commit D
      093b925 test commit C  
      ceb4cfa test commit B
      1252ff6 test commit A
    5. Lhas catches back up on the issue, but before checking the issue to see that Katrin has joined and is most up to date (and thus going to her first), he merges in commit D from Gorelik. His log:
      5cce74a Merge branch 'issue1' of /home/sdboyer/ws/gittesting/gorelik into issue1
      4385417 test commit D
      093b925 test commit C  
      ceb4cfa test commit B
      1252ff6 test commit A
    6. Lhas then adds Katrin as a remote, and merging in commit E from her. He then makes commit F, and pushes it to his sandbox. His log:
      c6d50ed test commit F
      f4a71a6 Merge remote branch 'katrin/issue1' into issue1
      67279cb test commit E
      5cce74a Merge branch 'issue1' of /home/sdboyer/ws/gittesting/gorelik into issue1
      d6fc190 Merge remote branch 'lhas/issue1' into issue1
      4385417 test commit D
      093b925 test commit C  
      ceb4cfa test commit B
      1252ff6 test commit A
    7. Gorelik finally rolls back around to the issue. He sees that Katrin has joined, and that Lhas has made updates, so he adds Katrin as a remote and then merges from both Katrin and Lhas (using an octopus - multi-headed merge. Remember that the additional heads makes octopus much dumber than recursive...). He then makes a final commit (G), and the feature is ready to be pulled in by the maintainer. His log:
      ac5c00b test commit G
      dae57b5 Merge remote branches 'katrin/issue1' and 'lhas/issue1' into issue1
      c6d50ed test commit F
      f4a71a6 Merge remote branch 'katrin/issue1' into issue1
      67279cb test commit E
      5cce74a Merge branch 'issue1' of /home/sdboyer/ws/gittesting/gorelik into issue1
      d6fc190 Merge remote branch 'lhas/issue1' into issue1
      4385417 test commit D
      093b925 test commit C
      ceb4cfa test commit B
      1252ff6 test commit A

    Now, it IS worth noting how this fairly complex situation is handled relatively gracefully (yay dvcses!). But this highlights some critical problems:

    1. How long did it take non-experts looking at that reflog to understand that Gorelik ended up with a tree that completely contains everything Lhas and Katrin did? I suspect the answer is "long enough that it will take a long time for many contributors to trust all the merge algorithms to do it properly - if they ever get to that point at all." And I can't support adopting a system that we can't reasonably expect people will be able to grow to trust.
    2. Whether we use sandboxes or issue repos, each contributor must be cognizant of all other contributors. In the issue model, this "just works," because the branches in that repo are themselves the woven-together collaborative progress on that particular feature. In the above scenario, though, every contributor must make a decision about how that weaving-together should happen - which is significant added mental overhead. And while git can handle all this nicely when there aren't conflicts, if merge conflicts DO arise, then it is suddenly essential that there be one way those conflicts are resolved - not left to the discretion of each individual contributor to resolve the conflict as they'd like.
    3. Rebasing here is really dangerous. The whole above scenario could, I THINK, work by rebasing, but all it would take is two people rebasing at different tips from the same person's tree, and then pushing out commits on top of that (which is an otherwise generally reasonable thing to do) to totally screw the pooch. Given that, there can be no clear rule about what rebasing is safe in this scenario - which for us would basically make it verboten. Personally, I'm not too opposed to that...but I do agree with Linus that there is legitimacy to rewriting history at times.

Having a 'branch from this point,' or similar feature would ease some of the initial setup, but it does not resolve the fundamental problem.

HOWEVER, there are some important notes to make:

  • In Frando's detailed followup post, he suggested (in point no. 2) that this is not very different from the current patch workflow. I'll agree that it's similar, but there's a crucial distinction: with the patch worfklow, you could pick an arbitrary patch from the queue to work from. While you can still pick an arbitrary commit in the tree to work from, you CAN'T just commit directly from it (however you try it, you'd end up trying to push a non-FF merge to the issue's bare repo, which we disallow). This is not a show-stopper, but we need to recognize that it's a bit of added difficulty that we should try to help resolve using some aliases/scripts. This also sorta falls under the heading of "ways to ensure disagreements in the issue queue don't explode our vcs." Actually, this is where "branch from this point" would be really useful...

  • Check out that link to Linus' email I posted above again, this time for the discussion of merging best practices. It's important to keep in mind that EVERY branch in these issue repos will be a downstream branch. One of the awesome things about switching is that we shouldn't ever HAVE to make these branches chase the mainline branch - and according to Linus, it may even be a bad idea. Just work away at whatever branch point you want, and the maintainer can & should resolve the merge when it's time to actually pull the feature into the mainline. Of course, if we auto-generate patchfiles, it's easy enough to generate a patch that's against the current mainline tip. The only exception to this would be if there's another major feature that went in which has a lot of overlap.

a) You talk about keeping

CorniI's picture

a)
You talk about keeping user sandboxes, but you don't mention for what you need them. Is this intended as replacement for the current cvs sandbox, which is probably mostly unused und not very important at all? I'd be totally okay with that, but just want to check...
b)
I don't understand why you don't want a patch to chase mainline. In a world in which patches are tested by a bot, they have to apply cleanly to the current mainline, else testing will fail. Also, the idea that the maintainer should then merge in the patches seems crazy to me, do we want webchick&dries to merge patches here and there? I'd guess that they would introduce quite some bugs in the process of merging. An auto-updated master branch which mirrors the project's branch against the issue is would be better. Yes, we'd need some merge commits then. And Linus' point:
" But if you want to sync up with major releases, do a

    git pull linus-repo v2.6.29

or similar to synchronize with that kind of non_random point. That
all makes sense. A "Merge v2.6.29 into devel branch" makes complete
sense as a merge message, no? That's not a problem.
"
doesn't apply to drupal, because we don't release nearly as often as linux does, nor takes the average patch 6 months or longer to develop.
We have different requirements here, because drupal's codebase is much smaller and not as self-contained as the linux kernel, thus we need to have our patches chase head, else they will be stale a week after they've been written.

a) because I don't want to

sdboyer's picture

a) because I don't want to have to have to create an issue or project in order to have a branch that someone else can look at on d.o. Plain and simple. They might not get used very much, but I'd still rather have them.

b) The patch file can chase the mainline, sure. But it's pointless for the issue branch to chase mainline. It's easy enough to make the a system that checks to see if a branch can be cleanly merged back into the mainline (which is what the testbot would need to do), and pop a warning about that on the page. But making the branch chase mainline ends up mussing up the feature branch's history with merges that aren't related to the functionality being dealt with in that branch itself. The more tightly focused a branch is on a specific feature, the better - so don't merge from upstream unless you actually have to.

I don't see how the release schedule is pertinent, and the fact that our patches can take so long to get in is an even BETTER justification for avoiding chasing the mainline. The degree to which patches overlap is a reason that you'd need to remerge from upstream more frequently than with the kernel, but:

else they will be stale a week after they've been written

really makes me think you're missing the main point here. We aren't (or don't have to) deal with patches anymore here, so it's not whether or not the patch applies; it's whether or not a) the branch merges cleanly with mainline, and b) there were other changes made on the mainline that interfere with the feature at hand. Actually, the potential for selectively merging in patches from other issues (but not necessarily the full mainline) is one MORE reason to avoid merging from upstream as much as possible: if everybody keeps their issue branches discretely focused on a particular problem, then it means one issue branch that is affected by another issue that got committed can choose to merge in JUST that other issue, which will go a step further to ensure clean, isolated feature branches.

You can also think about it as a comparison to our current patch workflow: while big patches currently chase HEAD, they don't have to include deltas for every change committed to HEAD whenever they chase it. The patches just include what's relevant to the patch itself. If, in the new system, a commit (or commits) roughly equals a single patchfile, then merging from upstream is like posting a patch with all the deltas to HEAD every time you post a new patchfile.

After a quick discussion in

CorniI's picture

After a quick discussion in IRC:
- We want the following rule for updating patches to HEAD: 'Just merge mainline in when you need to, aka there are merge conflicts.'
This won't occur that often, because a recursive git merge is ways smarter than a patch -p0 < issue.patch.
- We want a warning somewhere in the issue when there are merge conflicts. Further details to be worked out (when updated, how updated, maybe via a comment like the testing bot?)
- We don't need any mainline tracking/auto-updated branch in the issue-repository as the normal contributor will clone the project repo on the project page and then add the per-issue repo as remote. He already has the latest mainline as remote, so duplicating it in the per-issue repo doesn't make sense. We'd drop the edge case of the contributor who a) clones the per-issue repo and b) wants to merge mainline in. He'd need to add the project repo as remote for that. I think that's okay, though, and it's just a minor point.

ad 2) I'd assume that we

fago's picture

ad 2)
I'd assume that we still need "manually" written descriptions for the changes we do. Thus I think we have already a proven model here: Read the people's comments. Then it's up to you from which point to start working and then it doesn't matter whether the code is in the users repository or in the issues.

ad 1) Why I should trust user X to not undo my changes when he commits something? Either way when there is no trust, only a manual audit can help.

ad History)

think of the overhead required to monitor issues to see if sandbox branches have been deleted, then updated all the links to point to the archive branches

Instead I'd just pull in the changes in a per-issue repos (e.g. branch USER--branch) when the users posts it - that's simple. And the workflow is like now, post it to the issue and your changes are preserved for your descendants. ;) As you said remotes are complex and I think personal repositories combined with a nice branch from this point feature can help prevent that users are forced to deal with remotes.

Use cases

andb's picture

Frando did a great job reviewing user use cases. It might be wise to review the use cases for different types of code. Webchick's needs for core (Huge codebase, mutiple contributors) will be different from Earl's (huge codebase, single contributor) for views, from Fago's with Rules (small codebase, a couple contributors). Despite the differences I still believe there is one "best" workflow to be found.

I see tracking as being one of the biggest problems - what are all the repos out there that deal with problem X.

At first i thought, put all for a project into a single repo, but now I see how wildly inappropriate that could be if 100 people decide to all make a sandbox (personal branch) of the project. Now I see logic in having separate repos - Im used to git, so Im ok tracking a few tens of seperate repos and branches, as long as they are at least listed in one place so I know what to track. I wonder however if the casual git user going to be able to do this?

Wouldn't it be great if we could set up 4 or 5 projects with medium traffic with each of the proposals and see how it work out in real use?

It seems that consensus is moving towards something like:

Main project repository

  • Project owner can create new branches
  • Project owner can give d.o users write access to individual branches

Per issue repository

  • Anyone can make a new branch
  • Branch can only be written to by that specific branch creator and project owner
  • Master can only be written to by the project owner, and will become the official solution

Sandbox repository

  • Clone of the main project repository
  • Any use can create this seperate repository
  • Creator can make new branches and write to them
  • Creator cannot delegate write perms to a sandbox repository

Keeping track of related work then can be solved on the Main / Sandbox level by listing on the project page all of the generated sandboxes and their URL (sortable by activity hopefully). Then on the issue level it is handled by all branches being in a single repo.

Have I understood correctly?

This would be the solution

CorniI's picture

This would be the solution I'd (mostly) prefer.
The only exceptions:
You say, that the per-issue-repo-master branch can only be written to by the project owner. I'd just remove that point, because the project owner/ a co-maintainer has to pull in the branch he thinks fixes the issue anyways, so there's imho no point in an extra step here.
And for personal sandbox-repos of projects, I'd not list them anywhere, because really, if you've got per-issue repos out there, you won't do much interesting stuff in your personal sandbox anyways (or how much have you been doing in your current cvs sandbox now?), and cluttering the project page with such a list seems inappropriate to me.

Yes, sounds good. However, I

Frando's picture

Yes, sounds good.
However, I propose to not enforce any access restrictions in the per issue repos. It just happens far too often that issues are stalled and then get picked up by someone else etc. We just have a convention that "if somehow unsure, create a new branch instead of pushing to an existing one" in the issue repos. I think that's enough.

For the project repo, we can have write access per branch, but I think even just write access per repo would be enough. That's how projects are maintained in CVS right now, and it seems to work well (all Co-Maintainers having write access to all branches, even if they just maintain one of them).

Branch permissions

andb's picture

You may be right about the main project branch - blanket access to the maintainer and anyone he adds to the list is probably a good, simple solution. A maintainer won't add someone he doesn't trust.

On the issue repos though, I would enforce per branch access. I imagine that I'm working on a solution and some noobie comes along and pushes something to the branch I'm working on: at the least frustrating, at the most the cause of significant wasted time.

It just happens far too often that issues are stalled and then get picked up by someone else etc

As everyone would be able to make a new issue branch, if I've started and then lost interest, you can clone my branch in the same issue repo and continue to work.

You say, that the per-issue-repo-master branch can only be written to by the project owner. I'd just remove that point, because the project owner/ a co-maintainer has to pull in the branch he thinks fixes the issue anyways, so there's imho no point in an extra step here.

I like the idea of master being on the per issue repos being limited to the maintainer. After all, he will be the one to choose the final solution, so his cherry picking of commits will be a way of showing the direction he wants the solution to take.

Yes, on smaller issues you are right - its an extra step. But then, the maintainer would never even need to use the master branch. On longer, larger issues like move fields to core which may last months, having the master as the guiding branch would be very helpful, imo.

The whole idea of per-issue

CorniI's picture

The whole idea of per-issue repos is that you work together, mostly in one branch.
For example, you create an issue and write a patch and commit it into that per-issue repo in a branch called fix.
I, as reviewer, notice some whitespace issues and fix them. I commit and push these into the branch called fix, as well, so you don't have to manage several branches from the per-issue repo, which then wouldn't gain us much over the solution with the per-user per-project repositorys.
Restricting access there seems to be the wrong way, as we want free collaboration as much as possible. And then there's always git branch and git revert ;)

I don't agree, collaboration

Pisco's picture

I don't agree, collaboration in Git means fetching/pulling from others repositories. I always keep total control of what I created (in my clone). I can ask others to fetch my version and I can fetch other peoples version, but never do I let someone else mess directly in what I created, my clone.

The per issue repo is the platform where people exchange their solutions for an issue, people can fetch other peoples solution or contribute theirs by pushing them into the issue repo.

What I would add to to andb's proposal, is that only fast-forward pushes can be made to the per-issue repo and it's branches, otherwise things get messy and we loose history of how a patch came to be. If someone wants to change her solution, she can do so by doing more commits to her branch, or by pushing a new branch.

Here's my basic problem with that...

webchick's picture

Let's observe this issue: Bartik theme, which is a "real life" demonstration of exactly what I do not want to happen with this move to Git. It's a long thread, so I'll summarize:

  1. Chief Designer uploads nice looking theme to issue.
  2. Community goes "OMG! I want to help with that!"
  3. Chief Designer moves theme to GitHub for "easier collaboration".
  4. Designer B clones it to their own personal GitHub repo and starts fixing CSS bugs.
  5. Programmer A clones it to their own personal GitHub repo and starts adding Color module integration.
  6. Designers C and Programmers B and C also clone and start adding whatever the hell they're adding.
  7. Chief Designer comes back after a couple of days and goes "OMG. I have no idea what the hell you guys are doing and it's going to take me 4 days to go through it and merge everything back in!"
  8. 4 days goes by, Chief Designer finally gets a working copy back in the "main" repo again. Patch is posted, reviewers can test. Everyone rejoices.
  9. Then, Chief Designer gets pulled away by "Real Life" stuff, which happens to everyone. In the meantime, Designers B and C and Programmers A B and C are working away.
  10. It's now been over two weeks since the issue was updated with code. This has effectively stale-mated this issue. When Chief Designer's real life calms down and she is again able to give this focused attention, she will again have to go through the 4-day process of merging in others' changes to her "master" branch, and then rolling a new patch for review. In the meantime, I have absolutely no idea where the hell to find the version that's the most furthest along, but more importantly neither do do any of the patch reviewers. There's absolutely nothing at all I can say to someone who approaches me and says "That theme looks cool. What can I do to help it get in?"

So, no. We are absolutely not doing that ever again. And the idea of having one canonical place (the per-issue repo) that always holds the most up-to-date code, where anyone and everyone can hack on it seems like a wonderful solution to me. That way even if Designers A, B, C and Progammers A, B, and C get totally overwhelmed by real life stuff, Designer/Programmer D extraordinare can take it the rest of the way home, and Patch Reviewers A, B, and C can go through the exact same process on this issue as well as all other issues to grab the latest copy of the code and test it for bugs.

I agree 100%!

Pisco's picture

I agree 100%!

In fact, my last comment was the worst comment I have read in very long time :-) , I'm sorry for that! What I was actually trying to say is that I agreed with andb when he said:

On the issue repos though, I would enforce per branch access. I imagine that I'm working on a solution and some noobie comes along and pushes something to the branch I'm working on: at the least frustrating, at the most the cause of significant wasted time.

But having re-read the last few comments and your post, I'm starting to like more what Frando and CorniI propose, because of what you just said.

It makes perfect sense to have as much as possible happen on d.o and to promote the highest level of collaboration possible.

Nonetheless I think it would be very important to only allow fast forward pushes to the per issue repos, in order not cause confusion, and not to loose history, that others may depend on.

However, how would you handle this situation:
* Alice pushes her (not finished) solution into a branch on the per issue repo.
* Bob, a complete newb, thereafter pushes something, that doesn't make sense at all, into that same branch.

Given that only fast forward patches are allowed, how should Alice handle the situation:
* Should she make a commit that completely undoes Bob nonsensical contribution?
* Should she commit everything into a new branch?

What should we she do?? If you were Alice and in the mean time you would have continued to work on your solution, only when you try to push do you notice that there has been a patch contributed, how would you proceed? The solutions that come into my mind a freakin complicated ... should forced pushes be allowed???

This use case is not clear to me.

(Sorry again for the confusion I caused earlier!)

Great you're on the board

CorniI's picture

Great you're on the board with the per-repo per-issue solution!
Wrt non-fast-forward-pushes, yeah, the only ppl who should be able to do non-fast-forward-pushes should be some d.o admins or members of theinfrastructure team, to resolve legal problems, etc.

In your case alice would probably first do a git revert, push it, and then comment into the issue why she did that, maybe asking bob friendly to put hsi solution in an extra branch if he wants to pursue it. This would lead to a not-so-nice history, but imho, in this special case, that's ok, and far better than dealing with forced pushes in the usual case.
You could also push a rebased and cleaned up branch in a new branch in the per-issue repo and designate that as the branch in which all the work is done now, so when the project maintainer later merges the issue in the mainline, the commit from bob doesn't show up there. Does this sound good?

Can you clarify a bit more?

Pisco's picture

Can you clarify a bit more? You say there are special ppl allowed to do non fast-forward pushes. What makes Alice special in this case?

From what I see, the only way to make Alice a special person in this situation, is to have branch permissions, specifically Alice would then be a special person because she initially created the branch. Am I misunderstanding something, or just overseeing something else?

As for the rebase or revert, let me pinpoint the situation:

After Alice's first commit:

ceb4cfa commit B
1252ff6 commit A

Bob pushes nonsense:

093b925 commit C' <-- this is what Bob commited
ceb4cfa commit B
1252ff6 commit A

The local state of Alice's branch before she want's to push for the second time:

4385417 commit E
093b925 commit D
093b925 commit C
ceb4cfa commit B
1252ff6 commit A

At this point Alice cannot push because it wouldn't be a fast-forward. What she would have to do to revert/undo Bob's C' commit is

git checkout -b wtf origin/branch-name
git rebase -i # or git reset --hard HEAD^
git push --force origin wtf:branch-name

only then could she push D and E. And as a side effect, everyone having started to work from Bob's C' would be cut off. But still, what gives Alice the right to push --force?

The easy solution for Alice would be to push her local branch into a new branch in the per issue repo, but then again, what do those do, who started their work on top of C'.

And finally, is this a use case we have to consider or am I just doing mental masturbation???

Ah I'm sorry for not

CorniI's picture

Ah I'm sorry for not expressing me clearly, Alice would never have forced push access.

We only want forced pushes to remove files we cannot legally distribute, like warez, or stuff not under the GPLv2+. To remove these from the repositorys, we need forced pushes, and the only ones who are allowed to do forced pushes are the guys from the infrastructure team (these handle this work for cvs currently as well).

And how should Alice handle

Pisco's picture

And how should Alice handle the situation then? Create a new branch?

yep, probably, or alice asks

CorniI's picture

yep, probably, or alice asks bob friendly to create a new branch for his work, or alice helps bob to figure out why bob's solution is bad.

Ideally, Bob would actually

sdboyer's picture

Ideally, Bob would actually be the one to create a new branch. Won't always happen - maybe even usually won't - but if Bob is truly taking a new approach, then he'll have realized it and will pop off a new branch for his commit.

If, however, Bob's commit is merely incorrect, then in most circumstances I would say Alice should simply roll back the incorrect portion of Bob's changes in her own chain of commits. She can do that be rebasing off her own commits, rolling back Bob's, then restoring them...or any number of other ways. Splitting off another branch for her would be an option, though. Kinda up to her.

From discussion in IRC, it

pwolanin's picture

From discussion in IRC, it sounds like the cloned repo would have all the branches of the target project - and that there would also be the potential to add additional branches as needed for competing versions of a patch. I'm not sure if one would be able to push to the branches that match the branch in the terget you are trying to fix, or wither you would always create at least one new branch for every issue repo.

i think we always want at

CorniI's picture

i think we always want at least one new branch, but technical justification needs to wait until we have the exact technical solution how we deal with this (including porting patches, auto-updating or not the mainline branches in the per-issue repos, etc).

I say you definitely create

Frando's picture

I say you definitely create at least one new branch for every issue repo. Typically, that can be the "master" branch, as the branches from the master repo will be named 7.x-1 etc.
Otherwise, thinks get complicated and confusing IMO if at some point in time you want to merge in changes from the master repo.

This should be done far less often that it's currently needed to "reroll" a patch thanks to git's amazing merge capabilities, but if a patch stalls for a year or something you likely need to merge in the work from the master repo. As we don't allow non fast forward pushes (because they destroy history) rebasing is not an option for branches that have already commits pushed to them.

OK, this seems like a wise

sdboyer's picture

OK, this seems like a wise place to step in and point out something about how the issue repos are actually going to work, as there seems to be some confusion over how the master branch will behave.

The issue repos are going to be 'bare' repositories. There's no reason for them to be anything else. Consequently, the presence/absence of a "master" branch is going to be entirely dependent on the presence/absence of a branch with that name in the project it is based on. To achieve this, the repository will be generated by cloning the project repo (which will also be a bare repo) with the --bare option; this causes all the branches to be copied over directly into the issue repo - unlike what you normally see when cloning without --bare, which simply registers that remote as 'origin', lists each branch as a remote, and generates a local 'master' branch corresponding to the project's current HEAD. So discussion about what we do with master in an issue repo is moot.

Once the issue repo is created, individual participants in the issue will be freely capable to work from any base branch they want - but those base branches will all be the exact ones that were in the project repo at the time the issue repo is spawned. For most cases, it will not be strictly necessary to create additional branches off each base branch if a patch is applicable to multiple versions of a project. The contributor would have to do the same thing the maintainer would - cherry-pick the commits off the original branch they worked from, and apply them to the other branches in question. Most of the time, given the possibility of functionality changes, etc., that make that merge non-trivial, I expect the maintainer would want to take care of the cherrypick.

Sorry if I'm being pedantic

Pisco's picture

Master can only be written to by the project owner, and will become the official solution

Sorry if I'm being pedantic here, but the official solution will be that one, that gets merged into the official repo (perhaps by resolving some merge issue). And that is done by the project owner (or one of the project owners).

Yes Pisco, you're right, and

sdboyer's picture

Yes Pisco, you're right, and no, you're not being pedantic. This is an important distinction - there simply cannot be any "canonical" branch in issue repo that the maintainer will always know to pull from. "master" will probably not even have that much significance as a branch name; the only circumstance under which people would have a master branch would be the (I hope) uncommon one where they are cloning the issue repo directly, rather than just adding it as an additional remote to their already-existing local copy of the project. Only once a project maintainer decides an issue is resolved and that they will merge back a particular issue branch back into mainline does it become in any way "official."

Longer running issues

andb's picture

With the idea of using master as a guiding branch I was imagining a longer running issue. Imagine a complex bit of new functionality that a handful of people are contributing to. As the solution solidified, the maintainer would be merging or cherry-picking in changes that he felt we would incorporate in the end. So in effect, he would guide the people working on the solution with this.

Of course, much the same can be done with comments :) I thought I should explain how the idea came to be.

Summerizing the issue

andb's picture

Has the mass mind reached a conclusion which can now be incorporated into the main wiki body, perhaps as a closing section "Solution"? I could try to summarize what is above, but feel that would best be left to one of the more significant contributors to the thread.

I'm working on a detailed

CorniI's picture

I'm working on a detailed post on that, but it is already >70 lines, because i want the exact technical specification to implement. I'll post it as new wiki page then, because this one is already pretty long...
I hope to so before or on sunday.

you can find it over

CorniI's picture

you can find it over there:
http://groups.drupal.org/node/52953

Late to the party ...

te-brian's picture

I've been following along as best as I can and there is one general concept that I haven't seen the clear translation of yet. (For context, I have never used dvcs and am fluent enough in cvs/svn to use tortoise/eclipse to work on patches, but I rarely use the command-line)

With the current workflow (CVS/patches) there is this one singular object that I can reference (via my browser thanks to dreditor) to see whats going on with an issue. I can observe what smart-person-A and anal-code-style-person-B have been doing to an issue by almost physically seeing them pass this object back and forth. Sure, in some cases there is a cross-post, or a new idea comes up. In this case hopefully some descriptive comment copy can clue me in to which patch is the current "hot potato". What is the Git equivalent of this, in layman's terms? How can I tell if the progress that is being made is something that I can/should jump in on? What is the current working copy?

Maybe the answer is that we are "distributing" this work-flow a little, with the upside that more ideas can be explored, and bigger pieces of functionality can happen here a d.o.

Issue tracking and software releases

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week