Evaluation discussion for how to move Drupal.org off of CVS

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Conclusion

Git has been selected as our new VCS!

http://groups.drupal.org/node/48818?page=2#comment-133893
http://sf2010.drupal.org/conference/sessions/exodus-leading-drupal-out-cvs
http://groups.drupal.org/drupal-org-git-migration-team

Debate archived below:

We actually had quite a productive IRC discussion tonight (no, that is shockingly not an oxymoron!) about the general migration to distributed development, the community fragmentation that it causes, and how the tools on drupal.org might be improved as a way to combat this.

After spirited discussion, we brainstormed this "hit-list" of things that need to happen in order to Drupal.org to ever move to a distributed version control system. These are pretty loose, but there are quite a few actionable tasks that came out of it, which folks interested in scratching this particular itch could start picking off.

If you want to help, please clearly mark one or more of these things with your name (or the name of you and your buddies, make a game of it!), and comment below with progress. Or, help flesh this out a bit more by providing links, things we haven't thought of yet, etc.


Summary of Contenders

We've narrowed the search to Bazaar and Git. For the purposes of our community, the two appear to be basically functionally equivalent (yes?). Here's how they stack up on the other merits:

Bazaar

  • Provides the benefits of a distributed version control system, while also supporting a traditional centralized workflow, which will make transition easier for people. This advantage should not be under-stated.
  • Existing drupal.org infrastructure team already has knowledge of bzr, and has agreed internally to move drupal.org to it.
  • Bazaar is headed up by Canonical, which is likely not going anywhere anytime soon. Good for future-proofing.

Git

  • Much higher percentage of Drupal community members have familiarity with it, which will help ease transition headaches, which may be considerable since there are not familiar "anchors" with Git as there are with Bzr. This advantage should not be under-stated.
  • Existing work has already started to integrate Project* modules with Git: http://drupal.org/project/versioncontrol_git
  • Metrics show a general trend towards Git being the leader in the distributed VCS space with a huge, thriving community. Good for future-proofing.

Bzr vs. Git Smackdown at Drupalcon SF

The best way to figure out which of these two options would be the best fit for our community is to actually see with our own eyes. Drupalcon SF offers a tremendous opportunity for this, with the added advantage of having all the major players in place to lead sessions and make a final decision. Therefore, what we'd like to organize are Git/Bzr "info sessions" where people knowledgeable in the technology sit down with volunteers and observe and take notes, so we can try and get a sense of

So please help by filling out this stuff:

  • Drupalcon SF attendees who are willing and able to lead these kinds of "info/tutorial" sessions:
    • Bzr: Senpai, matt2000, NAME
    • Git: sdboyer, scor, ceardach, sirkitree, NAME
  • Drupalcon SF attendees who are willing and able to be Guinea pigs, mainly those with knowledge only of CVS (or CVS and Subversion): <-- yes?
    • Core developers: webchick, beeradb, NAME, NAME
    • Module developers: webchick, NAME, NAME, NAME
    • Theme developers: NAME, NAME, NAME
    • Site builders who manage deployments through version control: webchick, beeradb, NAME, NAME
    • Wtf is a patch? What's version control?: Elijah Lynn, NAME, NAME
  • Drupalcon SF attendees who are willing and able to act as "clipboard" folks, gathering data during these sessions:
    • webchick, NAME, NAME, NAME

Then we need to brainstorm stuff like:

  • What criteria are the clipboard folks looking for?
  • What documentation needs to be prepped ahead of time?
  • Who's going to prep it?
  • What infrastructure needs to be prepared to perform this testing?
  • Who's going to prep it?
  • What sort of data are we hoping to gather from this session?
  • What sort of infrastructure do we need to capture it?
  • Who's going to prep it?
  • Other...?

Why are we doing this?

Things about CVS that currently suck:

  • It's an archaic system that no one in their right mind chooses as a version control system today, and thus it's a big barrier to entry because new contributors must learn it "specially" in order to contribute to drupal.org
  • Things that should be really simple (submitting changes that add/remove files, renaming files, etc.) are absolutely horrendous in CVS.
  • It's really difficult to "chase HEAD" since CVS merge tools totally suck, making contributing major changes to Drupal core incredibly painful when there are constant re-rolls due to whitespace changes in other places.
  • It's difficult for patch authors to have discipline to keep changes to one "context", resulting often-times in "mega-patches" that are impossible to review.
  • We lose the incremental commit history that happened on said "mega-patches."
  • It's difficult to share experimental code and encourage others to improve it. <-- this is a d.o issue, not a CVS issue
  • The credit on commits (as in cvs annotate credit) currently goes to the committers of the code, not the people who actually wrote (and reviewed) them.
  • It's impossible with CVS to do much of anything while offline, because even a 'diff' operation needs to "phone home" to the parent server. Lots of us are on planes a lot of the time.
  • In general, the lack of modernization of our contribution tools are causing people to move to other places like Launchpad and GitHub, thus fragmenting both our community, and the strength of Drupal.org as the central hub to find any and all Drupal stuff. <-- both a d.o issue and a CVS issue.
  • Security in CVS sucks. pserver's encryption is sub-standard, and only a small handful of people (mainly core maintainers and infra team) use public-key encryption.

What don't we want to lose?

  • Drupal.org being the central collaboration hub for Drupal core and contributed projects: code, reviews, and discussions take place here, where the entire community can participate and learn from them.
  • The "incremental changes posted to an issue queue where they then get peer review" workflow allows people who are non-developers to participate, and for reviewers/maintainers to see incremental progress instead of the whole thing at once.
  • Keeping only one canonical "target" for each project in all of Drupal, for the entire community to buzz around. This makes peer reviews much easier, since the way to test changes is always the same, and keeps everyone collaborating on the same stuff, not a fork of a fork of a fork of a....

Things that we've ruled out

  • Moving to $better_centralised_VCS, such as Subversion. The amount of work required to pull this off is substantial; we need to make sure that we're set for another 10 years, and distributed VCS is clearly the way of the future.
  • Any non-free (as in beer and freedom) VCS.
  • Pretty much anything but Git, Bzr, and Hg at this point, unless someone can make an extremely compelling case for something else.
  • Mercurial. There just aren't enough existing community members who know it, nor do the folks on the infrastructure team have a background there. Sorry, Hg!

Concrete to-dos for moving Drupal.org to $distributed_VCS

Documentation/Training

HEY! You there! Put your name here. :D


Git: yhager, mike booth, sdboyer, mig5, marvil07, lut4rp, scor, ceardach, reglogge, kyle_mathews, bdragon, asimov, hugowetterberg, tobiassjosten, fago, DamienMcKenna, Frando, Will White, Jeff Miccolis, psynaptic, anarcat (willing to help with CVS -> git migration), slantview, EclipseGc, VoxPelli, gordon, schoobidoo, Pisco, Sean Bannister, mikl, jrglasgow, stepmoz, Jeremy, seutje, corni, ben-agaric, sirkitree, chachasikes NAME
Bzr: David Strauss, Peter Wolanin, Narayan Newton, Josh Koenig, bdragon (again), chx, NeverGone, a_c_m, Nuno Veloso, cha0s, Heine, Garrett Albright, Emma Jane Hogbin, TBarregren, dixon_, frjo, ximo, matt2000 NAME
Hg: Heine, mikey_p, mcrittenden, NAME

  • Compile a list of our current "use cases" around version control: (DONE?)
    • Collaborative patch authoring: Huge changes like form API, field API, core themes, etc. that are more-or-less equally authored by multiple people.
    • Primary patch authors: There's primarily one person who's "in charge" of a patch, but is taking contributions from other authors.
    • Patch contributors: Contribute more minor things like code-standard compliance, spelling corrections, etc. to a primary path author.
    • Patch reviewers: Eyeball the code, take a change from the issue queue and ensure that it's properly working, post back their results.
    • Contrib module developers: Handle primary development of module, pull in patches from others, maintain branches and releases.
    • Contrib theme developers: Handle primary development of theme, pull in patches from others, maintain branches and releases.
    • Co-maintainers: Help maintain a module/theme, but often checking first with the primary author before changes are made.
    • Users: Run their Drupal site on code directly checked out from d.o repositories, they update to the latest release tag and rebase patches on top of that.
    • ____ (other use cases?)

Comparisons

Compare Git/Bzr/Hg on the following items (see also DVCS-specific pages for Git: http://groups.drupal.org/node/48843, Bazaar: http://groups.drupal.org/node/48848, and Mercurial: http://groups.drupal.org/node/48853 for more detailed details). Comparison charts git-hg-bzr: InfoQ dvcs-guide, May 2008

(Note: Some of these might be "Well all of them do that" and that's cool, then just mention it there; I was trying to pull from further down the wiki.)

  • Community: How big is the pool of developers working on extending and maintaining $vcs? How's their IRC channel for asking questions? Do we have a contact who would be willing to help us with the technical side of the migration?
  • Prominent users of the technology: What are other open source projects or other big players using $vcs?
    • Git: Linux Kernel, X.org, Perl, Ruby on Rails, jQuery, Debian, Gnome, Fedora, QT, KDE, Android, VLC, Wine, Facebook, MongoDB (wikipedia)
    • Bzr: Launchpad, Mozilla, MySQL (and derivatives like MariaDB), emacs, Squid, Mailman, Ubuntu packaging, Drizzle, Debian Apt (wikipedia) (official)
    • Hg: Python, Mozilla, OpenSolaris, Java/OpenJDK, Adium, Google Code (wikipedia)
  • Features: What cool things does $vcs offer that neither of the others do?
    • Git: Interactive rebase (see also here)-- hack on several things at once, commit to a branch on your local machine whenever you want in random order, then reorder and edit the changes into a smaller (or larger) number of logical, sensible patches before sending the results to anybody else. No need to plan for this in advance; no extra plugins required. interactive add and commit allows for committing only chunks of a file. Supports cherry-picking.
    • Bzr: Why switch to bazaar? bzr also supports a rebase functionality, although it requires a plugin. Also, I believe bzr does not support interactive rebase as does git, but perhaps someone can fill that information in?
    • Hg: Mercurial has a large number of available extensions and includes the excellent MQ or Mercurial Queues which excels at handling dependent patches (similar to stacked git, or quilt) as well as an included RebaseExtension with a collapse option similar to common usage of git rebase
  • Use cases: How would $vcs specifically help/hurt/be neutral towards the use-cases defined above that the Drupal community has?
    • Git: Webchick provides a possible scenario that could be used to handle the use cases (for Git, Bazaar and Mercurial). Git also allows for keeping the current workflow intact for a period of time to ease the transition.
    • Bzr: Bazaar allows module maintainers to use the familiar checkout/update/commit workflow popularized by CVS and Subversion while providing the standard distributed capabilities as an option. This allows Drupal to adopt a single VCS tool while lowering barriers to contribution.
    • Hg: Mercurial emphasizes simple commands that map nearly 1-1 with SVN or CVS. No staging area is used, and most commands give helpful feedback about next steps, such as 'hg merge' or 'hg pull'. Most workflows do not require any switches on commands for common tasks.
  • Access control features: How well can we 'lock down' access in each of the solutions? Can we do things like maintainer/co-maintainer relationships? Can we block commits that contain "black-listed" stuff, such as security holes and LICENSE.txt?
    • Git: Basic and global access control can be implemented with using different protocols (ssh, http/https, git) for accessing a central repo. More finegrained access control (down to controlling access to directories on a per-user basis) is also possible, through "hooks" (executables that can be written in any language).
    • Bzr: Out of the box, Bazaar uses POSIX permissions and ACLs on a branch-by-branch basis, which maps well to Drupal's maintainership model. Bazaar supports Python-based pre- and post-commit hooks.
    • Hg: Utilizes basic authentication using the host protocol, i.e. ssh, or http/https auth. Allows custom scripts to be implemented for many hooks, using Python.
  • Migration path from CVS: How do the migration tools fare with each option? Links to tools that we can use to help make this easier?
  • Speed/Size: How fast is it? How much space does it take up relative to CVS?
  • Scriptability: From both our (drupal.org) point of view, and also from a contributor's point of view, how much flexibility is there in making the tool do other things? Links to resources? (For maximum in-PHP scriptability, we're going to want an analogue to svnlib - fortunately, not a very difficult task)
    • Git: Git exposes the entirety of its guts for use in system calls (which is why it comes with some 140+ commands). Many of these commands are designed to interact with each other via stdin/stdout. In other words, it's built to be scripted on. Some people use git to manage deployment of dev-staging-live servers Example1 or "versionize" an Aegir-Drush-make workflow Example2
    • Bzr: Bzr has a robust API (in Python) that can be easily extended with highly-portable, easily installed and distributed plugins.
    • Hg: Mercurial has a robust number of [[http://mercurial.selenic.com/wiki/Hook|hooks]] that support scripting with Python.
  • Credit tracking: How well does $vcs do with tracking who gets credit for a change, where that change came from (issue queue #) etc.

    • Git: Each commit contains the name, email and date of both the author of the patch and who committed it, and it also allows for a 'Signed-off-by' field (see example). Issue numbers should go in the commit message, and can be enforced through commit hooks.
    • Bzr: Bazaar associates every commit with a name and email address-type field. Bazaar treats issue numbers as first-class fields for commits. For the ultimate in attribution, Bazaar supports optional or required digital signatures on a branch-by-branch basis.
  • ____ (Other criteria?

Comparison of issue trackers on other Open Source projects

Note: While these statistics are interesting, note that a remotely-hosted service such as GitHub is off the table, because it violates the #1 thing we don't want to lose: keeping drupal.org central collaboration hub for Drupal.

Project Repo Bug tracker Notes
Linux kernel gitweb
http://git.kernel.org/
bugzilla
http://bugzilla.kernel.org/
mostly patch + email based workflow
http://www.wlug.org.nz/KernelDevelopmentWithGit
http://lwn.net/Articles/160191/
http://kernelnewbies.org/UpstreamMerge
Ruby on Rails github lighthouse
patch-based workflow
example ticket with patch
link to related changeset
JQuery github trac
merged fork example ticket
merged patch example ticket
KDE gitorious
migration in progress
bugzilla
merged fork example ticket
merged patch example ticket
KDE git migration docs
KDE git tutorial
GNOME cgit bugzilla
CAKEPHP github lighthouse Bakery
SproutCore github tasks tasks source
MySQL launchpad bugs.php.net variant
mailing list/patch based
example ticket linking to commits
commits mailing list
Mailman launchpad launchpad
both patches and branches
ticket example with patch

Drupal.org integration

(This part still needs major exploration work done, to determine what exactly we want, and what work is actually involved. It's also the thing that's going to hold everything else up, so the sooner someone jumps on this the better.)

"Phase 1" probably just looks like keeping our existing workflows in place (i.e. sharing changes via patches in the issue queue, hosting only the "canonical" repository for each project), and replacing CVS with $vcs. This requires research into the following:

  • Evaluate feasibility of moving off of Project* modules in favour of $vcs issue tracker/source code integration tool.

    • Gitorious (Git)
      • Pros: __________
      • Cons: ____________
    • Bzr: Launchpad (Bzr)
      • Pros: __________
      • Cons: ____________
    • I posted a comparison that wouldn't fit here in a comment below. --David Strauss
    • _________ Other? Neutral?
      • Pros: __________
      • Cons: ____________
    • Existing infrastructure that would need to be modified to work with $vcs tool
      • Testing bot
      • _____ (lots and lots of things, I'm sure...)
    • Bad stuff
      • We'd end up forcing our community through two significant learning curves at once (new VCS, new project management tools), which should definitely not be under-stated.
      • In addition to the already significant documentation requirements for simply the VCS move, we'd probably sextuple those having to re-write all of our existing documentation that makes reference to how to use the issue queue, project management tools, etc. (otoh, we could probably externally link to a lot of it)
      • _____ (lots and lots of things, I'm sure...)
    • Good stuff
      • Not having to port project* module every time we want to upgrade durpal.org. :P
      • _____ (lots and lots of things, I'm sure...)
  • Identify and document places where in our current scripts, etc. there are hard-coded assumptions about CVS. For each item, identify the piece of code currently responsible for performing this job, determine if there are CVS assumptions, and if so create a set of issues for discussing/tracking these, and link them here.

  • Determine logistics for how $vcs replacing CVS actually looks:
    • What does the new directory structure look like in core/contrib?
    • How do we manage "official" releases (tags) vs. development releases (branches)?
    • How do we facilitate (or don't we, in phase 1) "spooning" of code?

Stuff we need to do /after/ we have chosen a $vcs and have the rest of the stuff above in progress

Documentation/Training

  • An equivalent of http://drupal.org/handbook/cvs/quickstart for $vcs
  • FAQs/Troubleshooting/OMG I BROKE IT HELP!!! docs
  • Screencasts on how to use it.
  • Scheduling of "info sessions" on IRC (or Skype, or whatever) for $vcs brigade to train existing contributors on the new system.

Future feature requests

  • When patches are uploaded to d.o, do automatic generation of tarballs with project + patch applied already, to facilitate reviews by non-technical users. (Feature request posted here: http://drupal.org/node/707526)
  • Crazily pimping out drupal.org to take advantage of more advanced $vcs features, e.g. feed from commit logs on forked branches to the issue queue, create and post patch automatically to issue with a button, etc.
  • Exploring alternative core contribution workflows, e.g. "authorized" branches for specific features like fieldable user profiles in D8 core, with someone "deputized" to accept changes.
  • Browser-based editing to files in repo leading to automatic patch creation, for designers + commits via the issue queue/web UI <-- /really/ not sure of this idea....
    • This is really appealing in my mind, since it blows away the barrier to entry for the vcs, so devs can commit easily. ao5357

(Props to at least Benjamin-Melancon, Lizzard, walkah, yhager, JohnAlbin, dbabbage, catch, jensimmons, hefox, and whoever else helped work on this. :))

Edit: I've removed all +1 comments. Please, no more

Comments

Um. Seriously? I've both

David Strauss's picture

Um. Seriously?

I've both explicitly and implicitly put myself on the line to be the one doing since long before this thread began. And I'm not the only one. I don't know how you missed that.

Yes, seriously. By "exists to implement it," I mean the necessary team, not a couple people for a sprint or three (which is what I would expect needing for the other options I mentioned). We don't have a multi-person, full-time development team dedicated to spending years building the functionality, and that's what it took to make GitHub and Launchpad. Either the developers of those platforms are incompetent (and they aren't), or building an effective online tool to manage DVCS workflow is hard. I'm not only wagering on the latter, I've seen the code for Gitorious and Launchpad, and it's not trivial.

In my opinion things like

Pisco's picture

In my opinion things like GitHub, Gitorious, Launchpad and the like are out of the question, do not foolishly jeopardize what we have here at d.o! webchick pointed out very clearly

I personally am not willing to trade this awesome community even for the very best issue tracker/project manager/wiki planning tool/translation/security tool in the world

and I agree with it. Stop bringing this up over and over again please.

It should be possible to adapt the existing infrastructure to the new VCS, be it Bazaar or Git.

Stop bringing this up over

David Strauss's picture

Stop bringing this up over and over again please.

You clearly did not read my post.

That's a key clarification -

sdboyer's picture

That's a key clarification - and I've thought as much myself. Given that, the reason I'm nevertheless reasonably confident that we can get something together is because I don't have a huge featureset in mind. Just the baseline architecture necessary to replicate our current featureset (which is still nontrivial, but not insane), and from there, potentially the addition of discrete features. Underlying tracking of issues in the VCS would be the first thing on my list. I don't expect that what we have will be able to rival any of those other systems on a feature-by-feature basis - certainly not initially, and possibly not ever. And I'm OK with that - so long as we're preserving that which we've identified as being essential to our community.

Thank you

Pisco's picture

Thank you for saying what I didn't dare because I'm not one of the 7 or 8 people.

Thoughts...

webchick's picture

First off, thanks for clarifying the summary. My brain must've been starting to futz out a bit there, I'm not sure why I added that bit, since you're right that it's obviously not true. Sorry about that. :( I'll go and edit my response accordingly.

To respond to you, though...

From any realistic perspective, those features and that level of integration will never happen if we continue using Project*.

Yep, can't argue with that.

How long have we been linking "subscribe" in issue replies to proposals for better "follow" systems?

For a long time. :) http://drupal.org/node/34496 shows a post date of October 2005. According to http://3281d.com/2009/03/27/death-to-subscribe-comments it'd cost us about 5,000 bucks.

How long did Drupal.org wait, with Project* as the final thing holding back the upgrade, to move to Drupal 6?

About a year extra. Though, I do want to point out for accuracy that this was largely due to Views being slow to port, and the maintainers' firm stance that the D6 version of Project* must be ported to the same tools (Views) the community is using in order to become more generally useful and bring on some extra maintainer muscle. This hard work is now done, and for D7 afaik they're planning a straight port, with migration to Field API after for the same reasons. But still, the point stands that this would be completely off of our plate if we moved to something like Launchpad.

It's OK to say, "Launchpad is confusing" or "we're worried about maintaining a Zope system" or "the migration effort would not preserve our data well" or "we like our existing system more," ...

I think it's all of those things. But most importantly...

I'm concerned that people who aren't doing the work are influencing this too much. Very few people maintain the current Project* system or worked on the last upgrade of Project* to Drupal 6. In the spirit of our "do-ocracy," I don't like the idea of people outside that group deciding what's "worth the effort" here any more than equivalent people deciding whether adding transactions to core is "worth the effort" or the Field API conversion is "worth the effort."

BZZZT! No. Wrong. Absolutely, 110%. Were Drupal a "regular" software project, I would totally agree that it would be well within the rights of the infrastructure manager to say "We're using Jira. Suck it." But Drupal is anything but a regular software project, and therein lies its immense strength, and something we must never screw with, above all else.

The Drupal.org issue tracker is the Drupal community's "community plumbing." It's the central tool of communication and collaboration, not only among developers, but also among end users and developers (and designers and usability experts and ...). This is the very essence of what makes our community what it is, and having it look/act/feel seamlessly like the rest of the site is absolutely key to its success. I personally am not willing to trade this awesome community even for the very best issue tracker/project manager/wiki planning tool/translation/security tool in the world, and my very strong feeling is that if you lose the integration of our issue queue into the rest of our site, you've doomed our community to behaving like those of "regular" software projects, where the devs are devs and the end users are end users and only in the case of a highly technical end user who's a bit brave do they ever meet.

Also, when you talk "do-ocracy," please realize that every single name in this thread is part of the much bigger "do-ocracy" that makes our community so immensely fricking awesome. And we do this using the tools that Project* provides, and so we are all well within our rights to voice concerns about initiatives that are going to mess with our day-to-day workflow and make making Drupal more awesome more challenging for us. As frustrating as that might be to watch, knowing that this path would be so much more efficient, the community trumps all.

But by all means, farm out to "third parties" our source code browser, our mailing list manager, our testing framework, our JavaScript framework... I'm 100% with you that "eating our own dog food" has its limits in terms of sanity. But when we start talking about changes to the base fundamentals of our community that will literally affect every single person on Drupal.org, we need to tread extremely cautiously. And personally, I don't think we'll ever get the community-wide buy-in that a move like this would take...

This is the very essence of

David Strauss's picture

This is the very essence of what makes our community what it is, and having it look/act/feel seamlessly like the rest of the site is absolutely key to its success. I personally am not willing to trade this awesome community even for the very best issue tracker/project manager/wiki planning tool/translation/security tool in the world, and my very strong feeling is that if you lose the integration of our issue queue into the rest of our site, you've doomed our community to behaving like those of "regular" software projects, where the devs are devs and the end users are end users and only in the case of a highly technical end user who's a bit brave do they ever meet.

That's perfectly reasonable, but I needed to hear that argument specifically. Up to this point, the sense I've gotten is, "we would leave Project for the right capability on another system." If this debate isn't about raw capability, I'm wasting my time trying to debate the feature set and transition methodology for something like Launchpad. If that's how you and others feel, I'm happy to leave this discussion there.

Not the first time that's

sdboyer's picture

Not the first time that's been said here.

Not the first time that's

David Strauss's picture

Not the first time that's been said here.

Sure, but are you disputing it? I'm as interested as you are in having the full capabilities of DVCS on d.o, but the most I can give every year to Project* and related tools is a few weeks. Based on my casual observations, even that puts me in the extreme of project infrastructure contributors. But I'd rather not even do that every year because there are more interesting things to work on than cloning features from GitHub on d.o.

bias and advantage with our own tools

Gábor Hojtsy's picture

I'm concerned that people who aren't doing the work are influencing this too much. Very few people maintain the current Project* system or worked on the last upgrade of Project* to Drupal 6. In the spirit of our "do-ocracy," I don't like the idea of people outside that group deciding what's "worth the effort" here any more than equivalent people deciding whether adding transactions to core is "worth the effort" or the Field API conversion is "worth the effort."

While you communicate that it would be the maintainers of tools like Project*, localize.drupal.org, etc. who could have best bets when making decisions, its hard to argue that these people would be the most biased as well. When I started to develop the tools which form the base under localize.drupal.org, launchpad was not open source (not even any attractive to begin with), and other tools were even worse. I set out to build a tool much better suited for localization, and I believe it is for the benefit of Drupal to be developed this way. It gets bugs fixed in projects due to its unique reporting of localization API parsing errors, development of the site got bugfixes into organic groups, og_user_roles, and now getting new activity module features, which will be reused by actual Drupal site builders. The Project* porting got new features into Views, and stretched its boundaries into ways useful for all advanced Drupal users. So these developments do good for our community and ecosystem.

Still, they require more effort to develop then just using an external tool, you are right. I'm committed to improve our localization system, and would obviously not be happy to switch to a different system which in some regards is substandard to what we have now. We are hard at work on the DRUPAL-6--2 branch of l10n_server for example, which already implements this UI: http://localize.drupal.org/node/258 Is launchpad's user experience tested and Canonical funded experience superior to that? Localize.drupal.org for example also supports a remote interface allowing people translating their Drupal sites to contribute real-time. How do we do that with launchpad?

Finally, we obviously lack certain important features. For localization, we do not yet have suggestions based on previous strings. However, what we can do is to improve constantly (because the software is not in a language/framework we don't know but our beloved framework), so we can roll out new features as the need arises, such as group level permission control: http://localize.drupal.org/node/616 That we'd need to go to Canonical with feature requests or jump on the learning curve for a new language/framework is not to be under estimated IMHO.

BTW once again as I said above, though you want the opinion of those involved with the existing tools, our opinion will be biased by nature.

That's excellent feedback,

David Strauss's picture

That's excellent feedback, thank you. I hadn't considered how much our Project and localization work had forced other modules to improve, and that's a valid consideration.

I'm honestly not sure how Launchpad's i18n tools compare to the l.d.o ones. I do know Launchpad Translations are on version 3.0 of the code and interface, so it's possibly improved substantially since you last checked it. No interface on LP has evolved as much as the translation one.

If we did use it -- which look like it won't be the case -- we could integrate translation submission from Drupal using the Launchpad API. Nor would using Launchpad i18n tools be a requirement of using the other parts.

patch workflow FYI

adrian's picture

Currently we expect patches on the issue queue to be generated by CVS, which doesn't prefix the paths with a/, b/ (which is actually how diff works out of the box).

So when creating patches with bzr/git , we have had to specify the --no-prefix options to maintain compatibility with this, and furthermore because patch expects the default output of diff when applying patches, we have had to use 'patch -p0' to apply the patches created by cvs and git/bzr with the no-prefix options.

Once we switch to bzr/git , we will no longer need to specify the --no-prefix option and therefor we will no longer need to use 'patch -p0' either.

Both the creation of patches and the application of them will require fewer command line options by bypassing CVS's kinks.

Bazaar has never put prefixes

David Strauss's picture

Bazaar has never put prefixes on patches by default.

addendum

adrinux's picture

It's worth noting that even if someone forgets --no-prefix you can apply the patch with -p1. The prefix really isn't a huge deal if you even have a slight clue what you're doing – I managed to figure it out after all.

FWIW with Mercurial I strip

mikey_p's picture

FWIW with Mercurial I strip these out with filterdiff.

I have a final word in

gordon's picture

I have a final word in this.

git bisect - http://linux.die.net/man/1/git-bisect

This allows you to when trying to find an introduced bug by telling git when the bug didn't exist and where the bug exists, and then git will use a basic bisecting method to do checkouts and then you check the version you are out for the bug and tell git if the problem exists.

Where I am working ATM, we needed to track down the commit in a couple of minutes which introduced the bug. We did this from 256 commits that we needed to search and we were able to find the bug.

This is a feature I didn't know about but it is invaluable when you need to locate a bug.

Also this feature can be automated so we could use a simpletest test (with a bit of work) to do the bisecting and testing automatically to find the bug.

Gordon.

--
Gordon Heydon

Okay. Bisect has a serious

David Strauss's picture

Okay. Bisect has a serious flaw, though: it assumes, as any binary search does, that the set is sorted. In this case, that the bug doesn't occur before one revision and always occurs after it (or vice versa). If that constraint doesn't hold, then bisect cannot deliver reliable results.

Both have it

yhager's picture

Bisect has a serious flaw, though: it assumes, as any binary search does, that the set is sorted.

That is true. Bisect is just a tool to automate the binary search we are doing when looking for a specific bug. Although it looks like a redundant tool, I found it very powerful and a huge time saver.
If you give bisect a set of revisions that the bug you are looking for comes and goes in the set, or it's symptoms does, bisect will be confused - but so would you, if you've done it manually.
Even if this is the case btw, you would still find one commit that introduced the bug, even if it is not the only one.

bisect is just a cool feature. It is not widely used IMO, but when needed, it can save a lot of time.

Anyways both git and bzr have it (git preinstalled, bzr as a plugin) - another commonality :)

But git repositories the

gordon's picture

But git repositories the commits as always in order, so tracking down errors that occurred between commit x and y will always work.

--
Gordon Heydon

No choice yet?

BartVB's picture

As far as I see it there don't seem to be any really significant differences (in a technical sense) between Git and Bzr, I can't count "git commit -a" vs "bzr commit" a significant difference :) Sure, both have their pros and their cons but most of the people in this discussion seem to be rather entrenched, I guess partially because of the time/energy/money/code they already invested in their VCS of choice.

As I see it the main conclusion is that Git and Bzr are the main contenders with a large enough userbase within the Drupal community.

Now the next step is defining how exactly $vcs is going to be implemented. Is this going to be a two step process? Are we going to use external tools? What needs to be done to get all this going?

After that I guess this will either become an arms race (which 'faction' will complete these tasks first/best) or a decision of 'the brass' (Angie, Dries, ...?). But I don't have the elusion that a choice for $vcs will be the result of this discussion while David keeps coming up with excellent rebuttals for pro-Git arguments :)

It seems like this discussion

voxpelli's picture

It seems like this discussion has reached the conclusion that Drupal should move to either Git or Bazaar - two tools which has more similarities than differences.

Could this discussion then perhaps be summarized, shut down and then resurrected in multiple new discussions with a specific focus and goal?

Right now it takes an effort just to follow this three page threaded comment monster - and even more to actually try and contribute with some reasonable thoughts, research etc.

With my current workload I see no way of being able to contribute as the discussion is now (and I'm sure I'm not the only contributor that's currently on a tight deadline at work) - if the discussion was split up into more manageable parts I could perhaps pick the one I feel I could contribute the most to and follow just that one.

git rebase warning

chx's picture

When you rebase a branch, you are changing its history in a way that will cause problems for anyone who already has a copy of the branch in their repository and tries to pull updates from you.

While git rebase is touted as a great feature in here, it actually can result in this and I am sure given the number of developers involved it will happen. Another example of git being too powerful -- yes that can be and is a problem. Also, it might be that you want to clean up your commit history but I might actually want to see the struggle, the mental process you went through to get to the shiny results. Ask a scientist whether they burn their notes once they published their results.

that's why you definitly want

CorniI's picture

that's why you definitly want to disallow non-fast-forward pushes to the d.o repos except in special cases where you can override it, so that for example you can remove the settings.php with your secret passphrase from the repo, or that 10 MB log file your debug script created.

see #130773 and the following

fago's picture

see #130773 and the following for comments already discussing this

The analogy with the

Pisco's picture

The analogy with the scientist is quite good actually! Think about what a scientist does once he has found out something: he think carefully about what he just achieved, he orders his notes, write a paper and publishes it so that everyone else can understand it and benefit from it ... hello Git rebase!

Of course, if you use your tools the wrong way, you mess things up, but that's true for every tool and not a particularity of Git. I use rebase in my day to day work, I work in different teams and I have never had problems with Git rebase.

The analogy with the

David Strauss's picture

The analogy with the scientist is quite good actually! Think about what a scientist does once he has found out something: he think carefully about what he just achieved, he orders his notes, write a paper and publishes it so that everyone else can understand it and benefit from it ... hello Git rebase!

I know you're a rebase fan, but that analogy breaks down pretty fast:

  • Product of the scientist's work: a published paper
  • Product of the developer's work: a carefully segmented set of revisions and commit messages?

More realistically:

  • The scientist
    • During work: keeps a lab notebook (only appended to and archived following the work)
    • Product of work: a published paper
  • The developer
    • During work: commits revisions to a repository (only appended to and archived following the work)
    • Product of work: polished code integrated into the mainline

This is more accurate because a scientist does not write the paper throughout the course of doing the research and merely reorganize it at the end for publication.* For developers, revisions are a digital lab notebook for the development process. Just as it's inappropriate for a scientist to go back and revise her lab notebook to imply a more streamlined and orderly history of performing the research, it's bad for developers to create a contrived history of their development work.

I worry that part of the interest in rebase is ego-driven. Developers want their work to appear super-human, as if they develop their code without missteps or having to discard work and re-factor. I think the open-source development community would be less daunting if the abundant flaws of the "rockstars" were as easy to see as the flaws of the beginners.

*Everybody knows a proper academic paper is gruelingly prepared just before the submission deadline for the journal or conference. The director of an computational RNA sequence alignment lab I worked in for a while would joke that those were the nights you email any biologist in the world and get a reply within 10 minutes.

You forgot to show where the analogy breaks

Pisco's picture

So and where exactly does the analogy break?

  • Product of the developer's work: a carefully segmented set of revisions and commit messages

A carefully segmented set of commit object, actually, which get even more carefully segmented before pushing them to the public repository. Where exactly does the analogy break?

I really don't understand were you're heading. The Linux kernel is maintained with Git, does it work for you or not? They actually use rebase, that's why they implemented it. The same is true for all of the many other prominent software projects maintained in Git ... the list is very long and by a magnitude longer than the one of Bazaar. Of those many big open source projects, how many have you heard of having troubles with Git? If heard you talking about Git without myself knowing it, I would not even think about trying it.

The digital lab notebook of the developer, and we're talking about the notes for one very defined feature, gets cleaned up of publication, after that, they're not touched anymore. And publication here means publication in the publicly accessible and promoted repository.

Do you use rebase on a daily basis? Have you ever used it? If so, have you ever had problems with it? If so, did that problem arise because you messed with it or because rebase is truly an inherently broken? Or are you just building castles in the air of how it could be if you used a tool you don't really know?

From what I understand you know Bazaar really well, you know where it's good at and hopefully where it's not so god at. Your domain is Bazaar. Do you really know Git that well to be able to tell where its strengths and weaknesses are? From what I read here, most people are either Git or Bazaar experts and able to point out the pros and cons of a tool they really know. You seem to be the only one picking on the VCS that happens not to be you're preferred one and furthermore you call every Git feature that was brought up a flaw. I find that neither trustworthy nor professional.

Last year I helped migrate a project, one that had been going for over a year, from Subversion to Git. I still work with Subversion a lot, I also work a lot with Git, I use Git for all my private stuff. I do not know Git as well as others do, but I know it quite a bit. I can't say the same for Bazaar, but then I did not once purport that any of Bazaar's features was flawed, broken or dangerous.

I'm done talking with you.

David Strauss's picture

I'm done talking with you. You don't fully read my posts that you're replying to, and clarifying how you've failed there has become tiresome, especially because you've resorted to branding me as untrustworthy and "unprofessional." You seem only interested in promoting git on the basis of its rebase support (which is irrelevant to our collaborative workflow) and git's popularity, which has already long been (1) acknowledged and (2) discounted as primary decision-making factor for Drupal.org. Moreover, though all the posts I've read from you on git, I've gained remarkably little knowledge over what I had entering this discussion.

I'm sorry but I have to say

adrinux's picture

I'm sorry but I have to say Pisco has a point, far to many of your posts in the discussion have contained anti-git FUD. You've made it stridently clear from the outset that you know and like Bzr. Focusing on telling us why Bzr is good is educational, but all the anti-git stuff implies that those who've already chosen to work with git are wrong/ignorant/stupid – it rankles – little wonder people start throwing about words like "unprofessional".

As a former molecular microbiologist I also have to say Pisco's interpretation of your lab book analogy makes more sense. Old notebooks may be occasionally be historically interesting but they rarely do little to push forward the boundaries of science because they so quickly become obsolescent.

Clearly you personally prefer a warts and all historically interesting commit history. Other people don't. It's a matter of personal preference, not a flawed model.

I'm sorry but I have to say

David Strauss's picture

I'm sorry but I have to say Pisco has a point, far to many of your posts in the discussion have contained anti-git FUD.

For my posts to contain FUD, they would have to literally promote "fear, uncertainty, and doubt" about git. (If your usage of "FUD" means general lobbing of criticism, I'm afraid you're wrong by definition.) While I'm highly critical of certain workflows that are popular among git users -- which are encouraged but not mandatory on git -- I have expressed my criticism in concrete terms that do not have the vagueness implied by calling it "FUD."

You've made it stridently clear from the outset that you know and like Bzr.

I don't see a problem there. It certainly wouldn't benefit anyone for me to feign disinterest. Even then, in response to a post that mentioned rebase on git and Bazaar, I went out of my way to point out that Bazaar lacks interactive rebase support. And I've made the limits of my knowledge clear, like when I was only in a position to explain Bazaar's capability without comparison to git's, as for line-ending normalization.

Focusing on telling us why Bzr is good is educational, but all the anti-git stuff implies that those who've already chosen to work with git are wrong/ignorant/stupid – it rankles – little wonder people start throwing about words like "unprofessional".

I'm sorry if people take my criticism personally, but I don't think it's my problem if reasoned objection to someone's VCS tool or workflow gets taken that way. I know I've had to get used to it: for years, git users have published articles criticizing Bazaar's performance on offensively contrived tests that are (1) highly uncommon operations, (2) known to run fast on git's architecture, and (3) known to run slow on Bazaar, like listing the first 10 revisions in a branch with very long history. Such work is disingenuous, yet the idea that the people behind those articles are calling me "ignorant" or "stupid" never crossed my mind.

As for discussing only the advantages of each tool without criticism of the other, I don't see how that helps us make an informed decision. Accordingly, you don't get points in my book for not posting criticism of Bazaar. Please post it; I don't want this discussion known for lacking it.

As a former molecular microbiologist I also have to say Pisco's interpretation of your lab book analogy makes more sense. Old notebooks may be occasionally be historically interesting but they rarely do little to push forward the boundaries of science because they so quickly become obsolescent.

Clearly you personally prefer a warts and all historically interesting commit history. Other people don't. It's a matter of personal preference, not a flawed model.

In my linked blog posts, I've argued that even if you have a goal of delivering a clean, constructed history, rebasing isn't an optimal way to do so. You really want something more like Bazaar Looms or the Darcs model of partially ordered sets, maximizing the ability to organize coherent change groups.

About project

chx's picture

I know how much help dww and hunmonk gets. I know this very well because I contribute. Everyone hates crossposting and we (as in Dries himself IRL asked how it goes) really would like custom priority levels. And I am not mentioning subscriptions here, I am mentioning trivial issues that take years and noone ever helps. How going from here to Launchpad can be worse? I am fairly sure (and I will actually try to get confirmation of this) that the Drupal community -- given its sheer size -- will get quite a say of how Launchpad changes and there are a lot more resources Canonical can throw at problems than what we are apparently are able to.

Must we throw the baby out with the bathwater?

webchick's picture

I don't think anyone at all would dispute that our key infrastructure doesn't get enough contributions, and that we have long-standing, irritating bugs. But I guess I don't see why these hard lines in our choices are drawn between either sticking with existing tools with annoying bugs or scrapping the entire damn thing in favour of completely separate, out-sourced system.

Can we not think outside the box a bit here? I bet if we put our heads together we could come up with at least a few alternate ways of addressing this, including better documentation about how incredibly flipping easy it is to start hacking on project* module, or someone running for the current Drupal Association election on the platform of managing our Project* infrastructure, and securing an actual budget to pay folks for this work.

...or running for the current

David Strauss's picture

...or running for the current Drupal Association election on the platform of managing our Project* infrastructure, and securing an actual budget to pay folks for this work.

I've included that in my platform before. In 2008, the first time I ran (unsuccessfully) for the DA, I proposed spending DA money on professional services for infrastructure. In 2009, when I ran (successfully) for the DA, I argued in the budgeting process for spending on professional services for infrastructure. Even though some specific tasks were budgeted (Zip file support, etc.) and developers agreed to implement them, we haven't ended up with much done.

This appears to be a time issue. Even with funding, we're only in a position to fund a bit of extra work from existing contributors, and the people who can improve Project* seem to be equally capable of having their schedules overbooked with other work.

I would lean toward pursuing Google SoC funding, which would allow us to cultivate new talent to work on Project*. I'd be happy to mentor.

But why do you think it

adrinux's picture

But why do you think it didn't happen?
I suspect the developers that were able to do the work were not the ones needing zip files, so was it just not scratching their itch?
Were they just not excited by extending the whole cvs based system further?

What would you do differently if you had to do it again?

In the name of fairness

chx's picture

I was debating myself whether I should post this link http://www.gitready.com but it's not nice not to. This site makes you understand git. So the 'too complex' argument is down.

Windows support is still experiment, rebase is still a problem and launchpad would be nice.

Windows & Mac GUI support

webchick's picture

Reading over this thread, there's still one major item for me that feels a bit hand-waved around by both the $vcs brigades. And that's the topic of Windows & Mac GUI support (and to a lesser extent, Linux, although for the most part I'm not worried about Linux users taking care of themselves :)).

So to cut directly to the chase, I think what we need is for folks to write (or reference well-written) step-by-step instructions on:

  • Installing $vcs on Windows
  • Installing $vcs on Mac
  • Installing $best_free_gui on Windows
  • Installing $best_free_gui on Macs
  • How to perform all of the actions listed on http://drupal.org/handbook/cvs/quickstart with $best_free_gui
  • How to create and apply (if applicable) patches with $best_free_gui

These should be added as sub-pages to the Bazaar and Git quickstart guides. If you can't do something in the GUI, then go ahead and mention it there, too.

I realize that this is asking a lot, since we all know that only one team's effort is going to pan out. :( But I can't over-state how much we need this information to make the switch.

For starters, the Bazaar GUIs

David Strauss's picture

For starters, the Bazaar GUIs are bundled with the main installers. On Windows, this includes TortoiseBzr, Bazaar Explorer, and QBzr. The Mac installer comes with Bazaar Explorer and QBzr. Although you didn't ask about Linux, I'll mention that Ubuntu packages Nautilus (GNOME file browser) integration for easy installation.

I'll update the quickstart guide with more details.

And also, GitHub.

webchick's picture

While I think "phase 1" of this migration is pretty well fleshed-out -- just the bare minimum to replace CVS, both for expediency and for learning curve purposes -- it still remains to be seen what "phase 2" looks like. One of the major things that got me interested in spear-heading this discussion is the alarming amount of code that's starting to pop up outside of *.drupal.org infrastructure, primarily at GitHub. And to be honest, I'm quite terrified that if we don't follow-up phase 1 with a very strong phase 2 relatively quickly, switching Drupal.org to Git will only exacerbate this issue.

Is it possible for some of those GitHub lovers out there to elucidate on reasons why you choose to put your code over there than d.o? Is it that we need to promote the presence of http://drupalcode.org/viewvc/drupal/contributions/sandbox/ better? Is it some kind of wicked-awesome features? (and if so, what, and what would it take to replicate them?) etc.

Would appreciate your thoughts.

Development Seed

mcrittenden's picture

It would be nice to get the opinions of some Development Seed guys on this one, since they use GitHub pretty much exclusively for their Drupal contrib stuff. Any of you guys lurking?

I think 'exclusively' is

adrinux's picture

I think 'exclusively' is overstating it, I know Adrian didn't want to trust github with Aegir, that's hosted by koumbit.

Just as with myself and

Garrett Albright's picture

Just as with myself and BitBucket (where I don't currently host any Drupal-related projects, but am planning to in the near future), I suspect most people who use $vcs_host would rather be hosting their stuff on Drupal.org and be closer in with the community and such, if only D.o supported $vcs. It's not that the services of $vcs_host are all that better than what D.o provides, at least not to the extent that distancing our code from the community is worth it; it's just that they'd much rather use $vcs than CVS. I suspect that, if/when D.o supports $vcs anywhere near the extent that it supports CVS, most of the folks currently using $vcs_host will gladly move their projects back to D.o.

GitHub seems "friendly"

reglogge's picture

That's my main impression of the service. I've used it for my customized Drupal distro (mainly for booksellers and quality-wise not at all ready for Drupal.org, but I digress) which I've since moved to a paid hosting service (sourcerepo.com).

I started with GitHub because it was free and very easy to set up, with nice documentation and a good user interface. I moved to sourcerepo.com because they offered Redmine as an issue-tracker which I found much better than the one at GitHub.

What would be really worth replicating from GitHub is the ease and user-friendliness of setting up an account. You just have to give an e-mail-address and that's basically it.

And of course I would LOVE to host my stuff on Drupal.org, only that until now that seemed really intimidating (this page: http://drupal.org/node/59 scared the beejesus out of me!). I had never heard of the sandbox you link to, and looking at it, I could not find any documentation or guides on how to use it.

Done and done.

webchick's picture

http://drupal.org/node/59 recently went on a BIG diet. It's hopefully far less intimidating now.

On the topic of sandboxes, I recently discovered that page was horribly out of date and incorrect, and unpublished it. I updated it now to what I believe are the current rules, and that's now published here: http://drupal.org/node/773

Catch 22

reglogge's picture

It seems as if to get access to a sandbox, I first have to get a CVS account. And for getting CVS access, I need to "submit a finished, working module or theme for review along with your application" (http://drupal.org/node/59).

So no go for somebody like me who has some not really polished code to host it on Drupal.org?

Edit: This is just meant as an observation - not a call to action. Thanks for publishing the sandbox page, webchick :-)

Ah-ha. Right you are.

webchick's picture

Feature request: Proposal: Give a CVS sandbox-only account to any user who wants one

Let's see if we can't do something about that. :)

I'm using github because its

yhager's picture

I'm using github because its git.
I have not transferred my Drupal projects there, because I want to keep drupal.org in sync with github - and I still have no idea how to do that (commit hooks anyone?) automatically.

If drupal.org is git, I don't need github.
if drupal.org is bzr, I don't need github neither (nor lp).

I haven't yet put any code

adrinux's picture

I haven't yet put any code there but I do have an account and follow several OSS projects on Github. So:

It's free to host OSS – something d.org matches.
It runs git – that might sound obvious, but git is popular and Github is easier and cheaper than self-hosting.
It has a well designed UI – the drupal.org redesign should help with meeting that.
It provides pretty graphs of project stats (not sure how useful they really are) and commit history (similar to what can be seen in gui apps like gitk and gitX).
It has good 'social' features:
You can 'follow' a project and it's commits etc will turn up in your own custom Github feed along with all the other projects you're following.
You can click on a username and find out all about that users commit activity etc.

– Which is not far off what you can do on d.org, it's just we lack the polished UI and flow. Some of that can be addressed, but at the end of the day d.org is less focused than Github, so expecting quite such a simple UI is perhaps unrealistic.

All in all I suspect there are no 'killer' features, it's more that it's good enough, polished and free. The drupal.org redesign coupled with a move to a DVCS should address most of it, and the other features shouldn't be too difficult to achieve with drupal.

why i use git where i can

anarcat's picture

Aegir's case

So Aegir is one of those projects that switched away from CVS to use purely git to manage its releases. This wasn't only due to CVS: one of the things that was really painful for us was to release 4 projects at a time (which meant creating tags and release nodes for all of those at the same time, very time consuming and error-prone). Plus, at the time the profile packaging stuff wasn't yet on drupal.org and even now, we can't package the third party libraries we want automatically. So we merged our modules and themes directly in the profile which means we can't really host our code on drupal.org again unless the rules get a little less stricter regarding what can be part of an install profile.

Feature branches

Git allowed us to make development of new features much easier. Since branches are much more lightweight objects, we can create feature branches for anything we want very easily and it's quite easy to track those branches even amongst multiple repositories. Having such topical branches keeps the head development branch clean and stable, as we merge back those feature branches in the trunk only when they have been well tested. That would be simply a nightmare to do in CVS.

Quick fixes and cherry-picking

The other reason I use outside git repositories is the same reason people use SVN or other VCS to manage their Drupal installs.. I very often have quick fixes I need to do on a production server: if the module is in git, I can just hack at it, commit and forget about it. I have a "production" branch on my sites that I use to make those quick fixes. I can also cherry-pick fixes from the trunk or feature branches as I see fit, which is one of the killer features for me.

Basically, even if there's a git.d.o, I will have those ephemeral repositories outside, and I will probably push some stuff to git.koumbit.net, because I can. :) Project maintainers often don't have time to followup on patches or roll out releases quickly enough that I can afford to rely only on drupal.org code. And I think that's fine, actually: rapid development outside of Drupal.org is fine as long as changes are contributed back to the community in the long run. It even lowers the barriers for communicating those changes around.

One branch per issue

One thing I would really like to do would be to have a branch per issue. I think it would be really crazy and awesome if every issue would be a potential (no need to create a branch if there's no patch yet) branch on which development can occur in parallel to the main trunk. With VCS history tracking, anyone (that has an account on drupal.org) could push (and pull!) patches related to an issue very easily. If someone screws up, you just revert and push a new patch. Instead of having a pile of patches with no clear idea of what changed in between them, we would have a clear view of the incremental changes to a patch, and still keep the big picture of what changed overall (by diffing with the trunk).

This is a workflow I use when I need to send patches to the core or to modules I know will change while the patch is reviewed. It allows me to easily reroll patches when trunk changes and resubmit them. If that would be part of project*, that process could be automated (putting conflicts aside here).

Note that the above probably also applies to bzr.

I'm using github for working

fago's picture

I'm using github for working on some of my drupal modules and as well for stuff that is not yet published yet. For that I've setup synchronisation with my module's drupal CVS, for more about that see http://more.zites.net/git_drupal_contrib_cvs.

I started using git for my doing some core patches and I liked it. I started using github as there were already some drupal folks, including a drupal core mirror, so it was straight forward to start with it. Github nicely assists one during the creation or forking of new repositories and makes it easy to track the work of others - which is nice for collaboration. Click fork and start coding. When I fork a project others can easily see my changes and reintegrate them in their repository - imho that's really handy.

So why should I choose the d.o. sandbox?
* When I want to develop stuff that should be visible to others, I would not use the sandbox. No one looks at it or would note I've committed something there - neither does it provide ways to easily track my work.
* So still the sandbox would be fine to develop stuff no one else needs to see - as it's not really visible for anyone. But for that I don't need a central repository either, a local git repository suffices for that (and of course that way I don't have to use CVS..). Apart from that if I want to have a nice web based viewer for my code or a central repository, I'd choose github over d.o. as it's browser is fancier and easier to use than viewvc.

If we want to get people back to d.o. I think we need
* ways for everyone to easily create a repository without having the hassle of a complicated and long lasting CVS application
* ways to easily track the work of others (-> redesign?)
* assist users to be able to easily hack on something (fork button?).

Phase 2 planning?

webchick's picture

Fago, could we get you to come on-board and help us map out what Phase 2 of Git/Drupal.org integration might look like? (Or anyone else for that matter. :)) I'd like to hear more details behind your suggestions.

A couple of concerns I have with a "fork" button is:

  1. Duplication of code. But as long as we made it clear that there was only one "true" Views module, for example, I think that would be okay.

  2. Lack of collaboration. Forking encourages grabbing someone else's code, going off into a corner someplace, hacking it to bits, and if the maintainer wants your changes, they need to come track you down and find you, and merge them in. This flies directly in the face though of our current collaborative workflow via the issue queue that keeps maintainers and reviewers in the loop every step of the way, which I think is key to our community's health and strength. I'm really concerned with us losing that. :( Do you have suggestions on how we might satisfy both?

Forking has a pretty bad name

BartVB's picture

Forking has a pretty bad name in the Open Source community which is, depending on context, rightfully so.

In this case 'fork' doesn't mean forking a whole project including it's community, documentation resources, devteam, etc. It just means that development forks into a 'topic branch' where someone can fix a bug or add a feature. The idea is that this bugfix or feature is contributed back to the originating project. This is also what's happening on GitHub all the time.

Forking a project on Drupal would entail copying it's files, creating a new project on d.o and posting the (changed) files there. This is not what a DSCM 'fork' button would do.

New term with new semantic meaning

Elijah Lynn's picture

I agree that the semantic meaning of "forking" forking varies. Therefore it I think it would be wise to choose a newer, fresher term that is void of meaning. This way people would have a fresh start with the term and not associate any preconceived meaning or feeling with it. Plus it would be a lot easier to explain.

fork = branch

yhager's picture

The fork concept in github actually means branch. I am not sure why they chose the word fork, when everybody else use 'branch'. We can use 'branch', which has less negative conotation.

Something like: "Branch" on the button, and "Create your own private branch of this project".

The "pull requests" on github could be implemented in a similar fashion (basically a web interface for cherry picking).

I think it's pretty obvious

adrinux's picture

I think it's pretty obvious why they chose 'fork' instead of something more mundane but more accurate like 'branch' when you come across this phrase: "Fork me" or "fork me on github". Because it sounds a bit like f*** me, geddit? Yeah I know, not hilarious. But it does add an air of irreverence, is unique to github and generally good in a marketing type way. Lets not forget github is a business. Making source control seem like fun is probably good for their bottom line.

The phrase has even spawned it's own banner project:
http://mattn.github.com/jquery-github-badge/

I agree it doesn't really fit d.org :)

@yhager I'm not sure 'private' is the right word for something the whole world can see :) maybe: "Create your own personal branch of this project"

Because it sounds a bit like

yhager's picture

Because it sounds a bit like f*** me, geddit?

Heh, this never occurred to me - you are probably right on :)

I'm not sure 'private' is the right word for something the whole world can see :) maybe: "Create your own personal branch of this project"

Of course.

Well, when you fork a project

EclipseGc's picture

Well, when you fork a project it pops up a modal window with the text "Hardcore Forking Action" in it, so I'm pretty sure the similarity with f*cking is intentional.

Ha!

adrinux's picture

Never seen that one. I must be a forking virgin.

I think I'd best stop at that.

That does pretty much underline why we should stick with 'branch' on d.org though, definitely not appropriate.

New thread, please

sun's picture

It would be totally awesome if we could move discussion about next steps into a new thread and don't mix those details into the 3 pages here. :)

That said, a "fork" button + "follow" functionality would be nice to have, but should probably be tackled last in this entire integration, i.e. after projects + repos have been properly migrated to git and everyone is able to work with it. By then, we will know how branches can be handled and whether we will have per-issue branches.

And that being said... per-issue branches? sandboxes? So what is http://drupal.org/project/sandbox and http://drupal.org/project/issues/sandbox then? :)

Daniel F. Kudwien
netzstrategen

Agreed!

fago's picture
  • We need a new thread!
  • Fork + Follow is definitely phase2 and shouldn't be tackled before project* and the repos are migrated - maybe as part of the d.o. redesign as the "following feature" was already suggested there.

Yep, forking is probably a

fago's picture

Yep, forking is probably a bad name. However if d.o. doesn't support it people will still clone drupal projects for whatever reason. However assisting users to do so enables us to make those "forks" visible (as now we know who forked what.) That way I think it can really help us to improve collaboration.
But yes we need to think how we best can integrate that with our current workflows. Maybe "forks" or "issue branches" could serve as an alternative for patches in future - that way it would be much easier to track changes between several patch revisions and it could help us to ease testing of the proposed changes.

@webchick: We seem to be

EclipseGc's picture

@webchick: We seem to be settled on git as our choice, but I do want to answer this question for the record.

There are 2 primary reasons I choose to use github over *.d.o for my code repository.

1.) Git is a million times more forgiving with me if I choose to move/rename/delete/break up files. That alone is enough for me to entirely abandon d.o for my code repo. Which switching to git will fix.
2.) So many other people are using github already, that forking/playing with code within their system is dead simple. This expands the amount of code I can consume and grok pretty significantly. That's something that I'm not entirely sure our defined "phase 1" move to git is going to satisfy, and I think your points about phase 2 are important here. I think the objective has to be to give our users the ability to more quickly/easily consume/process/contribute to whatever code base they want.

To me that means:

1.) Docs
2.) Intuitive code "cloning"
3.) Intuitive code "pulling" (for project owners to pull from contributor's code bases)

Anything else is gravy. I've mentioned how we could potentially tweak out the issue queue to provide patches and tarballs a number of times now and as much as I like that idea, it's really an attempt to make review/testing easier for the uninitiated (which I still think is an awesome goal), but even with that, I still think my 3 points above are the real kickers here in giving drupal developers a better sandbox/project host.

As a closing point I'd like to elaborate for just a second on there being lots of code on github already. I think many developers feel the way I do, or have similar reasoning. While the projects that are off d.o might take a while to eventually come back no matter what our environment here looks like, I have to assume that our dev's loyalty to d.o AND d.o's desire to host their code would play a big role in those devs moving back here for primary development... bottom line it's sort of an "If you build it, they will come"(back) scenario.

Eclipse

DISCLAIMER: This is all just my opinion, so... yeah ;-)

Random replies

sun's picture

Can someone, anyone, please, increase the max comments per page setting for g.d.o, pretty please? I #fail to find the comments I'd like to reply to.

  1. git Windows support: TortoiseGit is "usable". It is currently backed by one developer/maintainer only, but he is very responsive to bug reports. As a Windows user, I mostly don't care for the command line of whatever tool, including git. Performing basic operations like clone, checkout, branch, merge, pull, push, log, etc. works fine. Stuff like rebase, stash, submodule, and so forth seem to be supported as well, but I hadn't have a chance to try them out yet (why should I?).
  2. I'm pretty sure that I'm partially still applying "old-school" concepts from CVS/SVN to the way I work with git, but at least, I was able to setup a remote (bare) repository, using post-update hooks to manage a production and staging site, including a "vendor branch" to manage third-party updates and own patches, within one day of learning and understanding the concepts of git.
  3. When disregarding all that local branch crap, then working with TortoiseGit is identical to working with TortoiseCVS. Checkout a project, do your changes, create a patch. Revert all changes, work on next patch. This is how I work with CVS currently, and I bet I'm faster than many in this thread. :P ;)
  4. More tricky problems arise when you're used to your awesome visual diff tool, but your diff tool only supports 2-way diffs - which is more than sufficient to resolve merge conflicts with CVS. Merging with git (always) requires a diff tool that supports 3-way merges, i.e. base > local < remote. For me, this means I need to upgrade my diff tool + all the custom scripts and configuration around it, but YMMV.
  5. I actually have no idea what you guys are talking about re: fast-forward, rebase, bisect, wtf. Tracking most replies quietly, it seems like both systems support those functionalities. We'll very likely need docs on those. Otherwise, d.o will throw error messages at me, which I don't understand.
  6. I have to admit that I never tried Bazaar or Mercurial, just because everyone in IRC, Drupal Planet, and whatnot was and still is talking about git. Logically, "talk" means that there are at least plenty of folks who I could ask for basics. My first "contact" with bzr was during Views' port to D7 on Launchpad. I wasn't able to help, because I did not understand Launchpad. If those are the people who invested millions of dollars, then they should stop paying engineers and hire some usability folks instead.
  7. The centralized and integrated nature of the projects and issue queues on drupal.org indeed are the heart of the Drupal project and the Drupal community. I can only second webchick (and someone else) here. After all, it is what pulled me into Drupal development after totally stupid + wasted years of development on Joomla! and other systems. It is the key to our success, the key to innovation, and the key to "human integration". Humans. Not existing developers. :P There is no feature or software on this world I would treat with drupal.org. On that note, the d.o redesign makes me worry, too, because it equally contains concepts of "de-mystifying" things... aka. users here, developers there - #fail.
  8. In the past, the Drupal Association did not directly fund effective implementation work. A couple of replies in this thread proposed to do so, and some were worrying about no one being there to enhance project* for DVCS support. I think there are plenty of developers that would happily help -- if there was a clear roadmap, clear targets, and managed project organization. Instead of paying developers, we want to pay a dedicated + skilled project manager in this area, to setup a project plan and lay out an architecture that leads to a step-by-step integration on drupal.org, using atomic phases + tasks that arbitrary developers can work on. As always, it's only about planning, communication, collaboration, and controlling. We have more than sufficient developers, but only very few people who drive, connect, and push a certain topic (aka. project managers). It is always the same, and we must be blind if we don't recognize that pattern. This entire thread is a wonderful example. Thanks to The Incredible Webchick™ for taking on this challenge.

Daniel F. Kudwien
netzstrategen

Second Sun's post

kyle_mathews's picture

Second what Sun wrote. I'd love to help on the DCVS conversion project but I don't have time (or the skill frankly) to take on a large role. So if I have 4 hours to give, I don't want to spend the first 3 hours just figuring out what's going on before spending the last hour actually helping. If there was a paid project manager who could maintain an active list of tasks, then I could help. Otherwise it'd be too frustrating and time-consuming to be worth my (limited) time.

Kyle Mathews

I concur, doctor

DamienMcKenna's picture

I second what kyle said - I'd love to be able to help a lot more but I'm trying to work two jobs, help with FLDrupalCamp and have a family to take care of too, and I bet there are lots of others out there in a similar boat. I started looking at project* twice before but in both cases there was so much involved I had to back down. If there was a longer list of smaller tasks it'd be much easier to jump in and start chipping away.

I need some help...

webchick's picture

I agree this that bite-sized chunks are indeed the best way to go about this, but unfortunately I lack the knowledge on the infrastructure side to be able to properly parcel this out. :( I'm hoping once we get step 0 -- what vcs are we moving to? -- crossed off, that the folks who volunteered to lead implementation, along with the d.o infrastructure team, can help fill in the blanks with actionable tasks that people can assign themselves to.

Second Sun's post

kyle_mathews's picture

Second what Sun wrote. I'd love to help on the DCVS conversion project but I don't have time (or the skill frankly) to take on a large role. So if I have 4 hours to give, I don't want to spend the first 3 hours just figuring out what's going on before spending the last hour actually helping. If there was a paid project manager who could maintain an active list of tasks, then I could help. Otherwise it'd be too frustrating and time-consuming to be worth my (limited) time.

Kyle Mathews

SmartSVN

DamienMcKenna's picture

For anyone who can afford it, there's also SmartSVN which provides an interface similar to SmartCVS.

Bug Report

Elijah Lynn's picture

Can someone, anyone, please, increase the max comments per page setting for g.d.o, pretty please? I #fail to find the comments I'd like to reply to.

+100 (sorry I couldn't resist, this is broken and is a bug)

killer feature demo: cherry-picking

anarcat's picture

One of the killer features of git for me is the cherry-picking features. I can backport changes from the dev branch to stable branches with a single command. Say I have a "stable" branch (same as DRUPAL-6 in CVS) and my regular "master" branch (same as HEAD in CVS) where I do all the dev. If I want to backport (say) postgresql support from HEAD, I would do something like this:

$ git checkout DRUPAL-6
Switched to branch 'DRUPAL-6'
$ git log master
commit 635ebba91ad7c69f26c6f0cf8185d04752146b11
Author: anarcat <anarcat@koumbit.org>
Date:   Wed Aug 19 15:57:09 2009 -0400

    #522260 - start fixing decisions for pgsql: don't specify a length for
    quorum_percent, it seems the docs for the schemaapi are misleading

[...]
$ git cherry-pick 635ebba91ad7c69f26c6f0cf8185d04752146b11
Finished one cherry-pick.
[master 2635156] #522260 - start fixing decisions for pgsql: don't specify a length for quorum_percent, it seems the docs for the schemaapi are misleading
1 files changed, 1 insertions(+), 1 deletions(-)

And that's it! I then just need to push the changes to the central repo and I backported a change, with just one command:

$ git cherry-pick <ref>

The neat thing with cherry-picking is that if you merge with that branch later on, git will recognize that patch and will avoid conflicts.

killer feature demo: interactive add/commit

anarcat's picture

Another thing I use very often in git is the possibility of committing only chunks of a changeset. How many times did I use diff and patch to revert parts of a patch so that I remove whitespace or unrelated changes... In git, it's as simple as using --interactive. Demonstration:

<?php
# this is my full diff. notice the stupid whitespace change at the end.
$ git diff
diff
--git a/verify.provision.inc b/verify.provision.inc
index 96c1a94
..6d023e7 100644
--- a/verify.provision.inc
+++ b/verify.provision.inc
@@ -30,8 +30,9 @@ function drush_provision_boost_provision_verify($url = null) {
  *
Inject the relevant Apache configuration in the site vhost
 
<em>/
function
provision_boost_provision_apache_vhost_config($data = null) {
-  if (
drush_get_option('site_boost')) {
-    return
file_get_contents(<strong>FILE</strong> . "/boosted1.txt");
// the real change here
+  if (($version = drush_get_option('site_boost')) && is_numeric($version)) {
+    return
file_get_contents(dirname(<strong>FILE</strong>) . "/boosted1.txt");
   }
}

@@ -
54,3 +55,5 @@ function drush_provision_boost_post_provision_verify($url = NULL) {
    
drush_set_option('platform_boost', drush_get_option('platform_boost'), 'drupal');
   }
}
+
+
// some stupid whitespace change

# then we go into interactive mode to commit only the change we need
$ git commit --interactive
           staged     unstaged path
  1
:    unchanged        +5/-2 verify.provision.inc

*** Commands ***
 
1: [s]tatus     2: [u]pdate     3: [r]evert     4: [a]dd untracked
  5
: [p]atch      6: [d]iff       7: [q]uit       8: [h]elp
What now
> p
           staged     unstaged path
  1
:    unchanged        +5/-2 [v]erify.provision.inc
Patch update
>> 1 # select the file for patch mode
          
staged     unstaged path
</em> 1:    unchanged        +5/-2 [v]erify.provision.inc
Patch update
>> # just hit enter here
diff --git a/verify.provision.inc b/verify.provision.inc
index 96c1a94
..6d023e7 100644
--- a/verify.provision.inc
+++ b/verify.provision.inc
@@ -30,8 +30,9 @@ function drush_provision_boost_provision_verify($url = null) {
  *
Inject the relevant Apache configuration in the site vhost
 
*/
function
provision_boost_provision_apache_vhost_config($data = null) {
-  if (
drush_get_option('site_boost')) {
-    return
file_get_contents(<strong>FILE</strong> . "/boosted1.txt");
// the real change here
+  if (($version = drush_get_option('site_boost')) && is_numeric($version)) {
+    return
file_get_contents(dirname(<strong>FILE</strong>) . "/boosted1.txt");
   }
}

Stage this hunk [y/n/a/d/j/J/? ]? y
@@ -54,3 +55,5 @@ function drush_provision_boost_post_provision_verify($url = NULL) {
    
drush_set_option('platform_boost', drush_get_option('platform_boost'), 'drupal');
   }
}
+
+
// some stupid whitespace change
Stage this hunk [y/n/a/d/K/? ]? n
           staged     unstaged path
  1
:        +3/-2        +2/-0 verify.provision.inc

*** Commands ***
 
1: [s]tatus     2: [u]pdate     3: [r]evert     4: [a]dd untracked
  5
: [p]atch      6: [d]iff       7: [q]uit       8: [h]elp

What now
> q
?>

And then my regular editor pops up for the commit message and booya! no whitespace in my patch.

I hate whitespace in non-whitespace patches. With a passion. git cured me. :)

Conclusion?

webchick's picture

Here's where I'm currently at with this.

Provided that the Windows GUI question turns out okay (which looks fairly promising, given sun's response at http://groups.drupal.org/node/48818?page=2#comment-133748), I am going to propose to Dries and the Drupal.org infrastructure team that we go with Git as our CVS replacement.

Here's why:

(Note, I am coming at this with purely my "evaluator" and "community manager" hats on. Dirty secret: I actually only really know Subversion, and barely enough CVS to get by. So I don't have any particular loyalty to one $vcs or the other. This makes me a part of the prototypical "target audience" who we are concerned about being able to come on board.)

  • Both systems seem to be functionally equivalent. Every time $vcs fan talks about $cool_feature, $counter-vcs fan chimes in and says "Ours does this too! Here's how!" This has actually been really informative, and has helped quell concerns that we might be locked out of certain features with option A or option B.

  • Of the two, not only does Git have the larger community in the broad Internet-wise space, but also within our own community. We have more than double the signed-up implementation resources for Git as we do for Bzr. We also have at least some partial code already written for replacing our existing Project* infrastructure. This helps set my mind that ease that even when the transition is rocky, our contributors will be able to get the support they need, and that we will be able to grow our infrastructure team with additional helping hands.

  • Learning curve was a definite concern for me with Git, coming in. Both by reputation and also by trying to follow conversations of people in #drupal who were using Git for code collaboration (bisect? rebase? wtf?) But both Git and Bzr seem to have fantastic off-site documentation, and the differences between http://drupal.org/node/710906 and http://drupal.org/node/711070 are not /that/ much to cause me horrific concern that people won't be able to make the adjustment. Especially with the hand-holding provided by the much bigger numbers of Drupal community members already using Git in their day-to-day lives.

  • One thing that would definitely have pushed Bzr to the forefront is if we were giving serious consideration to Launchpad as a replacement for Project*. But this move, despite several very well-reasoned arguments, has basically zero support within the larger community. And we need massive community will to make major infrastructure changes like this. We have such will for moving off of CVS. We don't for moving off of Project*.

  • And finally, my major motivation in taking up this cause is to stop the flow of developers and code flowing outside of drupal.org. And when it comes right down to it, we are for the most part losing developers and code to GitHub, not to Launchpad, and not to BitBucket. So it makes sense for us to move to equivalent tools to help stop the bleeding.

So although we could definitely drag this out further, and do "real life" usability testing of both options at Drupalcon like was mentioned previously, I really think the choice is pretty clear at this stage, as long as the Windows situation is adequately addressed.

So if someone from the Git brigade can please get on that post-haste, I think we can wrap this up.

I really do want to thank everyone though for your participation in this. It's been extremely informative, and we've managed to keep this thread amazingly constructive, while working through quite a bit of the thorny issues in a very short period of time. Sorry about the 400-replies it took to get us there. ;)

Windows support

DamienMcKenna's picture

For Windows there are several options:

  • Command line option 1 - msysgit - http://code.google.com/p/msysgit/
    Gaining momentum within the greater community and support from the main git project, though it is not as seamless as on other operating systems.
  • Command line option 2 - cygwin - http://www.cygwin.com/
    Provides a complete and mostly compatible UNIX shell environment on Windows. The subsystem believes it is running on a UNIX-like operating system and a large number of command-line tools are available.
  • GUI option 1 - TortoiseGit - http://code.google.com/p/tortoisegit/
    A rewrite of the very popular TortoiseSVN tool built around git. Requires msysgit so the command-line functionality is still there if you need it.
  • GUI option 2 - SmartGit - http://www.syntevo.com/smartgit/
    A new commercial cross-platform GUI tool from the makers of SmartCVS and SmartSVN.

Also, to help get people started:

GitCheetah

chx's picture

Well. If the community casts it lot with git then I will do everything I can to make this easier. http://code.google.com/p/msysgit/wiki/GitCheetah

Edit: SmartGit can be used free of charge for non-commercial purposes.

Recommendation has been sent...

webchick's picture

Thanks for this, both of you.

I've sent off an official recommendation tonight to the Powers That Be and the infrastructure list, so we'll find out what happens from here. :)

well...

alexanderpas's picture

Well, we can't go wrong by following someone who has written about 2% of the linux kernel! (LBT)

Bullshit

chx's picture

This only proves git is powerful and my counterarguments were it is too powerful. What were you trying to say here?

Read Carefully

alexanderpas's picture

Take a close look to the Linux Kernel Workflow.... now, replace email with issue queue... and what do we have?

Cygwin is not an option

jpetso's picture

Telling Windows users to use Cygwin is basically telling them that we have given up on Windows. Not only does Cygwin make its tools believe that actual Unix is running, but also the user is expected to work as if he was on Unix. Plus going the "emulation" way is not the way forward to make Git for Windows feel more native.

In short, msysgit is "the official" Git for Windows and I really would not consider Cygwin an option. Which is fine, because msysgit works just as well :P

The important thing is that

David Strauss's picture

The important thing is that we're moving forward with a transition to a DVCS. We've gotten bogged down in debates between the nuances of Bazaar versus git, possibly losing sight of one fact: git and Bazaar both completely spank CVS and Subversion. That's why I'm still happy to see this. I'd much rather have git declared the choice than debate this for another year with the possibility of picking my favorite.

Sorry about the 400-replies it took to get us there.

An apology would only be necessary if you had dragged us into this discussion without moving the process forward, which is very much not the case this time.

Thank you, David.

webchick's picture

That's really great to hear that despite your valiant efforts to promote Bazaar -- which were immensely helpful, btw. I'm so glad we had such a strong Bazaar advocate to help balance out the numbers on the Git side! -- you are still on board even though we ended up recommending Git. Kudos.

Seconded. Kudos to both David

adrinux's picture

Seconded. Kudos to both David and Chx. True gentlemen.

Awesome news

walkah's picture

This is great Angie, nice to see a little late night IRC conversation balloon into such a community-wide effort with actual conclusions, recommendations and action items!

Great work by everyone involved - can't wait to see the results :-)

One last feature that I

gordon's picture

One last feature that I forgot about, that I would find invaluable as the maintainer of the e-Commerce system.

Patches can be cryptographically signed by the submitters. This would apply to drupal core as well. So when a patch is submitted the lieutenants can sign off a commit to say that they have testing a thing that patch is ok. So in looking at the patch you can see who authored the change, and all the lieutenants that tested and pasted the patch up the tree to the core, and also who committed the patch to core.

I don't know if bzr has this, but give the legal issues raised by SCO, Linux development wants/needs to be able to track every patch and who tested and authored the patch.

This may or may not be used for core, but I know that with e-Commerce because it is dealing with peoples money and credit cards I would love to be able to track every patch to this level. Makes everyone who submits a patch accountable.

I am glad this see that we will most likely be going with git, and I wish I could be at Drupalcon SF, but family issues means that I can't attend. I know that I help out so much on the issue of implementing/brain storming git.

I have been using git and I have started development of a github type system for Drupal, which I think some of that code will be able to be reused to integrate git into Project.

--
Gordon Heydon

A decision has basically been

David Strauss's picture

A decision has basically been made, but I'll still respond as I did earlier in this huge mass of comments (so I excuse you from not seeing it): Bazaar has full support for digital signatures on commits. Branches can be individually configured whether to make them mandatory or optional.

Good to know David

Elijah Lynn's picture

Thanks for keeping at this David. I have enjoyed your comments in this discussion. A decision has been suggested but nothing is final yet, I too am concerned about how easy this will be. Someone said once that this decision should not be for the existing programmers but for the "new" community we want to bring in. I would prefer the easier route and it seems as if Baazar is easiest. But if there is good docs then I guess it should be OK. Anyways, don't give up your points yet, there is no final decision yet.

Thanks for all the feedback

Dries's picture

Thanks for all the feedback in this thread. Very helpful, very constructive, very collaborative -- the Drupal way. The sheer volume of posts also required us to tweak Mollom for g.d.o. I've read all comments in this issue, but I'm going to re-read them again a couple of times.

I'll also get together with the infrastructure team and webchick to do some more brainstorming, to process the different pros and cons, and to define an action plan on how to move forward. I'd like to have this action plan wrapped up by DrupalCon SF so we can execute at and after DrupalCon SF -- if not earlier.

Tortoise for windows seems quite mature

rfay's picture

I use git on the command line on Linux for my own purposes, but of course other people need to use my repositories, and they're not all as friendly with the command line. I've installed and supported TortoiseGit for a couple of people just for this purpose, and it's basically 100% successful at this point. A very nice quality implementation, actively developed. I think we'll be OK on the Windows GUI side.

Tortoise, BTW, now comes with an outstanding shell of its own, a little slicker than cygwin.

I'm clearly late in my input,

Manuel Garcia's picture

I'm clearly late in my input, but I wanted to chip in anyway.

I maintain the views_accordion module, and the theme darkblue. I'm one of those people that use cvs to basically make commits, have to lookup the documentation when they're about to make a new release, and messes it up still. The first project i committed was darkblue, since I am a better themer than a developer, and I remember having someone (cant remember who it was, but thanks!) walk me through the whole process through a couple of hours, just to create the theme's project and initial release. I would commit more themes, if it didn't make my knees shake just the thought of going through the process.

When I started using a vcs for my professional projects as a freelancer, I chose Bazaar. It took me about 5 minutes from knowing nothing to having a working project to make commits and all that. Bazaar to me is a MUCH friendlier vcs for newbies to the whole deal. For example, you can delete a file and then just commit the changes, to me that is both intuitive and idiot proof (though I don't know if you can do the same in git).

I've had to use git once, in a futile attempt at helping out with one of the themes being proposed for D7, and well, I had to read the documentation several times just to actually grab the project, and not mess things up. The commands are not intuitive from my point of view, the naming is confusing, and the command line help doesn't help much when you are starting up.

I know I probably sound like an idiot, but well, really I'm not, just very unexperienced with these tools. Drupal has been my first free software project that I am contributing to, and I'm still learning every day.

So what I wanted to get across, although it seems a decision has been made, is that although a lot of people seem to be using git, IMHO bzr has a much lower barrier of entry for us mere mortals who just want to share our "great" theme in 10 minutes and go celebrate. For whatever it counts, there it is! =)

Thanks for your feedback, Manuel.

webchick's picture

I too am nervous about the learning curve here, and your feedback is especially important since you're coming at this from a designer/themer perspective (as is Emma Jane, who is also a Bzr fan). And I definitely don't think you're an idiot. Every time I have to roll a new release of Drupal 7 I have exactly the same "jelly-knee" reaction. :D

What I would love is to be able to call upon you and others to help review (or even help write!) the documentation on this, because I agree that we definitely do not want to shut out new contributors in this move. We can't really do much about Git's innate syntax not being as intuitive, but there might be other tricks, like aliases and so on, that we could document well to help people like you (and heck, me too!) through the process.

I would not be too nervous

gordon's picture

I would not be too nervous about the learning curve of git. As I said earlier generally you will only use about 7 commands and they are very simple to do the usual things.

Generally once you start dropping down from the porcelain to the plumbing that the commands get very strange. I know when I wrote Git Browser you find some strange commands.

Where I am currently working we have themers using git without any issue. Generally they only use 4 commands. pull, push, add and commit and that is it.

Gordon.

--
Gordon Heydon

Yes, you are right gordon,

Manuel Garcia's picture

Yes, you are right gordon, I'm sure I'll manage, basic stuf would surely be easier than cvs -- no doubt there -- my point was though that for really really new ppl, bzr is friendlier, not that git is hard ;)

Whichever way we go, it will be an improvement!

Will keep an eye out

Manuel Garcia's picture

If we finally make the move, (which i hope we do, either bzr or git), I'll keep an eye out for the documentation issues, and try to work with you guys to make it human-friendly at least.

I know this isn't the place for such discussion but, I find myself re-watching Addison Berry's Create a Module Release short video when the time comes. We should really create such screencasts for the most basic uses when we get down to it. It'd really help newcomers.

Also, thanks for the kind comment!

Ease of use

ximo's picture

Sorry for being late to the party, but I too have some concerns about the learning curve of Git and thought I'd add my reasoning, for whatever it's worth.

I'm a developer who won't mind reading a bit of documentation to understand a new system. And having learned Git, I really appreciate its flexibility and the concept of a staging area. So to me, Git is an excellent choice. But the vast majority of contributors don't need a complex and flexible tool, they just want to checkout a branch and commit their changes to the repository as easily as possible. And this is where Bazaar with its bound branches (centralized workflow) shines.

Also, while Git has a lot of support in our community and elsewhere, which means more hands to hold when you need help, Bazaar is so much easier to use that I don't think there would be the same need for assistance.

I think this, the first listed advantage for both systems in the summary, is the most important factor to consider when selecting between the two.

I know the decision has

elberry's picture

I know the decision has basically been made, but there is a small point I'd like to make.

There is a small feature of Bazaar which I believe plays very well with Drupal usage. It was mentioned in one of David's comments, and is often overlooked I think.

Bazaar supports FTP out of the box for both branching and pushing. I think this is a key feature of bzr in regards to it's usability with websites.

As an "end-user" of drupal, I could install the Drupal files on my new host by doing something as simple as:
[code]
bzr branch [remote drupal 8 branch] drupal_8
bzr branch drupal_8 my.domain.com
cd my.domain.com
bzr push ftp://[my remote host]/www
[/code]

I don't even need an FTP client. Now I just go through the online instructions. The second branch into my.domain.com isn't really necessary, but will make upgrading easier.

Upgrading:
[code]
bzr branch [remote drupal 8.x branch] drupal_8.x
cd my.domain.com
bzr merge ../drupal_8.x
bzr push ftp://[my remote host]/www
[/code]

This is nice as it will delete files which have been removed in the Drupal 8.x branch, and will only push the differences, making the upgrade quicker and safer.

If I'm the developer of a new website and I am using bzr for my website's other code, then I can upgrade my Drupal website, manage non-drupal parts of my code, and push the changes live all with one tool.

Lastly, in my opinion, the FTP support makes bzr very cheap to use. I don't know of any hosting company (even the really inexpensive ones) that can support Drupal and doesn't offer FTP making it the ideal VCS solution for startups. This is one of the reasons why I (myself) chose both Drupal and Bazaar to do the few websites I've developed for people.

Just m2c.
Eric

Cool, but not a show-stopper IMO

webchick's picture

I agree this sounds really cool, and convenient for a website publishing workflow; it basically makes Bzr equivalent to FTP clients with a "synchronize" option, but with version control built-in. I know I've heard Emma talk about using this, too.

However, Drupal.org using Git doesn't preclude the use of Bazaar this way for website management, since obviously you're able to use it right now when Drupal.org is on CVS. :) So while certainly cool, I'm not sure this is really a pro-argument for bzr.drupal.org, per se...

Any Git-heads know if Git can also do a similar workflow thing?

There is an FTP script

chx's picture

http://wiki.github.com/ezyang/git-ftp/ but that does not allow a simple git push via FTP. Given how FTP does not offer any sort of locking capabilities this is not an easy thing to do, anyways.

Git can push things over SSH

voxpelli's picture

Git can push things over SSH which is more secure than FTP - we're using it at Good Old for all pushing of code to servers and as a result practically never use any SFTP or FTP clients anymore - it's all command line.

Here's a blog post my colleague Hugo wrote on it (which I now see just got an interesting comment from Tobias): http://goodold.se/blog/tech/git-workflow-going-live

Showstopper?

Jeff Veit's picture

I would have thought that it was a showstopper that Git does not version empty directories. Instead it discards them. I'm thinking of the files directory, sites/all/modules directory in a new install, and so on. This means that if you use Git to version your site, it may not be able to exactly reproduce the structure of your site at a particular version. This is a common use case for version control; it's not just for software versioning that the vcs will be used.

CVS too

chx's picture

Look, CVS checkout dscards empty dirs. Sucks. We worked around that in the past by adding a README.txt in the dir.

Ditto - late to the conversation :-(

Jeff Veit's picture

I was following the conversation until i saw the plan to test both at Drupalcon SF, which seemed completely sensible. Apparently that was a mistake.

I use bzr and there were 3 key reasons for choosing bzr over git:

  1. Bzr was designed from the beginning to be cross-platform. Git's home is Linux.
  2. The bzr documentation seemed better to me and I found it much more organised.
  3. Git, by design, does not version empty directories, meaning that it's not possible to recreate the exact state of a project at a particular version. Bzr does. For me this was a critical issue. In terms of Drupal it's important because things like the files directory starts off empty with an empty CMS. I would have thought this was a showstopper for git because empty directories often have meaning and I expect that the same vcs will be used for code management and system versioning.

    (See http://git.wiki.kernel.org/index.php/ContentLimitations. 'Git cannot be directly used as a general versioning solution for arbitrary files, such as home directories or "/etc"'. This quote is misleading in that you can build upon the basic git infrastructure, by tarring before versioning, but this loses the point of versioning.)

Was the last point taken into consideration in making the choice?

Implementation tasks...

webchick's picture

Looks like some infrastructure folks got antsy... ;) The Drupal.org infrastructure queue now has a new "GIT" component, where issues for individual tasks are being created. There's also a "meta" issue to track everything in one place.

Members of the Git brigade who offered to help with implementation, go ahead and head over there. Dries and I also have a call scheduled early this week to discuss this plan further.

Much higher percentage of

matt2000's picture

Much higher percentage of Drupal community members have familiarity with Git.

I keep hearing this, but is there any hard data to support the claim?

I think I heard the same claim for bazaar when it was chosen for the Fields-in-Core code sprint. Besides the informal "Hey You" list here, do we really know how many Drupal CVS users are familiar with one or the other?

For my experience, I tried GIT first, then found happiness with Bazaar. I think Bazaar has the better documentation, and I LOVE the flexibility of bind/unbind for distributed and centralized workflows. I think this will be key for complex projects with their own ecosystems, like Ubercart, for example. (I'm just guessing; i don't know the opinion of the ubercart team, or what SCM tool they use apart from CVS.)

Don't know, don't care. ;)

webchick's picture

I don't have measurements for raw numbers, but don't really think they're relevant for our purposes here. If that sounds preposterous, I'll explain.

I actually deliberately tried to leak information about this post in pieces, using a concrete, calculated strategy in order to arrive at the conclusion of which one has more practical community support (this is the key).

The first place this initiative was announced was IRC. This was important, because IRC is often the first place developers go when they're having trouble with CVS and can't figure it out. Having a strong group of people available to answer questions is our first line of defense against losing contributors to this move. The fact that Git folks swarmed around to help flesh out the initial copy of the wiki page was a great sign, but of course not everything since it really was more a survey of "$vcs users who happened to be awake at X hour." :)

The second place this announced was here on the "Issue tracking and software releases" group, once the wiki page was created. This was especially important, because this is the group that people who help out with Project* module and a lot of our existing CVS-based infrastructure hang out in, and the decision of what VCS system to use affects them greatly. This move definitely paid off: within a day or so we had participation from key folks on the infrastructure team. This had very interesting results: Bazaar had an edge, because it turns out, we had buy-in from the infrastructure team already to move to Bazaar anyway. But we also had a group of people not already on the infrastructure team who pointed to existing work they had done on Project* and Git integration, migration scripts, etc. And once again, this made Git the winner in my book: it had the potential to grow our infrastructure team with new helping hands, rather than over-burdening our existing 5-6 volunteers who are already responsible for everything else on drupal.org.

The third place this was announced was Twitter, which reaches a very broad user-base of people who are both relatively and not at all engaged in the larger community. Here, the Git users seemed to come in like a swarm. This was a good indication to me that broadly, at least among people who follow key Drupal folks on Twitter, Git was the vastly more popular. Another point in its "community support" column. Even if only 10% of those folks stick around after this discussion to help out, that's still a huge win, and much bigger for us than if 10% of the Bazaar users stick around, just in terms of sheer numbers.

And finally, this was announced on Drupal Planet, which is where most of the relatively "clued-in" people get their Drupal news, if they're not on Twitter. Here, Mercurial of all things made a surprise re-resurgence when we'd all but written it off. But unfortunately, not enough of one to put it back on as a contender. It was nice to see those sections of the wiki fleshed out though. And speaking of wikis, I was also monitoring how quickly the various "fill in the blank" sections in the wiki were filled out for each project. Git users seemed to be ravenous, finding reference after reference for their stuff. The Bazaar users were very conscientious and thorough in their documentation (and David's counter-points to Bazaar FUD were invaluable), but because they were not as numerous as Git users, their stuff filled in more slowly.

Any one of these data points in isolation would be a silly thing to base a decision like this on. However, all of them together have shown me (as a technology-neutral evaluator acting as community advocate) that active Drupal contributors with the "itch" to move off of CVS, who are relatively clued in to the community, and feel knowledgeable/available enough to help flesh out information and sign up to do real work, and in some cases have even started it... by and large are Git users. Since those are the very people who'll be driving this migration, and staying on to help with the considerable user-training task, it only makes sense to choose the platform they know, so that they can rally behind it and do the actual work to move us over. And the proof is in the pudding, IMO, based on the explosion of activity in the infrastructure queue from new contributors that this decision has created.

The trick is going to be maintaining momentum, now. Still working out a calculated strategy for that. ;)

Well, I'm pretty dismayed to

Garrett Albright's picture

Well, I'm pretty dismayed to see you say so blatantly that pure numbers was the main reason Git was chosen. As I've said before, that's the worst possible reason to choose one over the other.

But not that I really want to belabor the point now. It's great that a choice was made, even if it was made for the wrong reasons. Now back to coding.

(Though it seems to me that, both from reading this thread and elsewhere around the internets, that those who have tried only Git like Git, but those who have tried both Git and another DVCS prefer the other one. Is that fair to say?)

That's a broad generalization

kyle_mathews's picture

That's a broad generalization that's probably not true. I can't speak for anyone other than myself but I've used Git and Bazaar both pretty extensively and played around with hg and have settled on Git for the past year or so. But I really have found all of them pretty easy to use. The main learning curve was just understanding how to work in a DCVS environment -- once I understood that moving between syntactical differences of Git/Bazaar/Hg was trivial.

Kyle Mathews

Hm?

webchick's picture

Did you... even read my post? :\ Numbers had almost nothing to do with this. I in fact specifically said that raw numbers were irrelevant and what was really important here was to gauge our specific community's ability to make this transition, based on a variety of factors, including volunteer support, work already completed, how quickly resources came together, etc. And the Git folks simply aced this, hands down, for all of the reasons discussed.

And please also see the conclusion post above. Community support was just one of several factors that led to this recommendation.

Did you... even read my post?

Garrett Albright's picture

Did you... even read my post? :\

Yes, particularly these parts:

The fact that Git folks swarmed around… And once again, this made Git the winner in my book: it had the potential to grow our infrastructure team with new helping hands, rather than over-burdening our existing 5-6 volunteers who are already responsible for everything else on drupal.org.… Here, the Git users seemed to come in like a swarm.… Git was the vastly more popular.… Even if only 10% of those folks stick around after this discussion to help out, that's still a huge win, and much bigger for us than if 10% of the Bazaar users stick around, just in terms of sheer numbers.… because [Bazaar users] were not as numerous as Git users…

Forgive me if that reads as "We chose Git because it is more popular" to me.

But again, I'm not really trying to be contentious here. If I had my druthers, Git wouldn't have been chosen, but rarely in life do I get to have my druthers and this is a rather minor case in the grand scheme of things. So I'm ready to get over it and get back to coding. I, for one, will welcome our new Git overlord.

Thanks for the detailed and

matt2000's picture

Thanks for the detailed and considerate response.

I wish the announcement venue had included the development mailing list and the front page of drupal.org for such a huge decision, since the decision had already been made by the time I caught up. But, that said, I'm not actually upset about the final result, although I probably would have preferred bazaar, and would have signed up to help at DCSF.

I look forward to giving git a second chance...

The Ubercart crew uses

Garrett Albright's picture

The Ubercart crew uses Bazaar, incidentally… http://ubercart.org/bazaar

Bzr's API

BiosElement's picture

I just wanted to comment here that I've spent quite a few days looking very closely over bzr/git and one of the most striking differences I found was their core design principle.

git was designed to be fast and this is a good thing. It concerns me however that this is it's 'primary' goal. 3 seconds, 15 seconds, even 30 seconds isn't that long to wait to commit something in the grand scheme of things. Annoying? of course, but not the end of the world.

bzr on the other hand was designed to be easy to extend and has a api to back it. Github actually uses a hacked on version of git just to get access to some of the api features they use because git itself doesn't have them. I think that this in itself is a major issue as I believe that the api is probably one of the single most important features any system can have.

bzr is lacking in speed at times, yes. But it's worth noting that bzr is in python, not C with some bash and perl tossed in. I don't see speed as an issue because it's working with 'just' python. If it was needed, it would be possible to replace some of the more taxing parts with C modules and take it from there.

Note: I make these statements from the position of a project manager/end-user. I have 'not' done more then skim the source of either system but I have spent several months with both and as you can tell, I prefer bazaar.

As someone who's contributed

David Strauss's picture

As someone who's contributed development to Bazaar's core and written BCFG2's interface using the native Bazaar API, I can confirm that Bazaar's architecture is 95% bzrlib, a distributed version-control database API and 5% bzr, the command-line utility that wraps it. The bzr utility never works with the repository directly; everything goes through the API. Bazaar's architecture forces the API to be extremely rich because every feature for the bzr utility must be supportable through the bzrlib API.

I believe Bazaar is quite unique in this approach.

Spam filter

gopherspidey's picture

Well I have tried to edit the page but I have been trapped by the spam filter.

My vote is for git

Also there is a git-to-cvs server that would or could aid in the transition. http://www.kernel.org/pub/software/scm/git/docs/git-cvsserver.html

This is great news

stephen.moz's picture

I think Drupal is going to see a much increased potential in the volume, variety and quality of contributions as a side-benefit of this move towards DVCS (in general) and Git specifically. Hats off to everyone.

Great hg tutorial

AdrianB's picture

Look at this new Mercurial tutorial by Joel Spolsky: Hg Init: a Mercurial tutorial

(Yeah, I know git is chosen, just wanted to share this because it looks great for those wanting to learn Mercurial.)

This made my day

bojanz's picture

I'm really glad to hear that Drupal is moving to a DVCS. My days of cursing CVS are over (Drupal development is the only place I use CVS these days).

In my spare time I maintain the Ubercart Affiliate 2 module.
In my 9-5 time I maintain two very large and vibrant Zend Framework projects, and we've been using Git for about 8 months now.
I must say it has transformed the way we do work.

I saw a couple of points raised here that I would like to address:
1) Git on Windows sucks
This is purely anecdotal, but we've been using msysgit for the whole time, never a single problem.
Download, point&click, it's done.
Also, we're using the "Git Extensions" GUI which is very nice (and integrates with KDiff3, which is included).

2) Git, by design, does not version empty directories
Open Git Bash (comes with msysgit)
cd my_empty_directory/
touch .gitignore

Voila! Git is now tracking the empty directory.

Cheers!

Flippant

Heine's picture

Okay, I don't really have time to revisit this exploratory debate that has already been carried to a conclusion, but your flippant post irked me sufficiently to take the time anyway.

I'm not sure what you do with Git on Windows, but in 1.5 weeks of use, with only a fraction of the available commands used, I already encountered two issues:

1. EOL problems
The msysgit installer asks a number of questions, one of which is to whether to keep line endings or autoconvert to the system's default. As almost all tools are able to deal with arbitrary line endings, keeping existing end of line encoding is preferred.

Unfortunately, that didn't go well; On fresh clones of the new-date core repository from drupalfr.org, git reported differences in numerous files. Setting auto.crlf to true was the only way to prevent the issue. I accept the small chance of binary file corruption (I don't push upstream anyway).

2. git clone -l doesn't properly handle relative paths, whether supplied as /relative/example or \relative\example, and manages to do something different for both:

d:\www\sandbox>git clone -l \www\core6 test
Initialized empty Git repository in d:/www/sandbox/test/.git/
fatal: failed to open 'd:/www/sandbox/\www\core6/.git/objects': No such file or directory

d:\www\sandbox>git clone -l /www/core6 test
Initialized empty Git repository in d:/www/sandbox/test/.git/
fatal: 'd:/Program Files (x86)/Git/www/core6/.git' does not appear to be a git repository
fatal: The remote end hung up unexpectedly

The obvious solution is to use an absolute path.

Important questions for a DVCS:

- does the tool handle CRLF/LF issues?
- does the tool support text in UTF-8?
- does the tool rely on filesystem specific features?
- does the tool have a future?
- does the tool have a future on our development OSes?

When addressing, please handwave less. Details are important.

Important issues

Damien Tournoud's picture

Heine,

Please open issues in the infrastructure queue (Git component), we need to figure those out (especially the line-ending issue, that already causes headaches with CVS users on Windows).

Damien

Damien Tournoud

Closing this thread

Damien Tournoud's picture

The exploratory phase is over, let's start the implementation.

Please meet us at http://drupal.org/community-initiatives/git to get involved. Locking this thread.

Damien Tournoud