Building a Drupal project using git is different than building Drupal itself, and requires its own workflow. I've been kicking ideas back and forth with Sam Boyer lately about how to make this process take advantage of all the Git power, but also be newbie-safe and as frictionless as possible. I think what we've come up with is pretty good: there's even code written! The process I am going to explain allows the following:
- Git-based updates for Drupal core and contrib
- The ability to patch/tweak core/contrib without the complexity of vendor-branches
- Portability for local development or Git-based deployment
- Unrestrained custom development: feature branches, tags, multiple repos
- Safe patterns that minimize conflicts and provide a clear resolution process
Pretty cool, eh? Expect a larger manifesto post from Sam in the near future, but for now here's where we are heading.
Using Two Git Remotes: upstream
and collab
In a vanilla git situation, cloning an existing repository creates a single remote called origin
, allowing you to push and pull code to/from that remote. However, a single origin isn't going to be enough to enable a best-practice Drupal on Git workflow. Why? Well, any good Drupal project should start with a clone of the canonical Drupal core project repository (or a repo that tracks core like Pressflow). This lets you pull updates for security and easily contribute back any innovations you make. Anything else is starting off on the wrong foot.
However, this immediately creates a problem because any project is going to need to add code in addition to core. Unless you're Dries, webchick (or davidstrauss) you're not going to be able to push back to the single origin, meaning you can't work as part of a team or use the power of git in any deployment workflow. No bueno.
Luckily this is something git was explicitly built to handle. The answer is to take a small step beyond the vanilla git workflow, and create two remotes: one for upstream
, and one for collab
. As you might have guessed, you'll use upstream
as a "pull-only" source to get updates and make patches, while collab
will hold your custom modules or themes, and allow you to work with a team and implement git-powered workflows.
The actual host for collab
can be anything. You could use your own private repository, github (public or private), or even a drupal.org hosted sandbox if you don't mind your work being completely public. Likewise, your upstream
could be any valid source. Drupal core from drupal is always a good choice, but any repository that starts with the canonical drupal history is also valid. You might want to use Pressflow, or maybe your team maintains its own "drupal-plus" repository (e.g. a distribution or quick-start set) which you use as the upstream
for projects.
Contrib Modules as Git Submodules
A best-practice workflow will follow the same pattern for Drupal contrib as we've described for core: allowing project builders the ability to pull upstream updates and easily contribute back their changes if they want. There's a problem though: git.drupal.org necessarily separates every contrib module into its own repository. If your project started off as a clone of Drupal core, how can you include a separate repository for Views?
The answer is Git submodules, which are designed to handle this specific problem. However, these are an advanced feature, and it's important for us to have a consistent pattern for using them.
Luckily the use-case for contributed modules and themes is consistent, and the commands you'll need to add them as Git submodules — as well as updating them, — are the same every time. In the event that you need to apply a patch or make an enhancement ahead of the upstream
maintainer, the same process for adding a collab
remote will work just the same.
Custom Development in collab
The particular development for your project happens directly in the primary repository, and is tracked in the collab
remote. This lets you work with a team, taking full advantage of feature-branching, local development, and branch/tag-based deployment workflows. With the small change of using collab
where you're used to using origin
, the git workflow of git checkout
, add
, commit
, pull
and push
works the same as ever.
This also means you should be able to use your favorite Git GUI or other power tools with no problems.
The only complication here is the case where you have multiple developers who are adding Git submodules as per above. In that case, in addition to pulling code from collab
as usual, it is necessary to run the git submodule update
command, and potentially rebase your code if you've added the same submodule as someone else and have a tree conflict.
Visualize It
In this version we have the sitename
project residing in /var/www/drupal
on a server. It's main upstream
is the official git.drupal.org/project/drupal.git
and its main collab
is on github at sitename/drupal.git
.
Additionally, we have added views
, wysiwyg
and jquery_ui
as submodules from drupal.org, and the tinymce
library from github. We have created a collab
repo for jquery_ui
because we needed to update some of its libraries.
The sites custom module(s) and theme are stored in the primary collab
repository.
Introducing dog
The workflow described above is safe and solid, but running all those git operations is a practical nightmare:
- Repetitive stress injury is no laughing matter.
- Missing a step or making a typo means you're at risk.
- Git has you all the info you need, but quickly assessing the status of a complex repository is a multi-step process.
As I'm fond of saying, human beings are really bad at repetitive rote tasks. It's not what we evolved to do, and we're unhappy and error-prone when subject to those conditions. Computers, on the other hand, love repetition and rote tasks. So let's make the robots do the $&*%'ing work!
dog
= a Drush extension for "Drupal on Git"
The Drupal project already has a wonderful robot helper tool in Drush. Since the patterns we are describing are completely regular, this is a perfect use-case. Better yet, code is online here:
Contributions are encouraged. As of right now, here's what Dog is specced to do for you:
dog-init [--upstream] [--branch] [--collab] [<directory>]
Initializes a new local project repository for building Drupal on Git.
--upstream
defaults to latest major stable branch (e.g. Drupal 7.x), acceptsdrush dl
style shorthand for drupal.org sources, or a full git url for using non-drupal.org remotes.--branch
local branch name; defaults tomaster
--collab
remote collab repository; defaults toupstream
<directory>
where to make the repository locally; defaults to the repository name ofupstream
Example: drush dog-init --upstream=6.x --collab=git://github.com/joshk/my-drupal-project my-new-drupal6-project
dog-dl [--collab] [<project>] [<destination>]
Downloads a contrib from Druapl, sets up submodule and updates the main collab
repo with the new information.
--collab
optional collab repo for this contrib. Necessary if you don't have write access to the drupal source and intend on making local changes. Can be added later.<project>
project from drupal indrush dl
style; also accepts a full git uri to support non-drupal remotes<destination>
destination for the module/theme; defaults tosites/all/modules
orsites/all/themes
Example: drush dog-dl views-6.2.x
dog-collab [<uri>] [<directory>]
Add a new collab remote to a module, theme or main repository if one was not set up initially.
<uri>
the location of the collab remote<directory>
path to the module or theme directory, or drupal rood; defaults to current working directory
Example: drush dog-collab git://github.com/joshk/my-views-patches sites/all/modules/views
dog-catchup
Pulls collab
updates and automatically brings new submodules in/up to date.
Example: dog-catchup
dog-upstream-update [<directory>]
Pulls upstream updates and commits them to the collab
remote if one exists.
<directory>
optionally specify a directory to update; defaults to current working directory and works recursively.
Example: drush dog-upstream-update /sites/all/modules/views
dog-status
Parses main repository and submodule status and presents an overview of the entire project.
Possible alias: dog-vet
dog-remove [<directory>]
Completely removes contribs added via dog-dl
and pushes that change to collab
<directory>
directory of contrib to remove
Example: drush dog-remove sites/all/modules/views
Possible alias: dog-gone
Project Manifest
In order to maintain the integrity of a project and insure portability for local development and deployment, dog maintains a manifest file for the current local project. The allows us the potential to dog-rollup
a project into a manifest file and then dog-rollout
the same project elsewhere in a similar fashion to drush_make
.
However, the dog manifest is entirely git-centric and must include the upstream
and collab
information. It will likely also be stored in JSON format.
In the longer-run we hope to see more convergence between drush_make
, the dog manifest file and possibly the site archive format since these are all different approaches to describing a Drupal project.
Scriptability
As a tool designed to automate the low-level git workflow, dog is itself designed with scriptability in mind. Any commands which allow interaction should include a -y
flag to run non-interactively, and they should all support a --backend
or --json
flag to do their output in script-friendly JSON.
Future Potential
We're hoping to get many Drupal projects "on the dog sled" to help "vet" these patterns and create critical mass around a set of best practices. There are also obvious implications for Drupal distributions, as well as the update manager. The sky is the limit here.
Comments
This is very close to what
This is very close to what I've been using for some time, with one exception, 'collab' vs. 'origin'. Why need to learn another Drupalism when anyone who has been using git for some time will be very used to using origin as a push destination, etc. I'd rather not have another non-standard here when typing 'origin' is second nature, and well set into my muscle memory.
@mikey_p It is explained in
@mikey_p
It is explained in the "Using Two Git Remotes: upstream and collab" - you will work with projects already using/having
origin
elsewhere (on d.o, GitHub etc) and you can't add anotherorigin
, and you can't push to the "original origin" for projects you didn't start, hence, usingupstream
andcollab
in this workflow as a new standard makes perfect sense.I think this is a point worth debating
I'm not sure mikey_p isn't onto something. I'm definitely against introducing new "drupalisms" unnecessarily, and it's true that people are used to "origin", and it still makes semantic sense in that the scope of the repository is the project itself (the site being built) and thus having the repository where custom code is developed called the "origin" seems fitting.
We're talking about projects that we are setting up ourselves, starting with Drupal and then adding our own code. The case in which there's a third source that must be called "origin" seems rare to me. Do you think this will occur often? Maybe I'm missing something...
https://pantheon.io | http://www.chapterthree.com | https://www.outlandishjosh.com
It is not only about possible
It is not only about possible conflicts with some existing "origin" somewhere.
Even if "collab" == "origin", there will be still "upstream" added anyway.
I found it confusing many times while working with many third party and my own stuff, especially with cloned versions of the same module, so I stopped to use "origin" at all and instead I'm using more meaningful names for my remotes, like "github", "gitorious", "drupal-sandbox" and so on. This is why I like ideas like "dog-vet" and "collab" + "upstream" because we are not machines and we like funny or at least meaningful association, since it helps us remember where are we pushing/pulling the code :)
Grace ~ http://omega8.cc
Using origin makes sense
Yeah, I'd add a +1 for having collab changed to 'origin'. I think this would help reduce confusion for newbie users who might be prone to manually enter something like
git push
without specifying repository. If the collab/origin/working repo is called 'origin' then git will push there automatically rather than throwing an error about not specifying a repo.I know 'origin' probably isn't as semantically descriptive as 'collab', but I think the benefits of not adding drupalisms and not confusing newbies are probably worth it.
The more I think about it the more I agree
Sticking with the normal behavior is the right idea, and the whole Dog automation system wouldn't really work if you didn't start out with a dog-style (furiously resists making pun) repository. In other words, taking over an existing git project which already had an "origin" would require minor surgery anyway.
Plus it's not that hard to rename remotes. Let's see if we can get Sam to weigh in.
https://pantheon.io | http://www.chapterthree.com | https://www.outlandishjosh.com
Kind of nice to see us
Kind of nice to see us building a (small! seriously not complaining!) bikeshed on this topic. I'll take that as an indicator of general interest in this approach on the whole, that we can focus on something like remote naming semantics.
I'm actually just working on the
DogSled
...err, manifesting system now, which is the central place from which a decision about the naming of the 'collab'-purposed remote would really happen. And for the most part, everyone here is right - it doesn't matter. And I don't care. It's quite easy to have name used for both 'upstream' and 'collab' be set on a per Dog-managed-project basis. And it probably is best to have it per-whole project, not per-instance - you should be using the same word to refer to the same remote as all your colleagues are.The place where it really matters, IMO, is the docs/helptext. e.g., in the help for
dog-init
:'collab' => 'The URI to use as the collab repository for this instance. If unspecified, the upstream URI is used.'
I don't know how to make that text work if we replace collab with origin, as the assumptions about origin actually put the reader at a deficit; after all, in the
dog-init
case, we're not cloning from collab at all. Judging from http://drupal.org/node/1122642 , it seems like that confusion could already be bad enough, without even calling it 'origin'.On a more general note, remote naming (it being a completely local-to-your-repo thing) is one of the safest areas in Git to add drupalisms. People name remotes crazy things; at least these names are pretty descriptive. Hell, see the git.git merge logs - Junio Hamano (Git's maintainer) names all his remotes as two-letter abbreviations of the most frequent contributors.
Agreement
@sdboyer You're totally right about the bikeshed. Everyone seems down with the dual repo, and that's probably the most important thing to grasp about the proposed dog workflow.
In defense of the 'origin' naming proposal, I do agree with @joshk's point that using origin would make it slightly easier to shoehorn dog into existing projects. At the same time, clear documentation is going to make life easier for everyone. Whatever helps people start working smarter faster.
On the subject of helping newcomers, perhaps a diagram that makes the dual repo concept painfully clear would help newcomers. Something like the following?
Apologies for the dodgy clouds - the diagram was done in a hurry.
Definitely a case where a
Definitely a case where a picture is worth a thousand words. You're entirely right that the goal is people working smarter, more standard-ly, faster - and while the diagram could maybe use a little work, it definitely captures the basic idea.
Just to clarify, I think the
Just to clarify, I think the basic idea inside the two remote upstream and collab is very sound, and is what I do now. I suppose my overall take away point would be that making the actual branch name used for different purposes configurable via a drushrc or something like that would be an excellent idea. (this of course would also support overriding via a site/project specific drushrc as well ;)
I also want to make it clear that I'm hugely in support of this proposal, as this has the potential to remove the biggest barrier to this workflow currently, which is that it's just plain alot of work to check the status of each submodule, and handle it's upstream vs. collab when needed. Automating that step alone would make a night and day difference in enabling a proper all git workflow.
FWIW, I've noticed that the kohana folks seems to encourage a somewhat similar workflow for kohana core, and the core kohana modules. (they don't go into much detail on the collab side, but encourage your actual kohana checkout to use the submodules for each kohana core module).
I'd love to see what the implication for moving Drupal core development over to that type of approach would be, but I imagine that our dependencies are too intertwined to support that kind of workflow for core modules and subsystems.
Veery cool :-) Going to try!
Veery cool :-) Going to try!
Manifesto
P.S. After thinking about this, I'm very, very interested in reading this manifesto that joshk mentions. Any change of a vague guess at when we might be able to see this? Next year? Next month? Next week?
Somewhere between next week
Somewhere between next week and next month. With any luck, next week. The two basic goals of the manifesto, as I'm writing it right now, are to a) step through the process of building a dog-managed site, and b) explore the world of possibilities that the standardized git approach opens up.
subscribing
+1
Learning more about Git and looking for a means to manage drupal projects for the long term. If others have good white papers on this topic would love to see them. Thanks.
This is great! The Drush
This is great!
The Drush maintainer team has been talking about something like this for a while - we also really want to enable making this kind of sophisticated git workflow easy to use. We have actually be considering making the next version on the "pm" commands highly git-centric (with a much simplified wget function as fallback). From my point of view it looks like what is described here is exactly what we need - I think we would want to include in some higher level interfaces (project name parsing/validation, dependencies, pm-info, pm-updatecode etc), but this would get us a long way. For reference see http://drupal.org/node/908212, http://drupal.org/node/814174, http://drupal.org/node/759906, http://drupal.org/node/797190.
The only thing I am not sure I understand is the manifest file - doesn't this just replicate the contents/purpose of config and .gitmodules? The URL for the collab repository should already be sufficient to rebuild the exact tree using "git clone --recursive git://host/repo.git", so I am not sure that distributing a file to do the same thing adds much. Of course, it would be really useful and important to be able to list out a manifest to summarize "what is in the site and where does it come from/go to", since collecting that info with submodules is really not that fun to do by hand. This feels more as a human readable command output though, not something that needs to live in a file.
There's some background on
There's some background on the manifest file at http://drupal.org/node/914284. Not sure if it will help it make more sense, but you'll get an idea of who was concerned about it, and they might be able to explain more clearly.
The Drush maintainer team has
I considered writing this as patches to drush core (albeit quietly) for a couple weeks before deciding to roll it out as a separate package. Lemme quickly be clear about what, in my mind, having it as a separate package does/does not mean:
I ultimately arrived at dog as the right approach after considering a number of different levels of drush integration and considering the implications that each level would have on ease-of-use for users, consistency of the overall experience, and the ability for the system to 'right' itself after user interference, etc..
I do not see dog as a challenge to or replacement for the existing pm functions, or even necessarily the git_drupalorg package handler. They work well for the patch-together-your-workflow case, and however cool dog might get, we shouldn't ever take away the swiss army knife. Folks can be luddites if they want :)
Also, I have no problem with eventually rolling dog directly into drush once it's more complete, if we all think that's the best way to go. I think it'd be great, actually - would certainly make provisioning with dog easier. But for now, I'd like to keep it separate until it matures, at least into alpha code.
That's where I'm at on it. Sorry to have been a bit quiet about it and just handwave about having a manifesto (which is, yes, still in progress); I kinda went dark on big, wide discussion when I realized that this needed a foundation. Now that this bit is out at least, I'd really welcome some wider discussions with drush peoples - though waiting for the manifesto could really give the fullest possible context.
Buncha reasons for the manifest. Here's a super-simple one to start: pretty much every dog repository is going to have two remotes - upstream and collab. There's no way to record that directly using submodules - it needs to be kept somewhere else.
There's also the case where you want to be able to make a pseudo-submodule that tracks a particular collab/ in a particular subrepository. You can't really take advantage of submodules very effectively at all in that case (nor will a recursive clone get you anything). That case is not on my immediate critical path with this, but it still shouldn't be impossible.
There's also the case where you might want to use a working directory for a repo that's outside of the webroot (helps address http://drupal.org/node/1119802). No way to set that up natively with submodules, but it's easy if you have a meta-system like dog retaining config setttings for
core.worktree
that it can roll out in every new instance.That last case highlights where I expect a lot of additions to be useful for the manifest file - custom repo config (as in .git/config) that is impossible to transmit using any built-in git functionality. And not even just config - this system will seriously leave GUI users behind unless we can roll at least some of the functionality into git hooks. And ensuring that the right sets of git hooks are attached to a repo can't be done without local action - managed by something like dog, reading from its manifest.
So while I most definitely agree that part of dog's responsibility is to automate complex-multi-step processes into unified concepts with single commands (because, let's be honest - submodules are really frustrating and annoying), it'll definitely need its own manifest data to do it properly.
I do not see dog as a
I don't see it as an either/or situation. If you need to circumvent pm to implement dog, I see that as a problem. I'd rather use this as an opportunity to fix pm, or detach the VCS stuff from pm like we've discussed before.
Regardless of how widely accepted dog becomes, there will be other approaches, so drush should enable projects like this to be very lightweight additions to the existing apis.
I ultimately arrived at dog
To be clear this was me saying "yay - you just did exactly what we were thinking", not suggesting that this should have gone into Drush core, or even needed a bunch of prior hand waving/discussion. As I suggested, the current PM commands split the package handler and version control functions into separate plugins/engines (which made a lot of sense in the CVS/svn world) and so currently is architecturally not able to do the kind of things dog does - trying to wedge a dog in here would have resulted in a dead dog! We do want to fix PM to allow this, of course - but we need to get a better handle what the next-gen API should look like.
I think it totally makes sense to have this as it's own codebase, especially at this stage. Once it is a bit more mature we can more clearly identify the interfaces (as in API, not UI) that PM would need (add project, upgrade project etc), and get dog, wget and git_drupalorg to all speak the same language. This will mean that normal pm commands could work in the same way from a UI point of view on both dog and non-dog sites. I don't think we are at a point where it makes sense to attempt this, although it might be interesting to look at the existing package handler (e.g. wget) and version control (e.g. svn) interfaces, and seeing if/how they map onto dog commands/functions.
Thanks for your answer here - this makes a lot of sense.
I guess my only question is the usability implications of users bypassing dog and doing things with git directly - obviously some git operations (commit, push...) should be fine and used frequently, but then other operations could get the git metadata out of sync with the dog metadata, and/or break dog functionality. How do users know what is safe/unsafe? Of course you could break dog functionality even without a manifest, but it seems like having the possibility for an out of sync manifest may make this more fragile. I am not to what extent this is really a problem, or what the solution is though. Perhaps some dog managed git config (auto-alias commands to include warnings if you try something risky, or use hooks to do the same?) could help prevent this?
Incidentally, have you seen http://drupal.org/project/githook?
@msonnabaum & @Owen Barton
@msonnabaum & @Owen Barton re: eventual integration & dog/pm - awesome. It sounds like we're all quite on the same page, so I look forward to trying and figuring out lots of shit as dog evolves, all the while being conscious of the lessons & ideas we learn wrt the existing pm system.
I think somebody may have pointed me at githook before, not quite sure. What I do know is that using hooks for the sort of validation/reinforcement of git actions done in githook is crucial to this very valid and important question - how do we keep dog and humans playing nice together? My thinking has two parts:
dog-vet
is our line of defense. It needs to be able to do really thorough vetting of an instance, identify inconsistencies and suggest solutions, or even automagically fix problems where possible.IMO, dog's acceptance is basically going to turn on how well we handle this problem. We want all these excellent goodies in the background, but if it all ends up meaning that using dog is as or more onerous than using git directly, then dog will be a failure. And rightly so.
This is great, thanks to
This is great, thanks to everyone who's been working on this. We've been evaluating and planning for a similar workflow. It will be great to have a standard around this which will make collaborating even easier and this approach makes a lot of sense. Snoop Dogg would be proud!
--
Gravitek Labs
This just crossed my mind,
This just crossed my mind, but I imagine would be a concern for quite a few folks: The number of hosted git repos they may need. Supposing they do some patching of contrib modules and make those as public repos on github, that wouldn't count against any quota there and could encourage collaboration*. But if a project needs custom modules, than that could be at least 1 additional collab repo in addition to the the collab repo for core. This could add up fast depending on how your hosting/github/unfuddle/whatever bills you. This could be quite a big downside.
Possible solution: Would it be okay to keep things that don't muck with core, but don't have an upstream repo (i.e. custom modules and themes) directly in the core repo? I've been using this now, and it doesn't seem to interfere with my ability to merge with upstream repo of drupal core from git.drupal.org.
Yeah, we talked about this a
Yeah, we talked about this a fair bit last week. There's no problem with doing that at all. It means you maybe get a slightly mucky history in your core repo, but oh well, we're all used to that. Best part is, if later on you decide to take that custom module and make it its own real repo that you contribute back, you just run a quick
git filter-branch
and separate out the commits on the subtree for the theme/module and make it its own separate repo. Ipso facto magico, push it up to d.o.Also, if repo proliferation is an issue, you can also just use a d.o sandbox for that code (if the code is appropriate to put in public).
Deployability
I confess to being a git neophyte but certainly plan to support it in Acquia Cloud soon. My primary input into the discussion, therefore, is to make sure that that whatever scheme we come up with is easily, automatically, and reliably deployable. This is critically important for all hosting environments like Dev Cloud and Pantheon that want to automate the deployment process.
In my mind, deployability requires:
Somewhere, accessible to the hosting environment, there exists a single repo that contains 100% of the code that makes up the site's docroot as it should be deployed at any given moment. The repo can can one or many branches, and the owner can do whatever it wants with it, but when the hosting customer tells the environment, "please deploy the HEAD of branch ABC, or symbolic tag XYZ", the hosting environment needs to know exactly where to find all of the code. If that requires contacting multiple remote repos to get submodules or whatever, that is likely to cause an unreliable deployment process because at any given moment some of those repos might be unreachable.
Also, if the hosting environment needs to know anything about the methodology the repo owner is using in order to find the code (e.g. "this is a DOG repo"), that is likely to lead to complexity and unreliability, because there will always be edge cases, people who need to do things slightly differently, whatever. So saying, "there are remote repos you need to contact, but you can find them in the file foobar/.manifest", while seemingly implementable, is going to be painful long term.
To understand the need for this, consider a cloud hosting environment. It might want to build new web nodes on the fly as load increases, so it needs to know where to get the code without any real-time intervention from the site owner. Or in a single-server setup, the server instance and disk that the site is currently running on might suddenly fail, so the environment needs to rebuild that server as fast as possible.
Those last examples imply another desirable property, if not a requirement. It should be possible, and in fact the most common setup, to have the primary repo for a site hosted at the hosting environment. When a new server needs to be launched, it is much more likely that the hosting environment can reach a repo on its own servers than that it can reach github or whatever, especially when the site is hosted in Singapore or Australia or on the moon.
DC (and I think Pantheon) are set up to automatically deploy any new commits to the currently deployed branch. This means we need to know when a new commit arrives. If the repo is hosted locally, that's easy. If it is remote, either that repo provider needs to provide a remote callback for new commits (e.g. a URL they invoke), or else we'll have to poll it, which won't win any friends. I guess this is not really relevant to the repo methodology being discussed here, but I just thought I'd mention it. :-)
Both Dev Cloud's and Pantheon's UIs allow the customer to perform various repo write operations (commits, branches, tags, etc.) from the UI. This requires us to have write capability to the repo. Currently, we both operate with a local repo that we control, so write access is easy to arrange. For a remote repo at github or wherever, presumably the user could give us credentials to that repo for us to use. This isn't that complicated. However, it does point out one potential wrinkle. It occurs to me that one response to the need to have a single repo containing all the code will be, "well, the site developers can just push all the changes they want to the hosting environment's repo." In that case, though, if the hosting environment is deploying from a local repo but the primary repo lives elsewhere, then when the user performs an action from the UI (e.g. "create a new tag to deploy right now"), that tag will be created in the local repo---or else the environment will need to create in the remote repo, then pull the changes, and then deploy them, and that might violate the expectations of a dev team that thought, "hey, we're pushing all our changes to the hosting env, so what are you doing writing to our primary repo?"
Sam: I'm looking forward to discussing all of this at our call next week. :-)
Sam: I'm looking forward to
Me too :) But in advance of that, let me clarify one thing, re: this point -
I think there's a key misconception here that threads throughout, so let me at least clear that one up. While 'upstream' repos are at some canonical location (e.g., drupal.org), 'collab' repos can be located anywhere. And 'collab' is the only repo from which a new site instance is ever rolled out. So a hosting provider invested in dog can (and should, to smooth the process) provide collab repos for everything - so all the repos are indeed locally available within the hosting provider's network. Which also means writeability. And notification hooks that are readily available. My thought all along has been that hosting providers like Dev Cloud would do all repo hosting locally, and maybe possibly in the future allow hosting from an external provider.
One other point to address, though:
The goal of dog is to create a portable Drupal package that is both versioned and deployable, both human & machine read/writable, and easily encapsulated within wrapping deployment workflows, config management, and/or build systems. In the current implementation/spec, it carries a pretty minimal amount of information required to do the internal assembly - and most of that is just git settings to pass around. To that end, I should note that building workflow-specific applications on top of Git is not something we should be circumspect about - because the Git developers aren't. Read the git dev list for a bit - you'll see that the express intention is for people to build systems around Git.
That said, I entirely agree that one of the primary requirements for dog is that it is internally robust - that it is nigh-impossible to get projects managed by it into unrecoverable, or even unpredictable, states.
love the idea of In the
love the idea of
--
mike stewart { twitter: @MediaDoneRight | IRC nick: mike stewart }
Site Builder Guide
Along these lines, I've posted a first draft of a rewrite of the Site Builder Guide back at http://drupal.org/node/803746.
This thread has been very helpful in a number of ways, so I'd love for you all to take a look and let me know your feedback - file issues against it if you feel so inclined.
I should preface it by reiterating something that I tried to convey in the introduction - it is meant to be am entry-level guide to using Git in this manner, without any additional tools such as Drush or Dog. I definitely think that there will be additional pages outlining those workflows as well, but I felt that the best start was to approach it from a lower level to give people a direction to follow and build upon with the additional tools.
Let me know your thoughts!
Good work
As the person whose work you wiped away - I couldn't be more delighted!
I really like how the instructions are independent of DOG or Drush.
We can add on other pages detailing where DOG and Drush can help automate certain tasks.
Some good diagrams are in order, that could help clarify that there are multiple 'read-only' external repos - for core, each of the individual contrib modules and themes etc, but only one 'writeable' repo is really needed [unless you patch/hack a module].
Thanks!
First of all - let me just say I breath a sigh of relief to read this. :) I felt a little bit of angst at totally blowing away your original documentation, but in the end (and with a little bit of convincing) I decided it was better to ask for some forgiveness rather than permission.
Yes, I made a conscious effort to make it a workflow that could be implemented by someone with only Drupal and Git at their disposal. Even after we build documentation around using the additional tooling, I think it's important that there's a resource that explains the process from a lower level. I fully expect that this first-ish draft will change as we get more best practices defined based around what Drush and Dog do.
Also, good thought on diagrams - I agree 100%. One of my partners is going to be presenting at the local Drupalcamp and he's creating diagrams for the presentation that could very well make it into this document once they're done.
Example of sexy git diagrams
Example of sexy git diagrams :)
http://nvie.com/posts/a-successful-git-branching-model/
Liked how this was written;
Liked how this was written; great documentation!
Saw a few things I stumbled on while learning git fairly well explained.
Would really like to see something about subtree instead of submodules... I remember that there was a nice thing written about those in the git pro book just after the chapter on sub-modules (and how not to use them).
http://progit.org/book/ch6-7.html
Need to read more of the thread to see if this is discussed down there, but I'd like to see if some though have been put into using subtree merges instead of submodules in this proposed workflow.
Switching Cores
One workflow situation has occurred to me.
How easy would it be to switch cores in this git scenario.
For example, we start a project with a clone of Drupal core from Drupal.org,
add modules, themes etc from individual drupal.org repos.
Time passes, the site gets popular, we need to move to Pressflow Drupal.
How easy is to do that, and maintain full history etc?
Would Pressflow have to start a new Repo that forks Drupal.org drupal core,
and once that was in place, we could switch upstream to Pressflow easily.
Or could we use the Pressflow git mirror as it currently stands?
As long as that alternative
As long as that alternative core is a good steward and is itself based on a clone of the original Drupal core repository, then it's not too tough. It'd be more or less the same as a standard update from upstream, actually - except that you change the git URI for your upstream source first.
All variants should start w/canonical drupal core
This is correct. Ideally any an "alternative core" like Pressflow would start with the canonical Drupal upstream. That way you will be able to merge them in cleanly. I know this is the plan for Pressflow 7. It is also a fine plan for people who want to make very involved installation profiles or "drupal plus" applications.
In the realm of Drupal 6, you can generally do this with a "rebase". Git it smart enough to realize when files are the same, but it's a lengthy process and doesn't give you quite as nice a version history.
https://pantheon.io | http://www.chapterthree.com | https://www.outlandishjosh.com
I'm new to git, but
I'm new to git, but understand the project layout you're talking about. I like the idea of letting dog handle the messy stuff for me, but I need to start working today, and dog isn't ready yet (or so I understand).
Can you publish the git commands you're using to structure a project using dog, so that I can create the layout now, manually, and let dog take over when it's ready?
Very valuable question,
Very valuable question, should have laid this one out. Unfortunately, it'll probably be a bit tough for newbies to slog through - as you noted, what's under the hood here is messy. But if you follow these basics:
What you create should be basically compatible with dog, once it reaches maturity.
Thanks a lot, definitely want
Thanks a lot, definitely want to try this.
Just one question.
Separate repositories for processed releases.
What about the stuff added by the packaging script - UPDATE.txt, expanded $Id$ and info file?
If we pull from git, we don't get this stuff, do we?
This is especially relevant for existing projects, that already do contain the added stuff. Switching to the unprocessed git version of modules and core will make a huge diff with a lot of pointless noise.
Switching costs
Switching costs moving from an existing project to this style are unavoidable. The diff really shouldn't be that bad as a one-time thing. Adding the upstream via git surgery will be harder, and I don't imagine that would be an "automatic" feature for quite some time. The near-term answer is going to be to start new projects with dog, or be prepared to spend some time on the transition.
https://pantheon.io | http://www.chapterthree.com | https://www.outlandishjosh.com
Having the processed core and
Having the processed core and contrib as "upstream", instead of the unprocessed one, would make the gap to a traditional work flow smaller, which would be a good thing. Or do you think the expanded $Id$ would have any unpleasant effects? Such as, more noise in changesets.
The expanded $Id$ is a big
The expanded $Id$ is a big PITA. That's why we removed it as part of the migration. Look through all the new Git repositories, you'll see they've been removed. And look in any tarballs generated from releases made after the migration was completed - you'll see there are no expanded tags in there, either.
Ah, I actually misunderstood
Ah, I actually misunderstood this a bit at first because I didn't read the link. So:
Updating existing projects to use dog is not something I'm looking to deal with right away. It's important, obviously, but the basic functionality needs to be working for new sites before we can think about scooting existing ones into this format.
And to be clear, it's not just "especially" relevant for existing projects. It's ONLY relevant for existing projects. If you're using dog, you should never ever ever EVER have a single tarball from d.o in your system. All git repos, all the time, period. Hybridizing makes things unnecessarily complicated.
never ever ever EVER have a
The idea was, if the processed stuff (that is, LICENSE.txt included) was provided as a (read-only) repo, then we could start from there.
For the git workflow it would be nice to have a LICENSE.txt added, but we don't need expanded $Id$ (good to know it's gone), and we can discuss if we want the stuff in *.info or rather not.
I personally like the *.info stuff, because it's an easy way to know the version of a module. And probably the "available updates check" also uses this information. Yes we don't drush up anymore, but we might still want to have the warning messages about available security updates.
So, I think it is reasonable to ask in that linked issue, if d.o. could provide a repo with the processed module releases. And once we have that, I imagine we all want to switch to that one, if only for the LICENSE.txt.
We could even think about an intermediate repo that only has the LICENSE.txt, but nothing else.
And if d.o. does not want, someone could set up the same thing on github or somewhere else..
Individual projects can add a
Individual projects can add a LICENSE.txt. Truth is that core, at the very least, should probably put one in. That's how it gets in the git repo.
So when you say ~"a read-only repo of the processed stuff," there are a few possibilities to what that could mean:
The first proposal has already been out there for a long time: http://drupal.org/node/806484 . I don't like release repositories, because IMO they solve a problem that doesn't really exist - and in the process obliterate everything useful about git history. If you want to use tarballs, then USE TARBALLS. Don't just wrap their data in a git repository because "hey, we use Git now!" If you want to do that locally, fine - but I don't see a reason to invest infra time and resources in doing it. Beyond that, I see it as an inferior method for sitebuilding, so I'd actually rather we not support it at all, as that'll give it the impression that it's a good idea.
The second method is simply not feasible, period. We'd have to have background workers do nothing but continually rebase a commit on top of tens of thousands of repositories - ALL of which are copies of the real repos, and need their own repo location strategy, management when things go wrong, etc. And all of that so that people have a repo they can clone which does an upstream rebase on every single push. So every single merge from upstream will be painful and nasty.
As for the information in the .info file and managing upstream updates (security or otherwise), I'll say it again: git_deploy takes care of that.
The real question is - what problem are you trying to solve with this?
And..."I like doing it this way" isn't a reason. Dog is about codifying some best practices into real, assumable rules - not about accommodating every possible way to put together a Drupal site. That's what we've already had for ten years.
The second method is simply
All of this can be automated.
Maybe it will be resource-expensive - in this case we should probably discard the idea. But maybe it is not.
The release repo would have the original repo as a remote, and it would have its own branches for all published releases.
For every new release, it would check out that version from the original repo, add the LICENSE.txt and *.info stuff, and commit the result into the release branch. That's the minimal thing, which does not require any merge or rebase or whatever.
The benefit is small, but so is the cost - or if it is not, we just say goodbye to this idea.
If we want to be a bit smarter, then we need to somehow make both origin-1.1 and release-1.0 parents of release-1.1. Not sure how exactly we would do that, probably involves merge and/or rebase. But still, it would be automatic.
And, in case of merge conflicts, we can always take the version from origin, then add the usual stuff (LICENSE + info), and declare this to be our merge result.
I need to read a bit more about git, but from what I know so far, this should work.
Expensive or not, only a test can tell.
This discussion is now
This discussion is now officially completely off topic for dog. Let's please move it to the issue you opened.
git_deploy is a real dog
So dog will depend on a working version of git_deploy, whose project description says, in total:
Placeholder for an analogue to the CVS deploy module. This needs to become a real, finished and tested module for phase 2 of the migration to be finished.
See #1013302: Versioned dependencies fail with dev versions and git clones.
Bravo!!
+1 (subscribing), and:
after learning to use Git and drush over the last year, I have been longing for (and making my own private - and rudimentary by comparison to this discussion - forays into) developing a process similar to the one(s) described above. I just want to thank all of you for bringing this discussion out in the open and for the work you're doing. The biggest "hole" in working with Drupal for me has always been about process, and I'm frankly happy to be alive to see this all happen - such exciting work!
Equally exciting for me is seeing the birth of Pantheon (for the same reasons stated above), and when these processes mature - Pantheon + DOG - life as a Drupalist(a) will be sweet indeed.
Thank you all very VERY much!
--Kelly Bell
Gotham City Drupal
twitter: @kbell | @gothamdrupal
http://drupal.org/user/293443
DOG or Aegir+Drush Make?
Hello,
I can't seem to understand why we should use your workflow instead of the one explained here:
http://greenbeedigital.com.au/content/drupal-deployments-workflows-versi...
This is a real question and not a remark. I am really trying to make a comparison between both workflows. So, what are the pros and cons from your experience?
Best regards.
Not either/or
DOG is similar, but somewhat more ambitious than Drush Make. You should review what we're talking about in terms of using git submodules.
The Aegir+Drush Make creates "one big repo" for each platform. That's fine, but you're stuck doing all upgrades by hand. A big part of the value of the DOG process is having the ability to pull updates/upgrades to Core, Contrib and your own Custom code from their actual sources. This lets you really leverage git much more.
Whether or not you then use Aegir to deploy is a totally other question.
https://pantheon.io | http://www.chapterthree.com | https://www.outlandishjosh.com
I fail to see how building
I fail to see how building drupal sites as a profile+make file creates "one big repo". I've found them to be incredibly slim since all you track in version control is the
.profile
file, the.make
file, custom modules and themes, and usually a shell script to easily rebuild the codebase usingdrush make
.Granted upgrading core/contrib is easier using DOG but not by a wide stretch. With
drush make
you just type in a new version number to use."One big repo" is in contrast
"One big repo" is in contrast to a cluster of repositories, which is the approach dog takes.
I disagree. With drush make you just type in a new version to use, IF:
1) You have no patches to that module that need to be applied.
2) If you do have patches to be applied, the patches still apply cleanly after the upstream changes.
If you've been a good steward and gotten your patch contributed back upstream, you get penalized - the patch will fail, and you have extra work to do to clean it up. Not a lot of work - but enough that it makes it not a safe task for machines to run.
And that's a lot of the point here - some of the differences between drush make and dog certainly are minor (I'll reiterate that the original plan WAS to just extend drush make for this). But they're enough to make the difference between a system that's machine read/writable throughout the entire lifecycle of the project, vs. a system that can only automate certain initial/setup-type tasks.
We have skipped over the
We have skipped over the "why" a bit in this post, and I'm probably going to continue to skip the more in-depth discussion here as I'm working on explaining that elsewhere.
Summary version, though: drush make was the best we could do in the days where CVS was the upstream. And if you're unwilling to deeply embrace Git, it's still pretty much the best out there. But now that we've migrated to Git, there's a whole new world of possibilities. Problem is that while I know there's amazing, robust stuff you can do with Git+Drupal, it takes a fair amount (and in some cases, a very high amount) of knowledge to unlock that. Dog is really an attempt to bring that powerful workflow potential to bear in a way that anyone can use it.
Some bulleted benefits I see of dog vs. drush_make, in particular v. the link you shared:
so, that's off the top of my head :)
What I really like about
What I really like about
drush make
is that you can look in one place (the.make
file) to see a full definition of a sites structure. Versions of all contrib, patches (there aren't any patchfiles scattered around, they are all defined in a single place) libraries etc. I worry that with a system like DOG I would lose that.That's what dog-vet - which
That's what
dog-vet
- which reports the overall status of a given instance - is for. Which has the advantage of being able to differentiate between what should be there (based on manifests) vs. what is there.* Because dog can rely
Any example? I don't see what you mean by "custom versioning & heuristics"
Same here. What do you mean? Are you talking about patching drupal core?
Is it a fact? I mean, how could we ever know?
I thought Drush Make was able to get the patches by itself?
This is indeed one important point. You could use DOG since the very beginning of a project. With Drush Make, you need to wait until the end and then, it is less interesting.
Thanks for your answers
Best regards.
What do you mean? Are you
Core yes, but much more likely, downloaded contrib.
Not a quantifiable one. But consider the difference between inheriting a site built using whatever random methodology the devs decided to use, vs. inheriting a site built with a structured methodology like dog. Or even just coming back to maintain a site you built a year ago - do you really remember all the tweaky little things you did? Dog would.
It is. Question is, where do you put those patches in the first place for drush make to grab? It's out-of-band data that you have to come up with a storage strategy for yourself.
Like I said, there's a bigger article coming on this - I'm sorry to dangle that, but writing these things up takes me a lot of time, and I want to do it properly at the right time, once the picture has cohered a little more.
"Like I said, there's a
"Like I said, there's a bigger article coming on this - I'm sorry to dangle that, but writing these things up takes me a lot of time, and I want to do it properly at the right time, once the picture has cohered a little more."
Please, buy a good coffee machine, and release it asap...;) If you can put some Aegir integration in there too, it would be great;)
Anyway, thanks for your answers, I believe I am starting to understand what you are trying to achieve.
Best regards.
I get the feeling I'm using
I get the feeling I'm using
drush_make
differently than others. I use it from the very beginning of a project by forking Build Kit and then building up the.profile
a little bit and the.make
file a lot as I add to the project. Why would you only begin to usedrush_make
at the end of a build?From a quick glance at Build
From a quick glance at Build Kit, it looks like it's achieving a similar thing that dog does. My guess would be the difference is that dog is structurally built around the idea of an arbitrary large number of "Build Kit"-like baselines, which can act as anything from full-blown install profiles to just quick helpers. Dog's role, then, is quickly and easily manipulating such kits.
Same principle, though. The fact that Build Kit requires git is what makes the big difference between it and plain drush_make's capabilities.
And actually, I realized I
And actually, I realized I didn't comment on the Aegir part - really, Aegir has minimal feature overlap with dog. Dog could (and we hope, will) be paired with Aegir and act as the engine for managing sites in the same way drush_make does. drush_make is a less end-to-end capable engine. Or will be, rather.
Also, just to be clear again, I'm not slagging drush_make here. It's an excellent tool - but it allows any sort of sources & doesn't care about the underlying vcs your site is stored in. Allowing that flexibility means it can never tackle cohesive, complete workflows in the way Dog can. That from a guy who's worked on the cross-vcs platform that runs d.o's git infra :)
As others explained above
As others explained above already, DOG could work as a perfect tool to manage your code, while Aegir and Drush Make are designed to work on different level, since they help to manage your sites and their environment (the platform) but they don't help in managing the code at the low level at all.
It helps to think that in Aegir context the app is not a site, it is an install profile with corresponding makefile, used by Drush Make to create an environment (the platform) where the app lives, and the site is just a deployed and managed (with the help of Aegir) instance of the app.
But that instance (the site) and its environment (the platform in Aegir) still has the code you may want to maintain both on the platform level and on the site level (for site specific stuff), and you don't want to maintain separate platforms per (every) site, plus, Aegir will not help at all with the code in the site specific space (it just moves it between platforms as-is w/o any comparison checks etc), so you still need something to manage/track the changes at the code level, and this is where the DOG can be a perfect match for Aegir and Drush Make, not a replacement for anything, imo.
But that instance (the site)
Yeah, absolutely agree. Describes the separation of responsibilities quite well.
I'm not quite in harmony on this point, though, at least not within our initial plans. Because...
The initial plans are not really targeted towards being able to make a dog instance out of just anything. They need to be crafted in a specific way. Which means dog will need to do the initial setup as well as ongoing code management - it won't be able to deal with something drush_make has built.
Of course, that's a problem I'd love to see solved, and frankly it shouldn't be that hard to solve it. There are two basic approaches, both of which are worth pursuing - teaching dog to read drush makefiles, or teaching dog to convert an existing site tree into something dog-compatible. But from the perspective of completing dog's basic featureset, that's out of scope: the bulk of what makes dog dog are all the things it can do in a working setup. Until those things get built, dog is vaporware, so there's no point in investing effort to allow multiple routes to creating a working setup.
git-subtree
Submodule are great but definitely a complicated concept, hard to follow and deploy. You fracture the project in pieces but when you want to move all those pieces on your dev/staging/prod server or to a co-worker it become quickly a mess (or perhaps I miss something). With one repositories you can just do a "git push test-server" and use some magic git hook to checkout the code where you want.
So for Drupal project I'm looking more in https://github.com/apenwarr/git-subtree approach than having 50 submodules.
You fracture the project in
Yep, you've missed the point of dog - take that fractured repository and weave it back together, automatically and transparently.
The histories created by the subtree approach are a mess, unfortunately.
I see the dog-rollup and
I see the dog-rollup and dog-rollout. But how do you push regularly on a staging server? If you push to a remote (bare repositories) and checkout a branch outside the repository can you still build submodule?
I add my servers as remote and when I want to update the code I just push to those remote (like http://toroid.org/ams/git-website-howto)? So the version on test server is almost always the state of code without the need to really deploy anything.
Dog's goal is not to be a
Dog's goal is not to be a transport mechanism between multiple instances. It's based around a hub-and-spokes model, with the central collab repositories at the center, and every functioning instance (be it for dev, staging, prod, whatever) acting as other independent spokes. So you don't perform a deployment from your local dev instance - at least, not with dog's built-in command set. That's a separate layer of responsibility - one that dog is very much interested in working with and providing useful data to, but not directly responsible for. Maybe you wire up your post-receive hooks to trigger a deployment on a particular server whenever a push comes in; maybe you trigger a build system like phing or ant; maybe you run it all via Jenkins. The idea behind dog is to expose information that makes it trivially easy for you to construct your own deployment events like that, but not to be directly responsible for it. Like I said, its concern is with the communication between hub and an individual spoke, not spoke-to-spoke. There are simple enough tricks that you can employ to make that happen, but it's not our core focus. At least not for dog 1.0.
If you really wanted to set up a test server that you directly pushed to per your example link, it could be done - but it would require a post-receive hook on that repository that triggered up some dog logic to ensure everything gets put in the right place. Can't do it (reliably) just with core.worktree or GIT_WORK_TREE, unfortunately.
Thanks for this interesting
Thanks for this interesting and complete answer. I'm impatient to try it out on a real projet but I will wait a 1.0 and adapte first my deployment process.
noob question
So, should I use --upstream=7.x or leave the default. Would there be any difference?
never mind
I see the default is the latest stable branch. Stupid question anyway...
aegir
Is it possible to use dog-init --upstream= with aegir repository? If so, which one? provision? hostmaster? ??
Got the reply from sdboyer at
Got the reply from sdboyer at #drush
Nice approach … but
The approach dog takes, is very similar to what we do at my company, but it lacks one important, at least for us, feature which I'll talk about in a second.
Our goal is, to stay as close to the upstream Drupal as possible, but when we write patches we need them as soon as possible in our testing/staging/production environments. We therefore have our own Drupal repo, the one dog likes to call contrib. (On side note: I think it's a terrible idea to call that remote contrib, Git best practice is to have an
origin
and anupstream
remote!). Those patches a very general and not at all project specific, this means that when we build a new Drupal-project we like to reuse that same Drupal-repo for new projects and share it across all our projects.The solution we came up we like to call deployment-platforms and it's basically a very small Git repo with a lot of submodules. It looks like this:
.git/
.gitmodules/
drupal/ <-- a Git submodule
libraries/ <-- contains various git submodules
modules/ <-- contains various git submodules
themes/ <-- contains various git submodules
For this to work we had to convert:
sites/all/modules/
sites/all/libraries/
sites/all/themes/
to relative symbolic links:
sites/all/libraries@ -> ../../../libraries
sites/all/modules@ -> ../../../modules
sites/all/themes@ -> ../../../themes
This allows us to pull in our very own shared Drupal repo at very specific versions for each project/deployment-platform. We can use the deployment-platforms to install code on test/staging/production environments using Git:
git clone <url>
git submodule update --init
Two very simple commands.
Now, Git is NOT a software deployment tool! So, what we're trying to do, is to have Jenkins/Hudson use the deployment-platforms to run tests and build Debian packages for Drupal, modules, themes, libraries. Those packages are automatically injected to our package repositories, which later are used by APT and Puppet for deployment to the target systems.
Working with the deployment-platforms during development is fairly easy and involves only a few extra but simple Git commands like:
git submodule add
git submodule sync
git submodule update --init
Coming back to the important missing feature I mentioned early. With the approach dob takes, it not possible to reuse the contrib Drupal repo across multiple projects. You could probably have different branches for different project, each branch having different submodules at different version. But that's a mess when switching branches because Git at the moment is not able to remove unused submodules when switching branches. Then, depending on your workflow you might have different branches per project (master/staging/production) so that you would have 3*(number of projects) branches plus all the branches coming from upstream, plus the features branches you use for development. This is worse than ugly!
I like the idea and approach of dog and I hope our approach gives you some ideas for dog.
Lot of great points here,
Lot of great points here, thanks for the thoughtful response. Took me a while to respond because I was thinking a lot about it :)
First and quickly, on the annoyances of submodules (such as them sticking around across branch switches): yep, and if you read over some of the other comments in this thread, you may see that while I was initially inclined towards always using submodules everywhere, problems such as that, as well as the spammy history it can result in, led me to only use them in certain cases. Where submodules are used and a branch switch occurs, the post-checkout hook is where we do the cleanup. Getting the right hook into place in an automated fashion is one of those things dog is good at doing.
Feasibility of sharing a single contrib repo (e.g., Views) across multiple site projects: when I was first thinking about dog, the case where a shop would want a base install to start with was actually near the front of my mind. And yes, if you use a remote configuration that's unmodified from the base set up by
git clone
, e.g., this:[branch "master"]
remote = collab
merge = refs/heads/master
then you'll have a proliferating mess of branches. But do just a little namespacing magic, say for a project called "projectfuntime":
[branch "master"]
remote = collab
merge = refs/heads/projectfuntime/master
And your local master branch is linked to a namespaced branch in the collab repo called "projectfuntime/master". (There are a number of other tricks, but that's a start). It is true that there's no git-native way to just fetch a namespaced subset of branches with a glob (fetching a single branch is quite easy, but we also need to be able to discover feature/hotfix/etc. branches), which means the remote listings will get cluttered - unless you do some cleanup in some custom porcelain. Like, say, dog.
So, I strongly disagree that it's impossible to reuse the collab repos on multiple projects. In fact, I don't actually see how, looking at your description, your pattern makes anything more reusable than the above outline for dog does - it's still just submodules pegged to a specific ref. Dog can do that, or it can attach to the repository more loosely. Where the subrepo is ultimately placed within a particuarl dog-managed project is irrelevant to reusability - you still have all the same basic problems of many-branches-to-contend-with, unless you make the assumption that all your projects will be in lockstep on their shared repos.
Also, a quick note - I'm hoping to roll Phing in to dog somewhere, also probably pretty transparently to the top-level interface, as the tool simply won't be that useful unless you have a way of managing target-specific variations between instances (e.g., differing mysql connection strings on your local dev vs. staging boxes). Build targets ftw.
It's
collab
, notcontrib
- contrib wouldn't make sense at all. I disagree that it is a "best practice" to use origin, especially in a case (like this one) where you have two very important remotes that need to be interacted with regularly. Using "origin" works best when there is just one remote repository you interact with regularly; I came up with this convention in the first place specifically because I wanted to avoid those connotations, and the names reflect the purpose.I considered a repository layout like this, with an outer super-repo that contains the core clone. . Truth is that there are advantages to having the base dog repo != to the webroot - a lot of potential advantages, and it's my preferred solution for the long term. Many sites, especially bigger sites, want to put stuff into the repository that really shouldn't be under the webroot, for example. I initially rejected it, though, because it hadn't occurred to me that we could place all repos in the thin, outer super-repo, thus avoiding the hellish commit noise of submodule commits from a module echoing first up into the core repo, then again up into the super-parent dog repo.
However, having reflected for a few days on this layout you've suggested, I'm now about 60% sure that I'm going to take a note from it. We'd adopt a strategy where we have a base repo that contains ALL other repos, including core (there's a nice symmetry to that). The biggest drawback is an ease-of-use consideration - before, you only needed to set up one collab repo per project. Making this switch would mean at least two are necessary - the dog super-repo, and the core repo. I worry that that could be just enough of a hurdle right at the outset to keep people from adopting dog.
We're pretty well in agreement there. Neither Git, nor even dog, are deployment systems in and of themselves. I think people get confused because they're mistaking necessity for sufficiency: you need good versioning & collaboration tools to build your project in the first place. Add in the fact that git can do all kinds of transport, and it's easy to make the mistake of thinking that deployment is just a hopskipjump away from your normal dev process. But it isn't - or if it is, you probably lucked out. Dog is really about building that portable package that a true deployment-focused tool can easily roll out elsewhere. The set of tools you've described are a shining example of a way of how these tools are package-builders that slot nicely into tools which really do deployment and provisioning. Actually, I'm very interested by what you guys have set up and would love a tour, if that's possible? :)
... Now, Git is NOT a
I disagree with you both here. There's no reason that Git cannot be used as a deployment mechanism. It's very handy to be able to do
git push www1 master
and the post-receive hook on that remote runsgit checkout -f
into the worktree (expanded guide here: http://caiustheory.com/automatically-deploying-website-from-remote-git-r...).Something like that scales for multiple web heads, and makes for nice easy deployments, IMO. There's the added benefit of "I already use this", so I don't have to learn yet another tool to do a deployment.
--
Cameron Eagans
http://cweagans.net
Using the wrong tools for the wrong tasks
This is going a bit off-topic, I don't mean to hijack this thread, but I've got to respond to this.
Excuse me for being blunt, @cweagans, but what you say is naive and plain wrong.
Git is a DVCS, full stop. It's a brilliant piece of software, that's why it can be (ab)used in ways no one would have thought of. Of course it can be abused for software deployment, I use it for that too (unfortunately). But not because it's my tool of choice for that job, it's not even suitable for it! I just didn't find the time to assemble the proper tools and develop the missing pieces for the targeted workflow.
When you set up web-servers to run Drupal on them, do you install Apache, Postfix, Nginx, Varnish, APC, PHP, … with Git? I hope not! Do you download the source code, configure, build and install it manually each time? I hope not! Why do you think that should be the way to do it with Drupal?
Git lacks basic features of a package management system. But that's not a problem, because it was not designed as a package management tool. Deploying and installing software is not the purpose of Git.
Of course one can use a chisel instead of a screwdriver, but those tools serve completely different purposes. If you're doing your job well, you're not going to use one for the other. And you shouldn't be telling others to do so.
I'd love to see Drupal people spending there time building reasonable and useful tools, embracing what's already there (packet management: apt, yum, macports, …; configuration management: puppet, chef; continuous integration: jenkins, hudson, …) and respecting best practices, instead of wasting their time thinking of ways how to abuse existing tools and letting others believe that this is the way to go.
The guys at Debian (and Ubuntu), do a proper job at packaging Drupal and modules for those platforms, and then there's the wonderful dh-make-drupal. But, for good reasons, their package repositories do not hold recent enough packages, at least not recent enough for us, that's why I'd like to build a workflow that involves Git, Jenkins, dh-make-perl and reprepro.
I'd love to see d.o providing package repositories (deb and rpm) for Drupal itself and its modules. I think this is the only way to go if Drupal wants to be a grown up, industry grade project. An initiative for Drupal 9 or later? I hope so!
Giving a tour
Hi Sam
Sorry for letting you wait so long for a reply and thank you for reading and replying so thoroughly to my post!
I'd be happy to give you a tour of our setup. Either this August in London, or earlier through any means of communication, if you wish.
DrupalCon London BOF notes
Hi there, dog gang!
Here my rough notes from today's BOF.
My own use of git is basic - which is why I think dog is such a cool project - but it also means the lines below will need some elaboration (and/or correction) by others.
So, what can people work on for dog?
At the meeting some folk volunteered for some of these but I didn't catch which ...
Use Gitolite instead of Gitosis
During the BoF in London I forgot to mention a recommendation I wanted to share with you. During the presentation Sam mentioned gitosis for hosting your own Git repositories. Rather than using gitosis I'd recommend using Gitolite. Unlike gitosis, Gitolite is actively maintained and much more feature rich. You will find a how to in the Pro Git book by Scott Chacon: http://progit.org/book/ch4-8.html. There's an easily installable package for Debian and Ubuntu!
Ah yes - I should have
Ah yes - I should have mentioned both. The key feature is on-demand creation of new repositories, which apparently gitolite supports as well. I'd only known it to be a feature of gitosis, but apparently gitolite does it now, too.
I've been using Gitosis for a
I've been using Gitosis for a while now, but would also recommend Gitolite over it just for the feature set and control granularity.
I really like the idea of
I really like the idea of defining a set of common methodology to manage a drupal (pick one) site/app workflow, but I wonder why I don't see much more opinions on using submodules vs. using subtree merges (except from the gentleman here http://groups.drupal.org/node/140949#comment-501244 ) for managing contrib modules in the proposed workflow.
I do not have a complete proposition, but I'll paste an example of how I've managed contrib modules (including drupal core) in a test project repo, trying out a workflow to manage contrib modules in a Drupal project.
git init
#Creating a staging branch - could follow any environment/branching naming/convention your team uses
git checkout -b staging
#Adding Drupal core as a remote
git remote add drupal_core http://git.drupal.org/project/drupal.git
git fetch drupal_core
git checkout -b drupal_core drupal_core/7.x
git checkout staging
git read-tree --prefix=htdocs -u drupal_core
git commit -m"Added drupal core"
#Now adding Views
git remote add modules/views http://git.drupal.org/project/views.git
git fetch modules/views
git checkout -b views views/7.x-3.x
got checkout staging
git read-tree --prefix=htdocs/sites/all/modules/views -u modules/views
git commit -m"Added Views"
I hope this is clear enough to give a basic idea of how this could easily be automated and simplified using drush. (which I considered at one point but haven't done yet) "Dog" could be highly opinionated about many (default) decisions made on how we handle subtree merges naming conventions for remote naming (modules/module-name, modules/core, and whatnot) and where read-tree (the --prefix target) should put a module, theme, libraries, core and so on.
I've already mentioned this a bit up there in the thread, but there's a good read in the Pro git book on how to use subtree merges as an alternative to sub-modules. http://progit.org/book/ch6-7.html
Again, if I'm missing something please guide me to the light :)
Thanks for all this by the way, I'll be keeping an eye on how things progress.
So this made me at least
So this made me at least revisit the idea of subtrees, as I realized I'd dismissed them too summarily before. And there's some key aspects of it that really are pretty nice...but also some significant drawbacks. Here's my basic pro/con chart:
Pros
Cons
git fetch -n
or remote branch trimming. But the bottom line is that there's some pollution of the tag namespace that will occur - Views alone has 63, as of this writing. That would need to be carefully managed.I feel like I've missed a couple things there, but it gives a sense of the shape of it. Ideally I WOULD like to be able to incorporate subtrees for at least some stuff, but those cons are enough for me to not want to incorporate them right now.
Thanks for taking the time to
Thanks for taking the time to give some thought to this and for the comprehensive reply.
I still need to digest some of the points you're mentioning in the "Cons" since I feel I do not grasp all of it yet...
Anyhow, I'd still like to point you to a few things I've found in my research that I feel might be interesting:
There's also an interesting short post here: http://posterous.timocracy.com/git-sub-tree-merging-back-to-the-subtree-for
And a plugin that handles subtree merges and splits: https://github.com/apenwarr/git-subtree
Haven't tested that yet though.