Are you storing your Drupal sites or configs in a version control system? If so, which?

Chris Charlton's picture
SVN (Subversion)
48% (112 votes)
CVS
6% (14 votes)
Other (please comment which)
8% (18 votes)
No, but I'd like to!
30% (70 votes)
No, I don't store my drupal sites in a version control system of any kind.
8% (19 votes)
Total votes: 233

Comments

Database

nathanpalmer's picture

How does everyone handle the database in source control? I find it difficult to have a team of people working out of different locations and keep everyones database settings up to date.

Nathan

Search no more

boaz_r's picture

Backing up the DB is a challenge I also faced recently.
We use subversion.
Now, the problem is that subversion and CVS both use textual diffs and store it. I made a quick test on a simple site:

  • mysqldump of it weighed ~1.4MB
  • added another user
  • mysqldump of the DB, after the user addition above
  • diff between the two DBs is ~450KB...

As seen above, simple usage of mysqldump on the DB and storing it in subversion/cvs is not good - the delta is enormous.

I've searched the web a bit and found the following one stop page for me: http://www.petersblog.org/node/959 (it is well explained there, plus some 'man mysqldump' reading helped as well) . I took what I needed from it, customized and made a quick check - works like a charm. Now it waits for implementation... :-)

Boaz.

PHP therapist
Linnovate

Boaz
PHP therapist

Look here for my reply

boaz_r's picture

oops. wrong place. just see my previous post...

Boaz
PHP therapist

Good Question!

Thanks!

nathanpalmer's picture

Thanks,

I never really thought about having a custom module for those types of changes. The only difficult part to that would be duplicating administrative settings that were made through the web interface. We would have to find out what changes they are making or run some type of SQL diff utility after changing settings in order to generate the correct update scripts.

Nathan

Config settings in the database

patricio.keilty's picture

We usually struggle to replicate config settings from DEV environment to STG, we have to do that manually. Without knowing a module´s internals, it is not usually clear which tables are involved with its settings.
An idea came up to my mind, it maybe quite a simplistic/naive approach but it´s worth a try. What if we find a simple way to keep apart the config data from the content data, then we will just need to copy the appropiate data between environments. How about requiring that every drupal module provides a separate settings set of tables using some well-known prefix for their names , e.g.:{settings_module}, then we will just need to replicate the settings_* tables from DEV to STG and that´s all.
Do you think it is possible?

--p

If a module uses drupal

nathanpalmer's picture

If a module uses drupal variables it stores them in the settings table so that's close to what you are talking about. But ofcourse there are also times when you are creating a website that you also need to create actual content that should persist between developers. Certain static pages that are actually nodes that need to always exist should also be copied between dev environments.

Nathan

not all modules

greggles's picture

Some modules can't store their data in the variable table (not "settings") but many modules have data structures that cannot or should not be stored there (e.g. for performance reasons).

I like the solution of a [modulename]_settings[_typeofsettings] where the [_typeofsettings] item is optional if a module needs multiple tables then it could use that. Of course, getting modules to standardize on this will require quite a lot of patches to modules to get the

--
Knaddisons Denver Life | mmm Chipotle Log | The Big Spanish Tour

config vs content

ryan_courtnage's picture

I've recently started using a little script that does the following:

  • exports the structure & data of "config" tables
  • exports the structure (no data) of "content" tables
  • deals with the sequences table appropriately

It gives me the ability to take a snapshot of my drupal application on the Dev server, and use this snapshot to create a new instance of the application without moving over the Dev content. Currently I'm just using it for destructive builds, but I wrote it with non-destructive builds in mind.

2 considerations:
- static content (ie: aboutus, faq, etc) lives in code. I assume that all 'node' data is user content.
- if installing any new modules, you need to look at their .install files to see if they create a new table for settings. If so, it must be added to the $appTables array in my function. It's usually pretty easy to determine this, as "content" tables typically have a 'nid' or 'vid' or 'uid' field. It's on my todo list to semi-automate the identification of new settings tables by relying on this (with some exceptions).

<?php
function smartdump() {
 
// Tables identified as containing application data. 
  // If new modules are added to Drupal, their tables may need to be added
 
$appTables = array(
   
'access',
   
'blocks',
   
'blocks_roles',
   
'boxes',
   
'contact',
   
'filters',
   
'filter_formats',
   
'menu',
   
'node_type',
   
'panels_area',
   
'panels_info',
   
'permission',
   
'profile_fields',
   
'role',
   
'system',
   
'term_data',
   
'term_hierarchy',
   
'term_relation',
   
'term_synonym',
   
'url_alias',
   
'variable',
   
'view_argument',
   
'view_exposed_filter',
   
'view_filter',
   
'view_sort',
   
'view_tablefield',
   
'view_view',
   
'vocabulary',
   
'vocabulary_node_types'
 
);

 
// determine db name (is there a better way to get this?)
 
global $db_url;
 
$db_user = substr($db_url, strpos($db_url, "//") + 2, strpos($db_url, "//") - 1 + strpos($db_url, ":"));
 
$db_pass = substr($db_url, strpos($db_url, $db_user.":") + strlen($db_user) + 1 , strpos($db_url, "@") - (strpos($db_url, $db_user.":") + strlen($db_user) + 1));
 
$db_name = substr($db_url, strrpos($db_url, "/") + 1);
 
$db_host = substr($db_url, strpos($db_url, "@") + 1 , strlen($db_url) - (strpos($db_url, "@") + 1) - strlen($db_name) - 1);
 
//print "$db_user $db_pass $db_name $db_host";

  // create an array of all available tables
 
$qobj = db_query('SHOW TABLES');
 
$key = "Tables_in_{$db_name}";
  while(
$result = db_fetch_object($qobj)) {
    if(
$result->$key != 'sequences') // we'll deal with sequences later
     
$tables[] = $result->$key;
  }

 
// diff against app tables - what's left is user tables
 
$userTables = array_diff($tables, $appTables);
 
 
// Create dump of application tables - structure & data
 
$dumpFile = "/tmp/" . $db_name . ".app.sql";
 
$strAppTables = implode(" ", $appTables);
 
$command = "mysqldump -h $db_host -u $db_user --password=$db_pass $db_name $strAppTables > $dumpFile";
 
system($command);
 
 
// Create dump of user tables - structure only
 
$dumpFile = "/tmp/" . $db_name . ".user.sql";
 
$strUserTables = implode(" ", $userTables);
 
$command = "mysqldump -h $db_host -u $db_user --password=$db_pass --no-data $db_name $strUserTables > $dumpFile";
 
system($command);
   
 
// Deal with sequences
  // Be careful NOT to drop sequences, as existing seq on userTables must be maintained.
 
$seqSql = "CREATE TABLE IF NOT EXISTS <code>sequences</code> (
  <code>name</code> varchar(255) NOT NULL default '',
  <code>id</code> int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (<code>name</code>)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;\n\n"
;
 
 
// find out what sequences we currently have
 
$qobj = db_query('SELECT * FROM sequences');
  while(
$result = db_fetch_object($qobj)) {
     
// figure out the target table name from the sequence name
      // (we are relying on the fact that sequence names are in the format tablename_keycolumn
     
$tablename = substr($result->name,0, strrpos($result->name,'<em>'));
     
// For each appTable, look for an existing sequence.  If it exists include delete/insert in dump
     
if(in_array($tablename, $appTables)) {
       
$seqSql .= "DELETE FROM sequences WHERE name = '{$result->name}';\n";
       
$seqSql .= "INSERT INTO sequences VALUES ('{$result->name}', {$result->id});\n\n";
      }
// For each userTable, ignore the sequence
 
}

 
// write out sequences sql dump
 
$fh = fopen("/tmp/{$db_name}.seq.sql", 'w') or die("can't open /tmp/{$db_name}.seq.sql for writing");
 
fwrite($fh, $seqSql);
 
fclose($fh);

 
// TODO: if not a destructive build, clear out cache</em> tables
 
  // TODO: if not a destructive build, on IMPORT alert admin if userTable structure diff exists, or if table count exists
  // idea: do the same mysqldump on staging (above), and diff it against dump on dev
 
 
echo "Finished.  Dumps wrote to /tmp/";
}
?>

Git... but definitely not

Shiny's picture

Git... but definitely not using VCS for deployment - that'd be crazy.

How so?

Chris Charlton's picture

Please point out your thoughts Shiny, thanks. :)

Chris Charlton, Author & Drupal Community Leader, Enterprise Level Consultant

I teach you how to build Drupal Themes http://tinyurl.com/theme-drupal and provide add-on software at http://xtnd.us

reason one: http request to

Shiny's picture

reason one: http request to for the VCS folder (CVS, .svn, .git, _darcs) can retrieve lists of files on the file system. Imagine someone getting a listing of everything in your "files" folder, by requesting the correct file under .git/.

more reason one: with a distributed VCS (git, darcs etc) this can be every version you've ever had, including commit comments. some of those comments aren't exactly polite, coders don't know the public can read their commit comment.

reason two: conflicts during merge result in temporary broken files (wont' parse as php) - and a site broken until you fix it.

reason three: it only does half the deployment, you still gotta set up cron job, run database patches etc.

reason four: that's not what VCS are built for. Most (every?) distro has a packaging system (debs, rpms) that was built for deploying web apps, and in standard ways. Use the strengths of you chosen distro, and the standardisation that's already been thought out by other really smart people who know how to maintain packages.

more reason four: putting a deb up on an apt repo means i know that same version of drupal is being to all my drupal installations (eventually), in the next apt-get upgrade.

i'm sure there's more.

that said, there probably are some situations where VCS deployment makes sense, so i retract my "crazy" adjective.

reason 1: google reveals the

samhassell's picture

reason 1: google reveals the following for .htaccess to prevent requests to .svn directories.

SetEnvIfNoCase Request_URI ".svn" ban
Deny from env=ban

more reason 1: Dont most VCS contain authentication mechanisms to control this? I never let anyone have anon access to my private SVN server.

reason 2: You probably shouldn't be merging code onto production, you should just be updating it.

reason 3 & 4: valid, i guess reason 4 gives a way around this. the package management system can control the whole thing. I haven't really got into the guts of packaging for apt yet, its been something ive been wanting to do for a while. Not a bad idea if it can be done in a reasonable amount of time. Does updating a package on production require all the files to be reinstalled as opposed to just the ones that change?

Does updating a package on

Shiny's picture

Does updating a package on production require all the files to be reinstalled as opposed to just the ones that change?

not sure - but if they're the same it doesn't matter.

Naaa...

marquardt's picture

Reason one: There's no need to roll out the .svn (or other directories related to the VCS) - just export the version you want to. That avoids the VCS specific directories altogether (or use the .htaccess).

More reason one: Not every VCS stores commit messages in the sources / version controlled items (no way to do this with subversion, for example, and for good reason), and I'm sure you can at least disable this feature for all other ones. With cvs, you must consciously put in dedicated tags ($log:$) to get them. So just don't.

Reason two: That's a policy issue. One solution is to use version controlled source code tree for development only, always tag properly tested version, and only then roll out (e.g. export) those tagged versions on production systems. In this scenario, files will always be overwritten, so there are never any merge conflicts. The drawback is that you need to manually remove files that disappeared from the development version - or you remove everything from the production server and only then roll out from the repository.

If you do use updates instead of exports, there also shouldn't be any merge conflicts - after all, the production server shouldn't be touched at all, right? So the update will always go smooth. If you do get conflicts, this gives you an very important information: someone messed around with the production server without telling you about it. By running the equivalent of 'svn status' and 'svn diff' before updating you can even find out what that someone did, and react accordingly - may that be rolling back those particular changes, or incorporating them in the development version if they make sense. In effect, the VCS gives you a unique possibility to monitor what's going on with the installed version of your site - something a packaged based solution can not.

Reason three: Yeah, the same with a packaged based solution. If you build rpm's or deb's, you'll include scripts that do that for you. Put the same scripts in the version controlled source code tree, and handle the export through a script that also runs the required support scripts. Or do the latter manually. That's a bit more convenient anyway - if you are just updating the source code of some modules, but have cron jobs already set up, there's no need to install another cron job; same with database structure and content. In a package based deployment scenario, you would have to change the relevant scripts / setup files (like the .spec in case of rpms) for each new rollout. That's hardly the intended use of a package system. It also means you have to do additional testing each time to make sure that the automated things the package installer does don't break anything. Imagine you cleared the entire database last time and forget to remove that functionality this time...

There's no way to avoid downtimes if something goes wrong; in a VCS based scenario, you can at least try to roll out the previous version immediately - and if database changes don't happen automatically, you might be lucky in that this works. Even better: never work without having made a complete backup before upgrading - and have some practical experience on how to actually use that backup.

Reason four: Packaging systems were invented to install mostly binary software applications - not necessarily web (only) apps. They help in compiling and setting up those. They also run standard tasks before (removing old stuff, saving old configuration files) and after the actual install (reloading the dynamic linker information, setting up new configuration options). Moreover, they take care of dependencies, making sure that other required packages are indeed installed on the target system.

Little of that is needed for a drupal installation, I would think; all you need is that a defined set of files is installed in a defined directory. If automated tasks need to be run, the shell scripts are the same for both package and VCS based scenarios.

Using debs or rpms for the rollout does indeed make sure that you have the same version everywhere (if you roll out the same rpm / deb). Tagging the version of the drupal site you want to roll out in a meaningful way and following the policy of rolling out only tagged versions does the same. An advantage of a VCS based system is that you can go back to previous versions without having to rebuild the packages by just referring to the correct tag on the command line; so it's probably easier and quicker in most cases.

You can also work with branches, which are probably useful when having similar setups, say for different clients - you just split a branch from your standard version and make client specific modifications on that branch. At least in principle, you can still merge improvements from your development (or main, or default) setup to the client's versions etc. In a packaged based scenario, you'll have to set up a completely new package. Rolling out from a particular branch is much easier (a command line switch only) than setting up a completely new package.

And finally, I'm not sure if I want automated upgrades of a web site... In most cases, I'd prefer to be in control when this happens. Imagine your package is broken, but your client runs an automated apt-get upgrade each night, this time on a Friday night...

Don't get me wrong - package based installs are great, especially when rolling out the same thing on many computers. But then, how often do you install the same web site on more than just a few machines? If you don't need the compilation / build support rpm or deb offer, a VCS with tagging policies applied will give you all what's needed as well, with the benefit that don't have to learn how to use yet another complex tool. If there are tasks you have to run before or after installation and which change slightly with every rollout, you'll have to write (and test!) scripts to do that anyway - there's really no difference between a VCS and a package based scenario. You can also automate the rollout and execution of relevant scripts with a shell script - in the end, that's precisely what the package system does for you. So just write that little shell script as well; at least you don't have to learn another syntax.

There are more things that cause additional work with package managers. For example, if you add or delete files in your software, you'll have to specifiy that in the (say) rpm-spec as well - you basically maintain an independent list of which files belong to your package. If you forget to track one change, your package is broken. There's also an issue with tracking which version of the package build scripts correspond to which version of your website. What do you do? Use a VCS for development? And in addition maintain packages? Why doing the same work twice if don't get any benefit from it?

Just my thoughts,

Christian.

Edit: Fixed a few typos...

good thoughts.. i was mostly

Shiny's picture

good thoughts..

i was mostly referring to folks deploying by doing an equiv of "cvs checkout" and later doing an upgrade via a "cvs update". I've seen it go wrong so many times. Seen passwords and sensitive info ripped outta hints from VCS files. Sure you can deny these, but it becomes one more thing to manage by already busy people.

If you're managing 1 or 2 or 20 sites maybe, but at some stage you need a tool to handle it all for you. We're only human.

We have debs create the whole virtual host, and we have perl/bash scripts generate the deb.

branching, that happens anyways -- infact it's the same script that builds the deb.

You want to swat 50 all at once

earnie's picture

That is nightmarish for those of us who have to handhold the client while development resolves the issues. You must have preproduction testing for each scenario before attempting a push to production or you are doomed to have catastrophic issues. As for deploying from CVS into production, you would only do that from a marked tested release tag and never from a branch tag.

You must have preproduction

Shiny's picture

You must have preproduction testing for each scenario before attempting a push to production or you are doomed to have catastrophic issues.

I'm not sure how that relates to the discussion --- ?? deployment via deb doesn't mean you don't get on a staging server first.

Re: You must have preproduction

earnie's picture

I work for a large company with several hundreds of applications and a few thousand production servers. It is my job to ensure that applications and servers are up and running 99.98% of the time. Deployments from VCS really make my hair stand on end. I prefer the ``tarball the end to end testing server'' method over a VCS method of deployment as I have seen fewer problems with these deploys.

Forgive my ignorance but what do you mean by ``deb''? Searching the acronym databases I can't find anything of value.

I've also used the tar ball

Shiny's picture

I've also used the tar ball method.

debs are debian packages.
much like red hat has rpms, and solaris has solaris packages.

For a drupal site deployment, a package (such as deb or rpm) will do the same job as tar ball deployment, but also set up cron jobs, create users, create databases, virtual hosts. It's smart enough to know the difference between first deplyment and subsequent updates.

A package based install

Shiny's picture

A package based install allows rollback to the previous version if you need.

i also automate backups (meaning, a single tarball of the document root before any change is made) just for peace of mind.

SVN & Module Settings Import / Export.

samhassell's picture

Im a sole developer most of the time which simplifies the process. Im using SVN for development code, then checkout & set it up the code on the server. From this point I develop new modules on a basic install of Drupal5, then sync the production server with testing and make sure the new module doesnt screw up anything.

Once tested, I checkout the new module from svn on the live site, enable, configure and test. More or less the process eli describes here: http://groups.drupal.org/node/5792#comment-16920.

I tend to keep a backup of the database in svn also, but havent got around to automating that yet. been on the todo list for ages though.

@patricio: what about adding module_export() and module_import() functions to module.install? then the module developer could define what data needs to be dumped. then provide an interface for the admin to upload the settings (sql) file. From there its a bit easier to create a sync process.

Following up my Other vote

earnie's picture

Currently I am too small to bother with storing Drupal site configuration in a VCS. I am managing via a tarball backup of the site web folders and downloading that to an off site system.

Bazaar

HorsePunchKid's picture

I'm using Bazaar for new projects, though I have in the past used Subversion. What's nice about bzr is that I don't feel obligated to plan out the whole repository layout before I get started. I just go to where I'm interested in tracking changes and do bzr init, and I'm ready to start versioning. Low barrier to use means higher likelihood I'll use it. :)

-- Steven N. Severinghaus sns@severinghaus.org

-- Steven N. Severinghaus <sns@severinghaus.org>

I also use Bazaar. I liked

axel's picture

I also use Bazaar. I liked decentralized model and easy branching in him. Some time ago I used darcs, but bazaar seems more active evolved. I used VCS for documentation and code, not for database. For database I make snapshots for some critical points - just a full SQL dumps (not best way, I know).

Russian Drupal community www.drupal.ru

Good CVS with drupal is painfull

elagorn's picture

I am using Mercurial which is similar to Bazaar - but I am quickly discovering that this is not really a good versioning system because Drupal uses the DATABASE to handle its configuration and I am currently investigating SQLDiff technology. Being able to relate MODULES to DATABASE TABLES and MODULE settings to relevant configuration TABLES at certain points in time is something I really want.

especially since practically everything is in the DB

kanani's picture

i've been working solo also on most of my projects but on this latest project we have 3 devs and its getting to be a real pain to make sure all the mysqldump files are synced. Learning the ins/outs of SVN though.

Perhaps the dba module

earnie's picture

I install the dba module on my systems (especially development and testing). It allows for paging the data and taking backups via cron.php. There are two maybe even three types of tables. 1) The administrative and setup tables, 2) the user data tables (and I don't mean the user module tables) and 3) there are the system data tables, e.g. cache. It would be nice if the dba module (or even the devel module) knew which tables belonged to the administrative set and we were able to dump those for import as a set. Then a cron event could dump the tables and you could sync between the differing sets of DB or syncing with your VCS.

git

marvil07's picture

I do not have so many reasons like Shiny :D, but I like the distributed idea.

I use many git repositories, not just one big drupal repository. I mean I only use a repository if I'm making custom changes to a module or (specially) if I'm doing a module from scratch and I'm not working alone.

Obviously, I only track the code. I know a lot of the information is on the database, but I did not find a good method to make it part of the repository(also related with my "many repositories instead of one").

dgb

scor's picture

I've been using Drupal Git Backup (dgb) for a couple of years now and it's working pretty well for me. It makes the db dump more vcs friendly, and with the power and speed of git, it's a good combo I find...

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week