Are you storing your Drupal sites or configs in a version control system? If so, which?

Posted by Chris Charlton on September 17, 2007 at 9:12pm

SVN (Subversion)

48% (112 votes)

CVS

6% (14 votes)

Other (please comment which)

8% (18 votes)

No, but I'd like to!

30% (70 votes)

No, I don't store my drupal sites in a version control system of any kind.

9% (20 votes)

Total votes: 234

Comments

Database

Posted by nathanpalmer on September 17, 2007 at 9:43pm

How does everyone handle the database in source control? I find it difficult to have a team of people working out of different locations and keep everyones database settings up to date.

Nathan

Search no more

Posted by boaz_r on September 19, 2007 at 12:23pm

Backing up the DB is a challenge I also faced recently.
We use subversion.
Now, the problem is that subversion and CVS both use textual diffs and store it. I made a quick test on a simple site:

mysqldump of it weighed ~1.4MB
added another user
mysqldump of the DB, after the user addition above
diff between the two DBs is ~450KB...

As seen above, simple usage of mysqldump on the DB and storing it in subversion/cvs is not good - the delta is enormous.

I've searched the web a bit and found the following one stop page for me: http://www.petersblog.org/node/959 (it is well explained there, plus some 'man mysqldump' reading helped as well) . I took what I needed from it, customized and made a quick check - works like a charm. Now it waits for implementation... :-)

Boaz.

PHP therapist
Linnovate

Boaz
PHP therapist

Look here for my reply

Posted by boaz_r on September 19, 2007 at 12:25pm

oops. wrong place. just see my previous post...

Boaz
PHP therapist

Good Question!

Posted by ryan_courtnage on September 17, 2007 at 9:55pm

http://groups.drupal.org/node/5792

Thanks!

Posted by nathanpalmer on September 17, 2007 at 10:49pm

Thanks,

I never really thought about having a custom module for those types of changes. The only difficult part to that would be duplicating administrative settings that were made through the web interface. We would have to find out what changes they are making or run some type of SQL diff utility after changing settings in order to generate the correct update scripts.

Nathan

Config settings in the database

Posted by patricio.keilty on September 18, 2007 at 12:24am

We usually struggle to replicate config settings from DEV environment to STG, we have to do that manually. Without knowing a module´s internals, it is not usually clear which tables are involved with its settings.
An idea came up to my mind, it maybe quite a simplistic/naive approach but it´s worth a try. What if we find a simple way to keep apart the config data from the content data, then we will just need to copy the appropiate data between environments. How about requiring that every drupal module provides a separate settings set of tables using some well-known prefix for their names , e.g.:{settings_module}, then we will just need to replicate the settings_* tables from DEV to STG and that´s all.
Do you think it is possible?

--p

If a module uses drupal

Posted by nathanpalmer on September 18, 2007 at 2:44am

If a module uses drupal variables it stores them in the settings table so that's close to what you are talking about. But ofcourse there are also times when you are creating a website that you also need to create actual content that should persist between developers. Certain static pages that are actually nodes that need to always exist should also be copied between dev environments.

Nathan

not all modules

Posted by greggles on September 19, 2007 at 1:41pm

Some modules can't store their data in the variable table (not "settings") but many modules have data structures that cannot or should not be stored there (e.g. for performance reasons).

I like the solution of a [modulename]_settings[_typeofsettings] where the [_typeofsettings] item is optional if a module needs multiple tables then it could use that. Of course, getting modules to standardize on this will require quite a lot of patches to modules to get the

--
Knaddisons Denver Life | mmm Chipotle Log | The Big Spanish Tour

knaddison blog | Morris Animal Foundation

config vs content

Posted by ryan_courtnage on October 29, 2007 at 11:50pm

I've recently started using a little script that does the following:

exports the structure & data of "config" tables
exports the structure (no data) of "content" tables
deals with the sequences table appropriately

It gives me the ability to take a snapshot of my drupal application on the Dev server, and use this snapshot to create a new instance of the application without moving over the Dev content. Currently I'm just using it for destructive builds, but I wrote it with non-destructive builds in mind.

2 considerations:
- static content (ie: aboutus, faq, etc) lives in code. I assume that all 'node' data is user content.
- if installing any new modules, you need to look at their .install files to see if they create a new table for settings. If so, it must be added to the $appTables array in my function. It's usually pretty easy to determine this, as "content" tables typically have a 'nid' or 'vid' or 'uid' field. It's on my todo list to semi-automate the identification of new settings tables by relying on this (with some exceptions).

<?php
function smartdump() {
  // Tables identified as containing application data.  
  // If new modules are added to Drupal, their tables may need to be added
  $appTables = array(
    'access',
    'blocks',
    'blocks_roles',
    'boxes',
    'contact',
    'filters',
    'filter_formats',
    'menu',
    'node_type',
    'panels_area',
    'panels_info',
    'permission',
    'profile_fields',
    'role',
    'system',
    'term_data',
    'term_hierarchy',
    'term_relation',
    'term_synonym',
    'url_alias',
    'variable',
    'view_argument',
    'view_exposed_filter',
    'view_filter',
    'view_sort',
    'view_tablefield',
    'view_view',
    'vocabulary',
    'vocabulary_node_types'
  );

  // determine db name (is there a better way to get this?)
  global $db_url;
  $db_user = substr($db_url, strpos($db_url, "//") + 2, strpos($db_url, "//") - 1 + strpos($db_url, ":"));
  $db_pass = substr($db_url, strpos($db_url, $db_user.":") + strlen($db_user) + 1 , strpos($db_url, "@") - (strpos($db_url, $db_user.":") + strlen($db_user) + 1));
  $db_name = substr($db_url, strrpos($db_url, "/") + 1);
  $db_host = substr($db_url, strpos($db_url, "@") + 1 , strlen($db_url) - (strpos($db_url, "@") + 1) - strlen($db_name) - 1);
  //print "$db_user $db_pass $db_name $db_host";

  // create an array of all available tables
  $qobj = db_query('SHOW TABLES');
  $key = "Tables_in_{$db_name}";
  while($result = db_fetch_object($qobj)) {
    if($result->$key != 'sequences') // we'll deal with sequences later
      $tables[] = $result->$key;
  }

  // diff against app tables - what's left is user tables
  $userTables = array_diff($tables, $appTables);
  
  // Create dump of application tables - structure & data
  $dumpFile = "/tmp/" . $db_name . ".app.sql";
  $strAppTables = implode(" ", $appTables);
  $command = "mysqldump -h $db_host -u $db_user --password=$db_pass $db_name $strAppTables > $dumpFile";
  system($command);
  
  // Create dump of user tables - structure only
  $dumpFile = "/tmp/" . $db_name . ".user.sql";
  $strUserTables = implode(" ", $userTables);
  $command = "mysqldump -h $db_host -u $db_user --password=$db_pass --no-data $db_name $strUserTables > $dumpFile";
  system($command);
    
  // Deal with sequences
  // Be careful NOT to drop sequences, as existing seq on userTables must be maintained.
  $seqSql = "CREATE TABLE IF NOT EXISTS <code>sequences</code> (
  <code>name</code> varchar(255) NOT NULL default '',
  <code>id</code> int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (<code>name</code>)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;\n\n";
  
  // find out what sequences we currently have
  $qobj = db_query('SELECT * FROM sequences');
  while($result = db_fetch_object($qobj)) {
      // figure out the target table name from the sequence name
      // (we are relying on the fact that sequence names are in the format tablename_keycolumn
      $tablename = substr($result->name,0, strrpos($result->name,'<em>'));
      // For each appTable, look for an existing sequence.  If it exists include delete/insert in dump
      if(in_array($tablename, $appTables)) {
        $seqSql .= "DELETE FROM sequences WHERE name = '{$result->name}';\n";
        $seqSql .= "INSERT INTO sequences VALUES ('{$result->name}', {$result->id});\n\n";
      } // For each userTable, ignore the sequence
  }

  // write out sequences sql dump
  $fh = fopen("/tmp/{$db_name}.seq.sql", 'w') or die("can't open /tmp/{$db_name}.seq.sql for writing");
  fwrite($fh, $seqSql);
  fclose($fh);

  // TODO: if not a destructive build, clear out cache</em> tables
  
  // TODO: if not a destructive build, on IMPORT alert admin if userTable structure diff exists, or if table count exists 
  // idea: do the same mysqldump on staging (above), and diff it against dump on dev
  
  echo "Finished.  Dumps wrote to /tmp/";
}
?>

Git... but definitely not

Posted by Shiny on September 17, 2007 at 10:55pm

Git... but definitely not using VCS for deployment - that'd be crazy.

How so?

Posted by Chris Charlton on September 18, 2007 at 12:30am

Please point out your thoughts Shiny, thanks. :)

Chris Charlton, Author & Drupal Community Leader, Enterprise Level Consultant

I teach you how to build Drupal Themes http://tinyurl.com/theme-drupal and provide add-on software at http://xtnd.us

reason one: http request to

Posted by Shiny on September 18, 2007 at 4:42am

reason one: http request to for the VCS folder (CVS, .svn, .git, _darcs) can retrieve lists of files on the file system. Imagine someone getting a listing of everything in your "files" folder, by requesting the correct file under .git/.

more reason one: with a distributed VCS (git, darcs etc) this can be every version you've ever had, including commit comments. some of those comments aren't exactly polite, coders don't know the public can read their commit comment.

reason two: conflicts during merge result in temporary broken files (wont' parse as php) - and a site broken until you fix it.

reason three: it only does half the deployment, you still gotta set up cron job, run database patches etc.

reason four: that's not what VCS are built for. Most (every?) distro has a packaging system (debs, rpms) that was built for deploying web apps, and in standard ways. Use the strengths of you chosen distro, and the standardisation that's already been thought out by other really smart people who know how to maintain packages.

more reason four: putting a deb up on an apt repo means i know that same version of drupal is being to all my drupal installations (eventually), in the next apt-get upgrade.

i'm sure there's more.

that said, there probably are some situations where VCS deployment makes sense, so i retract my "crazy" adjective.

reason 1: google reveals the

Posted by samhassell on September 18, 2007 at 5:02am

reason 1: google reveals the following for .htaccess to prevent requests to .svn directories.

SetEnvIfNoCase Request_URI ".svn" ban
Deny from env=ban

more reason 1: Dont most VCS contain authentication mechanisms to control this? I never let anyone have anon access to my private SVN server.

reason 2: You probably shouldn't be merging code onto production, you should just be updating it.

reason 3 & 4: valid, i guess reason 4 gives a way around this. the package management system can control the whole thing. I haven't really got into the guts of packaging for apt yet, its been something ive been wanting to do for a while. Not a bad idea if it can be done in a reasonable amount of time. Does updating a package on production require all the files to be reinstalled as opposed to just the ones that change?

Does updating a package on

Posted by Shiny on September 19, 2007 at 4:37am

Does updating a package on production require all the files to be reinstalled as opposed to just the ones that change?

not sure - but if they're the same it doesn't matter.

Naaa...

Posted by marquardt on September 18, 2007 at 10:21am

Reason one: There's no need to roll out the .svn (or other directories related to the VCS) - just export the version you want to. That avoids the VCS specific directories altogether (or use the .htaccess).

More reason one: Not every VCS stores commit messages in the sources / version controlled items (no way to do this with subversion, for example, and for good reason), and I'm sure you can at least disable this feature for all other ones. With cvs, you must consciously put in dedicated tags ($log:$) to get them. So just don't.

Reason two: That's a policy issue. One solution is to use version controlled source code tree for development only, always tag properly tested version, and only then roll out (e.g. export) those tagged versions on production systems. In this scenario, files will always be overwritten, so there are never any merge conflicts. The drawback is that you need to manually remove files that disappeared from the development version - or you remove everything from the production server and only then roll out from the repository.

If you do use updates instead of exports, there also shouldn't be any merge conflicts - after all, the production server shouldn't be touched at all, right? So the update will always go smooth. If you do get conflicts, this gives you an very important information: someone messed around with the production server without telling you about it. By running the equivalent of 'svn status' and 'svn diff' before updating you can even find out what that someone did, and react accordingly - may that be rolling back those particular changes, or incorporating them in the development version if they make sense. In effect, the VCS gives you a unique possibility to monitor what's going on with the installed version of your site - something a packaged based solution can not.

Reason three: Yeah, the same with a packaged based solution. If you build rpm's or deb's, you'll include scripts that do that for you. Put the same scripts in the version controlled source code tree, and handle the export through a script that also runs the required support scripts. Or do the latter manually. That's a bit more convenient anyway - if you are just updating the source code of some modules, but have cron jobs already set up, there's no need to install another cron job; same with database structure and content. In a package based deployment scenario, you would have to change the relevant scripts / setup files (like the .spec in case of rpms) for each new rollout. That's hardly the intended use of a package system. It also means you have to do additional testing each time to make sure that the automated things the package installer does don't break anything. Imagine you cleared the entire database last time and forget to remove that functionality this time...

There's no way to avoid downtimes if something goes wrong; in a VCS based scenario, you can at least try to roll out the previous version immediately - and if database changes don't happen automatically, you might be lucky in that this works. Even better: never work without having made a complete backup before upgrading - and have some practical experience on how to actually use that backup.

Reason four: Packaging systems were invented to install mostly binary software applications - not necessarily web (only) apps. They help in compiling and setting up those. They also run standard tasks before (removing old stuff, saving old configuration files) and after the actual install (reloading the dynamic linker information, setting up new configuration options). Moreover, they take care of dependencies, making sure that other required packages are indeed installed on the target system.

Little of that is needed for a drupal installation, I would think; all you need is that a defined set of files is installed in a defined directory. If automated tasks need to be run, the shell scripts are the same for both package and VCS based scenarios.

Using debs or rpms for the rollout does indeed make sure that you have the same version everywhere (if you roll out the same rpm / deb). Tagging the version of the drupal site you want to roll out in a meaningful way and following the policy of rolling out only tagged versions does the same. An advantage of a VCS based system is that you can go back to previous versions without having to rebuild the packages by just referring to the correct tag on the command line; so it's probably easier and quicker in most cases.

You can also work with branches, which are probably useful when having similar setups, say for different clients - you just split a branch from your standard version and make client specific modifications on that branch. At least in principle, you can still merge improvements from your development (or main, or default) setup to the client's versions etc. In a packaged based scenario, you'll have to set up a completely new package. Rolling out from a particular branch is much easier (a command line switch only) than setting up a completely new package.

And finally, I'm not sure if I want automated upgrades of a web site... In most cases, I'd prefer to be in control when this happens. Imagine your package is broken, but your client runs an automated apt-get upgrade each night, this time on a Friday night...

Don't get me wrong - package based installs are great, especially when rolling out the same thing on many computers. But then, how often do you install the same web site on more than just a few machines? If you don't need the compilation / build support rpm or deb offer, a VCS with tagging policies applied will give you all what's needed as well, with the benefit that don't have to learn how to use yet another complex tool. If there are tasks you have to run before or after installation and which change slightly with every rollout, you'll have to write (and test!) scripts to do that anyway - there's really no difference between a VCS and a package based scenario. You can also automate the rollout and execution of relevant scripts with a shell script - in the end, that's precisely what the package system does for you. So just write that little shell script as well; at least you don't have to learn another syntax.

There are more things that cause additional work with package managers. For example, if you add or delete files in your software, you'll have to specifiy that in the (say) rpm-spec as well - you basically maintain an independent list of which files belong to your package. If you forget to track one change, your package is broken. There's also an issue with tracking which version of the package build scripts correspond to which version of your website. What do you do? Use a VCS for development? And in addition maintain packages? Why doing the same work twice if don't get any benefit from it?

Just my thoughts,

Christian.

Edit: Fixed a few typos...

good thoughts.. i was mostly

Posted by Shiny on September 19, 2007 at 4:42am

good thoughts..

i was mostly referring to folks deploying by doing an equiv of "cvs checkout" and later doing an upgrade via a "cvs update". I've seen it go wrong so many times. Seen passwords and sensitive info ripped outta hints from VCS files. Sure you can deny these, but it becomes one more thing to manage by already busy people.

If you're managing 1 or 2 or 20 sites maybe, but at some stage you need a tool to handle it all for you. We're only human.

We have debs create the whole virtual host, and we have perl/bash scripts generate the deb.

branching, that happens anyways -- infact it's the same script that builds the deb.

You want to swat 50 all at once

Posted by earnie@drupal.org (not verified) on September 19, 2007 at 11:46am

That is nightmarish for those of us who have to handhold the client while development resolves the issues. You must have preproduction testing for each scenario before attempting a push to production or you are doomed to have catastrophic issues. As for deploying from CVS into production, you would only do that from a marked tested release tag and never from a branch tag.

You must have preproduction

Posted by Shiny on September 21, 2007 at 3:15am

You must have preproduction testing for each scenario before attempting a push to production or you are doomed to have catastrophic issues.

I'm not sure how that relates to the discussion --- ?? deployment via deb doesn't mean you don't get on a staging server first.

Re: You must have preproduction

Posted by earnie@drupal.org (not verified) on September 21, 2007 at 12:08pm

I work for a large company with several hundreds of applications and a few thousand production servers. It is my job to ensure that applications and servers are up and running 99.98% of the time. Deployments from VCS really make my hair stand on end. I prefer the ``tarball the end to end testing server'' method over a VCS method of deployment as I have seen fewer problems with these deploys.

Forgive my ignorance but what do you mean by ``deb''? Searching the acronym databases I can't find anything of value.

I've also used the tar ball

Posted by Shiny on October 1, 2007 at 5:23am

I've also used the tar ball method.

debs are debian packages.
much like red hat has rpms, and solaris has solaris packages.

For a drupal site deployment, a package (such as deb or rpm) will do the same job as tar ball deployment, but also set up cron jobs, create users, create databases, virtual hosts. It's smart enough to know the difference between first deplyment and subsequent updates.

A package based install

Posted by Shiny on October 1, 2007 at 5:24am

A package based install allows rollback to the previous version if you need.

i also automate backups (meaning, a single tarball of the document root before any change is made) just for peace of mind.

SVN & Module Settings Import / Export.

Posted by samhassell on September 18, 2007 at 3:09am

Im a sole developer most of the time which simplifies the process. Im using SVN for development code, then checkout & set it up the code on the server. From this point I develop new modules on a basic install of Drupal5, then sync the production server with testing and make sure the new module doesnt screw up anything.

Once tested, I checkout the new module from svn on the live site, enable, configure and test. More or less the process eli describes here: http://groups.drupal.org/node/5792#comment-16920.

I tend to keep a backup of the database in svn also, but havent got around to automating that yet. been on the todo list for ages though.

@patricio: what about adding module_export() and module_import() functions to module.install? then the module developer could define what data needs to be dumped. then provide an interface for the admin to upload the settings (sql) file. From there its a bit easier to create a sync process.

Following up my Other vote

Posted by earnie@drupal.org (not verified) on September 19, 2007 at 11:51am

Currently I am too small to bother with storing Drupal site configuration in a VCS. I am managing via a tarball backup of the site web folders and downloading that to an off site system.

Bazaar

Posted by HorsePunchKid on September 30, 2007 at 9:07pm

I'm using Bazaar for new projects, though I have in the past used Subversion. What's nice about bzr is that I don't feel obligated to plan out the whole repository layout before I get started. I just go to where I'm interested in tracking changes and do bzr init, and I'm ready to start versioning. Low barrier to use means higher likelihood I'll use it. :)

-- Steven N. Severinghaus sns@severinghaus.org

-- Steven N. Severinghaus <sns@severinghaus.org>

I also use Bazaar. I liked

Posted by axel on October 8, 2007 at 9:53pm

I also use Bazaar. I liked decentralized model and easy branching in him. Some time ago I used darcs, but bazaar seems more active evolved. I used VCS for documentation and code, not for database. For database I make snapshots for some critical points - just a full SQL dumps (not best way, I know).

Russian Drupal community www.drupal.ru

Good CVS with drupal is painfull

Posted by elagorn (not verified) on October 17, 2007 at 3:51am

I am using Mercurial which is similar to Bazaar - but I am quickly discovering that this is not really a good versioning system because Drupal uses the DATABASE to handle its configuration and I am currently investigating SQLDiff technology. Being able to relate MODULES to DATABASE TABLES and MODULE settings to relevant configuration TABLES at certain points in time is something I really want.

especially since practically everything is in the DB

Posted by kanani on October 17, 2007 at 4:39am

i've been working solo also on most of my projects but on this latest project we have 3 devs and its getting to be a real pain to make sure all the mysqldump files are synced. Learning the ins/outs of SVN though.

Perhaps the dba module

Posted by earnie@drupal.org (not verified) on October 17, 2007 at 2:18pm

I install the dba module on my systems (especially development and testing). It allows for paging the data and taking backups via cron.php. There are two maybe even three types of tables. 1) The administrative and setup tables, 2) the user data tables (and I don't mean the user module tables) and 3) there are the system data tables, e.g. cache. It would be nice if the dba module (or even the devel module) knew which tables belonged to the administrative set and we were able to dump those for import as a set. Then a cron event could dump the tables and you could sync between the differing sets of DB or syncing with your VCS.

git

Posted by marvil07 on March 23, 2008 at 4:37pm

I do not have so many reasons like Shiny :D, but I like the distributed idea.

I use many git repositories, not just one big drupal repository. I mean I only use a repository if I'm making custom changes to a module or (specially) if I'm doing a module from scratch and I'm not working alone.

Obviously, I only track the code. I know a lot of the information is on the database, but I did not find a good method to make it part of the repository(also related with my "many repositories instead of one").

dgb

Posted by scor on May 27, 2011 at 4:03pm

I've been using Drupal Git Backup (dgb) for a couple of years now and it's working pretty well for me. It makes the db dump more vcs friendly, and with the power and speed of git, it's a good combo I find...

Are you storing your Drupal sites or configs in a version control system? If so, which?

Comments

Database

Search no more

Look here for my reply

Good Question!

Thanks!

Config settings in the database

If a module uses drupal

not all modules

config vs content

Git... but definitely not

How so?

reason one: http request to

reason 1: google reveals the

Does updating a package on

Naaa...

good thoughts.. i was mostly

You want to swat 50 all at once

You must have preproduction

Re: You must have preproduction

I've also used the tar ball

A package based install

SVN & Module Settings Import / Export.

Following up my Other vote

Bazaar

I also use Bazaar. I liked

Good CVS with drupal is painfull

especially since practically everything is in the DB

Perhaps the dba module

git

dgb

High performance

Group organizers

New groups

Group notifications