Deploy-module-related explorations and development

Posted by katbailey on March 18, 2010 at 6:26pm

Kicking off a discussion here surrounding the whole area of content deployment à la Deploy module. I'm starting work on a comprehensive deployment system for a large client and we'll be contributing back our code. I want to try to minimise duplication of efforts by staying in touch with others who are already working on extensions or enhancements to Deploy and related module, so that I can build on existing code and not re-invent the wheel. Heyrocker has told me that he's done a nodequeue deploy module, for example, which is one of the main things we'll need. There's also dixon's Auto Deployment module (http://drupal.org/project/autodeploy). One of the main things I'll be focusing on is trying to automatically generate deployment plans based on changes since last deploy. Would love to hear from anyone else who's working on this area and intending to contribute back.

Comments

So some comments here. First

Posted by gdd on March 19, 2010 at 4:08am

So some comments here. First off I have uploaded my Nodequeue Deploy submodule to my sandbox for anyone who wants to take a look at it.

http://drupalcode.org/viewvc/drupal/contributions/sandbox/heyrocker/node...

It is pretty rough. The comments say that it will not deploy the nodequeue settings themselves, only the contents of existing queue, however there is code there to manage it, so I'm not sure about that. The details in my head are hazy, this module is over a year old and I haven't done much with it. It is likely incompatible with newer versions of nodequeue.

As far as automated deployment, deploy currently relies on a remote login in order to function, and that login has to be stored somewhere in plain text in order for Deploy to be able to log in to the remote server. I have been reluctant to implement this although Dixon is going after it with his module. The best choice, in my opinion, is to encrypt the password with a reversible hash and keep the key outside webroot (this is what Ubercart does as well.) Dixon has also written a pluggable authentication patch for deploy which would allow a variety of login mechanisms, including oAuth, which is the best solution for that long term IMO. Last I looked at this patch it was in pretty damn good shape but I haven't reviewed it thoroughly.

Automatically generating a list of things that have changed since the last time you deployed is going to be tough. You need one of two things

1) A timestamp for the object
2) A way to create and store your own timestamp for the object

There are many places this will not be easy to do. System settings of various strife are one of the most obvious examples. With no way to hook into their changes there's no way to catch them as they're saved to manage the time they were changed. I hack my way into system_settings_form() which gets a good chunk of them, but there are a lot of holes. Blocks is a prime example. There are many contrib modules that could use system_settings_form() but don't. That is going to be a really tough not to crack. Roles are another place with no hooks of any sort to act on. Ken Rickard backported the Permissions API from D7 in a contrib module which will help with some things. Anything using CTools Exportables may potentially be able to rolled into a single Deploy API.

Finally, there are some changes for a client I have yet to commit. I have a branch which doesn't require to be submodules. Instead they are lazy loaded plugins. I also have a reasonably decent solution for deploying circular references which I'm pondering whether to commit. If either of those things help.

Future thoughts - I'm looking into Feeds and/or PubSubHubub as potential backend mechanisms rather than drupal_execute() or node_save(). The whole 'whats the best way to save nodes' thing makes me want to vomit.

So there's some stuff to chew on :)

Go Go Deploy

Posted by boris mann on March 26, 2010 at 12:29am

For a variety of legal reasons, having someone need to be in front of the screen and enter a username / pass is actually a good thing. It's like turning a lock in two keys on a nuclear safety :P

I know that Karim (who also happens to be the author of Views Bulk Ops, which I think would be relevant) just wrote a Feeds OAuth, which is probably some good if one wanted to do it that way. Basically, have a "allow for auto deploy" permission, those users would visit a Deploy > Autodeploy Authorize screen, do the OAuth dance, and also be able to revoke permissions from there.

Looking at Backup & Migrates scheduling & profiles feature would also be interesting.

And yes, +1 to Feeds. Actually feels like Feeds for content staging / deployment, and Features for settings / etc. stuff potentially.

I was talking about Deploy last night. I bet more interest could be gained in supporting this with a nice write up of a general use case. I'm thinking various news / enterprise organizations would wet themselves at the thought of a 2 server setup with content authoring / editing / research / collaboration on the "backend", and production that has all of that turned off and focuses on user interaction.

I agree with your nuclear

Posted by gdd on March 26, 2010 at 12:56am

I agree with your nuclear lock analogy, although one problem with the current security implementation in Deploy is that the user it runs as has to have almost admin-level permissions, and thus you are giving someone that password which many clients aren't really pleased about. The next version of Services (currently cooking) will ship with oAuth support, and when that happens your oAuth dance implementation is exactly how I picture it. Deploy does currently support VBO for nodes, and even comes with a default query as an example.

The rest, I just haven't had time to really dig into it with any seriousness. I'm really focused on getting Services into shape for D7 at this point. I welcome anyone who wants to dig in and learn more! Find me at the Palantir booth at Drupalcon and I will talk your ear off.

I know we did a pretty large blitz of Deploy when Foreign Affairs launched, and that is really the prime use case - a frequent content publisher staging and previewing lots of content on the dev site then pushing it all forward. At the time I really assumed those writeups would open the floodgates of those types of sites hitting me up for additional work and it just never happened.

Just a note

Posted by dixon_ on March 19, 2010 at 8:17am

My Auto Deploy module does only store hashed login credentials. The encryption key used to read the credentials is stored outside of the web root. This whole thing relies on the Encrypt module. So I'd say it pretty safe.

Talking about the future we are planning to have some kind of BoF in San Francisco where we will discuss how we can approach all these things even better. Heyrocker mentioned Feeds and PusSubHubBub, and I know he also have been thinking about building Deploy on top of the CTools API, which I think is a really good idea. That would open up for a very powerful and pluggable architecture and the possibility to have exportable configurations (servers, deployment plans etc.).

Machine names v. UUIDs and an updated nodequeue_deploy module...

Posted by katbailey on April 6, 2010 at 6:56pm

Just putting some thoughts and questions down here, subsequent to my digging into the deploy code and getting a version of nodequeue deploy working with my set-up. One of the main things I've been thinking about is how the deploy side of things might fit in with the features/exportables side of things. It's quite clear that they both serve different but complementary purposes and it would be great if that could get nailed down explicitly somehow. To my mind, the features/exportables framework is for stuff that should be defined in code, whereas the deployment framework is for stuff that belongs in the db, i.e. content. So, features/exportables should not waste effort on trying to export nodes, and deploy shouldn't try to export data structure definitions.

Exportables & machine names

For things that ultimately should be defined in code rather than in the db the trend seems to be towards the ctools exportables / features paradigm, where everything has a machine name. Nodequeues would seem to fit into this category and it seems there's already some work being done on that - see here: http://drupal.org/node/373174 and here: http://drupal.org/project/features_extra. Hopefully the nodequeue_queue table itself will have a machine_name column eventually - in the meantime, which module's job is it to provide the look-up table between qids and machine names? Currently fe_nodequeue module, part of the Features Extra package, does this - and so this is what I've used for nodequeue_deploy module when it's looking up the queue on the remote site. This obviously isn't ideal though: it's only being used for the machine name thing, not for anything to do with features itself.

Deployables & UUIDs

For things that ultimately do belong in the db, it makes sense to use UUIDs: nodes, comments, users, taxonomy terms (but not vocabularies? I've seen discussion of machine names for vocabularies - see here: http://drupal.org/node/521630#comment-2108548)
Would it not make sense to use the UUID project instead of deploy's own uuid solution? If for no other reason than having less code to maintain ;-)
I notice that both the uuid solution in deploy itself and in the UUID module, a separate uuid look-up table exists for each type of object (node_uuid, user_uuid, etc). I must be missing something because I can't understand why these aren't all kept in one table that has an extra column for type, so you'd have uuid, type and id columns, where the id will be a nid, if type is node, or a uid, if type is user.

When it comes to getting existing contrib modules to play ball with either an exportable or a deployable framework, the same basic questions come up: for a D6 module, the chances of getting it to implement a hook itself (e.g. hook_exportable or hook_deployable) along with the requisite changes to its db schema, e.g. to add a machine name, are pretty slim. So far, the deploy package has added everything itself, for all deployable entities. So, deploy_uuid module adds a uuid table for each of the entities it knows about. But which module should be responsible for the uuid mapping of other entities, added by other modules? The dilemma is summed up by heyrocker in his comments in deploy_uuid.module:

   * I am incredibly torn on how I have architected this. 
   * There is obviously a great deal of code in here that may or may not be useful
   * depending on what modules you have installed or enabled (comments, filefields,
   * etc.) From that standpoint, it makes sense to move all this code into the
   * separate modules (comment_deploy, fielfield_deploy, etc.) On the other hand if
   * I do that, then the remote server (most likely the production site) has to actually
   * enable all those modules, increasing its code weight quite a bit. Whereas this way
   * the production server can simply enable deploy_uuid and the services and be done with 
   * it. How this is managed is one of the stickiest problems I have, and the only solution
   * I can come upt with is creating node_deploy_uuid, filefield_deploy_uuid, etc. Which
   * blows but maybe that's where this has to go.

A similar question arises on the services side of things. The services package provides an implementation of hook_service() for all of the relevant core modules (this hook defines the methods that can be called remotely, e.g. for saving nodes, taxonomy terms etc on the target site) but should we just keep adding to that package when we have new entities that need a service for saving them? Nodequeue module already has the nodequeue_service() module, which implements hook_service(). This currently just implements a method for getting all available nodequeues and one for getting all the subqueues of a given queue. In getting nodequeue deployment going, I decided to add the nodequeue.save method in here rather than adding it to the services package.

Anyhoo, my updated nodequeue_deploy module is at http://drupalcode.org/viewvc/drupal/contributions/sandbox/katbailey/node... and it assumes that the nodequeue.save method is provided by a service somewhere. I've added my version of nodequeue_service module to my sandbox as well (this module comes as an add-on to nodequeue module): http://drupalcode.org/viewvc/drupal/contributions/sandbox/katbailey/node...

uuids

Posted by sun on April 6, 2010 at 7:25pm

I'm not too familiar with all of the solutions, but after talking to heyrocker + many others a couple of times, I believe that uuids are, overall, the wrong tool for the job at hand. We're rather talking about properly implementing 'foreign keys' database schema definitions, so as to be able to determine whether a certain value corresponds to something else. This foreign key data needs to be used to track and ensure data mappings, i.e. wherever a nid 345 is used, and that value maps to {node}.nid, it must be the same nid 345 from our mapping.

We want to use uuids if we want to identify and map data from independent networks. E.g. if I want to globally map data from d.o's user "sun" anywhere on the net, then I need a uuid, so as to ensure that another user called "sun" that is not that d.o user "sun" is not the same.

Daniel F. Kudwien
netzstrategen

I'd have to disagree, as I

Posted by mundanity on April 6, 2010 at 8:18pm

I'd have to disagree, as I don't believe foreign keys provide enough to deal with this issue. We still have to provide a mapping layer for all sorts of ids, which, beyond being unpleasant to implement, also gets more complex the more stages a deployment has to go through. UUID's greatly simplify this aspect and allow us to move on to more pressing deployment issues.

That being said, I definitely think foreign keys need to be implemented, especially with D7 defaulting to InnoDB. Being able to introspect these relationships will allow for a more generalized approach that contrib modules can theoretically hook into. I'd love to get away from the current mess of contrib modules having to implement their own structure for everything (ie, install profile, services, deploy, etc...).

In regards to managing settings changes, katbailey noticed the Settings Audit Log module (http://drupal.org/project/settings_audit_log). At first glance this looks like a pretty elegant way of getting the metadata we need about variable changes.

The problem at hand, and the

Posted by gdd on April 7, 2010 at 5:16am

The problem at hand, and the one Deploy aims to fix with UUIDs, is wanting to stage content on one site, push it to another, and then push updates. In these cases the node IDs will not be the same, and thus UUIDs are in fact needed. I do agree that we also need foreign keys though, that would be a huge boost for a lot of related problems.

Thank you for some detailed

Posted by gdd on April 7, 2010 at 5:15am

Thank you for some detailed and thoughtful comments. I'll try and gather some thoughts I have on these issues.

I think that attempting to draw a solid line between 'what is content' and 'what is configuration' can only bring grief. CCK/Fields API is the classic example where the two merge inseparably, but there are others. Trying to divide these two into two different ways of handling situations is going to lead to grief. There is no 'this belongs in the db, and this belongs in code'. Everything should be able to live either place without impacting the other.

Conceptually everything having a machine name would be helpful, but i'm not sure how realistic it is unless some standard way of auto-generating them starts coming out of the woodwork (as in how D7 gives a default content type machine name for new content types.) There are lots of things the user should not have to name themselves. The second problem is, as you point out, how to manage foreign keys in this scenario. Laying the machine names next to real IDs seems silly, we may as well just use UUIDs and be done with it. Features Extra isn't doing anything different than I'm already doing with Deploy from that standpoint.

My main reason for making a separate table for each type of object was that at some point I knew I would encounter an object with a composite key, and then I was going to be screwed. The current architecture allows mapping to any type of key safely without impacting the other objects. It also makes joining somewhat easier because you don't have to constantly be remembering to add 'where type = user' for fear of getting the wrong type of content.

I would very much like to rearchitect Deploy such that it uses UUID, and add my changes back into that project, but I am a little worried about how much extra pain this will cause. There are cases where Deploy needs to do something, then a UUID needs to be generated, then Deploy needs to do something else, in that order. Right now this is easy but when another module comes into play, you start needing to be concerned with module weights, and that can get messy quickly. When I first wrote the system I worked around it by just doing my own thing (the UUID module was just starting at the time as well.) It is intricate and flimsy but also necessary due to a variety of Drupal 'features'.

New Services should either become their own contrib modules, or be integrated into the contrib modules they are working with. This has happened in some cases already like Flag, which includes Services integration with its 2.0 version. For Nodequeue I would expand the existing Services definitions and submit them as a patch. The current policy of the Services module itself is that Services supports core functionality only, and everything else belongs in contrib somewhere.

Phew! So, uh, I guess I don't have any conclusions on this discussion, just more of my thoughts.

More on the separate tables

Posted by gdd on April 7, 2010 at 1:05pm

Another thing I forgot to mention about the separate tables, is that this offers you the freedom to add extra data you need for one type of object, without impacting the others. For instance, in node_uuid, I save the last time this node was deployed. This allows me to check against node->updated to see if this node needs to be re-deployed or if I can just ignore it.

subscribe : note to self

Posted by chriscalip on April 7, 2010 at 2:23pm

subscribe : note to self dev->staging->production

Incremental Deploy module

Posted by katbailey on May 27, 2010 at 11:43pm

Hey all,
just wanted to update this thread regarding the work I've been doing. I've added the Incremental Deploy module to my sandbox (http://drupalcode.org/viewvc/drupal/contributions/sandbox/katbailey/incr...), which allows for a "deploy everything that's changed since I last deployed" type of workflow.

Now to qualify what I mean by "everything"... ;-)

First of all, on its own, this only deploys entities that come from Drupal core (I've built add-on modules for other entities such as nodewords settings). And then even within Drupal core there had to be two approaches to the question of how we know something has changed: one for stuff that can be hooked into on change, and one for stuff that can't. For the former, e.g. nodes, users, taxonomy, it just implements the necessary hook so that it can add the new/modified entity to the active deployment plan. Or in the case of language and translation changes it adds a submit handler to the relevant forms using hook_form_alter. But for variables and blocks there's no way to hook in. So, this module depends on Settings audit log module which adds insert and update triggers on the variables table and then keeps a log of what's changed and when. I replicated that functionality for the blocks table.

There's a sub-module called incdep_service which looks after the service end of deployment of languages and blocks.

We're still in the testing phase with this stuff and I'm just trying to gauge interest in it before I move it from my sandbox into contrib-land. I'll post some screen shots when I get a chance.