This project would build a system for managing the publication life-cycle and staging of content on a Drupal site. This system would equate to a “time machine” for Drupal allowing you to see the state of your site at any given time in the past or future. This would be accomplished via publication “states” that are date-tagged and can be moved through a publication life-cycle or workflow. In essence, a content editor would be able to package a series of content revisions for publication and view that content in the context of the overall site at any given point in the past or future based upon the selected state.
A system like this would be a series of contributed modules. The first would be a module to manage publication states for entities in the system. The second would institute some sort of unique identifier system to manage revisions for all of the entities in the Drupal system (not just nodes, but blocks, menus, taxonomy terms, etc). The third would be a module that allowed for a site “preview” along any timescale (past or future) with any publication state.
This module would allow authorized users to view the site in any state, past or present, and be able to view their content in the context of a publication state and time. This would serve the following purposes:
Provide a clear system for staging content and being able to view that staged content in the context of a site.
Management of entities that don’t have a current revision system.
Enable state-based publication workflows and the capability to define publication cycles.
Comments
What do you need from the community?
What contributions do you need from the community on these initiatives? Bug-testing and patches? Documentation? Help writing the modules in the first place? What do you anticipate being the breakdown of tasks between the large organizations and the community? Do you have a timetable or roadmap in mind, yet?
I am definitely interested in helping with this initiative in some way, so let us know what you need!
Blog post coming
I've been working with Dries to get a blog post up about this group. I expect that will happen in the next week or so and it should clarify a lot of the purpose stuff. I'll follow up with the info I mentioned about how the group functions.
Would love to be involved
Look forward to more details. Would love to be involved.
Rik
maintainer of http://drupal.org/project/revisioning
Blog Post
Where can I find the blog where you will be featured?
Post Up!
In case you didn't see it, Dries made the official announcement on his blog just after DrupalCon. You can read it here:
http://buytaert.net/announcing-acquia-large-scale-drupal-program
Still working some stuff out
Dan,
I'm going to be posting a bunch of information about the LSD group and what we can expect to see from them. The short of it is that we'll be having contributions in the form of development time or financial resources.
So far, we just have the conceptual plan of "we want to do this cool thing... and it might look a little bit like this". However, there isn't anything deeper than that right now.
The only way I can conceptually see this happening is doing the following:
Obviously, all of these things are really non-trivial... states is probably the starting place. If we can conceptually define what the state system would be, and start to build it, I think that it is a great starting place that could easily exist as it's own contributed module. The rest is dependent on the success of that module, so it's a natural starting place.
Is this for Drupal 7 or 8? If
Is this for Drupal 7 or 8? If its for 7, you should talk to Dick Olsson (dixon_) who has been doing an enormous amount of work with the Deploy module as part of his time at Al Jazeera. I'm sure he has a lot of insight that could be crucial to getting something like this working.
For 8 obviously I already have this getting started as part of CMI.
Mostly a Drupal 7 focus
This solution is geared toward D7, but I want to be thinking about what's coming in D8 so it's not too difficult to get it to work on that version. It would be great if you could check in from time to time so I'm not doing some crazy stuff that is going to conflict with the D8 stuff you're doing.
A few possibilities...
Thanks for the breakdown, Chris. A few thoughts:
So, presumably states would need to keep track of at least (A) a unique identifier (an auto-incremented integer? a UUID? a machine-readable name?), (B) a start date when the state goes into effect, and (C) and end date when the state is no longer published (should be optional). Does this sound right to you? What else would a state need - maybe a human-readable label and a detailed description field?
Curious to know everyone's thoughts on the direction to go with this.
Looking into other modules
I'm having webchick and moshe help me look into a few other modules that we could have as dependencies or build from. Unfortunately, my own knowledge is limited on how this all fits together. I'm going to be focusing on defining requirements, user stories, etc and have others jump in on the technical implementation. I get to paint the grand picture and then the people that know more about Drupal development can help me take a more pragmatic approach.
Deployment module suite
there is also the Deployment module suite, which revolves around a better UUID implementation
http://drupal.org/project/deploy
http://drupal.org/project/incremental_deploy
personally, I am not so impressed by Workbench.. bit disappointing really
interestingly, Total Control has now a dev release for D7, which is good. Its far more intuitive imho
Deploy requires a separate server and code base
I like the idea behind deploy, but it requires a pretty significant setup. You have to have a separate Drupal site that mirrors your files, database, and code. If you've got a big enterprise publication site that can be a challenging proposition. Further, we're trying to enhance the experience of authors and editors with this system. I suppose that using a deploy-based system you would just give authors access to the staging server for content creation and editing, but I still don't think it quite scratches the itch I'm driving at.
Deploy + Big Enterprise Publication Site = Challenge?
I thought that one of the main target markets for deploy was "big enterprise publication site"s. Even though many smaller sites could benefit from deploy, I think that most of them don't have the resources to manage deploy. The bigger end of town generally has heavy iron and deeper pockets which is what you need for setting up a "big enterprise publication site".
It's About Author Experience
AFAIK the Drupal tends to look at content staging from a more technical standpoint. A la "I want content to go from point A to point B to point C, and I want to be able to do some cool stuff to control how that content moves between those points." That is great! However, it misses a few things:
Deploy obviously works well for some folks. I'm not advocating they replace that, or even that this solution would replace deploy. I think they work in separate spaces.
User Stories and Methodology
Here is a more thought-out list of user stories for this idea (please feel free to make edits):
https://docs.google.com/spreadsheet/ccc?key=0AiblPnnnv-nvdGZwOVRleDFxR2F...
This includes basically 4 different things:
Some further explanation:
States exist for the dual purpose of managing a publication workflow and allowing the staging of content based on some future event. Dates exist to determine when that publication is considered active and ready to be published.
The essential functionality is the ability to view content relative to a point in time. This would include both historical points (to review how the site looked previously) and also future points (to see how it would hypothetically look). We'd want to offer full site features and capabilities regardless of past or future viewing so you could truly see the context of both a node and the entire site. I'm unsure of the technical implications for managing effective dates. I'd like to tie dates to a state, to allow for multiple publication dates based on a combination of state and date, but that may be too complex. Further, it's important to figure out if having a "beginning date" and an "ending date" is important, or if you just want to assign a single date value to a state and have the system automatically change the state when that value is reached. The former is easy to understand. Here is an example of the latter:
An editor reviews a piece of content that was created by another person on Jan 2nd. The article is supposed to be published in the 10th and taken down on the 17th. The editor opens the article that was assigned as "ready for final review" (the current state) on Jan 2. He views the piece using time machine to look at the site on publication day (Jan 10). Seeing that everything is good he changes the state to "scheduled for publication" and assigns today as the date. He also sets a second state of "published" to Jan 10, and a third state of "archived" for Jan 17th. When each of these dates is hit the system would automatically alter the current state of the node. Seems complicated, but it's a super-flexible system.
When an organization wants to publish content, there are typically several people involved in that publication. This includes people like the original authors, editors, layout people, and finally someone that signs off on the final product. Being able to take content from one state to another, see the evolution of that publication, and push it through a workflow is a powerful too. However, while really nice to have, I don't consider the workflow component to be a part of the minimum viable solution.
Many organizations create multiple versions of content that are "ready for publication" based on the outcome of a future event. The easiest example of this is looking at a sporting event. Two teams (Team A and Team B) play, and only one team can win. So multiple pieces of content are written (one from the perspective of Team A winning and the other from the perspective of Team B winning). The content staging system should support being able to stage and review both pieces of content in the context of the whole site. Again, this isn't essential functionality, but it's a huge added value to the system.
Finally, revisions are an important part of this as well. We would want to be able to track revisions of things independent of states. A single state may have many revisions of a particular bit of content. I think a coupling of revisions relative to a state is important. In this way, one person could revise content relative to a state and another could revise it relative to a different state. This would help changes to live content be made (e.g. typos) in a "published" state while at the same time revising more major things (like adding more relevant information) on a second state called "revised". Separating the revision information would keep it a lot cleaner and more useful. This is probably the least important part of this system, but it's valuable to think about.
well, that's a pretty
well, that's a pretty interesting workflow-publishing-state paradigm
Deploy hasn't really taken off, documentation? installation? and for my tests on D6 I lost interest when it became obvious to my feeble brain that once you'd Deployed content from one site to the next you then had two copies that could get out of sync (I think)
are you looking at a single site paradigm ?
personally, I think you need to consider the political issue that in publishing "sites" need to be owned by a business unit and wont be too happy about being just a section on a mother-ship type site
the project I am researching for has dozens of satellite sites - many out of orbit really - and a bunch of central sites
in one multinational media company I worked on years ago, editors/executives where very serious about having each major masthead/brand own its own website... from a business perspective it was important too, from a divesting perspective. .. the value of a media brand is diminished if it cant be packaged and sold as a discrete unit
A thought-out example of how this might work
I am talking about a single server, but you'd still manage separate domains differently. I'm not really looking to solve the problem of how sites share content (there are other things going on to help with that). I'm more looking at how to managing the viewing / editing of sites in a workflow.
What we're not trying to do
Take the example of two sites each with a different brand. We'll call them "bagel" and "cream cheese", as that's what's on my desk right now. Theoretically, with this system you could have a single site called "breakfast" that served as a central place for content and then you could cut states for that content relative to bagel and cream cheese. The URL for viewing the "bagel" site as of today would look something like this:
http://breakfast.com/somepage?state=bagel,date=20120109
You could probably do some rewrite magic to make to make that appears as http://bagel.com, but that's really not what it's intended to do.
What we are trying to do
You work for a company with a site called "breakfast", and the site has a "breakfast of the day" page. You're asked by your editor to write up two potential featured breakfasts, one about different bagel varieties, the other about different varieties of cream cheese.
What this illustrated, and the challenges
So, that's a complex example about how states, revisions, and dates all play together in a system like this. I'm assuming a lot right now, and that's also a very "pie in the sky" vision. Obviously, problems emerge under scrutiny (e.g. what if you want to manage complex relationships between states like ("bagel" OR "ready for publication") AND a date of "Jan 10, 2012"). I also didn't really illustrate how the system would go "back in time" or the changes to blocks, menus, etc that might occur. Regardless, this is the best example to date of what this might look like.
Secondly, this problem shows one of the major design decisions I'm struggling with. States can represent a context (e.g. bagel) and a workflow (e.g. ready for publication). This duality might be served better by a combination of two controls instead of one highly abstract state.
Most likely, we'll start smaller - building a system that allows for an effective publishing date somewhere in the future, and build in support for states, revisions, and other functions. I really do feel like dates is still at the core of what we're trying to build, with a close second being contextual states.
I've been down this road before...
I designed and lead the development on a homegrown CMS that had this kind of staging/publishing service, including revisions, at its core. It was also designed from the ground up to be a deployment system, publishing and expiring content (data and files) across multiple deployment target servers. From reading through these posts, I think I've been down this road before and I might have some experience to share.
One of the challenging aspects of that system was guaranteeing that content dependencies were identified and that an item could not be published or expired if there were dependencies on that item, thus ensuring that other content could not become broken or orphaned. Dependencies included related or associated content, "views", taxonomies, taxonomy classifications, permissions, etc.
To solve this problem, I created a sort of transactional or relational CMS, where a form of referential integrity was maintained between elements. It even had some handling for cascading references wherein publishing or expiring a "parent" item would automatically publish or expire associated "child" content with it, ensuring that content "packages" were always published/expired together. Required dependencies were checked on publish and expiration and if referential integrity was lost the whole publish/expire package would be rolled back (hence the semi-transactional, part.) This all happened at a layer of abstraction higher than the DB, so it wasn't dependent on the databases implementation for referential integrity. That allowed any type of content, whether stored in the database or file system, to participate in the transactions.
The idea of time-stamps, unique identifiers and states are necessary elements of this kind of system, no doubt. However, I think that relying on just the "time-stamp" to ensure that the right stuff was published/expired at the right moment will not be sufficient to ensure that everything that should be published is published. I think the time-machine would show you what was published at any given moment in time, but it would also show you what was broken at any moment in time, wouldn't it? The time-stamp wouldn't be a guarantor of consistency and I think that is important in this kind of system.
The workflow I would agree is not a necessary condition for a minimum viable state. The procedures that an organization might need to follow to get the content into a state where it is ready for publishing or deployment are not separate from the work that needs to be done to actually enact the workflow.
The CMS I created is now defunct except for one soon-to-be-migrated-to-Drupal implementation. I believe I have documentation of the architecture of the staging/publishing system and I think it could be a useful reference. I have considered what it would take to build a similar solution in Drupal and came to the conclusion that it would take a deeper understanding of Drupal than I have, but people on this thread obviously do have that understanding. The code, while not in PHP, and the database structures could also be useful.
If there is interest in the documentation or code base or perhaps in seeing how it actually functions in the last living instance of this CMS, please let me know.
Thanks and good luck with this!
Fantastic!
I'd love to see a real-world example of how this was created in another system. Also, point taken on referential integrity. I've just been thinking about it in terms of some set of bulk operations changes to nodes with specific states, but it sounds like that might not be enough.
I would love to spend the time with maybe webchick or moshe taking a look at what you have. How would we go about setting up a time?
Sounds good. I'll send you a
Sounds good. I'll send you a PM with my direct contact information and we'll set something up.
Initial research
I went diving into the deep end of the pool today to try and figure out some things around this proposal. Thanks to davereid, agentrickard, merlinofchaos, stevector, recidive, and others I'm sure I'm forgetting in #drupal-contribute who bounced around ideas. There's definitely a lot of synergy in this space happening right now! :)
Here are the main players in terms of already-existing solutions, plus some pondering on some of the other requirements:
Workbench Moderation module
http://drupal.org/project/workbench_moderation handles custom publishing states, keeps a revision log of what was changed when and by whom, allows you to set up transitions between states and control who can change what to where, and has a nice UI. When coupled with http://drupal.org/project/revision_scheduler you can schedule transitions of workflows to happen at certain times.
The downside to Workbench Moderation is it's very hooked into nodes. So it wouldn't be possible atm to use it with other entities such as files and taxonomy terms.
Here's a video: http://www.youtube.com/watch?v=Rd0AWNKtgLw
Entity Revision Scheduling module
http://drupal.org/project/ers is another approach to what Workbench Moderation and Revision Scheduler module do. This one is more abstracted, so could be used for other entities aside from nodes (and currently has support for Panel Panes, as one example). The downside is it currently only knows two states: Draft and Published.
Here's a video: http://yfrog.com/ng44160214z
State Machine module
http://drupal.org/project/state_machine is an API-driven workflow management module which looks very sophisticated. (Haven't had a chance to try it out yet.) It provides exportable workflow states and transitions and a robust OO API. Reports seem to indicate that it requires some elbow grease in terms of UI, though.
Workflow
The classic http://drupal.org/project/workflow module has recently been revived for D7 (yay!). Didn't try it out yet, but the primary benefit this has over Workbench Moderation from my reading is that it allows custom workflows per content type, vs. Workflow Moderation is global for all content on the site.
So many choices!
Indeed. But note that right now, there are efforts underway to merge Workflow Moderation, Workflow, and State Machine, which are happening in the 2.x branch of Workflow Moderation: http://groups.drupal.org/node/198188
Revisioning module
http://drupal.org/project/revisioning is a fork of my old, dilapidated http://drupal.org/project/revision_moderation module (sniff ;)) that adds some nice features to revisions, such as the ability to edit a revision in place, or submit/schedule edits to an existing revision for review while the existing one stays active and public on the site.
I ran into some initial troubles in testing this out (could have been environment-related) so I didn't get too far in the evaluation. Seems to have integration with oodles of other modules. Not sure how nicely it plays with Workbench et al yet though.
"Time Machine" functionality
There were two modules just posted just today (I love the Drupal community! :D) which might be able to help with this. I have not had a chance to play with either of them yet:
When I talked this requirement over with Earl, he warned that trying to support historical dates is going to be a performance drain, because you have to carry a bunch of baggage around with you. We should consider carefully if we really need/want this, at least for the initial crack at it.
"Entity/field-ification of things" modules
A project like this gets a heck of a lot easier if everything you're trying to manage and preview is consistent, and in Drupal 7 that means entities. Unfortunately, Drupal 7 core only takes you so far with entity-fication (nodes, users, taxonomy terms, comments), so here are some add-on modules that can help:
I still need to dig in here a bit more to find out exactly where the gaps are between the use cases and what the existing solutions do. But it looks like there's plenty of prior work to build from!
Any comments/corrections on this welcome!
Background to force_timestamp
I've been tinkering with force_timestamp for a little while, I added ERS support today and pushed it out. The intention isn't so much to use it for winding back the clock, but more for seeing what something will look like in X (mins|hrs|days|weeks). We are also using it with another module we released earlier today Context Date.
We have some interesting requirements from the client for this project. They're in financial services and have some strict regulatory requirements to comply with. Their internal governance requirements also require that they have multi level sign off before any changes go live.
Challenges
The two biggest issues we've encountered during this project is Drupal's lack of versioning of configuration and the arbitrary distinction Drupal makes between content and configuration.
Versioning Config
Initially the client wanted the standard enterprise environments (Prod, UAT, Stage) and we'd almost sold them on the "Drupal way" of managing content changes on production. Then someone asked so how does Drupal control changing the text on a tile (aka a block). The answer was it goes lives immediately. That was a deal breaker and now we're back to Prod, UAT and Stage.
Content and Config
The other issue with Drupal is how it distinguishes between content and configuration. Some months ago I used @webchick's diagram to demonstrate the difference between content and config.
The client immediately asked for us to make Drupal work more like how a normal person thinks.
Other Issues
During our investigation of how to make this all work we discovered that Drupal will delete entries from files/ if the asset is no longer referenced. Schema changes which result in a column or table being dropped can't be undone. A block (or in our a case a box) is lost forever once it is deleted.
The Solution?
We're still building things out for the client, but we think we've hit on a pretty good middle ground. We've adopted the following approach:
The first 2 concepts should be pretty familiar to experienced Druapl devs. In case we find something which is "sorta config" but tightly bound to an entity and it doesn't have Features support we're implementing new modules to handle this. An example of this is the URL Alias module. Changing URLs based on a date is out of scope - at least for phase 1.
Before anything is deployed to production a full snapshot of the environment (db, assets) is taken and stored - the code is all in git. These will be retained for a period of 7 years to comply with regulatory requirements. If someone says "I need to know what the site looked at 12:34 on 7 July 2011" we can recreate the site at that point in time pretty quickly. This isn't done on the production site and it requires some manual intervention but it can be done.
There are other components to our solution which we will be releasing as they are ready. I also plan to write up a detailed case study once the site is deployed.
I now realise I should have called force_timestamp "delorean" instead ...
PS Sorry if this has become somewhat of an off topic braindump.
PPS Should we call this "Project Delorean" or "The Delorean Initiative"?
Super helpful
I'm down with Delorean... I've been calling it "time machine" internally, but I think yours is more fun (and less likely to get sued by Apple).
I want the ability to manage everything in production, but still provide for the assurance to preview content in the context of the overall site prior to production. I think a single site has a sweet spot within the user base that would be really advantageous for Drupal to take advantage of.
The issue you brought up is exactly why we need to have some sort of UUID system to help manage "all the stuff that isn't an entity" - mostly blocks and menus. Further, there needs to be a way to make it extensible to things like Panels and URL Alias.
You keep writing "Workflow
You keep writing "Workflow Moderation" when I think you mean "Workbench Moderation" :)
and a very nice workflow
and a very nice workflow diagramming ui in maestro
The Maestro module is a workflow engine/solution that will facilitate simple and complex business process automation.
Maestro has a number of components that include the workflow engine and the visual workflow editor.
http://www.youtube.com/watch?v=4DkyEYdFcSY
http://drupal.org/project/maestro
Holy moley
That reminds me of creating functional flow diagrams and maintenance plans for MSSQL. Crazy crazy!
This is well beyond the basic workflow controls that I was envisioning for this system. However, it's really really cool. Maybe we could find a way to extend the time machine system to support more feature-rich systems like this.
I think it's important that we have a hook system to allow people to extend the system beyond the basic functionality.
We are implementing revision
We are implementing revision tags in a currently client site. Here is a sandbox of the current module.
http://drupal.org/sandbox/indytechcook/1397210
It also has an implementation of state_machine.
State Machine contains a module called "state flow". This module adds a UI and node implementation of statemachine.
If you install this sandbox, (make sure you have state machine on the site also), you can see it in action.
There are install instructions on the page.
[Added:]
Here is the main hack:
<?php
class ExampleNodeController extends NodeController {
public function load($ids = array(), $conditions = array()) {
if ($site_tag = example_workflow_get_tag()) {
// Force the loading of revision
$conditions[$this->revisionKey] = example_workflow_get_vid($site_tag, reset($ids));
}
return parent::load($ids, $conditions);
}
}
?>
Edit:
I created http://drupal.org/project/template_field to work with content revisions. So the Site builders builds out a template, stipulating the content structure of the template. Then the editors add the content to the templates. Our client wants very very very very custom layout per node, not per content type, per node. It's kind of an extreme case.
Neil Hastings
http://twitter.com/indytechcook
We actually have several
We actually have several promising solutions here. That ExampleNodeController that indytechcook just posted looks pretty clever to me. Nice job, folks.
I built a time machine in 2007 for NY Observer newspaper. I wrote up the editions concept at http://drupal.org/nyobserver. Basically, landing pages (nodes) were built with an effective date field and the menu callback would just pick the node with most current effective date. The modern solutions are both more complex and more general purpose than that.
not sure if NewsML is
not sure if NewsML is relevant here, but I think its used, others more current in big commercial media will know
http://www.iptc.org/cms/site/single.html?channel=CH0087&document=CMS1206...
Collaboration among existing modules
I've started to pull together some maintainers of existing modules in this space over at http://groups.drupal.org/node/198188
Webchick mentioned this in http://groups.drupal.org/node/198223#comment-662173
Over the weekend I started to squeeze together State Machine, workflows defined as CTools exportables and merlinofchaos' new Entity Revision Scheduler http://drupal.org/project/ers http://drupal.org/node/1398994 This work is still in a very rough state.
As skwashd mentioned in http://groups.drupal.org/node/198223#comment-662243, ERS can be used as a basis for the time machine concept.
Network and Software Engineer here to help
I am a University trained network (TCP/IP) expert with some decent fundamental knowledge of PHP and SQL. I am looking to get involved at the earliest chance with this project. I have strong written skills and a lot of time and energy. If anyone has anything I can help with, please; let me know! I will be looking at this group more tomorrow to see what I can help with, what needs to be done etc. I like what I have seen of Acquia, that is most of my reason for signing up for this group. You guys are awesome!
-Ryan
Seattle PST
Ryan Michael Hell
Business Information Systems Developer
Seattle, WA 98003
ryan.h@nwtronix.com
A fun and exciting challenge: access control! :)
Ken Rickard pinged me today about https://drupal.org/node/1284492.
Access control is currently only done by entity ID, not entity revision ID. Meaning that being able to "flash-forward" revisions will inherit the access control of the parent node, unless we start monkeying with query tags. Paths, similarly, are not revision-able currently.
One idea is to go the route of SimpleTest-esque "spin up a clone of the existing site as a prefixed environment." But there are more than a couple of challenges with that approach:
a) How do you synchronize any changes you make in that environment back to the "host" site (I guess we could just leave it read-only).
b) To this day we are still finding new and exciting ways that SimpleTest "infects" the "host" site with this approach.
c) This approach seems like it would be fairly untenable in a situation where a site had > 100K nodes and users or so.
Node Revision Operations
Here is a patch to state flow that adds revision based content admin that uses state machine as the node status. It adds hook_node_revision_operations, etc to provide bulk changes. http://drupal.org/node/1399994
http://drupal.org/sandbox/indytechcook/1397210 also integrates with it allowing a filter on the content revision.
Neil Hastings
http://twitter.com/indytechcook
Integration with Open Publish
Neil,
Thanks for a detailed postings. At a different level, I wonder whether anyone is interested in integrating all this with Open Publish at http://drupal.org/project/openpublish
Aruna Kulatunga
CTO
Comunicamos.EU