rewriting workflow?

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
fago's picture

I'm considering to start rewriting the workflow module.
You may ask: But why?

  • to make programmatically supplying a workflow easier
  • to make it more modular and so more extensible
  • to allow multiple workflows per content type
  • to make it more efficent and to have it cleaned up

So how do I imagine the new improved workflow module?

I am thinking of dividing it into three modules, that can be bundled in one archive/drupal project.

  • workflow module (the engine)
  • workflow UI
  • worklfow permissions

The workflow module would be an API only module. It allows other modules to define workflows and to execute their transitions. It doesn't provide any access checks, as this is needed rarely for programmatically supplied workflows. So this modules task is to make programmatically supplied workflows possible.

The workflow UI module provides the UI (user interface). With this module activated, you get similar functionality as with the current workflow module, so you can also create new workflows or edit programmatically supplied workflows. The workflow module allows other modules to define their workflows to be editable per UI or not.

The workflow permissions module adds the per role permission system like it is existing currently. There is a wrapper function workflow_access() in the workflow module, which invokes this module, if activated. If not, everbody would have access. Perhaps this module can be incorporated with the workflow UI module.
However building it as own module would allow programmatically supplied workflows with configured permissions.

Schedule

I'm planning to write API workflow module quite soon as I need it for a project. This work will be 4.7 compatible.
Workflow UI and the permissions module will follow later, perhaps they will be initally written for 5.0.

Ideas

Provide a CCK field for changing workflow states.

Technical changes

I would prefer using strings for identifiers for workflows and their states as it makes programmatically supplied workflows easier to read. Then I would cache the workflow-content-typ map with drupal's variable system to speed up getting this often used information.
The new module will use the old interface to the actions module, so that currently existing actions can be used furthermore.

I'm looking forward on your feedback, suggestions and ideas. Of couse any help is appreciated.. :)

Comments

Already started

jvandyk's picture

I have already started on this. I've split it into workflow.inc and workflow.module similar to menu.inc and menu.module.

Awesome. I am looking

anthonyoliver's picture

Awesome. I am looking forward to it. Do you think it will be ready for 5.0?


http://xamox.NET

core

moshe weitzman's picture

dries and others have been wanting better workflow in core. please consider a minimal implementation as a patch for core so that people can swap out the simple published/unpublished workflow that core uses now. this would be analogous tothe simple node types feature in 5.0

speaking of core....

Anonymous's picture

Am I crazy, or do these new 5.0 core functions look like a perfect scaffolding for the new workflow?
1. http://api.drupal.org/api/HEAD/function/hook_node_operations
2. http://api.drupal.org/api/HEAD/function/hook_node_access_records
3. http://api.drupal.org/api/HEAD/function/hook_node_grants

Seriously, I might be nuts... I like the looks of these hooks though.

More detail

jvandyk's picture

Here's the direction I'm going, now that my laptop working again.

workflow.inc will reimplement the current ad-hoc workflows that we have (published/nonpublished, promote/unpromote, sticky/unsticky) in a compact state machine model. These will be read-only workflows, as they are now; the difference is the ability to fire actions at transitions which we currently lack.

The user-defined workflow component will be in workflow.module.

This is one of those multilayered problems that we're all so familiar with. Should taxonomy tables be used? (We've had a bit of that discussion already.) If so, should taxonomy be rewritten first to integrate taxonomy_access or tac_lite? The security implications of workflow access is pretty important.

At this point I'm leaning toward (re)-using the status column in the node table for the workflow state. That means that node reads are no more expensive than they are now, and eliminates the workflow_node table.

For internal workflow representation, I think a nested array in a variable_set/get should replaced the SQL-heavy implementation we have now; some benchmarking may be required in order to verify this.

I think we have to be very conscious of speed and footprint, so I support the approach of a minimal but functional workflow.inc and adding more code only when needed with conditional includes.

I think the UI is the hardest part, especially the permissions aspect. So fago, I think you're right on in what needs to be done.

interesting

fago's picture

thanks, very interesting.. Imo your workflow.inc equals the workflow.module I described. Do you already have some code? I would love to have a look at it.

combining workflow with taxonomy is also an interesting idea. I already thought a bit about this and I agree with you that it is a bad idea to reuse taxonomy for storing workflow states. Imho you would win nothing but confused users ;) I think the access problems should be solved by having the na_arbitrator concept in core. Beside that it sounds more logical to me to base access restrictions on workflow states and not on a special type of classifaction.

Could you give me some more details of your imagination of the compact state machine model of workflow.inc?
What would be different to the currently used one by the workflow module?

I would go for the same model as now expect of allowing more than workflow per node, which is oviously already required for replacing the current ad-hoc workflows. But how do you think of reusing the current status/promote/.. columns of the node table? Of course this would work for a core-workflow.inc, but where would be other workflows be stored? Imo a separate storage table would make more sense. So workflow.inc could store the states of workflows contributed by other modules also there. So I would like to have workflow.inc handling the ad-hoc core workflows and additional workflows contributed by other modules or the workflow module, too.

As I already mentioned, I need the workflow state machine (used programmatically and for multiple content types) for a 4.7 project quite soon. So I will do at least a 4.7 implentation of the state machine as workflow module or so. If you have already some work done on worklfow.inc, I'd like to make use of this work as much as possible. So we could combine further efforts on this topic.
Or otherwise I would start writing a module implementing the state machine, which I previously called (new) workflow module. Perhaps we can use this for benchmarking the variable_set/get approach vs the current sql or base the later core patch on this work.

tagging api

mfredrickson's picture

As the author of the ill advised workflow-on-taxonomy system, let me spend some time talking about the good ideas in the code (I freely admit that there are many bad ideas). While using taxonomy's UI is a "bad idea", I still stand behind the idea of a common "tagging" API. In the workflow-on-taxonomy module, I used taxonomy as that tagging API. I think the better solution is to abstract out the tagging component of taxonomy into another system.

At the core, workflow and taxonomy (and numerous other modules) have the same need: labeling nodes with tags. For workflow, these are the states. For taxonomy, these are the terms. But they are really the same thing.

Therefore, I purpose a tagging.inc. It should probably be data agnostic (ie. not directly tied to nodes). Basically a simple system of tying data to nodes. This would accommodate multiple workflows, a common access system based on tags, and a statemachine.inc that provides most of the internals for workflow. States are tags, and statemachine would just provide the functions for moving between tags in a valid way.

benefits?

fago's picture

First of all, thanks for your input :)
Indeed it sounds like a good idea to abstract this to a tagging.inc.

But for what? What would be the benefits of this approach?
(Expect of more work with rewriting taxonomy..)

Imo access control is no benefit. Access control on worklfow and terms is each already possible now. As soon as there is the workflow system in place with a state-based access control system in conjunction with a "content-type creator" like it is in 5.0, for what would people need access control on tags?
Tags are for tagging. Classifying content - building access control on top of might not be a common thing to do.

Furthermore then you have to make a distinction between term tags and state tags. How would this be done?

Taxonomy seriously needs a

mfredrickson's picture

Taxonomy seriously needs a rewrite. Though perhaps I am the only one who thinks that. Not that I've written a lot of code for that purpose.

Benefits:

  1. Access control: Yes, there are methods to get node access modules to work together now, but to quote Earl Miles "The biggest benefit to these is that access control modules can co-exist, though they still need to be a little bit careful." Why not take away that uncertainty by unifying a lot of node access based on tags - be they taxonomy terms or workflow states. Sure, we could string together a bunch of separate modules with bailing wire and a prayer, but I would prefer a more robust solution.

  2. DRY - if taxonomy needs to tag nodes, and workflow needs to tag nodes, why not extract out their needs to a common code base? Fewer bugs, less code to maintain, working together - not at cross purposes.

  3. APIs == good - Drupal is a framework as a much as a web app. The more APIs we can cram into it the better (to a reasonable degree). Clearly, lots of things need tagging, why not provide it? If there is desire to rewrite workflow, why not make that the basis for meeting this need.

As to differentiating state terms and tags, that's easy - just provide the module name along with the tags (or some other identifier). Rather like categories encapsulate terms right now.

-Mark

more ideas...

fago's picture

I agree with you that taxonomy could be improved also. E.g. currently it's hard to programmatically set terms: http://drupal.org/node/56670
So it would be cool, if it would be possible to reuse a general "tagging" API for storing states like you describe it. But taxonomy isn't there now.

I thought a bit more about firing actions. In an ideal world it should be possible to fire actions in dependence of changing states of the whole node (including its data). To go in this direction I would propose to make at most one transition for each node update.

E.g. if the flags "published, moderated, sticky, frontpage" are implemented each as separate workflow. Wouldn't it be cool to fire an action only if an moderated node gets unpublished?

To achieve this I would build the state of a node out of the states of each workflow assigned to it:

State(node) = <State(WF publish), State(WF moderated), State(WF sticky), State(WF frontpage)>

We could represent this as associative array:

state = array(
  Worfklow Publish => published,
  Worfklow moderated => unmoderated,
  ..
);

Then we could allow to define flexible actions on this state changes. E.g. WF Publish has to change from Published to Unpublished and WF, Moderate has to keep the value "Moderated", Sticky and Frontpage don't matter.

Furthermore another more powerful approach came to my mind. Remember: In an ideal world it should be possible to fire actions in dependence of changing states of the whole node (including its data). This approach would achieve that!
Each time a node is saved it is considered to be a new state. But how to handle firing actions on transitions?

Modules can define conditions. A simple condition for a state called "Moderated" would look like this: array('status' => 0, 'moderated' => 1). Workflow checks if the condition is TRUE, and if it is the current state of the node would be called "Moderated". Then actions could be defined between these declared states.

Note, this approach would be different in some points:
* each node could be in more than one states in the same time
* this would work without rewriting the current node flags

In this case we would still need the possibility for modules to create new workflows, for which the workflow module saves the state of this workflow.
So this approach would be really powerful, but building an UI for defining the conditions might be lot of work. However we could go with this approach and only provide the whole power for modules while the admin interface only permits to define simple actions on some basic variables like the status flags and of course the workflows handled by the workflow module. E.g. this approach would enable you to react on an update of an published node and set the node back to moderated.

Any opinions on these ideas? :)

splendid

moshe weitzman's picture

this is quite exciting. the assumptions you've detailed make sense to me ... promoting to home page.

Don't depend on nodes

gordon's picture

This sounds great. I know that I have not looked at the internals of the current workflow module, but what I would like to see is that it is not dependent on nodes. So that it can be applied to any type of data.

I know that I want better workflow in the E-Commerce module to allow store owners to customise the way that they process orders.

Thanks.

--
Gordon Heydon

multistep

moshe weitzman's picture

you really ought to look into the multistep feature of fapi in 5.0. modules can form_alter themselves into the checkout flow as they wish.

We have already done this,

gordon's picture

We have already done this, but the forms are still built the same. We need more flexible control over workflow, and make sure that everything happens in the right order.

The use of the workflow module is not exactly for the checkout process, but the processing of orders once they have been received.

--
Gordon Heydon