Notes from DrupalCon Chicago 2011

Posted by gdd on March 15, 2011 at 9:52pm

At DrupalCon Chicago there was a ton of energy around Drupal's staging and deployment problem. At his keynote, Dries identified this as one of the major initiatives for the Drupal 8 cycle, and following the keynote three Core Conversations were given on the topic by myself, David Strauss, and Howard Tyson. Those talks sparked conversations which continued through the week and generated a lot of really cool ideas. This post is intended to document the ideas that were discussed (at least the ones I heard) and to use those ideas to foster further discussion.

I personally think it is important to air and think about a lot of different ideas, even if they seem unworkable at the outset, or are too much of a reach for a core release. Shoot for the stars and adjust as needed! Lots of these ideas seem to be gaining traction but I would love to keep talking and throwing things out there.

Much of my core conversation was centered around the fact that the line between what is content and what is configuration has become so blurry that it no longer makes any sense. A couple of good examples of this are a nodequeue that has five hand-selected nodes in it, or a Panel that uses a node as context. Users don't know or care that the line is blurred here, they just want to push their site live. I did not provide a solution to this larger problem, I mostly wanted to reset the conversation from being about 'configuration management' to being about 'everything management'. I did provide three pieces of low hanging fruit we can fix to make the staging problem (in particular content staging) more manageable - make sure everything in core has a reliably unique identifier, make everything exportable, and fix up our save/load APIs so that they don't need to rely on forms or other systems in order to operate properly. If we can reliably export and save everything in Drupal, then implementations of how to deploy this data can be left to contrib. You can see my slides at http://www.slideshare.net/heyrocker/core-conv.
Howard Tyson presented his ideas for creating a Settings API for Drupal. This system is inspired by CTools in that it allows definition of settings to be exported that can be specified in module code and overridden by the database or settings.php. One of the pieces of functionality Howard has added is is 'Relations' IE dependencies. Settings can be associated with each other and other things in Drupal like modules. You can depend on modules, or you can create your own groups to bundle settings together. The Settings API also provides a log that records all settings changes and allows changes to be synched through update.php. You can see Howard's slides at http://www.slideshare.net/htyson/system-settings.
David Strauss presented his ideas for replacing our existing configuration system with on-disk storage in JSON. The main reason to use JSON is convenience over other storage formats like XML, despite the fact that they are more human-readable. This data would exist only on disk - read when a page is loaded and written when a page is saved. It would never be stored in the database at all and the files would be canonical (thus the term 'exportable' would no longer apply here). A new top-level writeable directory would be created to store these files (much like the files directory now.) One big advantage of this system is that it becomes much easier to deploy code that depends on specific settings, something that is pretty difficult to do now outside of update functions and the like. There was a lot of discussion around this idea through the week.
chx has actually already posted a patch which starts to address the storage of variables on disk that need to be used in the bootstrap process. His patch can probably seen as a starting point towards the kind of system that David described and also integrates some of the same ideas as Howard. See http://drupal.org/node/1059972.
During my presentation and in a blog post afterwards, fago proposed using entities as a foundational unified API for all Drupal objects. This would be available to provide the APIs needed by all objects. Not all APIs would necessarily need to be on all objects, but we should provide them such that any entity could opt in if they want. A centralized entity API would also be an enabler for web services, which is another one of Dries' core intiatives for Drupal 8. His blog post can be found at http://wolfgangziegler.net/Drupal-8%3A-Approaching-Content-and-Configura....
On Friday after the code sprint, myself and several other people (I think it was crell, eaton, eclipsegc, fiasco and sdboyer?) talked more about the entity ideas. Some random thoughts from that discussion.
- Entities as currently implemented are too heavy.
- It would help if entities could have properties as well as fields.
- Entities should be real classes with an interface. This would provide some the ability to provide, for instance, $entity->unique_identifier(), and depending on your use case you could provide a machine name, a uuid, or even a remote ID if you were based on remote data.

The interesting thing for me is how all these ideas collide and mesh in many places. For instance, David's canonical disk-based configuration is excellent but without UUIDs then the data within it can be corrupted and difficult to push. Something like Howard's Settings API or a more simplified base class for settings could help act as an abstraction layer. Etc. All these are great thoughts but we still lack an overarching vision which I personally think is pretty important.

Lets keep the ideas and discussions going. I'm sure there was more talk that I didn't even get to be a part of last week. Lets hear it!

Comments

Entities for data only

Posted by Crell on March 15, 2011 at 10:00pm

I will reiterate, as I've said elsewhere, that while Greg is correct that we do conflate content and configuration (and a few other things) right now, that's not a good thing. We should try to separate them to the extent possible.

In a way, a new, robust settings API would make that easier. "Configuration" is information that has a machine name and is stored primarily on disk, despite having a cached version in the DB. "Content" is information that has a UUID and probably a local serial (eg, nodes and nids), and lives primarily in the database despite having a standard "serialized" format (in JSON, XML, etc.). And then we can punt for now on the odd cases where they do have to touch.

The astute reader will realize that goes against what Fago is suggesting. That is correct. I am in fact 100% against using Entity API for configuration objects, as it is a fundamentally flawed approach from the start. (It puts configuration primarily into the database, which is bass-ackwards.) I would urge all module developers to not do that.

"One class to rule them all" is not in any way shape or form good architecture, and we should not be encouraging that anti-pattern.

The problem with 'punt for

Posted by gdd on March 15, 2011 at 10:08pm

The problem with 'punt for now on the odd cases where they touch' is that historically this means 'punt forever' and what we end up with is a system that works 75% of the time and leaves the rest in the cold. Not coincidentally, the place it will work best is where the developers writing the code need it most. I don't deny that an overarching framework may be biting off more than we can chew, but I still have a hard time swallowing 'punt it'. We can do better than that and our users deserve it.

Can't solve now

Posted by Crell on March 15, 2011 at 10:25pm

I don't mean "don't bother solving it", necessarily. I mean that we should not get into analysis paralysis and do nothing because we don't have a 100% solution. We have an 80% solution with some "not sure yet about this part" pieces. Let's go with that and see where it takes us.

We may well find once we're deep into it that a solution for the other 20% (or some part of it) emerges on its own. Or we may find that half of that 20% is best solved by rewriting other code to simply not use those crazy edge cases once we have a more robust alternative in place. But we won't know either until we have that more robust alternative in place.

Basically, I'm acknowledging that the case of nodequeue is a complicated one, but we should for now focus on the less crazy stuff and cross the nodequeue bridge when we get to it. That may well make it a smaller bridge than we think.

Ah!

Posted by gdd on March 15, 2011 at 10:30pm

Now this is something I can get behind.

Dangers...

Posted by eaton on March 15, 2011 at 10:46pm

I don't think that Heyrocker is specifically suggesting we should target Nodequeue, rather it's used as a quick shorthand for "the kind of problem that Drupal is currently bad at." Panel variants that trigger on a specific node ID, taxonomy vocabularies used for special purposes, and the canonical 'page node used as a terms of service page, that must be deployed as part of a rollout rather than edited on the live site' are all examples of the same issue.

The challenge is that we've already solved 80% of the problem -- we have since Features came out. The problem is that the remaining 20% is really, REALLY hard without solving the issue of unique identifiers, and cleaning up the format of our content entities. The problem you pointed out in Heyrocker's session -- load/save being polluted by promiscuous addition of runtime data rather than actual persistent data -- is a big part of that.

I've been running into this with a number of clients who've desperately needed real content staging. At present the only solution is to bolt UUIDs on top of core, rewrite all of the *reference CCK fields, formatters, widgets, views integration, references, and so on -- and try to start building on top of that. While that's doable it means that the bulk of the work ends up being heavily custom code that, purely by time and resource constraint, is almost always custom code.

The issue isn't trying to solve all of the crazy edge cases in core, it's looking at the crazy edge cases we already encounter on an ongoing basis with core, and ensuring that the new system we put in place doesn't just rearrange the chairs and leave us with the same problem. If we don't pay attention to the fuzzy grey area that's being pointed out, that is what we'll end up with.

Bundles should flag whether they are content or configuration

Posted by mlncn on March 17, 2011 at 12:23am

While i agree that in general entities should not be used for configuration, in practice i think it is necessary that entities be able to identify what should be content or configuration on the bundle level. While the Entity API makes it unlikely that we will have configuration nodes any more, vocabularies, menus (which aren't yet entities), and undoubtedly other first-level Drupal concepts will never be unarguably all content or configuration. In application-specific settings, however, there's a pretty good chance that this division can be made on a per-bundle level.

Admittedly i'm looking at a very Features-esque workflow -- content creators, editors, and admins doing their thing on live, and developers deploying changes with everything in code -- but i think that's a pretty common situation.

^{benjamin, agaric}

Agreed. While we had some

Posted by catch on March 28, 2011 at 1:44pm

Agreed. While we had some very good discussion about where the Entity API should go at DrupalCon (which encompassed reworking the field API to be able to deal with things like entity properties and a lot of other things), the idea of storing all configuration as entities fills me with dread to be honest.

For a start, even if we shift things like field formatters and widgets down a level to something more like an element, the entity API is still quite a high-level API - i.e. I need to be able to define where my entities are stored, what fields they have etc., and I really, really don't want that configuration to be held in an entity. I'm already regretting making vocabularies an entity in Drupal 7 when they're simultaneously used as bundles - it was good for code consistency (since vocabularies were definitely entity-like in D6, at least as much as anything else was), but very confusing otherwise.

I'm already regretting making

Posted by fago on April 2, 2011 at 1:54pm

I'm already regretting making vocabularies an entity in Drupal 7 when they're simultaneously used as bundles - it was good for code consistency (since vocabularies were definitely entity-like in D6, at least as much as anything else was), but very confusing otherwise.

The point is vocabularies are persistable data objects, so why should we use another interface/API for dealing with them as we are used to? The actual implementation could be still completely different and built as suiting for configuration. Still there might be things we could benefit from - e.g. implement a caching controller suiting for configuration (cache all) and re-use it across all configuration entities.

Entity API -> Data API

Posted by fago on March 16, 2011 at 9:21am

"Configuration" is information that has a machine name and is stored primarily on disk, despite having a cached version in the DB.

Having an Entity API that fulfills the role of a data API doesn't mean we cannot separate configuration and content. That's something we can perfectly do by building further content and configuration specific parts on top of entities. Still, having configuration built on top of the entity API allows us to use machine-names and to store it primarily on disk, it just makes sure we can make use of the same data-related API (=interface) for it. Then, having the common data API for both parts makes it easier for us to deal with the grey area - where the line between content and configuration becomes blurry.

"Content" is information that has a UUID and probably a local serial (eg, nodes and nids), and lives primarily in the database despite having a standard "serialized" format (in JSON, XML, etc.).

As heyrocker pointed out in his talk, I think having a standard "serialized" format for all of our objects helps us for content staging / configuration management. In particular, it enables us to solve the gray-area problem by e.g. writing some glue code that takes some specific content items and handles them as configuration.

(It puts configuration primarily into the database, which is bass-ackwards.)

Well, Drupal works that way already for a long time. That said, I'm unsure whether a only-on disk approach flies. Having stuff only on disk makes it difficult to properly implement the usual CRUD related hooks (insert/update/delete), and urges us to basically re-implement functions a DB provides us with in PHP, like sorting and applying conditions. However syncing configuration into the DB allows us to re-use the existing tools.

"usual CRUD hooks"

Posted by Crell on March 16, 2011 at 3:11pm

That's exactly my point. Configuration doesn't have the "usual CRUD" that content has, and it is wrong for us to force them into the same mold. Content really should have a CRAP (Create, Read, Archive, Purge) model, as discussed in a later Core Conversation, whereas that approach doesn't make sense at all for configuration.

Most meaningful configuration doesn't map nicely to a relational model now to begin with. Instead we have serialized data and lots of inter-related tables that are hard to update. That's already a problem. We need to separate configuration from the constraints of content, not tie it further to it.

The Search API module is basically being rejected as the direct basis for D8 search, mostly on the grounds that using entities for all of its configuration objects is fundamentally the wrong approach. There are better approaches. Let's use them.

Configuration Storage

Posted by eclipsegc on March 17, 2011 at 4:05am

To hop in on the side of entities NOT as configuration...

Most of our configuration storage to-date is variable based in nature. True that doesn't fly for views, page_manager, etc (search_api, rules) but all of these are attempting on disc storage options of some sort. As I pointed out numerous times in Chicago, ctools export_ui provides a crud system for EXACTLY this purpose already. In fact it's well formed enough that views will be switching to it with the new UI release (whenever that happens, but there's a branch on the views project already). The point is that our configuration system is essentially a "CRUDE" system for most of the popular modules (Create Read Update Delete Export) and that model has served us well. David's suggestions essentially move us to a "RE" format where we export (overwriting) the existing configuration (or creating it for the first time... same mechanism) or we read it. This is fundamentally different than what entities is attempting to do and one more example of why we SHOULD NOT be using entities as configuration.

On the topic of Search API for a moment... having used it a TON it's sad to me that it's entity dependence is having this effect. It's a really good tool.

Eclipse

This is fundamentally

Posted by fago on March 17, 2011 at 9:34am

This is fundamentally different than what entities is attempting to do and one more example of why we SHOULD NOT be using entities as configuration.

I think by implementing exportables in the entity API module and modules using it successfully I've shown that there is no problem with implementing exportables based upon entities. Also, I see no difference in "RE" as it just means when you update your configuration object you'll have to export and write it back to disk. So, still we'd need the functionality to save/update the configuration object, what means developers still have a CRUD like interface to it regardless what it does in the back.

That's exactly my point.

Posted by fago on March 17, 2011 at 9:36am

That's exactly my point. Configuration doesn't have the "usual CRUD" that content has, and it is wrong for us to force them into the same mold. Content really should have a CRAP (Create, Read, Archive, Purge) model, as discussed in a later Core Conversation, whereas that approach doesn't make sense at all for configuration.

CRUD is the way developers think. In the entity API bof we discussed that we want to keep ususal CRUD API functions, while internally the system can do CRAP. While we could opt-out from that for configuration, I think having an archive of configuration (changes) would be a valuable feature we'd get for free.

Most meaningful configuration doesn't map nicely to a relational model now to begin with. Instead we have serialized data and lots of inter-related tables that are hard to update. That's already a problem. We need to separate configuration from the constraints of content, not tie it further to it.

If you don't care just store it in a serialized way in DB either. Still, entity storage is pluggable anyway.

The Search API module is basically being rejected as the direct basis for D8 search, mostly on the grounds that using entities for all of its configuration objects is fundamentally the wrong approach.

Interesting, this was not mentioned at all in the d8 search talk.

CRUD

Posted by fago on March 17, 2011 at 6:58pm

In addition to that, we need the CRUD related hooks (or at least similar functionality) for the field API integration. Consider configuration that serves as field api bundle, e.g. node types. We need CRUD related hooks for that configuration in order to be able to call the field API bundle attachers (field_attach_*_bundle()).

Also, having the bundles implemented as entities would help us cleaning up the field-api <-> entity api interface by marking an entity type to be a bundle of another type. That way we could drop the bundle related information of hook_entity_info() (bundles have a separate entity info now) and just rely on the usual CRUD hooks instead of having to invent additional field-api centric ones which are called by the attachers.

Field API would probably

Posted by pounard on May 4, 2011 at 12:41pm

Field API would probably benefit from being part of entity system. It then won't need hooks anymore, and probably would be more readable for common developers.

Pierre.

Hoorah!

Posted by eaton on March 15, 2011 at 10:16pm

Really digging the writeup, here. The config management space feels like it's been percolating for a couple of years and it's exciting to see the different discussions converging. Although I've been on board the Features bandwagon for a year or so now, and am one of the strong proponents of separating configuration from content for sanity's sake, I think your presentation made a very very strong case.

Ultimately, as long as Drupal provides a GUI for configuration we're going to deal with the fuzzy line between config and content. You hinted at it in the writeup above when you mentioned the problem of config data being 'corrupted' by content data, but it's really worth repeating and trumpeting.

Even if we manage to draw a hard and fast line between configuration data (like the layout of a panel, or the cache settings for a site, or the path of the front page), we have to deal with the problem of references to content inside the configuration, like the ID of a node that the panel controls the layout for, the taxonomy id that controls which stories appear on that front page, and so on. This comes to bite us in dozens and dozens of places on any major site, and I think that if we attempt to solve any of these problems in isolation it will continue to bite us.

Listening on the conversation between yourself, EclipseGC, David Strauss, Crell, sdboyer, and a few others really helped pull the pieces together for me:

The 'Butler' initiative Larry Garfield is spearheading has the potential to help reduce Drupal's overhead for things like serving JSON and web services requests dramatically. It needs a good tool for managing config settings, though, that doesn't bootstrap all of Drupal.
David Strauss' proposed config management API is very promising for storing hard-coded settings -- essentially replacing the variables table, and maturing to replace many other kinds of code and config driven settings as well. However, it runs the danger of being corrupted by the kinds of content references discussed above, and dirtying the vision of clean deployments that make it so attractive.
And that's where the UUIDs and machine names for entities and shareable objects come into play, ensuring that stored/transmitted data doesn't get corrupted with server-specific collision-prone serial IDs. Solving the content deployment problem on top of that core functionality, however, would really depend on an efficient way to transmit stuff from server to server -- like web services, but without the overhead of full Drupal bootstraps, which brings us back to Butler...

One of the exciting realizations for me is that these three complimentary initiatives don't have to depend on each other. They can dovetail together as they mature, but none are explicitly dependencies of the other, which leads to much much easier development paths for smaller teams. That's a big concern for me, as a lot of core initiatives we've taken on in the past were essentially "blocking," requiring everything to track with them until they were complete.

A final point is the issue of entities and the idea of everything in Drupal becoming an entity. The idea that was kicked around in Chicago, of a shared 'Persistable' or 'Thingie' class that is a parent of the 'Entity' class, strikes me as promising. Separating relatively simple 'properties' from the heavy overhead of FieldAPI would be a useful way to abstract things like token generation as well, for stuff that absolutely doesn't need the heft of fields, widgets, formatters, revisioning, and fully normalized storage.

Entity API -> Data API

Posted by fago on March 17, 2011 at 7:00pm

The idea that was kicked around in Chicago, of a shared 'Persistable' or 'Thingie' class that is a parent of the 'Entity' class, strikes me as promising. Separating relatively simple 'properties' from the heavy overhead of FieldAPI would be a useful way to abstract things like token generation as well, for stuff that absolutely doesn't need the heft of fields, widgets, formatters, revisioning, and fully normalized storage.

I don't think we need a layer on top of entities for that, as the entity API can fulfill that if it doesn't already. I know most people think of entities as the stuff they can put fields on, but that's not the case. Already in D7 fields are completely optional - so what stays is the persistable thingie. I'm arguing that the entity API should become exactly that, our data API. And yes, for that it's important that we keep it slim and implement all the other stuff on top of it.

I agree that there seems

Posted by owen barton on March 31, 2011 at 3:50pm

I agree that there seems little point in adding a higher-level layer to entity API - it does not seem heavy in itself, basically a persistence wrapper for "objects + properties" that is extensible by entity type, with the (optional) ability to attach fields.

Great post

Posted by itangalo on March 15, 2011 at 11:42pm

Good stuff to read when you're still jetlagged from the Chicago trip. Good post, good comments.

I was just curious if anybody

Posted by rich.yumul on March 16, 2011 at 5:06am

I was just curious if anybody could point to any systems that has successfully addressed this problem space?

Maybe the answer's 'no', I don't know.

I don't have any to offer up, but I think it would be worthwhile to take a look at if such solutions exist.

Anybody ever work on a system that elegantly managed pushing changes from dev => staging => production?

Rich Yumul
Sage Tree Solutions
www.sagetree.net

elegant? I guess you didn't see Acquia Dev Cloud or Pantheon?

Posted by redhatmatt on March 16, 2011 at 5:57am

Richie Rich... see me on skype buddy. It doesn't get much more elegant, unless your talking testng or something!!!

Not for content problems

Posted by gdd on March 16, 2011 at 8:06am

Those products are amazing for developers, but they don't begin to address the problems of content staging or merging (for instance, you build a new section of a site out and want to deploy it, but you can't just push forward database dumps because you have a ton of new user created content on the live site.) We really need to solve that problem.

More thoughts

Posted by gdd on March 16, 2011 at 8:07am

From Owen Barton whose post I didn't see until today

http://civicactions.com/blog/2011/mar/09/thoughts_on_configuration_manag...

I really like the idea

Posted by dixon_ on March 16, 2011 at 8:20am

I really like the idea of trying to make the Entity API more lightweight, i.e. allowing entities to have only properties. While we might end up with "everything being an entity", I'd argue that the implementation of entities can differ enough for it to make sense. This would allow us to start by implementing a unified export API for entities which would bring us a very long way towards the goal. Let's look at these two examples:

Entity X implements a few properties but relies heavily on Field API with widgets, formatters, revisioning etc. It's possible to render it through a callback (e.g. page with HTML). The primary storage is SQL. The implementation of $entity->unique_identifier() returns a UUID since dealing with a human entered machine name is too bothersome for content-like entities. Read page node or similar.
Entity Y only implement lightweight properties. It's not possible to "render" the entity through a callback. Primary storage is JSON on disk. The implementation of $entity->unique_identifier() returns a human entered machine name since it will be used in HTML class attributes etc. Read some-sort-of-configuration object.

While the two examples above are very different, they are both entities. So I'd argue that making a generic export/import API for entities would bring us a very long way without treating content and configuration differently. This doesn't require the user to bother too much about content vs configuration. But it still allows us keeps our data structure separated and clean.

Win, win? :)

yep, that's exactly what I've

Posted by fago on March 16, 2011 at 9:28am

yep, that's exactly what I've in mind!

So far, git pops up in my

Posted by rich.yumul on March 16, 2011 at 4:01pm

So far, git pops up in my mind as a tool that handles branching and merging really well.

It sounds like with a unified import/export API, content/configuration items that need to get pushed between environments (dev/stage/prod) would somehow get serialized to files.

Maybe we could leverage git in this process, to manage merges and identify conflicts, which would then have to be manually reconciled.

Thoughts?

Rich Yumul
Sage Tree Solutions
www.sagetree.net

Yep

Posted by Crell on March 16, 2011 at 4:08pm

The ability to use $VCS_of_your_choice for deploying configuration is one of the key arguments for moving the primary living space of configuration to files on disk. Once they're there in a VCS-friendly way, the VCS can handle a LOT of the hard stuff for you because they've already figured this stuff out.

My only concern...

Posted by eaton on March 16, 2011 at 5:43pm

I'm skeptical that the VCS tools handle "a lot of the hard stuff" using a system like the one David Strauss is proposing. The hard stuff isn't figuring out how to put things on disk, it's how to draw the line between the disk-persistable configuration and the malleable data that it nudges up against.

The 'hard stuff' that VCS can solve for us is already being solved for us, except in cases where malleable serial-id-polluted data leaks into the configuration.

I consider the VCS to be the

Posted by gdd on March 19, 2011 at 6:01pm

I consider the VCS to be the 'transport layer' part of the problem and honestly if we get to the point where we can actually use any transport layer meaningfully then I'd say our work here is done already.

Perhaps Orthogonal to Configuration Management, but...

Posted by BMDan on March 17, 2011 at 1:10am

Were you at my BoF about injecting databases (or portions thereof) into an RCS? The Coolest Database->Git Hack Ever (I was being slightly hyperbolic with the title, I admit.) The solution I'm proposing is far less elegant than anything being discussed for D8, but it has the advantage of working on all sites, right now, regardless of what version of Drupal they're running, and of not requiring hand-tweaking of what parameters need to be saved on a per-module basis. Plus, it integrates readily into existing SCCM mindsets and methodologies.

I may be off topic, but there

Posted by noisetteprod on March 17, 2011 at 1:54am

I may be off topic, but there is a point that seems important to consider in the future system configuration.
I discussed with FGM during the code sprint and it made me realize that my view was not necessarily very clear! I'll tried to present it in a different way.
I will not return to the ongoing discussion on the choice of technical solution management settings.
I manage sites on different instances (dev> int> prod).
I find it like everyone else need to allow some internal parameters drupal instances are propagated in such instances they have been defined on the dev, and possibly some content.
My sites communicate strongly with external systems in different ways. This requires a lot of configuration and these settings can be different from one instance to another (for example, uses an address for the web service dev and another address for production, but also different accounts)
I therefore find it necessary to isolate all settings related to the environment (dev or prod or int) to help keep these elements during deployment or settings of the values on these different environments by propagation system configuration as "cfengine" or "chef".
I think it might be necessary to introduce a notion of setting "internal Drupal" and a notion of setting "external Drupal" and whatever the system chosen to allow a clear separation of these settings.
The fact of introducing this concept in the API directly, would allow some developers of modules not necessarily aware deployments industrialized apply good practice who make life easier for large systems.

Feel free to tell me if it is not clear, and my apologies for my English ...

I am starting to like the

Posted by gdd on March 19, 2011 at 6:00pm

I am starting to like the idea of the PersistableThingie base class more and more as I think about this. Fago makes a case for this actually being Entities, but I am skeptical as are many others apparently. Regardless of whether fields are optional, the Entity API as it stands now is pretty heavy, and as someone recently said to me 'Drupal has a poor history of slimming things down.' It seems like a better option would be to make a new PersistableThingie base class and extend Entity off of it. On the other side, we extend a class that primarily writes things to disk for the more configuration-y stuff. If we do this, we would want to make PersistableThingies and Entities and everything else true first class objects that implement interfaces (as pwolanin suggested in his core conversation at DrupalCon.) We could have a common function for requiring unique identifiers (UUID, remote, machine name, etc.)

One of the cooler things about this approach is that we can now have a common interface for referencing between Drupal PersistableThingies. Not only can we have a common way for entities to reference each other (like what Relation module is doing) but the configuration-y things can reference the content-y things and vice versa, with a common API. This can make it much easier to start doing dependency management and putting together some really cool tools for deployment of whatever PersistableThingies you might want.

I am really into how this creates a common interface, and yet allows us to implement many of the topics discussed throughout DrupalCon and above as well.

Please! Keep talking!

Interfaces, interfaces, interfaces

Posted by Crell on March 19, 2011 at 9:11pm

I am not yet convinced that we can have an uber persistable-thingie concept to rule them all, nor that it is even a good idea if we could do so. That creates a dependency across the entirety of Drupal, and I have been pushing very hard for Drupal to be less interdependent, not more.

That said, were we to go that route I encourage everyone to expunge the phrase "base class" from your vocabulary. It would instead be far better implemented as a PersistableInterface, which classes could implement if appropriate. That could be at the Entity level (all entities), the Entity Type level (Nodes or Users), at the Plugin configuration object level, or wherever. Subclasses in that case (and in most cases) should be viewed as a convenience hack, not as a declaration of functionality. It should be possible to do anything without ever extending a class, ever (just perhaps with more code duplication than you'd like).

Then we'd have some sort of Persister class (or rather, classes that implement a PersisterInterface) that can persist anything that is Persistable.

I'm still not convinced that we can have a universal persister system, but building around interfaces will make it much more likely that we can pull it off and/or allow the flexibility for someone else to pull it off in contrib.

Agreed

Posted by fago on March 22, 2011 at 12:49am

We definitely need to build this upon interfaces. Base classes are a useful tool and providing a base class makes for sure sense, but I agree with you that we should not require using it - but require the interface.

This sounds promising to me

Posted by catch on March 28, 2011 at 2:14pm

This sounds promising to me as well, and it's the dependencies issue that concerns me most about 'everything as an entity'.

I don't think the dependency

Posted by fago on April 2, 2011 at 1:42pm

I don't think the dependency issue is a problem. We can solve that (and workaround it like we do now).

Right now, the only weird thing with everything as an entity, comes with that that you have to include information about bundles in the entity info. With having everything built as an entity, that is just duplicated information, so a pointer or callback to the data type serving as bundle is enough - and dependency problem is gone.

I'm still not convinced that

Posted by merlinofchaos on March 28, 2011 at 7:00pm

I'm still not convinced that we can have a universal persister system, but building around interfaces will make it much more likely that we can pull it off and/or allow the flexibility for someone else to pull it off in contrib.

I'm not sure this argument matches reality. Here's what we already have:

Entities are the 'heavy' persistable object with lots and lots of UI goo associated with them. They exist, they work, and we know there are issues with them.

Right now, export.inc sorta resembles what a PersistableInterface would be -- it's not an exact analogy because it has database dependencies that PersistableInterface would skip. What it does have, though is a consistent set of CRUD functions for reading objects that are not necessarily content. Right now it's used for Views, Page Manager pages, CTools' reusable layouts/content types/access rules, and in contrib LOTS of modules are using it for things which are more about configuration than not, and usually need to be configured through the UI.

I think the base idea is the same, even if in core the implementation would likely be very different. I also don't think there'd be a requirement that every persistable thing ever would use it. But if we have it, it amounts to a 'lightweight entity' where we provide as little as is necessary to do basic CRUD operations on it. Going with David's idea of non-database storage, one of the basic implementations stores directly to files so it can work in a low level environment. Basic storage of objects is well understood, and if we do it right, it handles most use cases we can think of. Where it breaks down, usually, are really complex objects that have lots and lots of data spread out...and those are more likely to be 1) Views, and 2) Entities. Views, obviously, can be handled by the system I'm thinking of, and Entities can pretty much handle themselves using field api.

In any case, I think it's a wonderful place to start:

1) Draw up the basic CRUD interface we need. IMO it's: Create, Read, Update, Delete, Clone, Export. I also have enable/disable as a basic but optional CRUD feature, since it's very commonly needed when you're providing these objects via code, turning them off without deleting them is valuable.

2) Figure out what the basic drivers look like: Non-database, database-only, database+non-database.

3) ...

4) Profit.

If we design the interface right, we build supporting this interface into the entity spec.

interface EntityInterface

Posted by Frando on March 30, 2011 at 12:55am

interface EntityInterface extends PersistableThingieInterface

Yes. I like this a lot.
We can discuss what belongs where and where we "split" functionality between the two interfaces and default base classes, e.g.
on PersistableThingy "get some kind of unique id", basic CRUD, export, and optionally caching and render (or build)
on Entitiy revisions, archive, purge and support for bundles and fields

Some things we discussed in the entities in D8 bof would then possibly move towards PersistableThingies, but only the really lightweight parts, e.g. maybe a basic accessor to uniformely access properties (or maybe not because it's not needed for PersistableThingies, I don't know yet).

Then, if other code doesn't care whether it deals with a PersistableThingie or an Entity, it can just check for instanceof PersistableThinge for CRUD and be done.

yep, this

Posted by fago on April 2, 2011 at 1:38pm

yep, this "PersistableThingieInterface" really makes sense and is exactly what I'm advocating for.

As all other stuff on top of that is for entities optional (like fields, forms, ui, ..) I think the CRUD stuff is what right now entities really are. But well, in the end this are just names. So having a PersitableThingie API and an Entity API building upon it would be definitely a way to go.

Regardless of whether fields

Posted by fago on March 22, 2011 at 1:00am

Regardless of whether fields are optional, the Entity API as it stands now is pretty heavy, and as someone recently said to me 'Drupal has a poor history of slimming things down.'

I've heard that multiple times, but I still wonder: In what regard is it heavy? What makes the current API heavy? I really fail to see that as pretty much everything except loading and hooks is optional.

It seems like a better option would be to make a new PersistableThingie base class and extend Entity off of it.

I think that is basically what I'm proposing, just with different names as I'm basically calling the PersitableThingie "Entity". If we go and introduce two separate names, I guess we need a good definition what the differences between both concepts are so developers know which one they should pick.

One of the cooler things about this approach is that we can now have a common interface for referencing between Drupal PersistableThingies. Not only can we have a common way for entities to reference each other (like what Relation module is doing) but the configuration-y things can reference the content-y things and vice versa, with a common API. This can make it much easier to start doing dependency management and putting together some really cool tools for deployment of whatever PersistableThingies you might want.

Exactly!

Perhaps it would be

Posted by owen barton on March 31, 2011 at 4:19pm

Perhaps it would be worthwhile writing up an uber-simple entity example (a couple of properties, no UI etc) for people to examine? There is one being worked on for example project at http://drupal.org/node/893842, but that is a bit more feature complete.

I think most people think of

Posted by fago on April 2, 2011 at 1:46pm

I think most people think of entities of things with fields, forms and UI as they don't see this all is optional and doesn't make an entity an entity. So I agree, such an implementation could really help to show what we are talking about.

Simple config issues

Posted by yktdan on March 28, 2011 at 6:13pm

There are a few really simple config items that really cause troubles if in the database. Things that are properties of the machine it is on, not even what version of Drupal it is running. Things like the Google Analytics API key or the Google Maps API key, or the Mollom key. To create a clone (dev or staging) I want to just copy the database and not have to fiddle with it. But the clone won't work if it has the wrong API key in some cases, or pollutes the analytics, etc. Handling these gives some concreteness to the abstract ideas above. The db name, user and password, currently in setting.php is another simple example.

local settings

Posted by fidot on March 30, 2011 at 4:13pm

For projects that I have led, we have always addressed these type of settings in a local-settings.php (different contents for each environment) which is included from the normal settings.php.

HTH
Terry

One of the main ideas here is

Posted by owen barton on March 31, 2011 at 4:25pm

One of the main ideas here is that these items can optionally be stored on disk (and cached in the database). David proposed a concept of "layering" overrides of the on-disk configuration, so you could perhaps have environment specific configuration override site specific configuration, which would in turn override install profile configuration, which would in turn override the defaults.

Some random thoughts

Posted by drupal4media on March 28, 2011 at 11:05pm

Definitely an interesting topic being discussed here.

I think it would be useful to work on a sample application as a proof of all these complex concepts and as an example of best practices for people entering the Drupal world. I'm thinking about putting together some fictional "user stories", developing the application and moving it through the different environments (dev, sit, uat, prod, etc) with the right tools (phing, drush make, other/s?). It should involve content, configuration and the most well known things in the middle (nodequeues, panels, etc).

BTW, any change to the Drupal update system (hook_update_N) would be covered within this effort? I'm thinking of some concepts from database refactoring tools like Liquibase that could be really useful (contexts, change modification detection, etc)

My notes and ideas from the sessions

Posted by owen barton on March 31, 2011 at 3:34pm

This is from a blog post I wrote during a jet lag related early waking at Drupalcon (before these notes were posted). The post captures a number of details around the sessions, and how Greg and David's ideas mesh together. I like Fago's entity-ification idea a lot, if we went this route you could replace "data" with "entity" in most of the below.

This is a sketch of how some of the ideas discussed in the core conversations configuration management sessions yesterday might look in practice. Obviously this is just one idea and there are plenty of other valid approaches (overall of for any specific piece) - we don't want to get too specific yet. For example, some of what I describe could probably happily live in contrib. I think my big picture point is that the approaches discussed are not mutually exclusive, and in fact we probably need bits of each of them.

As Greg Dunlap proposed, lets say we added a system of adding reliably unique IDs to "everything" (within reason, but I think content should be included here). These unique IDs should be treated as the canonical IDs. This could look like the machine name field in many cases - sometimes this would be hidden from the user, sometimes there would be other methods of determining this ID (manually or programatically).

We still create/store the numeric IDs as the data is in the database, but store them alongside the canonical ID. The numeric ID is used for primary keys and most joins in the database (for performance reasons), but the canonical ID is king in the area of data existing in any context outside of the database. There is a layer that carefully maps the numeric IDs and references on CRUD operations.

What does this mean so far:

It allows us to track both the data, and the references to it, across multiple sites or site instances - we no longer run into the auto-increment ID problem.
Data numeric IDs no longer need to appear in the user interface, we can use these identifiers in the default system paths, making the UI more consistent, but also ensuring URLs are consistent and robust across sites (otherwise links to /node/123 URLs would break anytime content is migrated). Of course we would still need some kind of pathauto system that adds more structured aliases reflecting the information architecture, but this would be a layer on top of the system paths.
Data numeric IDs no longer need to appear in our web facing APIs - a good RESTful interface should not concern the client with details of the internal implementation. Right now we are pushing the tracking/mapping of IDs across sites out to the API clients, which makes a good number of feeds/deploy type systems highly complex and fragile and time references are pushed.
With canonical IDs that match the system paths you could do a JSON GET on the path on one site, then a PUT of the same data to the same system path on a different site, and the system would automatically know if this was a create or update operation based on the ID. This means that the deploy or feeds module use cases could feasibly be handled by a system completely outside of Drupal, which I think is a great benefit in some cases - for example systems (which may be non-Drupal based) responding to external events can make CRUD operations on a site but can also do easy push/pulls of data from one site to another.

So in addition to this, lets say we have a hierarchy of well structured, diffable JSON (pluggable, etc) files. These can represent a canonical, declarative and optional representation of data in the system. Hopefully it is obvious that the canonical unique IDs could greatly simplify the implementation here, as well as making diffs and tools that work with this representation of the data much simpler.

There are a few things (several of which David Strauss hinted at) that I think are important to consider about how this would most likely work:

The "cache" of this data that is pulled into the database on refresh is only a cache in the sense that it is not the canonical source of this data. It is not implied that these caches need to be in the form of blobs of serialized php in the cache table - in many cases it will be necessary to represent these in more structured tables that can be queried in useful ways.
Also, these caches should never expire individually - it is critical that when the caches are refreshed that this is an atomic operation. Hence the entire set of files needs to be validated to ensure it can be handled by the current code-base, and all the changes need to be made at the same time. If caches are updated in a more ad-hoc way it becomes much harder to manage the ID mapping (even with canonical IDs) in a way that ensures the end result will function - this is especially the case with complex configuration-oriented elements. Refreshing the caches would likely be a user initiated operation (possibly via a deployment system) in most cases, because the refresh will need to be timed with a source code update that references the changes.
While it is a big step, I think it should be possible (at least architecturally) to represent any data in this form, even stuff that is clearly content, such as nodes and users. This does raise some issues of course, such as what happens if content is stored in the files that references content that is not (my guess is that we recursively store referenced content too) but this is an issue that will occur even with storing configuration - e.g. views or even the front-page variable can reference tid and nids etc and even switching to canonical IDs doesn't ensure those elements exist on that site.
Another potential concern is that it would be horrible for large sites to have to store all content in this way, so it does need to be optional - I imagine a system of excludes (and probably most entities would be excluded by defaults) as well as perhaps an "opt in" for specific items.
This does imply some UI changes which will need careful thought - while I think a features style update/revert pattern will make sense for some sites, for other sites you may want to lock items (read only files, perhaps), and of course you have the requirements for bundles, multiple levels of overrides (code < distro < site etc). Even if much of the UI for this may be in contrib, I am pretty sure there will need to be some changes in core to allow users to intuitively control and understand this process.
I think it may be worth considering adding some level of self-descriptive structure to the CM storage code (JSON, YAML etc). This would make it much easier to build tools that work with these files directly - such as making them easier to edit directly, or providing more contextual diffs.

I think we made some real progress around this issue yesterday, more than I have felt in previous Drupalcons - and I am excited to see where this takes us in Drupal 8. Lets keep moving the conversation forward!

I am going to chime in, only

Posted by Bojhan on April 2, 2011 at 12:50pm

I am going to chime in, only for a little bit :). It seems this initiative is going to help our developer audience a great deal, and I am glad to hear all the high level thoughts. It's hard for me to grok what touches the UX, but from my understanding there is one fundamental UX part that it could:

Configuration revisions: The ability to "undo" changes that where made, similar to the Gmail undo function. From a pure UX ideal, you want to allow users to easily "fix" mistakes they made and make it not a big deal for them to make mistakes. This puts a lot of faith in the system - that it's there to help them, even if they do something wrong.

It's unlikely the UX team will fully follow this initiative due to its technical nature, however timely pings when fundamental decisions are made that affect the UX - we would appreciate a great deal.

In some contexts, database config is preferable

Posted by dustin@pi on April 11, 2011 at 9:41pm

I am currently working for a small NFP, but in my previous "Enterprise" job db config was preferred over file based config. In that environment we had 4 UI servers, 4 web services servers servers and an active/passive pair of batch processing servers, this environment was mirrored in our fail over data-center.

Reasons why db config was preferred:

an db update was automatically visible to all 10 servers in the environment
db config was instantly mirrored to our DR data-center (sub-second delay), but file changes would wait for nightly rsync processes
A update to a production file required a two day lead time with the outsourced hosting company that managed our servers, while a db update required a 2 hour lead time with our internal dba's

We still had to deal with moving settings between dev->test-> pre-production -> authoring and -> production environments, but that was little compared to the hassle of trying to update every file, etc.

I'm not saying that moving config to files isn't the right goal, but having a pluggable storage for configuration has value in some environments.

Notes from DrupalCon Chicago 2011

Comments

Group organizers

Group categories

Tags

New groups

Group notifications