Getting off the island - Research into other systems

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

I have been conducting some research into how other systems are organized and manage their configuration and content deployment. There are plenty of systems that don't manage this well or whose problems are similar to ours, but I've found a couple places that have some interesting approaches that I am going to start adding here. Please note that my experience with these systems is limited to several hours of research each, and that is just barely enough to be scratching the surface, especially since many of these systems are so different architecturally. If anyone with more experience finds inaccuracies or has more to add then I encourage you to do so. Additionally, if anyone has other interesting systems to check out, please let me know or add them yourself!

The one commonality seems to be that these systems have a very hard line drawn between what is content and what is configuration. This obviously reduces their challenges. Many of the systems I have looked at also implement UUIDs for content references and other uses. Neither Plone nor Alfresco use an RDBMS (Plone uses the ZODB object database, and Alfresco stores content on the file system) so many of the concerns we have about performance don't apply (or are at least different.)

For now this is just somewhat of a note dump, will try and clean it up later but I went to start pushing out what I was learning as I learned it.

Plone

Architecture Diagram
http://plone.org/countries/conosur/articulos/Plone-Infrastructure.EN.svg...

Plone 4 User Manual and a reasonable overview
http://plone.org/documentation/manual/plone-4-user-manual
http://plone.org/documentation/manual/plone-4-user-manual/introduction/c...

Collections are Views. Portlets are blocks.

Usage of UUIDs for content reference
http://plone.org/documentation/manual/plone-community-developer-document...

Developer Manual
http://plone.org/documentation/manual/developer-manual

Data Models
http://plone.org/documentation/manual/plone-community-developer-document...
- Three schemas aka objects - persistent data, form data, config data. All extend the interface class.

Alfresco

Some high level feature overviews
http://ecmarchitect.com/archives/2009/08/31/1038

Community Docs
http://www.alfresco.com/help/34/community/all/

Architecture Overview
http://wiki.alfresco.com/wiki/Alfresco_Repository_Architecture
By default, Alfresco has chosen to store meta-data in a database and content in a file system. Using a database immediately brings in the benefits of databases that have been developed over many years such as transaction support, scaling & administration capabilities. Content is stored in the file system to allow for very large content, random access, streaming and options for different storage devices.

A lot of good info here about how their DM deployment stuff works under the hood
http://wiki.alfresco.com/wiki/Transfer_Service

Specifics about resolution of remote content
http://wiki.alfresco.com/wiki/Transfer_Service#Where_transferred_nodes_a...

UI side of the deployment
http://www.alfresco.com/help/34/community/all/tasks/gs-wcm-publish.html

Content replication screencast
http://blogs.alfresco.com/wp/webcasts/?p=1261

  • Can push content from one source to multiple 'transfer targets'
  • Remote content is read only
  • Can specify batches of data to go to each target, can be different per target
  • All defined via UI
  • Pretty similar to Deploy, can select folders of content to send, scheduling
  • Replication jobs run in the background
  • Replicates deletions too
  • Entire transfer is a single transaction and if any part fails the whole thing rolls back

Take-aways
- Very strong line between content and config
- Implements three key foundation services on which everything is built (Node (config), Content, Search), all share the same transaction, security and configuration characteristics.
- They are using UUIDs plus some other identifying information, but internally they also appear to be using int IDs for primary and foreign keys

Zend Config

Zend Config is a set of class provided by the Zend Framework for managing configuration information. It is pretty interesting and provides a lot of concepts we can probably steal and use. These classes are available in both ZF1 and ZF2, and the 2.0 remains pretty much unchanged to my eyes from the 1.0 version, except for the addition of Writer classes (more discussion below.)

At its most basic level, Zend Config provides an OO interface to associative arrays. The Zend_Config object takes an array in its constructor, and stores it internally. This class is Countable and Iterable. It then provides a set of functions to interact with it. For instance, you can use magic get() and set() functions but it also implements its own internal get() and set() functions to allow more options (for instance, the internal get() function allows you to specify a default just like variable_get() does.) If you pass in a nested array, then the constructor will recursively turn it into a tree of Zend Config objects, with one object at every branch. These are chainable so you can easily do things like

$foo = $config->views->advanced->disable_cache;
$config->views->advanced->disable_cache = TRUE;

By default the config objects are read-only, but this can be changed when the object is constructed, and it can be changed as granularly as you want (you can make Views' advanced settings be read-only while leaving the rest of them read-write.)

In addition to this basic operation, there are classes that extend the Zend_Config base class in order to be able to hook it up to files of different types. For instance there is Zend_Config_Ini, Zend_Config_JSON, etc. These are constructed by passing a filename instead of an array, but otherwise they behave the same way, simply adding the formatting and reading information necessary for the specific formats. Initially there were no classes to later write this info back out to disk, but there is now a Zend_Config_Writer class with extended classes of the same types as the readers.

One really interesting aspect of this class is that you can provide granular 'overrides' to specific values. For instance, take this sample INI file (stolen from the Zend wiki at http://framework.zend.com/wiki/display/ZFUSER/Zend_Config+Example):

; Staging environment
[staging]
host = staging.example.com
db.type = pdo_mysql
db.host = localhost
db.username = someuser
db.password = somepass
db.name = somedb;

Production environment inherits values from staging environment and
; overrides values that are specific to the production environment
[production : staging]
host = www.example.com
db.host = db.example.com
db.username = anotheruser
db.password = anotherpass

You can now do the following:

<?php
$config
= new Zend_Config_Ini('/path/to/config.ini', 'staging');
echo
$config->host; // prints "staging.www.example.com"
echo $config->db->host; // prints "localhost"
echo $config->db->name; // prints "somedb"

$config = new Zend_Config_Ini('/path/to/config.ini', 'production');
echo
$config->host; // prints "www.example.com"
echo $config->db->host; // prints "db.example.com"
echo $config->db->name; // prints "somedb"
?>

These overrides can also nest multiple levels, extending off each other as far as you like.

A final cool feature is that Zend Config provides merge functionality. You can take two Zend_Config objects and merge them. Any non-existent items from the array being merged-in will be added, same named items will overwrite. This can provide some really useful functionality in terms of overriding functionality between installations while still inheriting most settings.

Links:
Zend Config DocumentationZend Config Writer Documentation

Some Examples and Tutorials:
http://devzone.zend.com/article/1264
http://zendgeek.blogspot.com/2009/07/zend-framework-zendconfig-examples....

Source
http://framework.zend.com/svn/framework/standard/trunk/library/Zend/Conf...
http://framework.zend.com/svn/framework/standard/trunk/library/Zend/Config/

Comments

A bit of Zend

pounard's picture

You should look at Zend_Config (Zend Framework component): http://framework.zend.com/manual/en/zend.config.html

It's a generalization of configuration storage/reading API. It has many advandages such as the fact that whatever is the backend, configuration has a single interface, and any specific implementation using it can manipulate it, store it, load it from/to anywhere.

The good thing about this is that I had extensive chats about this with a colleague of mine which is a Zend expert, it fits well with PHP limitations as soon as you cache your config objects wisely when needed. By extension, any "exportable" object could only be a object able to give its own data mapping object (a mapping layer based on the config API) that could be easily stored, serialized or whatever (generally speaking, manipulated using one API to rule them all) using any of the existing writer/reader components.

EDIT: Not posting it in the wiki page, my description is far from complete I'd let you look deeper into this. I'm available for a good chat about this if you want some day, I did a lot of talking with my colleagues working with different technologies.

Pierre.

Thanks

gdd's picture

I hadn't seen Zend_Config before but it looks pretty interesting and I will definitely look into it more.

I didn't do any research

pounard's picture

I didn't do any research here, but I'd be pretty fond of a system like gconf/dconf on UNIX/GNOME. They are basically hierarchical configuration registry, based on schemas given by each piece of software themselves (in Drupal pieces of software are modules), where each value is a scalar value (or a list of scalar values).

Zend Config manipulates data that is organized this way.

The basic principle of having a mapping based on scalar values only is the import/export facilities, and the easy advanced UI development that could be done by implementing a windows/dconf/gconf registry like page (with some bits of AJAX) where all the site configuration would be.

Using a hierarchical model for configuration/schema/profile/structural data storage would be to provide a highly comprehensive browsing through variables and easy site advanced configuration skipping all the hideous configuration screens (most module developers are really bad in designing UI's). It also so easy to merge/override at runtime/export/import/etc..

Even exportable objects (which are not always site configuration) could benefit from such API, each object would have a mapping of hierarchical scalar values, easily exportable (and this would fit with butler configuration objects too).

For a D6 project, I did this: it's some kind of registry like site explorer. Not tied to any schema but with a full AJAX UI (quite useful).
I also did this for fun, it's a Zend Config port adapted to Drupal 7 with some bits of code for import/export/exportables managing based on those hierarchical schemas.

Sorry, this is not really research, but that's more like a synthesis of what I could have seen here and there.

Pierre.

This is a good discussion to

catch's picture

This is a good discussion to be having. I've been looking into apc_define_constants, hidef and chdb, not in relation to configuration management, more their original goal of replacing define(), just started working on this yesterday over at http://drupal.org/node/334024

It looks like people are using those extensions for more general configuration storage now, so it'd be good to build something which is compatible with using those as a backend (or explicitly rule them out). Limiting values to scalars would definitely make this more viable.

Limiting values to scalars

gdd's picture

Limiting values to scalars seems like its going to be a tough sell, especially for more complicated configuration options (think exported Views and the like.) It is possible they may have to work in their own system or something though, and keeping systems like apc_define_constants in mind is definitely worthwhile.

Zend Config

gdd's picture

I just added my Zend Config findings above, but suffice it to say that I think it is a really interesting architecture, and I think there are a lot of pieces of it we could find super useful given our use cases.

Of course it is :) I really

pounard's picture

Of course it is :) I really would like an open discussion with you I have a lot of great ideas about making a fully featured mapping layer using this kind of helper.
I already did write some thoughts here.

Pierre.

OpenCms

matthewv789's picture

The Java CMS OpenCms (http://www.opencms.org/en/) takes a completely different approach to most others. I don't know that it's a good model (certainly not for Drupal, which is too different to really reconcile with it), but it's an interesting perspective.

First, its database is separated into two sets of tables: offline and online. Each of these can be viewed as its own site from a different subdomain, but assuming all content has been published, they will otherwise appear to be identical. Offline includes versioning of all content (past versions as well as any latest version that's not published yet), including built-in diff. Online contains the currently-published state. Every resource is unpublished when first created and saved or when a revision is saved, and can be "published" through the web interface (which can be scheduled), which copies the latest revision of that piece of content to the online db tables (it doesn't save more than one unpublished version). Built-in roles can restrict "publish" access to certain users, etc.

They (Alkacon) offer a paid replication package which allows connecting two installations, so (as far as I know) content published from one OpenCms's "online" database will be replicated into another's "offline" database (that is, publishing content in one OpenCms will automatically push it to the unpublished state in another), though we've never actually implemented this. Another use is for load-balanced servers.

Also, EVERYTHING is managed through the web interface and stored in the database (including all JSP template files, css/js files, images and other media files, etc.), though there is of course effective caching. This means they're all editable from anywhere through the web interface, and they're all version controlled so you can easily diff or revert. So while this doesn't separate out a dev environment from the place where content is entered, or help store changes in external VCS text files for easy tracking and export/import, it does let you upload and preview updated images without publishing, as well as play with template/css/js changes etc. without worrying about affecting the live site, then publish the changes when ready just like any other content, or revert to an older version if needed. (It also allows you to programmatically generate CSS and JS files using JSP, if you like...)

The admin interface is an entirely separate back-end, which shows everything in the site in a standard hierarchical folder structure that you can expand/collapse or navigate through. The folder/file names are the literal URL paths that will be used to access that content. Adding a new page to a section is a matter of navigating to that folder and creating a new page file (and you can select from any available template while doing so, or change templates for that page later); to create a new section, just create a new folder. (The pages can also have an arbitrary number of regions defined which can be included in the templates, so that CCK-like capability was also built in.) So that is very intuitive to anyone accustomed to standard directory structures.

In some ways having all this built in (though with not much else to recommend it as a CMS) kind of spoiled me when I came to Drupal. I just assumed those were basic features a CMS should have (more or less), but which Drupal almost entirely lacked (though so do WordPress, Joomla, and most others...).

Deployment & Build Systems & Change Management

Group organizers

Group categories

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week