Using fields (and maybe entities) for configuration in Drupal 8

Gábor Hojtsy's picture

Judging by the low number of comments (zero AKA nil) on my post on how custom user editable data in Drupal 7 modules are now localized and my commentary and Drupal 8 proposal there, I've decided to reformulate the basics of the ideas explained there with more of a focus on high level architecture questions. Since I have not seen any planning yet on the UI of configuration for Drupal 8 (while the underlying configuration storage and handling discussions happen actively), but the UI portion is very important for the multilingual initiative, I'm cross-posting there too.

There are 14 submodules in the Drupal 7 version of Internationalization (see Most of these modules do one or both of two things: (1) implement field-like features for certain Drupal core objects (2) implement entity-relation-like features for Drupal core objects. There is also a (3) which is providing APIs for other contrib modules to do (1) or (2) for their objects. But the code required there is so big that full modules like i18n_views exist to connect the views API with the i18n APIs. Why is it so big?

Fields for all of configuration

Drupal configuration is now done in all kinds of custom ways. Your blocks, your menu items, your contact module categories, your site name all stored in different places, edited in different forms, and so on. For those to be translatable though, i18n module needs to re-implement common pieces of field API. We need a way to mark "translatable fields" on each object (site settings, contact forms, menu items, etc). Then we need to be able to display the translatable fields in a form (using their "widgets"). Then we need to validate these fields with validation that makes sense for them (taking into account text formats and any custom validation that might apply). Then we need to store them somewhere. And finally we need to tie in to their "formatters" and use the right language value when display happens. Right, these are all fields concepts, and i18n module needs to implement all these for "object properties" such as site name, views empty text, menu item title or block body to be translatable. What's worse is that not only these use non-fields for form/validation/storage/display, they all use totally different systems for form/validation/storage/display. So we need pretty sizable code to tie the custom form/validation/storage/display to i18n's reimplementation of parts of the field API. (See for explanation of that API in Drupal 7 and many of its weaknesses).

All of the 4 aspects of fields re-implementations (widgets, validation, storage and display) have limitations as a result, i18n can only support text based field input, no custom widgets, its validation is very limited and sometimes not good for the field type at all, its storage is central (currently uses the locale backend, which is not at all ideal), and most Drupal modules don't let others to hook into display of these object properties, so i18n has a hard time to override all display of these "fields".

Now imagine if all of these configuration pieces would use fields. We could leverage the core built-in field translation features. Because fields have self-contained widget, display and formatter information, and formatters are actually used when the field is rendered, we could reproduce an equal input environment for translation like the original object, validate the value proper and use proper rendering to display it.

Woah, you might say, using fields for configuration is way overkill. Well, i18n module already does it. It needs to work around lots of Drupal limitations but it does need (almost) all the goodness that fields have to offer for these object properties, so if core does not do it, i18n needs to do it anyway. Of course now i18n uses a small subset of fields functionality and uses an entirely different API to do it, but the concepts stand. In fact to solve its bugs, it would need to re-implement much more of fields for these object properties. My gut feel is that now about 60% of i18n module is about this, and it will be a lot more if we need to work around core. Also, a contributed module like i18n can only do this in broken and awkward ways by definition.

Entities for all of configuration?

The other major thing that i18n module does is it supports translation relations of things. Like Drupal core does with translation module for nodes, but i18n module introduces a more generic "translation set" concept, that can relate things to each other as being translations of each other. This works for menu items, taxonomy terms and paths. You can mark 'node/5' a translation of 'taxonomy/term/sunscreens', so when you hit the language switcher on either, it goes to the other one for the appropriate language.

This feature should obviously be extended to blocks, views, etc. so yo can relate different items to each other as being in a translation set (for the cases when you need multiple copies). Now i18n module implements a framework that can relate arbitrary things, so there is no requirement for these to be entities. For configuration in general, @sun explains that we'd not need entities for configuration if we abstract off of Field Attach API: but he proposes that we can skip implementing something like entities for configuration that are equally fieldable if we just use entities.



my 2 cents

fago's picture

We need a common "data API", which defines our data model + provides a common interface to that - then we can build everything on top of it. It should handle storage, but also should be able to validate values, have associated access permissions and provide a way to annotate it with available metadata (data type, description, options list, ..). Well, it is no secret, that I'd like to see the entity API to be our "data model API"...

Then I think we should split field API into three independent parts:
- data model (storage + surroundings like translation, validation and access)
- display
- widgets (edit fields)

The base for everything should be the defined data model, based upon which one can assign make use of fitting displays and widgets - regardless of where the data stems from. That way, the same components could be used from code as well as from the field UI. That way you could define a "entity property" (= a data property defined to be handled by entity storage) and use the same APIs as for a field... So "entity properties" and "fields" would be become equivalent, except that the first is handled by the entity storage and the latter by a field storage system.

I think, to finally solve translations, we need to bake it right into the data API. I.e. mark data properties to be "translatable", but maybe also add a language parameter for entity_load(), cache entities language specific (usually you don't display multiple languages right?), ... Then the storage system (entity or field) would have to care about storing the translatable values properly... That could be a custom db table that is filled from an entity controller, a remote system, a flat file or it could be a storage controller that generically handles storage based on entity property information. It would probably make sense to partly unify the latter with field API storage controllers, such that one can use those generic controllers for both. But well, it's quite a long way to this point given we have not even basic CRUD implemented...

So using the entity API as storage for storing configuration objects would help us to make it follow the same "data model" interface. Thus translation could work the same way as for other entities (in object translation), while of course the storage back-end would have to figure out the details. Among that, we could use the same widgets, display components, .. for configuration objects as else. So I think there would be quite a big benefit in making configuration objects entities. Still, there would be the variable system for which the same storage layer probably? doesn't make sense, so this would need to be covered separately.

Update: hm, I see the point in making even variables using the same system. Thinking about it more, it sounds like an interesting approach, but still the concept of a single variables storage doesn't really match the entity concept - as there will be only a single one forever, so the very basic concept of having an identifier for something you load doesn't match.
But if field API is split up in multiple parts, we could still go and
* implement a separate variable system that is able to store some variables translated
* annotate variables with metadata structured the same way as entity properties, i.e. being based on the same basic data types
* re-use the existing APIs built around for those data properties, i.e. widgets + display components.

That's basically the approach I'm following already with the entity API property info system, which is not limited to entities at all. E.g. the wsclient module makes arbitrary data available, based on data-types and metadata as defined by the entity API property info system. That way, the APIs built on top of that work seamlessly with it, thus wsclient provided data can be directly used with Rules (and I'd like to see it integrated with Views too).


JeremyFrench's picture

I was a little sceptical at first reading this. But I think it is a fundamentally great idea.

It works very well with a couple D8 concepts.

Firstly that entities can have custom storage, so the back end storage for system entities would not have to behave just like a node.

Secondly that configuration management and deployment should be easier. If settings pages are entities then there is a standard way to manage content and settings as they are both just entities. Potentially rolling back settings could be easier too.

The downsides are performance and an increase in the difficulty of creating settings admin pages. But both of these could be managed with a decent api

If you think about the number

catch's picture

If you think about the number of variables that core provides, you're looking at dozens if not hundreds of new fields and instances to support all of those variables.

It's not just the loading of the entities and field data themselves that needs to be worried about - in Drupal 7 all the metadata about every field and instance is loaded on every request - see

So instead of around 200k-1mb for the variables cache as it is now, you could be looking at 3mb for the metadata about the variables + loading the variables when requested on top of that. This won't fit into the current architecture.

There would be some advantages to decoupling field types, formatters and widgets from the field attach system (and each other) so they can be used independently, this was discussed at the DCC entity BOF.

I can also see the case for certain things being converted to entities in their own right - for example contact forms, but that's a different discussion

It would require an

JeremyFrench's picture

It would require an architecture change. But that is what major version are about right?

I think the OP is pointing out that for i18n they have to do this anyway so if it is baked right into core everyone will benefit.

Absolutely not

chx's picture

There is approximately nothing (aside from the need for translation) that fields+entities and variables share. Let's see a few. The variiables are, at the end of the day, using a key-value store. A by-value query is meaningless. You never display a variable as such not to mention several display modes. Entering variables is a not-too-important thing, but for fields widgets are important. You need variables very very early in the process which calls for a lightweight system which can be done as there is a lot less to do with them as above. I am sure heyrocker + gang has more to say on this but we are a bit busy here.

Storage may not change

JeremyFrench's picture

I was assuming that storage wouldn't change, and for the most part variable display would be redundant. But in terms of getting some meta information about variables and getting a logical grouping of variables into some sort of wrapper (via a custom entity type) I could see some big advantages. The field api is a way we have of helping us define such things, if it could be re used to help with translation, and configuration management this could be helpful.

You are right that the basic variable_get shouldn't change (besides perhaps a language attribute), and the storage should still be very simple and or fast. Entering variables and managing them is vital for good configuration management.

variables need widgets, validation, rendering

Gábor Hojtsy's picture

Imagine site variables such as your site email address or your date formats. Now, if you need to make these translatable:

  • you should only let people translate them who have the permission (changing your site email address can have big repercussions on a site for example)
  • they have custom widgets (I assume HTML 5 email field for email address, and date formats already have a pretty custom and complex widget)
  • they have custom validation attached, you need to validate email addresses regardless of the HTML 5 field...
  • when they are "rendered", such as when a date is rendered with the format (I admit its not the format being rendered, but its being used in rendering operations and therefore should be used only in alterable rendering), you need to change the format appropriately; similarly the site email address is "rendered" in hook_mails() and such, which need to use the appropriate value again

So for variables, we'll need to reproduce custom widgets with appropriate access permissions, appropriate validation, and need to hook into the "rendering" of those values when used. That the email address or the date format "rendered" is just used for rendering a bigger component (an email or a node date for example) is just a detail in the concept. In fact, its not at all a detail now in Drupal 7/6, because that its not done with rendering now makes it especially hard to put in the right data. By the time things are rendered, we often work with derivative data.

From the translatability perspective, variables need the same level of detail, metadata about their structure, widgets, validators, formatters, etc for the above reasons. We can of course implement a parallel system to fields that does all these things for variables, because we cannot use fields fields for them. And put that parallel system into core. We'll have two very similar systems then.

In reality, I think we should realize that variables are abused for too many things. Most of the variables are regular site settings, they are not needed early and there is nothing critical in them to not move to regular configuration items. People were just lazy to put that into their own configuration space since variables were an easy fit... If we'd use variables for really only the very basic stuff, we'd have much less issues. Still funny pieces like your site maintenance mode page title and body are there and need to be translated. That needs widgets, permissions, formatters, you know :)

Update: site maintenance mode is also a very good example for permissions. Translators probably should not be allowed to change whether the site is in maintenance (per language?), so that you'd only allow translators to translate the body and title field. That needs individual widgets and validation to work, we cannot just reproduce the form as-is. Also, on some sites, I can imagine people want to bring down one specific language version of the site for maintenance (major content reorganization let's say). Now that we have a permission for people to work in maintenance mode while the site is offline, I'd not say this is a far off assumption. So some sites will want to allow translators to push site to maintenance per language (Drupal 6 + 18n or Drupal 7 + i18n already supports this). See, there's your need for per-field configuration for translation. This is already all done in field translation. And once again, we can reproduce the same feature set with a parallel system that does not use fields but have all these features, and have both systems in core if that sounds like the ideal solution... I'm not sure.

David Strauss's picture

To specifically address Gábor's motivations for using Field API for configuration:

  • The need for widgets and validation is not equivalent to a need for Field API widget/data bindings. Arguing that they're equivalent is like saying you should use Microsoft Access because you need a Windows GUI form with controls and validation. Drupal's Form API is where we should focus on work for widgets and validation because that is our system for building general forms with widgets and validation. If Form API feels inadequate compared to the widget and validation support in Field API, then work should be on making those capabilities available in Form API, not porting more things to Field API. Any i18n requirements for something like a date control are the responsibility of the controls/widgets and not the underlying data store.
  • We should avoid complex widgets for configuration, anyway, because configuration should be available in human-editable text files on disk, where you can't use complex controls to update and validate the data.
  • i18n translation capability is the exception, not the rule, when dealing with configuration. Connection information for the database/Solr, permissions, comment and user signup settings, pathauto rules, field settings, and logging levels are some of the most common configuration items; none of those need translation. Rather, I would say if many of your configuration items require translation, you should be suspicious of whether you're designing things the right way. There are occasional true configuration items requiring i18n, like the site name, but it's not clear why the i18n should happen in the configuration layer (rather than before display to users). Non-i18n configuration and a separate i18n layer for translating items for user display is the standard architecture; I would need to see a very good reason to deviate from that.
  • Formatters should not be generally necessary for configuration. In almost every case, configuration is revealed as a control panel or editing form to administrators and isn't displayed in non-editable form. Configuration isn't like nodes, users, or comments, where a majority number of users have a need to view -- but not edit -- the information. When configuration does display information to users, it's often more complex than what a formatter could generically deliver, like information about filters attached to an input filter.

There are a number of problems with using Field API for configuration:

  • Configuration items are generally scalar values or short lists. These data types map poorly to the storage and configuration model of Field API. We would either need an absurd number of entity types or try to shoehorn different types of configuration in a rigid schema. If we did the latter, we would need to do strange things like use different widgets and validators for different entities in the set in order to capture the benefits of Field API's form that you want. We could, of course, build support for singleton entity types into Field API, but it would be quite a departure from the current design. Field API would become "forms with persistence." It's not wrong, but we would have a lot of work to do on Field API before we could even begin building configuration forms on top of it.
  • Field API should be managed as much as possible by our configuration system. If Field API is that configuration system, we enter the awkward territory of starting enough of Field API to configure the rest of Field API.
  • Field API is heavy (both in code and backend requirements), and we want configuration available early and efficiently. Field API just isn't designed for hundreds of loads on every page request. This performance issue is probably my biggest objection to using Field API for configuration. If we could make the Field API core very lightweight (for reading items), this might not be a problem. Making it "lightweight" would also mean not invoking any i18n for most of the items.
  • Content in the Field API is addressed based on auto-generated IDs. These are either auto-incrementing integers (which aren't portable for the configuration management goal of dev->test->live deployments) or could be switched to UUIDs (which wouldn't be a developer-friendly method for addressing configuration). Maybe with a singleton-capable Field API, content could be addressed using only the entity type and field name, but that's just an idea, not something available now in Field API.

what do we need?

Gábor Hojtsy's picture

Ok, first, you've actually provided some great examples for things that need translation in your "none of those need translation" list. Pathauto rules are one prime example. Your /article/$nid alisaes will not cut it for the French. You'll need different rules for them. Field settings is yet another great example. In fact, Drupal 7.2 got a huge outcry for removing the inappropriately used t() calls for field labels and descriptions. However, field labels, descriptions, default values, allowed values, field display prefixes and so on need translation (provided by i18n module).

For these things to be translatable, we need

  1. A description of your object structure. For variables, the variable.module does this now which i18n depends on. Basically you need to provide useful names for your variables, you need to provide defaults (because they need to be editable before you hit save on a config form, etc). For fields for example, i18n has a whole description array of what is configurable on fields, so it can provide title, description, etc. for translation.

  2. Then we need a UI for users to pick what do they want to make available for translation. From my examples above, some sites want to allow per-language maintenance mode, some will cry out if they hear about the idea. This is a per-site config, and needs to be decided on fine object levels. Think menu items. Some sites will need different menus for different language content, some will need the whole menu translated as-is. So we need a UI for users to configure translatability based on the inventory of the object structures.

  3. Once we have that, we need to provide a translation UI. We need to reproduce the input for those but just those pieces of your object that are translatable, and let you enter stuff for them. We need the exact same widget (maybe it has format controls, maybe its an HTML5 email widget, maybe it has AJAX autocomplete, you name it). Then we need to reproduce the same validation for the translation as for the original value. And we need to be able to produce this two ways. Per object, so you can translate a menu item, a contact form, a site name in place and in overall listings, so you can translate all your stuff to French or Spanish (and see a progress indicator). Or export all of those for outside companies to translate them. And import their stuff back full with respective validation. Of course all need to tie the data ("fields") back to their original objects, and their validation, widgets and so depend on the original object, and maybe its configuration (eg. a French menu will not host English node references).

  4. Ok, we have the data translated, now we need to store it somewhere. We need to be efficiently able to load it when the translation is needed. Unfortunately Drupal can generate content in multiple languages in one request, so it turns out pretty late (right in the final rendering), what language do we need that piece in. Whether it be your site name or your contact form subject line. The contact module in Drupal 6/7 can send up to 3 emails in one request in 2 languages, ok? So we need those pieces of the data some of which come from content, some of which come from configuration to be rendered for use as late as possible and then take the language into account, so we can use the right language version. This is most naturally done if the object knows which of its properties are translatable and can look up the right value. If the object does not have that "field" metadata/information, well, then someone we'll need to tack that on from the outside anyway.

So we need (1) a description of the structure of your objects with every possible translatable thing (2) a UI for users to pick which one will be translatable and this can be different per object type, "bundle", or per object instance even (3) we need to reproduce the widgets for those fields and just for those fields for input + we need to manage permissions to that + validate it properly (4) we need to have it rendered as late as possible so it can be rendered in the right language.

Now this is not at all form API. Form API just works with forms. We need to track the whole lifecycle of the object at hand, whether it be a contact form setting, a site name or a pathauto pattern. We need to track it from input through translation through display. What maps to this in Drupal now? Well, fields.

I fully understand that fields as of now are overkill, but we need this functionality anyway for translatability. If its an afterthought like it is right now with Drupal 6+7, then we need to build up a whole parallel universe to do what field API does for tracking the data structure, translatability configuration, input, validation and rendering, and we'll not get it right however hard we try. If all Drupal config objects don't use a standard rendering format at least, we can't make it work well. I18n for Drupal 7 has this very clearly, it builds its parallel universe. I've had this post with Drupal 7 code examples that got no feedback (because it was long and boring), but yeah, that is the amount of code to make those dead simple contact forms translatable. I hope that can be a good enough reason to deviate from your suggested "standard" practice to just bolt-on the i18n support as an afterthought. If its not done with a system that can do the above, we need to augment it from the outside, and that just does not work. That is definitely not what we aspire for I think. Definitely not my initiative :)

So how can we make the above things work without using field API then? How can we move down the parts of field API that we need to a level where we can handle all these things and there is something else left in field API that we are not going to use? That is our key question then.

I haven't looked at the D7

catch's picture

I haven't looked at the D7 i18n module, but the D6 i18n module certainly did not load metadata about all possible variables used on a site into memory just so it could offer translated versions of some of them. Something approaching this may have been necessary to allow them to be translated in the first place, but translating happened in i18nstrings_t() or similar for runtime. That doesn't make it good, but it is less overkill than the current Field API would be if applied to that task.

I would suggest creating a Drupal site with a few entities and a couple of hundred fields and instances, then add a dpm() at the end of _field_info_collate_fields() and see what it looks like in there. Berdir has created partly due to this problem - and this is a problem before even talking about using fields for non-content stuff.

Also take a look at [#438570]. Converting the node body to a field was something like a 30-40% performance hit in PHP for Drupal 7 - one that is still there.

Apart from all this, there's also the issue that the field API is a module, (and the entity one will hopefully be moved to a module soon), so it is only available at a level much higher than the variables system currently is, and depends on a lot of configuration itself.

While I agree that some things the Field API does don't at all belong in FAPI, let's split it out:

  1. Pluggable storage - we can develop a generic pattern for this beyond field storage.

  2. Formatters - at least some of this could be moved to FAPI.

  3. Widgets - could add a new 'elements' module that concentrates these things, this should eventually remove a lot of the one-off hook_form_alter() we have in core as well.

  4. Attaching stuff to entities - it'd be good to be able to use formatters for things like publishing options without having to make those a field.

  5. Configuration of all these - this is one of the largest pain points in Field API already and needs serious work to make it viable for heavy use with the use cases it currently supports. There is the field info memory issue, but also many, many issues around disabled module providing fields or entity types (see at least two issues in the critical queue).

Also I'd ask:

  • Do you need a display widget for an API key?

  • Do you need a display widget for pathauto settings?

  • Does there really need to be an option to store the user registration e-mail in one storage backend, and the user password reminder e-mail in another storage backend?

  • Does the user registration e-mail need to be attachable to multiple different bundles?

conceptual approach

Gábor Hojtsy's picture

Ok, once again I'm not talking about the current Field API as-is. I've explained in detail above the pieces we'd need from field API for multilingual needs. Yes, an API key does not need a display widget, it might need an input widget (eg. Google analytics module already provides a form field prefix, a description and special validation to ensure your GA key is correct) - I don't believe GA keys should be made translatable, mind you, just pointing out the hidden structure that exists. The site name input widget will not be much special, but it is rendered differently for the website and email and also dependent on language, so it has multiple angles which define how will its rendered version look - even though it looks like a very simple thing, doesn't it? What i18n module does in Drupal 7 is that it singles out settings which need this information and it currently does not even define widgets or formatters for them proper, so it has a very unfriendly input UI for translators. It also provides one-off altering functions for some uses of the values and applies the language changes proper "manually". I'd like to improve this by pushing language-awareness much lower in the stack with the above explained pieces, which will either come from the field API abstracted out or from a parallel system that re-implements many pieces of fields API.

Configuratin managment needs to change.

JeremyFrench's picture

I think that this discussion has a lot of relevance to the configuration management initiative.

My feeling is that for that we will have to look at how to manage settings on a higher level. It sounds like a lot of the push back here is envisaging the existing form api replacing the existing variable system. This will not happen. The variables system has to change to allow better configuration management. I don’t know quite how this will happen, but whatever is done should take account of i18n.

In thinking about how this could be implemented, a lot of field api concepts crop up, so it may present a good opportunity to refactor the forms api and field api. Which there seems to be consensus that needs to be done.

It has been mentioned above that a lot of modules mis use the variables system as a general storage area. Perhaps if we separate out true system settings (like db_url), which are required for Drupal to boostrap and are not likely to need translation. From things like site name, which isn’t needed until much later in the process, it may ease some of the tensions here.

Jose Reyero's picture

The problem is entities are really heavyweight objects. Atm we've got entities and tables with metadata for both but there's nothing in the middle. We need real metadata for stuff that are not enitties but also not only a row in a table. See hook_i18n_object_info()....

So I would propose having some abstract concept of 'object' in the code. This would need to be some lightweight structure (so it can be used for everything) with real metadata that is needed only for editing/translating it, but not for all runtime operations.

Ideally all of these would have the same data type, that is: php objects instead of arrays. And some common known properties, like 'object_type' and 'object_id'. Optionally we could make them 'real PHP objects'. This way we can move these things around and modify them on the fly without too many type specific hooks.

Data that could fit this is almost everything: blocks, menus, menu items, comments, mails, paths, urls, links, long texts, strings?, permissions, roles, actions....

However, the key point is not having too many runtime hooks for them (no hook_load, alter, view etc..), but being able go alter them every time whenever they are on other data structures (hook_page_alter, edit forms, etc...) or being able to render them with drupal_render()

That not to mention possible 'object_load/upate/delete/....' functions that would make most of our modules query-free. And generic CRUD API and UI for 'objects' that would save half of the forms code in modules.

a word from a tired admin, not a troll.

denjell's picture

DISCLOSURE: I am not a core developer, I am a native English coder and I make websites for people that speak more than one language in communities where this is important.

I don't really care what the solution is, but we need a core solution. I am devving a new project, and the list of modules I need to install to configure multilingual support is enough to make me cringe write this to you. Here is my current sketch (not counting things like submodules, variable or views).

  • i18n
  • i18n_views
  • i18_page_views
  • localization_update
  • localization_content
  • translation_overview
  • translation_table
  • language_icons
  • entity_translation -> then run update.php

    • but this uses fields and breaks views
    • or make one view for each language!

I am even considering writing a drush .sh script because this is not the first time this year that I have had to do this. In fact, I have just revised an old joke - but just for you so you have something to think about->

What do you call the country where the people speak three languages: Polyglotia
What do you call the country where the people speak two languages: Bilingua
What do you call the country where the people speak one language: The United States of Drupal

Anyway, I think that Jose has the best approach - use an abstraction that metadata (whatever the type) hooks into. Using English as a default is simply lazy and not a reflection of the GLOBAL reach of the Drupal audience. Furthermore, assuming that people only speak one language is a sure sign of latent xenophobic homogenity. Drupal can be more than that!!!

Its easy once you know what you're doing

samcis's picture

Not to forget, we have three things here (1) CRUD (entity and storage) (2) Data entry (widget) (3) Display (field formatters).

Have you tried the Entity API module?