Internationalization of Drupal 8 configuration

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
gdd's picture

Last Wednesday I had a Skype conversation with Angie Byron (webchick), Francesco Placella (plach), Gábor Hojtsy and Jose Reyero to discuss how we are going to implement multilingual functionality in the Drupal 8 configuration system. For a little background, the configuration management system will be using XML files to store configuration. These files will be loaded into an 'active store' (the database by default, but pluggable) which will act as the source for configuration at all times. An extensive documentation of this is forthcoming, but for the purposes of this summary this should be enough information.

My original thought had been that we would store all the languages in one file. For instance

<site_information>
  <site_name lang="en">I am awesome!</site_name>
  <site_name lang="se">Jag är grymt!</site_name>
</site_information>

After some discussion it became apparent that this probably isn't going to work. Configuration usually has plenty of language independent pieces. The idea was that if you mess up one of these files, you can restore from the default set of configuration provided by the module. Since this will typically be an English-only file (when shipped with a module or theme), you would lose translations and have to re-merge everything all over again. Instead it seems to make sense that every file is language-specific, and contains only the configuration information that has actually been translated. So for instance you could have

site_information.en.xml (full set of configuration, shipped with Drupal)
site_information.se.xml (partially translated configuration, created install time)

Now say that your language is Swedish (.se). When files are loaded into the active store, the Swedish will be read and for any missing information the English version will be used. My understanding is that this is similar to how t() and the entity/field translation system works today. This information will then be the canonical set of configuration the site will run with. For configuration created by users on the fly (not shipped with modules or Drupal), the original version of the configuration will be the canonical version, and it can be in any language. Fallback will always happen to the original version of the configuration like entity/field translation.

Another thing that will need to be implemented is the ability to translate strings to an arbitrary language. This will be implemented as an additional parameter in the configuration API when you get data from the config. By default it will use the site's current language (as extracted from the context system), however you will also have the option to pass in a specific language. In this case the system will read the configuration out of the appropriate file, or, if it doesn't exist there, out of the active store.

This covers, if not everything we need, at least a good portion and enough to start implementing it into the system.

Over the weekend I also had a discussion with David Strauss about internationalization, and he pointed to

http://java.sun.com/developer/technicalArticles/Intl/ResourceBundles/

as an example of how Java uses a naming convention to get ever-more specific internationalization options from files. At the same time, Michael Favia found

http://developer.android.com/guide/topics/resources/providing-resources....

which describes how Android uses the same system plus potentially additional context information. These could be useful references for building our system.

While this does cover websites in a single arbitrary language, it does not deal with the issues surrounding websites in multiple languages that can be switched arbitrarily. This still needs to be sorted out and I am sure we will be having followup discussions in the near future.

Comments

Arbitrary suffixes

agentrickard's picture

From the Domain Access world -- which has been handling this since Drupal 5 -- this provides a good start, but I think the suffix (en / se in the above) needs to be extensible, such that contexts can dictate which file or config settings to load. I also wonder if the {config} table needs a prefix column to account for such things.

Scenario 1:

Under Domain Access, you have three sites, which share more settings, but override a few. Thus we might have:

  • site_information.xml
  • site_information.domain1.xml

Scenario 2:

Even more complex, but a fairly common request at this point, is DA in conjunction with i18n, which gives the following.

  • site_information.xml (default)
  • site_information.se.xml (default Swedish settings)
  • site_information.se.domain1.xml (Custom Swedish settings only applied to domain 1)

In versions prior to D8, we afforded these changes by dynamically loading onto the $conf global during settings.php.

We have talked a lot about

gdd's picture

We have talked a lot about managing multiple contexts and it really just opens up an enormous can of worms, and thus for now this system is going to be language-specific to keep the scope small. What this means for you, I'm not sure to be honest. You may have to continue to manage this yourself. $conf (or something equivalent) will still exist.

(I agree with pretty much

nonsie's picture

(I agree with pretty much with everything Ken stated)

This might be completely OT but how do you imagine handling configuration settings that might require different plurals? Not all languages use the same plurals.

Also what would happen if the translation is the same as the original? I've found in the last 5+ years in Drupal that in some languages it's better not to try to translate technical terms and just leave them as is since there is no match in the language.

Our idea is that the

gdd's picture

Our idea is that the translation files will only contain the changed strings, and when loaded would merge with the site default language. Thus anything unchanged will remain in place. I'm not sure about the pluralization, I think I'm going to pass the buck to Gabor on that one.

Going for the whole thing.

Jose Reyero's picture

I agree with @agentrickard that it would be interesting to give a try to the concept of having different 'realms' or domains besides language. So my intention is to start working on this and see how it would look like (patches).

Anyway, this may be mostly about storage and then runtime variable overrides. But on top of that we need to have a way to localize default configuration.

(A good example about generic variable 'realms' that can be used for language too is implemented in variable module, http://drupal.org/project/variable ). I am working atm on the patches to make languages and domains play nice by using variable_realm for both.

So, about file storage, though we can start just with languages, we should go for an schema that is extensible like:

config.xml
language/es/config.xml
(and then later maybe domain/example.com/config.xml

Maybe Drupal core just needs to handle 'languages' for now but I think it wont be much harder to make the model extendable by having better names. Basically, instead of looking for 'en', we can look always for 'language=en' variables, then it can take other parameters too ('domain=abc').