How to make a Drupal 7 module i18n enabled and what does that teach us for Drupal 8?

Gábor Hojtsy's picture

There is great discussion forming on my previous posts on exportables and user provided text as well as the dangers of using t() for user editable data, and I can only hope we can keep that up! To provide better visibility for this post, I'm also cross-posting on my blog, but commenting is only allowed here.

Regular readers could find this boring, but let's reiterate the three working modes that all objects should ideally be able to handle in Drupal to support multilingual site building.

  1. Being able to mark an object as in one language.
  2. Being able to mark an object as in one language and relate it to others as being a translation set. This is useful when you want to use the different language objects in different relations, track their history separately, have different permissions and workflows for them, etc.
  3. Being able to translate pieces of the object that need translation and leave the rest alone. Load the right language variant of the object dynamically as needed. This is very useful for keeping external relations intact and sharing common fields between translations effortlessly.

There are certain things, where not all of these make sense. For the site's name for example, people would probably only use either (1) or (3). For a block for example, people should be able to use either based on their needs. (2) is useful to place blocks differently on translated pages, (3) is good to keep the placement consistent without effort. This can be different on a per-block basis. Same applies to nodes, menus, taxonomy, views, rules, and so on.

Field translation as a model for translating in-object

For entities, (3) is very well implemented with translatable fields, and that actually gives us a cue as to how would it be best to implement it for other types of objects, or at least what is required. The field system is suited very well for in-object translation because it encapsulates small data components with their input widgets, storage and output formatters. Let's focus on in-object translation in this post, because we have bigger challanges here to solve.

Think of a view or even just a contact form. You'd obviously want to translate some pieces of that configuration, but don't want to replicate all of that in all languages. So you need to be able to tell which pieces of the objects are translatable, be able to tell whether they are translated yet, provide widgets for translating them and for validating them. Then they need to be stored, ideally revisioned and their right value should be used at rendering time (via formatters). This would actually covers almost all of the problems that I iterated on in my post on why using t() - designed as a simple string translation mechanism - for user editable objects is a failure.

So ideally, all Drupal content and configuration objects would be built from smaller reusable components (like fields), so we can reproduce their input for translation forms, handle their storage, version their changes and implement their rendering with a formatter using the right language. This sounds like pretty big bloat, but without the modules providing translation support themselves one-on-one, this is how we could systemically approach the problem. Unfortunately, as we know, Drupal objects are far from built up from standardized components, they use forms with various structures, uniquely set up data storage schemas and data validation as well as often no generic rendering system to hook into on display.

All right then, how can multilingual sites be done now anyway?

Now, all of the above are things the i18n module needs to solve for all kinds of custom objects in Drupal, and it does a reasonable job for core objects (lagging behind not for lack of effort, but for lack of opportunity in most cases). In fact i18n module needs to provide an API as well for all of contrib to be able to tie in to this system hopefully as easy as possible.

Let's walk through the i18n_contact.module implementation of these concepts. You'll see how i18n module needs to augment all Drupal objects with very lightweight "field" like information, and where it does not go the whole way to fieldify things, it lacks the relevant needed features. Also, it is important to tell that the i18n modules work from "outside of Drupal core" to augment its behavior, so need to use altering tricks to make multilingual features work. The code required to put these features into the modules themselves would be less, but not significantly so.

This example is not suggested as an API at all for Drupal 8, it is merely targeted to be a discussion starter on how we can make Drupal more capable to fill these missing pieces out of the box. These code samples are from http://drupalcode.org/project/i18n.git/tree/refs/heads/7.x-1.x:/i18n_con... so you can follow along there too.

What do we make translatable on contact forms?

When you turn on the Drupal core contact module, you'll see that it provides very simple features. First it provides per user contact forms, which have no configuration component. Then it provides site-wide contact forms, where you can set up any number of categories with their names and recipients as well as an autoreply email that goes out to the sender. In my asessment of what needs to be translatable here, I decided the category name and the autoreply email text are to be translatable. The i18n_contact.module needs to provide storage for these translations as well as a UI and workflow for them and should inject the right autoreply text and category names wherever needed when used. A pretty high bar for such a simple module, right? We'll see that the module can leverage i18n_string module APIs to do this, but it still needs to do lots on his own.

Tell the system about our object

There are two hooks in i18n_string and i18n to tell the system about our translatable object:

<?php
/<strong>
* Implements
hook_i18n_string_info().
*/
function
i18n_contact_i18n_string_info() {
 
$groups['contact'] = array(
   
'title' => t('Contact forms'),
   
'description' => t('Configurable contact form categories.'),
   
'format' => FALSE,
   
'list' => TRUE,
  );
  return
$groups;
}

/</
strong>
* Implements
hook_i18n_object_info().
*/
function
i18n_contact_i18n_object_info() {
 
$info['contact'] = array(
   
'title' => t('Contact category'),
   
'key' => 'cid',
   
'placeholders' => array(
     
'%contact' => 'cid',
    ),
   
'edit path' => 'admin/structure/contact/edit/%contact',
   
'string translation' => array(
     
'textgroup' => 'contact',
     
'type' => 'category',
     
'properties' => array(
       
'category' => t('Category'),
       
'reply' => t('Auto-reply'),
      ),
     
'translate path' => 'admin/structure/contact/edit/%contact/translate',
    )
  );
  return
$info;
}
?>

The hook_i18n_string_info() hook tells the system that an object group named 'contact' exists and it provides a name and description for the group. This is used to display the group on various central translation user interfaces. It also defines the group as "listable" which comes in very handy below.

The i18n_contact_i18n_object_info() defines the relevant details of objects in this group for translation. Imagine this as the very lightweight field definition hook for this object. In this module, the object group and the object has the same name, but remember they are different things. An object group can have different objects. The taxonomy object group has terms and vocabularies for example.

So the 'contact' object is defined with its name, key and string translation details. To be able to generate a translation user interface for this object, the system needs to know where it is edited, where is the translation UI that is to be generated for it, as well as what object properties are to be translated. This code is enough for the system to generate a translation user interface for the object (if the module itself provides the menu items to hook in there as well, see below).

Naming of pieces of objects

Now that we have the translatable portion of the object defined, it is important to know that i18n_string has a naming convention for pieces of objects. This is to solve the source string change problem (and in part the source language problem) from my list of 9 issues with t(). The string identifiers are in the form $object:$type:$id:$property, where $id is usually a number or machine name. Therefore for contact categories, it would be contact:category:13:category and contact:category:13:reply for the 13th category you set up based on the above object definition.

Now we know how to tell which "fields" are translatable in the object, and how i18n_string will refer to them.

Make our translatables known to the translation system

A huge chunk of code deals with making our translatables known to the translation system. Why? While some people work with translating their site object by object, others work with lists of translatables and want to see their progress / status. Also, if your site happens to work with an outside translation service, being able to export a list of translatable objects is of crucial importance. So whenever an object is created, edited or deleted, we update the translation database with information about our translatable.

<?php
/<strong>
* Implements
hook_form_FORM_ID_alter().
*/
function
i18n_contact_form_contact_category_delete_form_alter(&$form, &$form_state) {
 
$form['#submit'][] = 'i18n_contact_form_contact_category_delete_form_submit';
}

/</
strong>
*
Remove strings for deleted categories.
*/
function
i18n_contact_form_contact_category_delete_form_submit(&$form, $form_state) {
 
$contact = $form['contact']['#value']['cid'];
 
i18n_string_remove("contact:category:$contact:category");
 
i18n_string_remove("contact:category:$contact:reply");
}

/<
strong>
* Implements
hook_form_FORM_ID_alter().
*/
function
i18n_contact_form_contact_category_edit_form_alter(&$form, &$form_state) {
 
$form['actions']['translate'] = array(
   
'#type' => 'submit',
   
'#name'   => 'save_translate',
   
'#value' => t('Save and translate'),
  );
 
$form['#submit'][] = 'i18n_contact_form_contact_category_edit_form_submit';
}

/</
strong>
*
Remove strings for edited/added categories.
*/
function
i18n_contact_form_contact_category_edit_form_submit($form, &$form_state) {
 
$contact = $form_state['values'];
 
i18n_string_update(array('contact', 'category', $contact['cid'], 'category'), $contact['category']);
  if (!empty(
$contact['reply'])) {
   
i18n_string_update(array('contact', 'category', $contact['cid'], 'reply'), $contact['reply']);
  }
  else {
   
i18n_string_remove(array('contact', 'category', $contact['cid'], 'reply'));
  }

 
// If the save and translate button was clicked, redirect to the translate
  // tab instead of the block overview.
 
if ($form_state['triggering_element']['#name'] == 'save_translate') {
   
$form_state['redirect'] = 'admin/structure/contact/edit/' . $contact['cid'] . '/translate';
  }
}
?>

Well, I said it is going to be relatively lots of code for a simple task. We need to delete the translatables when the object is removed and we need to save them when the object is added or edited. i18n_string does have some helper functions to cut down on this code, but form altering and custom submit functions are essentially required to augment the contact module to make it multilingual. There is a little usability trick here as well for people who want to continue editing their translations after creating the category. A Save and translate button is added, so they jump right to translation once their contact form data is saved instead of jumping back to the contact configuration screen, from where it is very hard to get back to translations.

Handling existing sites

Deleting translatables and saving/updating them when the object is updated is great, but if you turn on multilingual features on an established site, you need all the existing data exposed for translation. This is where the translatable listing hook comes in handy.

<?php
/**
* Implements hook_i18n_string_list().
*/
function i18n_contact_i18n_string_list($group) {
  if (
$group == 'contact' || $group == 'all') {
   
$strings = array();
   
$query = db_select('contact', 'c')->fields('c');
   
$result = $query->execute()->fetchAll();
    foreach (
$result as $contact) {
     
$strings['contact']['category'][$contact->cid]['category'] = $contact->category;
      if (!empty(
$contact->reply)) {
       
$strings['contact']['category'][$contact->cid]['reply'] = $contact->reply;
      }
    }
    return
$strings;
  }
}
?>

This basically iterates over all contact categories and provides the category / autoreply text for translation. Now that we have the translatables, we can move on to providing a UI for the translation.

Providing a UI for translation

The i18n_string module reuses the locale module backend and user interface for translations, which both has its advantages and disadvantages. In fact it has more disadvantages then advantages at this point, so the module will transition out of that eventually. However, that does not really affect the external API. Previously people said that translators should just use that central interface. As I wrote in my post on translating blocks, that is very tedious, and can hardly be called user friendly. Therefore recent versions of i18n_string module actually provide helpers for you to generate a much more reasonable in-place translation tool very similar to the field translation system. We have already implemented the object description hook above, but need to put in the right menu callbacks for the translation UI to come to life.

<?php
/<strong>
* Implements
hook_menu().
*
*
Add translate tab to contact config.
*/
function
i18n_contact_menu() {
 
$items['admin/structure/contact/edit/%contact/edit'] = array(
   
'title' => 'Edit',
   
'type' => MENU_DEFAULT_LOCAL_TASK,
   
'weight' => -100,
  );
 
$items['admin/structure/contact/edit/%contact/translate'] = array(
   
'title' => 'Translate',
   
'access callback' => 'i18n_contact_translate_tab_access',
   
'page callback' => 'i18n_string_object_translate_page',
   
'page arguments' => array('contact', 4),
   
'type' => MENU_LOCAL_TASK,
   
'weight' => 10,
  );
 
$items['admin/structure/contact/edit/%contact/translate/%language'] = array(
   
'title' => 'Translate',
   
'access callback' => 'i18n_contact_translate_tab_access',
   
'page callback' => 'i18n_string_object_translate_page',
   
'page arguments' => array('contact', 4, 6),
   
'type' => MENU_CALLBACK,
  );
  return
$items;
}

/</
strong>
*
Menu access callback function.
*
*
Contact translators required to have both contact and locale admin.
*/
function
i18n_contact_translate_tab_access() {
  return
user_access('translate interface') && user_access('administer contact forms');
}
?>

The contact categories do not have a visible edit tab on them, so we need to put that in place, then put a translate tab by the side. The translate tab will need per-language sub-paths, so we put those there as well. These pages need permission checking appropriate (again see my post on problems with t() for user editable text). The permissions used here are the generic locale.module permission and the contact admin permission combined. The user interface for these translation pages are actually generated by the i18n_string_object_translate_page() function, which uses the object information from above to generate translation status overviews, lead links to the edit and translation paths proper, generate translation forms and validate and save the data.

Ideally this translation tab would also show up in Operations link listings in admin tables and contextual menu items on objects. The i18n_block module has a simple example to put the translation tab on a contextual menu. Injecting links to the Operations link lists of object admin screens is not trivial and mostly not done by i18n submodules.

Generic input widgets and lack of revisioning

Let me point out a very obviuos omission here. As you might have noticed, a full field system is not replicated by i18n_string, so the translation input screens will only work with textfields or textfields with unmodifiable formats assigned. There is no information as to what widget should be used to edit certain properties of an object, neither an abstration of the validation to use for them. The i18n_string module also needs to reimplement some of the format permission checking to be sure that no security holes are opened when editing sensitive formatte content (such as PHP input). Also, most Drupal objects will not support revisioning and translations don't support it either.

Now use those translations

We've covered letting i18n_string know about your object and its internal structure relevant for translation. We've provided the translation system with the translatable data as well as removed it when needed. Then used the available APIs to generate a very familiar translation UI. All is left now is to actually use the translations at the right places. And that is not a walk in the park either as you might guess.

Drupal uses these object values all around its codebase in forms, titles, error messages, and so on. Unfortunately most of these places there is no way to jump in and replace the original language text with its translation. I've written about my firm belief that translation is a rendering operation, because we don't know until the last minute what language we need to use. With translatables rendered sooner, its hard and sometimes impossible to replace them proper with their translations.

The contact module is in fact a very good example of this situation. Imagine the site-wide contact form. If you have an auto-reply message set, and the sender checks the "send me a copy" checkbox on the form, the contact module will send no less than 3 emails in one HTTP request. There is an email sent to the contact recipients with the contact input. This should be sent in the site default language. There will be a copy of that email sent to the submitter and an autoreply sent to the submitter. Both of these should be sent in the submitter's language. Also, the contact module displays the categories in its form, where the right language versions of the names should be displayed based on the page language. So we work with the page language, the site language and the langauge of the user all three of which can be different. We also deal with "display" in form and in email.

<?php
/<strong>
* Implements
hook_form_FORM_ID_alter().
*/
function
i18n_contact_form_contact_site_form_alter(&$form, &$form_state) {
  foreach (
$form['cid']['#options'] as $key => $label) {
   
$form['cid']['#options'][$key] = i18n_string("contact:category:$key:category", $label);
  }
}

/</
strong>
* Implements
hook_mail_alter().
*/
function
i18n_contact_mail_alter(&$message) {
  if (
in_array($message['id'], array('contact_page_mail', 'contact_page_copy', 'contact_page_autoreply'))) {
   
// Alter the first part of the subject of emails going out if they need
    // translation.
   
$contact = $message['params']['category'];
   
$category = i18n_string('contact:category:' . $contact['cid'] . ':category', $contact['category'], array('langcode' => $message['language']->language));
   
$message['subject'] = t(
     
'[!category] !subject',
      array(
'!category' => $category, '!subject' => $message['params']['subject']),
      array(
'langcode' => $message['language']->language)
    );
  }

  if (
$message['id'] == 'contact_page_autoreply') {
   
$contact = $message['params']['category'];
   
// Overwrite the whole message body. Maybe this is not entirely responsible
    // (it might overwrite other existing items altered in by others),
    // but unfortunately Drupal core cotact module does not make its item
    // identifiable easily.
   
$message['body'] = array(i18n_string('contact:category:' . $contact['cid'] . ':reply', $contact['reply'], array('langcode' => $message['language']->language)));
  }
}
?>

Displaying the right language version on the contact page drop-down is easy by altering the contact form and pulling in the right language value of the object property. i18n_string will use the page language if not instructed otherwise. Emails are altered with hook_mail_alter(), and we get the language code to use in the message array, which was already pre-computed properly by Drupal. However, we need to replace certain parts of the message based on the user provided translations for contact categories and auto-reply. In these cases, the subject and body of the email are already "somewhat" rendered, so we'll need to overwrite their value with the right language variants.

All-in-all the contact support module is in fact very lucky that these values are not used elsewhere. Modules like i18n_field have a pretty impossible job to replace the field name and descrtipion at all places. They can do a best-effort job to replace them at the most used places, but given that values are scattered around in error messages and user interface components unmarked, it is never going to be complete with outside augmentation.

Fields for all configuration

As the above state of the i18n API for arbitrary Drupal objects probably shows, the i18n module needs to work around several things that are lacking in Drupal core for multilingual support:

  1. Objects needs explanation as to which parts of them are translatable,
  2. Those parts need to have widgets, validation and storage,
  3. As well need to be listable for integration with 3rd party translation services as well as merely to provide a status report of where you are.
  4. There needs to be a friendly user interface for translation of each object,
  5. As well as permission checking possible to limit translatability - avoiding permission escalation.
  6. Finally and even more importantly, all display and use of these object properties should use rendering that allows translated versions to appear in place (with "formatters" for the "fields")

The Drupal 7 version of i18n will surely keep using workarounds and (hopefully continually improving) lightweight approaches solving problems very similar to the field system. As I've written above, the code examples presented here are current practices, and they definitely need improvement (there are multiple issues in the i18n module queue to improve on them). However, they properly represent the problems that need solving, and show the remarkable similarities of the issues to what the field system is aimed at.

It might sound like a dream that basically all translatable Drupal objects would need to be required to use some fields like solution for those pieces at least, it sounds especially overkill for simple things like site name or slogan, right? Well, let's discuss how we could solve the same problems in other ways and if we can find common ground in solving these solutions with as much code reuse as possible.

Your input welcome!

Comments

What happened to the

Dave Reid's picture

What happened to the 'translatable' tag for DBTNG queries? This is what's currently implemented in contact_site_form():

<?php
$categories
= db_select('contact', 'c')
    ->
addTag('translatable')
    ->
fields('c', array('cid', 'category'))
    ->
orderBy('weight')
    ->
orderBy('category')
    ->
execute()
    ->
fetchAllKeyed();
?>

Is this not used at all? Or un-usable? Because it's also used in contact_load() when is passed to contact_mail(). It seems this example is overly complex as i18n_contact_mail_alter() would be completely unnecessary.

Senior Drupal Developer for Lullabot | www.davereid.net | Gittip me!

augmenting from the outside does not work

Gábor Hojtsy's picture

There are dozens of problems with this. Itemized:

  • language is not a load operation, it is a rendering operation; the above code examples show that contact module can send up to 3 emails in one HTTP request in up to two languages; so we'll need the contact object data in at least those two languages; the loader is not language aware, so that we'd need to pretend the language context changed from the outside and then load something else again is not really a "solution"

  • this does not tell which properties of the contact settings are translatable; some sites will want to translate the category (subject) and autoreply, some sites will want to send the contact mails to different addresses too so that people who understand the given language can deal with that; we need a config UI to configure which properties are translatable and we need a description/metainfo of the object to present that UI

  • we need permissions associated with the translation operation, and the UIs to respect those permissions;

  • we need a UI for translation, which as seen from above will need to reproduce part of the object input form with part of the object validation; this is probably the second weakest link in the current i18n implementation, as it cannot really reproduce any serious form widget...

  • we need to then save these translations somewhere and use them when needed, tagging select queries do not solve the saving obviously

  • we then need to contextually understand when the object is used in each case whether some kind of language version of the object is needed; on a multilingual site, one of the problems is that these objects can be entered in any language; say a view or a rules rule; so we don't even know what's the source language; but we still need to tie in to each place where the object is actually used for rendering and use the right language version

See, tagging select queries is really just scratching the surface, and its not even the right way to scratch. All the above augmentation are all still needed, and is mostly unique to the object in question, so needs per object glue code. We need descriptions of the objects, widgets, validators and permissions for translations, we need language associated to the original object too and we need rendering to tie in to use the right language translation when needed. Language is not a loading operation, it is a rendering operation in terms of object use, and we can only render stuff if we had a way and a workflow to save it proper.

I have a list of these and some more drawbacks to augmenting language unaware loading functions from the outside (like if some code wants to perform a save operation on your object in the request, it will overwrite the original object with the translated one, ouch) at http://groups.drupal.org/node/149984 under the "Translation is a rendering operation" heading. I've also linked out to the only code experiment that I have seen using these tagged queries, which multiplies each translatable Drupal table as many times as many languages you need there.

In short I think this was not well thought out and it looks like a dead end that is not going to get actual use and can be removed in Drupal 8.

translatable

sun's picture

I wouldn't necessarily agree with the conclusion that it's ultimately doomed already.

In fact, no one had the time to actually start working on a real implementation yet.

So, while I definitely agree that there might be some challenges and problems to solve, it's entirely hypothetical as of now. Not all translatable data has all the requirements that you listed. I still think that it can work for simple translatable data.

Lastly, bear in mind that, compared to everything else, this approach has almost no performance impact. That's actually what makes it very attractive and worth to investigate further IMHO.

Daniel F. Kudwien
unleashed mind

Facing a particular problem

HnLn's picture

Facing a particular problem but can't find a related issue. Is it possible in d7 to translate field names in form validation errors ? For example the core number module does something like this in number_field_widget_validate: $message = t('Only numbers are allowed in %field.', array('%field' => $instance['#label']));

On the form itself the label translates fine, the actual message also translated, but the %field bit I can't get translated. (i18n_field is enabled).

Some API updates

Jose Reyero's picture

There were some API improvements right before the first stable release of i18n. While all of the above is still true, there are now some shorthand functions for full object translation/update/delete:

(Btw the object type 'contact' has been renamed to 'contact_category')

<?php
// Update all strings for the object
i18n_string_object_update('contact_category', $contact);

// Remove all strings when you delete a contact category
i18n_string_object_remove('contact_category', $contact);

// Get a clone of the object with all strings translated
$translated_contact = i18n_string_object_translate('contact_category', $contact, array('langcode' => 'es');
?>

See the latest version of i18n_contact module for a full example.

Internationalization

Group organizers

Group categories

Content categories

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: