String translation: why using t() for user specified text is evil?

Gábor Hojtsy's picture

There is a strong stream of support for using the Drupal built in t() function to translate taxonomy terms, vocabulary names, menu items, profile field titles and options, poll titles and options and similar user specified content. This is a very bad practice, and should be changed. It should be pointed out that this system is reused, because it is ready and seems to be easy and fitting for the problem. Unfortunately it is neither easy, nor fitting. Let me explain why, and where should we look for a solution.

What t() is designed to achieve?

Drupal has a built in interface translation mechanism. Literal strings marked with t() in the module and theme source codes are extracted when a new Drupal version is released and translation template (POT) files are generated. These templates have all literal strings provided by the Drupal source ready for translation. If some user turns on the locale module, she can import any of these translated templates (PO files), either mixing them under one language (core translation, contrib module translations, theme translation) or adding them to different language (Italian Drupal core translation, German Drupal core translation, etc). It is possible to edit these translations on the web interface, search for untranslated strings and import/export translations as PO files.

There are some fundamental approaches in this solution, which result in it being quite bad for user specified string (menu item names, taxonomy term and vocabulary names, etc) translation. The t() mechanism is used regardless of these problems in both i18n module and localizer module (plus some other modules in contrib, which are not inherently i18n related, but itend to be kind to their users with such extra features). Let's use a menu item as an example to simplify the text further on.

t() collects strings dynamically

If you add a menu item, you first need to click to see at least some other interface language, so that t() can collect the menu item string. It is also important that you actually see that menu item displayed on the page, since this is only when t() is called.

» A possible solution is to inject the menu item string into the locale table when adding the menu item. This approach is not used for some reason by any of the t() using modules.

The locale module interface is not user friendly

The locale module interface is anything but user friendly when it comes to translating strings. Some of you might remember the good old days when that was the only option (apart from editing SQL dumps). At that time, the Drupal interface translations were very rare. When with the active involvement of some experts in thsi field (most notably Jacobo Tarrio), we introduced PO import functionality and POT generation (see above), the number of translations skyrocketed. Using a desktop tool which has much better support for your workflow was a lot easier. The locale module interface did not change since then, it is still not useable.

Consider the workflow of adding that menu item, switching your language to some other interface to get that menu item into the locale table, then going to the locale module admin page, go to the strings tab, search for untranslated strings, get that menu item and translate it to different languages. Repeat this for more menu items. This isn't what our users are dreaming about. Sure t() is easy on the programmer, but is it easy on the user? No.

» Sure, we can solve this by injecting more input fields for each language into the menu item addition page (see tr.module screenshots for nice examples). This is how path aliases were worked into the node submission page for example. You need the functionality there? Provide it there!

You mix the pretranslated stuff with user defined strings

One of the bigger problems of using the t() mechanism to translate user specified strings is that there is no way to distinguish them later on from the PO imported strings. When you export your translations, you either get all interface and user defined translations, or nothing. This is especially a problem, when building a new site. You have some Drupal 5.0 beta for which you have a work in progress translation you grabbed from somewhere. You add your menu items, then after some week you start to deploy the site. You need the final translations online, but your menu item translations should also be there. No way, unless you can live with unused translated stuff in the database. Now unused, but previously t()-ed strings are not possible to detect, since the t() collection is dynamic (see above).

» OK, you say we should identify user specified strings in the database specially. It is certainly possible, since there is already a comment field for strings (which contain file names and line numbers for PO imported strings). We can extend the import/export interface on this, so that you can import and export user specified strings. This is already done in locale module, so this seems to be a perfect fit. Unfortunately it is not. Remember that PO file source strings are English! See below.

t() collects English(!) strings

The concept behind t() is that you write your module/theme/.info file source in English, and apply t() to literal English strings. The primary language of Drupal is English. If you add a menu item, you need to add it in English, even if you don't have a publicly visible English interface (because you only provide French and Dutch interface for example). Even if you add a menu item on the French admin interface, you need to provide it in English, so that it can be translated to other supported languages. Note that locale module explicitly asks you to provide the language name in English, when you add a new language by hand, exactly because of this reason. (Drupal has most languages generated into the base translation templates, so when you import a translation, your English language names will get translated to their equivalents in different language interfaces, this is why).

It is popular to abuse this system, if you don't have a public English interface. You can provide the menu item text in any language you wish (eg. French). When you provide translations, retype the text in the French field, so that it gets recorded there, and will not show up as untranslated later. It is not too wise to tell people that they need to provide the menu item text twice to use it. Also it is very confusing if different admins add menu items in different languages.

» How could we solve this? The Drupal design is very strongly built in this direction (ie. you can't have a translation without an English original). See we can solve all the above problems, we can build a custom interface on top of t() to better support our users, but no nice interface helps to use some system for something which it was not intended to be used for.

All right, what should we do then?

Good question. You see that t() is not the right tool for the job now. Some of it's problems could be solved by an additional interface (think the forum setup interface, which is a simplified taxonomy interface). But to be useful for our purposes, we need to rethink its base design, or we need to introduce a similar solution specifically designed for user specified text translation.

The locale storage design is architected for quick translation retrieval. It used to be a (source string, translated string) connection table, but it quickly turned out that this way we need to store source strings multiple times for each langugae. This is why we have a locales_source table which stores the source string with an ID (expecting that is in English) and a locales_target table which provides translations for the source string identified with that ID.

tr.module does introduce a string translation table, storing menu item translations, etc in that table. It can detect and remove unused translations and can add strings in any language with possible pairings in any other site supported language. That design would fit in here.

We need to think about implementation possibilities. If anyone can add new menu items in any language, we somehow need to store the language used to add that menu item, since then the menu table would be filled with different langugage items. When displaying the menu in a language, we would need to know if we already have the translation itself in the menu item. This starts to get messy with taxonomy terms, profile field names and options and similar things.

So what we can say is that (instead of English), one needs to provide menu items, taxonomy terms, etc at least in the site default language (which could be anything, and is user defined). We would store these strings in the menu, taxonomy, etc tables as primary values, and look up translations for these. This solves the last t() problem. Since the others have suggested solutions, maybe t() will not be that evil for user defined string translation.

What should we do? What is your opinion? Have a better idea?

Comments

The "dark side" of t()

riccardoR's picture

Wise and realistic analysis!

I almost always forget to enter new strings in English first and translate them afterwards;
to keep the locale database in good shape is not for timid souls actually.

As for menu items, I don’t translate them through the locale module interface.
I prefer to assign different primary/secondary link menus to each language through i18n_variables.
This implies some extra work, but allows for greater flexibility in site hierarchy.

It is not likely that everybody needs or wants to have different menus for each language, but IMHO it is an option to consider for the new i18n implementation.

What do you think about the possibility to disable menu items on a per-language basis?

menu item options

Gábor Hojtsy's picture

As far as I see, you will either have the option to translate the menu via some t()-like mechanism, or provide custom menus for your langugage. We can mix this of course, so you can have the same menu in some languages translated, and different menus in languages for which you did set up different menus. Language dependent options on the menu item level are not a target IMHO.

I am fine with that

riccardoR's picture

The possibility to mix translated menus and custom menus gives enough flexibility IMHO as well.

some discussion about

Gerhard Killesreiter's picture

some discussion about improvements:

http://groups.drupal.org/node/1967#comment-5181

The latest implementation

Roberto Gerola's picture

http://drupal.org/node/104403 (Comment #4)

Using a method like this :
tobject($object_name, $object_key, $object)
to translate all the fields for a particular object.

Calling example :
$term = tobject('taxonomy_term', $term->tid, $term);

Implementation example (with simple caching) :
//Implementation of tobject method
if (module_exist('localizer')) {
function tobject($object_name, $object_key, $object) {
static $translations = array();

if(localizer_get_locale() != localizer_get_baselocale())
{
    $translation = $translations[$object_name][$object_key];
    if(!isset($translation)) {
        $translation = localizer_get_currobjtrs($object_name, $object_key);
        $translations[$object_name][$object_key] = $translation;
    }

    foreach($translation as $key=>$value) {
        if(!empty($value)) {
            $object->$key=$value;
        }
    }
}
return $object;

}
}

--
http://www.speedtech.it

A proposal

Roberto Gerola's picture

Inspired by translate module of Rob Ellis, I am experimenting a possible solution.
Actually I have this solution working for menu module and I am working to make it
work also for taxonomy.
It is based on a single table where to store user-defined translation.

This is the table :

CREATE TABLE localizertranslation (
tid int(10) unsigned NOT NULL auto_increment,
object_key varchar(100) NOT NULL,
object_name varchar(100) NOT NULL,
object_field varchar(100) NOT NULL,
translation text NOT NULL,
locale varchar(10) NOT NULL,
PRIMARY KEY (tid),
UNIQUE KEY localizertranslation_idx1 (object_key,object_name,object_field,locale)
) ENGINE=MyISAM AUTO_INCREMENT=13 DEFAULT CHARSET=utf8 AUTO_INCREMENT=13 ;

You can use it to store the translation of different fields (object_field, like 'title', 'description' and so on) of particular
'object' (object_name, like 'menu', 'menu item', 'taxonomy' and so on).
object_key is a unique key that identify the object (mid for menu for example), translation is the translated text
and locale is the language.

object_key is a string because you could virtually use this table to store everything
that has to be translated, for example, also modules help text, using as object_key
as the module name and as object_name 'module'.

I have created also three simple function to use as APIs :
function localizer_save_objtrs($object_name, $object_key, $translations)
function localizer_get_objtrs($object_name, $object_key)
function localizer_get_objstrs($object_name, $object_keys)

I can post the code if someone is interested.
I am working actively on it and it is changing very quickly.

The translation UI is the same proposed by Rob Ellis, but a
'translation console' can be implemented easily.

What do you think ?

Roberto

--
http://www.speedtech.it

object key, name and field?

Gábor Hojtsy's picture

Good to see this going on!

What is the reason behind separating these three fields? Why isn't the field present in the API? What would you like to use these three fields for for which some single key would not suffice?

Re : object key, name and field?

Roberto Gerola's picture

Hi Gabor.

What is the reason behind separating these three fields?
Because you need all these three fields to uniquely identify the translation
of an attribute (object_field) of a particular object (object_key) of a particular
object class (object_name).

object_key cannot be unique in the whole Drupal environment, I mean,
the same id could be used both from a table and another, so you need
object_name to uniquely identify that particular object.

I am using the object notion here, not make this concept more abstract,
but you can think to it also as a record (object_key) of a particular table
(object_name).

And object_field represents the field or the attribute of the object
that you want to translate.
So, thinking to an object like menu, you have :
- object_key : mid, the id of menu or menu item
- object_name : menu or menu item
- object_field : title or description

So, for a particular menu item you have two records in the this table,
one for title and one for description. Both records with the same
object_key and object_name, but different object_field.

Why isn't the field present in the API?
Because, most of the times, you need to load or save all the fields / attributes
of a particular object at the same time.
You need rarely only one field of an object.
The APIs can be extended, of course.

What would you like to use these three fields for for which some single key would not suffice?
Probably for everything. (See the previous answer).
There are same cases in which the use of these three fields are over-sized,
for example for variable translation.
You need in this case only object_key (variable name) and object_name ('variable') to uniquely
identify a particular variable, but I don't see any problem to use for example, 'value' as
object_field in this case.

I think that with this approach is easy to implement a generic "translation console" where
the user can manage all his translation items of every object.
With a simple filter on object_name value, we can provide the same interface to translate
every type of content, menu, taxonomy, variable, poll title, an so on.

Roberto

http://www.speedtech.it

storage and API

Gábor Hojtsy's picture

Because, most of the times, you need to load or save all the fields / attributes of a particular object at the same time. You need rarely only one field of an object.

Well, all right. These are pluses on the side of separating these three things from each other and not storing them as one string.

Code

Roberto Gerola's picture

Hi Gabor.
I've just relased on the CVS the 2.0 version of my module that uses this new storage system.

http://cvs.drupal.org/viewcvs/drupal/contributions/modules/localizer/?on...

Any feedback is really appreciated.

Localization support to modules

Roberto Gerola's picture

How to provide localization support to external modules ?
Regardless of the system we will adopt, in which manner can we provide support for user
content localization / translation to external modules ?
I think we must provide localization support as an option not as an obligation.
I mean, core or third party modules must have only some hook to provide,
if that is the case, to add translation support.

Rob Ellis suggested also to extend the db_rewrite_sql hook to change
the fields returned. This can be an option, but perhaps we have to think
at some more abstracted API and absolutely very simple to use, like
the t() function.

I'm trying to figure out how to implement a similar feature to taxonomy module,
without breaking the core features of the module.
Adding hook call everywhere doesn't seem to be a viable option.

Probably I'll add to the taxonomy module a function similar to t() that
will call, if present, the localization engine functions.
In case, we can create a similar and more generalized system that every module
can use and insert this t() similar function to the core.

What do you think ?
Other ideas ?

follow up there

Gábor Hojtsy's picture

Posted a follow up on the issue.

Keyword-based translations.

donquixote's picture

A possible solution (or an inspiration, maybe) is being proposed here:
http://drupal.org/node/630432#comment-2769014

Internationalization

Group organizers

Group categories

Content categories

Group events

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week