Drupal's multilingual problem - why t() is the wrong answer

Posted by Gábor Hojtsy on May 19, 2011 at 8:01am

Drupal is a great system to run foreign language websites on. The core itself is written in English and modules and themes are expected to follow suit. For developers, very simple wrapper functions are available to mark your translatable strings and let Drupal translate them to whatever language needed. These are the famous t(), the less famous format_plural() and a whole family of other functions. See my cheat sheet (PDF) and the drupal.org documentation for more on this.

Then there is "the other side", whatever does not come from code. Drupal works pretty well and very consistent if you want all of those to be in a foreign language (i.e. not English), but not in multiple languages (any of which can be English at that point). Drupal only has direct multilingual support in nodes (+ fields of entitites) and for path aliases. But life with Drupal means you work with all kinds of other objects like blocks, views, rules, content types, etc which are not "language-aware".

Unfortunately for building multilingual Drupal sites, this is the biggest problem that needs to be worked around. The contributed Internationalization module attempts to fill in the gaps, provide language associations and different workflows for translating these language-unaware objects. This works to some degree, but is really not easy without much help from the modules implementing these objects.

Go the easy way with t() - wreck your ship

Module developers are well aware that if they call t() with a string, it should give back the translated string. It is so easy, and tempting to use for all text translation. So some module developers do use t() that way. There are even some examples of this in Drupal core, which we are working to remove. The field system for example lets users specify the field label, help/description, allowed values and the default value for fields. Now to translate all these, one could think t() is a nice and easy solution. In fact, Drupal core was using t() for some cases of label display and some cases of description display (but not for allowed values and default values). This was recently removed from field constructs in Drupal 7, so code that was using the t() system for translation of field properties will not work like that in Drupal 7.1.

Using t() for user provided data is a very bad idea, and it comes from the simplicity of it. It merely relates one source string to other translation strings. Let's imagine the field system would use t() for all the four properties mentioned of a field. What would happen?

Timing problem: t() collects translatable strings for storage in the database when it is invoked. Therefore the values are only available for translation once the form is displayed with all the help text, label, default value, allowed values, and so on. For fields, this requires the user to first navigate to the entity form so the source text is saved. For other types of objects where some strings are conditionally displayed (think views empty help text), this can be pretty hard to achieve.
Source string change problem: t() will store the translations related to the source string. If the source string changes (eg. you fix a typo in your long help text), all your translations for that string are lost, and you need to redo them all over again.
Source language assumption problem: t() assumes your source text is in English, and it will not let you translate it to English. Now if your site is not primarily built in English, you'll not be able to translate your custom objects to English. You'd need to set up a fake secondary English language to be able to.
Overall UX problem: When you pass a string through t(), it is saved at a very generic place which just relates strings to others. It does not know that your string was a field label or help text, or that a string was the field label for a field for which another string was the help text. You cannot translate the four field properties at once, because there is no meta information involved to relate them together. (You can optionally specify a context for your strings, which somewhat mitigates this problem, but your strings are still translated at an entirely different place to where you edit the originals and filtering by context is still a major stumbling block for users, just try it.)
Individual UX problem: Many strings in objects have widgets associated with them, like a format selector, a WYSIWYG editor, a dropdown, etc. Now there are no such widgets on the translation user interface. There are (almost) no allowed value limits. Some source strings (such as default value for a body field) have format assigned to them. Even if the default value can be full HTML, the t() backend will not let translators submit translations in that format, and will not provide the right widget to translate it.
Permission escalation via formats: It is not just a UX problem, also a permission problem. If the default value has a PHP format, and you can edit its translation, you could theoretically inject PHP code to the site merely with 'translate interface' permissions. Well, the t() backend will filter the text for some XSS input, but it is entirely unaware of formats and permissions associated to them.
Cross-object permission escalation: t() just stores generic strings, if you use it to translate properties of all your objects, translators can manipulate your objects without permission to create or edit them. With the field example, if the allowed values are translatable via t(), translators can edit the allowed values of a field (in a language context), add new ones, remove some without having the permission to actually tinker with fields. This might or might not be the desired behavior depending on your site.
Workflow problem: translations either exist or they don't. There are no unpublished, in review, etc. translations with t(). You cannot preview how it would look like with your translations before you save it.
Performance problem: the t() system preloads short strings for quick translation on the page. While this might not be a big performance problem with field properties, think about what would happen if we'd use t() for taxonomy terms or other data of big quantities. Since these are usually short strings, the t() string cache will load the translations for all of them on the page. On the other hand, since we have no object meta-information for these strings, we can only load the longer ones one by one when needed. This means SQL queries for the field help text, allowed value and default value each - multiplied by the number of fields displayed.

The ideal system

Now let's see what an ideal system would do to avoid these issues and provide a generic translation service for objects in general (views, blocks, field properties, etc).

Timing problem: creation, updates and deletion of the object in question should save its translatable as available for translation.
Source string change problem: instead of using the source string as the key, we should introduce a string identifier; Java, .NET and anybody else does this for a pretty long time now - see property and resource files.
Source language assumption problem: whenever we save an object, we should know and store what language did we save the object in (which will filter down to its text properties); we don't know this now for blocks, views, field properties, your contact form configuration, your anonymous username, your default date format, site name, content type labels, etc. - this is a major missing piece
Overall UX problem: we should be able to generate in-place translation forms based on the string identifier (for which we need an index of objects with their properties to tell which are translatable)
Individual UX problem: we should be able to look up the widget to use from aforementioned index
Permission escalation via formats: this should not happen once we know the widget and related metadata from the index about the object
Cross-object permission escalation: this is a tough one, because it makes cross-object translation harder; I think we should avoid it by default by focusing people on the in-place translation tools and allow for it if not an issue or specifically required for the site
Workflow problem: this can be solved by implementing translation sets for more objects; like the Drupal core node translation module, that could allow for previews, workflows, even more fine grained permissions, etc. for the given object - opening a whole new set of features
Performance problem: translated object properties need different loading patterns implemented compared to simple string translation; this needs to be explored and probably different, overridable implementations provided for different use cases

Translation is a rendering operation

Ok now this was all about saving the translatable and editing the translation. How do we actually display it? Well, that part was actually not a problem with using t(), since if you used t() consistently, it should work for display (even if it is a big mistake for all kinds of other reasons). However, this lets us learn another lesson: translation is a display/rendering task. When you use translatable fields in core for example (actual field values, not the field labels and friends mentioned above), your translations will be under the node, the right one being used depending on which version should be displayed. When you send hundrends of notification emails in a request, the right language translation of the value will be loaded for each email (users can have very different language preferences). In a similar sense, translating the strings should be a rendering operation.

There was considerable (but not really well known) work in the Drupal 7 core development cycle to tag database queries with 'translatable' and potentially implement contributed solutions to translate data right in the object loading phase. One of these experiments is in Berdir's sandbox at http://drupal.org/sandbox/berdir/1122562. In reality, while I think this approach fares good for performance (and solves the source string change problem), it does not solve any of the other problems. It works by making multiple copies of data tables each per language (think to translate menus, you'd have as many menu tables as many languages you need). However, the code still needs to make original values available for translation (timing problem), it has no idea of the original source language, there is no UX plan in place that I know which would not require similar code support from each module defining the objects, the individual UX problem is there, and permissions and workflows would depend on the currently missing implementation. However, it introduces new problems by multiplying tables in your database and making your object translated in loading, which can be a problem if you re-save the object and overwrite the original with translated values, or if you need different language versions of the same object on a page (for which you need to make the load function language-aware, which requires the original objects to be language aware as well).

All-in-all I don't think it would be possible to escape assigning language to objects and defining metainformation of the objects to support building translation user interfaces and workflows as well as handle permissions properly. Neither using t(), nor just tagging your queries and then assuming someone else will take care it for you cuts it.

So what can aspiring contributed module developers do to multilingual-enable their objects? I'd like to do a rundown of the current i18n_string module API in the next post, from where I hope we can brainstorm on how to simplify and structure it further, and then form Drupal 8 plans either based on that or some other solution to the above detailed problems / goals.

This was cross-posted on http://hojtsy.hu/blog/2011-may-19/drupals-multilingual-problem-why-t-wro... to get more attention. Commenting is only open here.

Comments

Way to go for the moment?

Posted by Jerome F on May 19, 2011 at 10:41am

Thank you I learned a lot there.
Does that mean that there is no available solution for a multilingual drupal website in the meantime once t() functions are removed?

i18n module

Posted by Gábor Hojtsy on May 19, 2011 at 10:50am

The i18n module has submodules to work around core limitations. There is an i18n_field module for this specific case, which was in limbo due to how core translated some things in error. Now that it is not going to translate them, we can fix the module and make it translate objects properly. Unfortunately not all occasions of the use of field labels, descriptions, etc. will be translatable because the code does not mark them in any way (why would it?). So once again, the i18n module will fill in this gap.

I think this info should be in release notes

Posted by martin jinoch on May 30, 2011 at 1:20pm

I think we should be more careful about breaking existing functionality without telling users about it. There is no info in 7.2 release notes (unless you study each patch code or related issue comments) that labels of default files are no longer translated (even if you had them translated in 7.0). It is good to have an explanation like this article is, but still I'd prefer to know it BEFORE I update my site to Drupal 7.2.

i18n_field module in today's dev version doesn't work for field labels

It's one of my bigest

Posted by gagarine on May 19, 2011 at 12:20pm

It's one of my bigest frustration with D7 and I'm glad you open the subjet. Actually, you can't build a multi-language website properly. As a swiss citizen where all websites are at least in 3 languages and now witch more and more European website than want english and the national language.... it's a deal breaker.

I'm going to follow the conversation with attention.

Excellent note. I have been

Posted by awm on May 19, 2011 at 2:46pm

Excellent note. I have been working on these issues and documenting workarounds and other solutions. There are ways to get thing work, but are they intuitive? I don't think so. Perhaps the problem here is architectural and addressing them would mean reevaluating the current Drupal Architecture. I have had a scenario for a while which I find somewhat hard -not impossible- to implement; it goes as the following:
-Enterprise website that exist in more than several languages.
-Each language has a team of several people to maintain the website: (integrity and consistency issues)
-work is simultaneous and may or may not be in sync ( the workflow problem you mentioned arise here)
-Permissions. each language team is only allowed to edit nodes in that language and view+add translation of other nodes in other languages. this can particularly be a problem with the current menu system.
- Media sharing/not sharing
...And others. This is just an example where drupal + contrib modules do not provide the solution but rather, as Jose says, a flexible platform to build upon. I think it is may be easier to work on a multilingual distribution of drupal rather than waiting for a the new releases and hope it will provide the solution we need.

both

Posted by Gábor Hojtsy on May 19, 2011 at 2:56pm

I think we should work on both a Drupal distribution / better solutions for Drupal 7 and much better core support in Drupal 8. These should cross-validate each other (ie. hopefully closely resemble each other so the migration path is simple, so that our efforts are not scattered, etc). Right now i18n solves some of the 10 (10th is rendering) problems to some degree. I'd like to post about the approaches taken there, and hopefully get some brains on how to improve it in Drupal 7 and eventually make out a systemic solution for Drupal 8.

You are right that to solve this properly, we need systemic changes. However, we can get to better results with i18n too if we give more care to it (that is more people contribute to improve). Last week we have been sprinting in Berlin to improve it and it really showed that when people get together to contribute, we can make huge improvements. Some of those were in the 10 areas discussed here, especially around solving UX problems.

I hope you are working on Drupal 7 sites now so you can enjoy all the improvements. :) Please keep providing feedback. It would be great if you could share your notes and suggestions for improvement (no promises your bug reports will be solved right away :).

source string change

Posted by adub on May 19, 2011 at 11:47pm

re: the source string change problem, I'm not sure that throwing gettext out in favour of java-style string ids is necessarily a clear-cut solution. It's pretty widely used outside Drupal (I think most open source projects favour it) - do we know how other projects approach this?

internal storage vs. external transport

Posted by Gábor Hojtsy on May 20, 2011 at 7:17am

Well, Drupal currently only uses gettext formats for transport, internal storage can be anything custom (and it is to some degree). Translations are used from the database, because we can query that much more efficiently. So as long as we can identify strings properly from the gettext files, there is no problem with using them for transport IMHO. The i18n_string module now uses context (a gettext feature) to identify concrete strings.

Me agrees. Keyword-based

Posted by donquixote on May 20, 2011 at 12:20am

Me agrees.
Keyword-based translation system
(forget the uploaded code in there..)

whenever we save an object, we should know and store what language did we save the object in

The way out would be for "language-neutral" objects, that they can be translated in all available languages. The original object would then be the master and fallback.

The implication is that anything we forget to tag "English" will show up as "English translation missing", even if the fallback is perfectly sufficient. That's the price.
So, we would in fact be very close to a keyword-based system, except that the keyword itself would be quite sufficient as a fallback string.

Ok, I am mixing up two things here, by intention. One is the translation of object values, the other the translation of unique UI strings..

keyword based translation

Posted by Gábor Hojtsy on May 20, 2011 at 7:23am

Well, the object values specifically need identifiers (keywords) to base their translation system on, while the regular user interface translation could also benefit from that. Since your post on that issue, Drupal 7 introduced string contexts, which solved one of the three problems you've mentioned. Not the other two.

For using language neutral as fallback, I think while this might sound like a clever hack, it is not a good long term idea. We should know about the language of the object as it is for various reasons. Be it performance (we don't need to load its translations, if we need the same language), migration or rolling back to a single language website in one language (where we need all objects to use the same language). Having objects in mixed languages on a site is confusing enough. If we don't even know which language they are in each, that can be a big pain down the road IMHO.

language neutral content

Posted by bwinett on May 20, 2011 at 7:30pm

I would love to see language neutral as a fallback. Here is the scenario it would solve (at least in D6, don't know if D7 has the same issue):

Multi-domain configuration.
U.S. site has English as the default language.
France site has French as the default language.
Spain site has Spanish as the default language.

We create a language neutral node on the U.S. site, and we allow that node to be displayed on the French site and the Spanish site. No problem.

Now a French translator decides to translate that node into French. To do this, s/he must first change the node's language to English. Then s/he can creates the French translation. Problem: the node is no longer visible on the Spanish site to someone looking at Spanish language content.

If language-neutral nodes were allowed as a fallback (allowing language-neutral nodes to be translated), nodes such as these would still be visible in these cases. The logic could be: display the node in the visitor's selected language. If it doesn't exist, display the node in the site's default language. If it doesn't exist, display the language neutral version of the node.

views

Posted by Gábor Hojtsy on May 21, 2011 at 4:19pm

This all sounds like something your database queries (Views) would handle for you.

Views?

Posted by bwinett on May 23, 2011 at 5:18pm

I don't understand. This has nothing to do with views. It's just the display of nodes.

Views

Posted by Gábor Hojtsy on May 24, 2011 at 7:37am

Well, you wrote "the node is no longer visible on the Spanish site to someone looking at Spanish language content.". Drupal core itself does not have node visibility or listing limitations per language, so if you managed to build a system where a node disappears from the Spanish language listing if its in a different language (even that a Spanish language listing somehow exists based on the node's language field), you certainly did not use core only. You either used custom code or views. In views, you can set the language criteria with fallback, in your custom code, you should write your queries like that.

more on our use case

Posted by bwinett on May 24, 2011 at 4:09pm

Thanks for this added info. Based on your previous postings, I know that you have thought more about Drupal's multilingual capabilities than I have, so I must not be communicating well.

If I go to admin/settings/language/i18n, I have the option of specifying the content selection mode. I have selected "Mixed current language (if available) or default language (if not) and language neutral." So back to my scenario:

We create a language neutral node on the U.S. site, and we allow that node to be displayed on the French site and the Spanish site. Given the content selection mode we have selected, the node will be displayed on the U.S., French, and Spanish sites, regardless of the visitor's selected language (because it is language neutral).

Next, in order to create a translation of the node, the French translator first changes the node's language to English. Because of this, a visitor to our Spanish site (which has a default language of Spanish) will no longer see the node, if that visitor's selected language is Spanish. Looking at the content selection mode, we see that the reason for this is that the node is not in the current language (Spanish), it is not in the default language (Spanish), and it is not language neutral (our French translator has changed it to English).

Now IF Drupal were to allow our French author to create a French translation of the node WITHOUT having to first change the node from language-neutral to English, this issue would not occur. Our original node could remain language-neutral. And our visitor to the Spanish site, viewing the site in Spanish, would still see the original node.

Maybe I'm not being clear about the visitor "not seeing" the node. I'm talking about search results and clicks on links to the node. Oh, and after testing just now, I see it's important to state that we are using pathauto (if I link to the original node using "/node/", the Spanish visitor can get to the node; but if I link to the original node using its alias, the Spanish visitor sees a "page not found" message).

Is this any clearer?

the idea of language neutral

Posted by Gábor Hojtsy on May 25, 2011 at 7:09am

The idea of language neutral is that conceptually, you cannot say it is in either language. That is just an image without text attached for example. Using language neutral to trick the i18n language selection system to be able to list languages in multiple nodes is just that, a trick, a hack. If you need listing of nodes with multiple languages, please use views, which is perfectly capable of this. Re-implementing more of views in independent modules is definitely not a goal when we have a widely used module like that to do the job.

thanks

Posted by bwinett on May 25, 2011 at 6:37pm

Thank you for your input. Given that I've already stated that my comments have nothing to do with displaying node listings, discussion of views has no relevance.

I disagree with your opinion that using language neutral is a trick or hack. The "language neutral" concept is used for more than just images without text, as evidenced by the fact that i18n allows you to fall back to language neutral if the node isn't available in the current language or default language.

I will have to test the Active Translation module. And if it works, hope it - or something similar - is available in D7 when we move to that version.

evidence

Posted by Gábor Hojtsy on May 27, 2011 at 6:50am

Not sure what it is evidence for. Imagine a multilingual blog, where you also post photos without comments. In the Spanish version of the blog, you'd want to list your language independent posts (your photos without annotation), just like in your French blog. But otherwise Spanish and French blogs would be limited to their respective languages. That is what language independent nodes and the listing options are designed for. If you need to list multiple language posts, i18n module does not have features for that because Views has.

Structural directions

Posted by donquixote on May 20, 2011 at 12:35am

There are many ways to structure a translation system, and we can use more than one solution next to each other.

I think a quite generic problem description is "objects with multilingual values".
These objects can be nodes, entities, field instances, or just plain "table rows".

There are three ways to give these things multilingual fields:
1) Give each row a language, and copy all shared values (and keep them synchronized).
2) Introduce a separate table, for the language-specific values.
3) Re-use a general-purpose associative translation table, with (simplified) columns for "language", "associated_id" and "associated_table" and "value". If necessary, we could have separate tables for different combinations of value type and primary key type.

I think (2) is generally the preferable solution, although (3) can be nice if we want to build a generic translation interface.

Solution (2) can be a lot of work in hook_schema, and has the risk of reinventing the wheel each time we introduce a new language-sensitive object type.
This is why I would suggest an automatic mechanism in hook_schema, that will automatically define the additional tables. Just mark some fields as language-specific, and this mechanic will give you the table. This will not just reduce the amount of typing required, it also gives us consistent tables and column naming schemes.
On the other side, we could have some automation for table joins and value retrieval..

good thinking

Posted by Gábor Hojtsy on May 20, 2011 at 7:48am

Right. I think (unfortunately) depending on project, people either need a big translation table (or even a generic interface to export translatables altogether) or per-object translation user interfaces. We've seen examples of both. The Translation management module for example feeds all translatable sources to one queue to interface with translation services and just generally make sure you have an overview on your site translations specifically. The node translation solutions (both in-core translate.module and in-contrib entity_translation.module) do per-object translation because that is what makes most sense there.

However, choosing either storage mechanism should not stop us from implementing the other UX. So if we have separate language specific value tables, we can still implement hooks to feed generic UIs. If we have a common table, we can still implement helper functions to generate per-object UIs. I think these are cross-implementable.

Now solution (2) is what was tried with DDT in Drupal 7 with most core queries tagged with the 'translatable' tag, so contributed modules can implement the actual schema multiplication and query altering. There are however a couple problem with that approach in that query altering happened in the loading phase, which as discussed above is a problem if the loaders themselves are not language aware (which was the whole point of injecting the language altering from the side). This causes problems for re-saving objects and when we need multiple language versions of an object on a page. The database level translation got stuck on stumbling blocks where translatables were stored in serialized values at some places (like field API): http://drupal.org/node/549698#comment-2244044. This can definitely be solved in Drupal 8, just saying that it requires critical thinking about the original object data storage.

However, we could still use any database storage and just load them later, unlike DDT's original idea. That would not let us use much of the schema and query tagging that was provided since we have processed objects at that point, which might or might not resemble the original schema structure.

Finally, there is also a huge push for Drupal not to rely on relational databases. Whole Drupal websites were built with little use of relational databases (see examiner.com). Now if we make our translation system rely on and support only database level translation, we lock ourselves out of other data storage. The field system itself for example is abstracted out so that any kind of data storage can be used and the translation system there (in Drupal 7 core) will still work fine. Again, that is not the most performant solution, but given directions in the Drupal data storage layers, thinking on that level might be problematic. (Eg. Drupal's default translation system could be implemented using database layer tricks but needs to be pluggable per object or something like that for different storage mechanisms IMHO).

if the loaders themselves are

Posted by donquixote on May 20, 2011 at 12:11pm

if the loaders themselves are not language aware

I think they should not.
The language(s) should be part of the arguments of the "fetchMeSomething" method. As if they were part of the primary key.
Maybe locally we could use a wrapper layer around the $somethingFetcher, that would be a "new SomethingInLanguageFetcher($fetcher, $lang)".

Sorry I cannot parse this

Posted by Gábor Hojtsy on May 20, 2011 at 12:19pm

Sorry I cannot parse this. If the language is part of the primary key for an object, then the loader should in fact be well aware of the language. I think some pseudo-code examples would help here to understand what you mean, because I'm afraid I'm not on the right path to understand you.

Ok, sorry. The idea is we

Posted by donquixote on May 20, 2011 at 12:44pm

Ok, sorry. The idea is we have:

<?php
class SomethingFetcher {
  public function fetchSomething($id, $language) {...}
}
?>

as opposed to

<?php
class SomethingFetcher {
  protected $_language;
  public function __construct($language) {
    $this->_language = $language;
  }
  public function fetchSomething($id) {
    $lang = $this->_lang;
  }
}
?>

Then on top of that there

Posted by donquixote on May 20, 2011 at 12:48pm

Then on top of that there could be a wrapper,

<?php
class SomethingInLanguageFetcher {
  function __construct($fetcher, $language) {..}
  function fetchSomething($id) {
    $this->_fetcher->fetchSomething($id, $this->_language);
  }
}
?>

But in general, components should be provided with the not wrapped fetcher thing.

well, if SomethingFetcher is an object loader...

Posted by Gábor Hojtsy on May 20, 2011 at 12:50pm

Well, if SomethingFetcher is an object loader then we are talking about the same thing.

running in circles

Posted by Gábor Hojtsy on May 20, 2011 at 9:25am

Thought it would be great to link in previous discussions:

A very similar post from me from 5 years ago :) String translation: why using t() for user specified text is evil?
Object translation overview
Object translation option #1: locale system, optimization strategies (which was a precursor to i18n_string's implementation)
Object translation option #3: 'translatable' schema field attribute, parallel tables (which was a precursor to above mentioned DDT efforts)

This is a great summary of

Posted by Jose Reyero on May 28, 2011 at 11:01pm

This is a great summary of most of our problems with string translation, Gábor.

Just to add my two cents, when you say 'Translation is a rendering operation', I'd go one step further to say 'Naming is a rendering operation'.

What I mean is 'naming things' is the issue, which is not yet 'translation'. Translation is when you have a name in a language and you need to display it in a different language.

Which takes us to the question: why are we loading all names/text/labels for things we still don't know whether we are going to display or not? Basically we are moving a ton of data around that we may not need later because most of the stuff we load is going to be printed finally.

In my ideal world we'd have objects loaded, tags, whatever, but without labels or names, just object ids and meaningful data. Then application logic applied to it, then we get to the 'page' data. Say we need for this page 'title label', 'body label', 'article help text' and the 'footer message', and 'term 1 name'. It is at this stage and not before when we should worry about loading our texts, and then it may be a single query:

SELECT text FROM big_strings_table WHERE textid IN ('title label', 'article help text'....) AND language = 'xx'.

Well, maybe two queries, one for current language and one more for the fallback language in case we don't have all the translations, but just in that case.

This would be the performance and memory saving killer feature to have everything not translated but just displayed in the right language. Also for single language sites, we wouldn't be loading and moving data around we are not going to need later.

I like this! One thing

Posted by donquixote on May 28, 2011 at 11:33pm

I like this!
One thing though: We might not have everything in the same table. So, we would need more than one query.
Still, this is definitely a good idea. Ideally, we do this in to drupal_render().

When building the nested array for drupal_render(), we nowadays do something like

<?php
  $element['#text'] = $entity->field_body[0]['rendered'];
?>

In future, we would do this instead (pseudo-logic):

<?php
  $element['#text'] = array(
    'source' => 'entity_field_type_text',
    'args' => array('field_body', $entity->id),
    'process' => array('ucfirst', 'trim'),
  );
?>

drupal_render() would collect all those pieces of text and sort them by the 'source' parameter.
For each 'source' it would obtain a loader, initialized with the current language and the fallback language. The loader can fetch a bunch of these strings at once. This loader might look into the database, or into a cache, or whatever. The query could be a single table lookup, or a SELECT with JOIN, etc. The loader would also take care of the language fallback logic.

This also would mean that one and the same render array can be rendered in different languages.

The idea here is to have most

Posted by Jose Reyero on May 29, 2011 at 1:29pm

The idea here is to have most of the strings in one single table. All the 'entity' needs to provide are 'string ids' for its labels and descriptions.

However, we could have something like current textgroups, which would allow different storages for texts.

listing, sorting, searching

Posted by Gábor Hojtsy on May 29, 2011 at 9:07am

I like your idea for the display of objects we already know, it is very appealing. How do you solve issues with sorting and searching, if you need to sort the given objects by labels in a certain language or searching in a certain language? (Note these are not yet solved either as far as I know, so I don't mean your solution makes them break...).

If this sorting happens for

Posted by donquixote on May 29, 2011 at 12:15pm

If this sorting happens for objects that are displayed anyway, then we can sort on them on the rendering layer.

But, if we have paging, then the "translation is rendering" does no longer apply. Unless we do a lazy fetching of lists.. maybe a good idea? The rendering object would have a list loader object, that is only waiting for a language.

I'd say we don't have to decide this now for all use cases, we simply provide the mechanisms.

Sorting at the display level

Posted by Jose Reyero on May 29, 2011 at 1:38pm

Sorting at the display level would be done by drupal_render(), similar to element's weights.

However, if we want to sort on the database, like for selecting the first 20 terms by label or description, then we would need some joining fields. We can store along each string, it's object type, id, etc. Just like i18n_string does atm (see i18n_string table).

So i18n_string is not really the way to go here, as it stores 'source' and 'translation' on different tables but it could bel some inspiration.

I think we could give a try to the concept creating some 'string' module for D7 that provides some generic key-string storage. it would be a proof of concept for D8 strings and maybe a better storage to be used by i18n_string in the meantime.

Drupal's multilingual problem - why t() is the wrong answer

Go the easy way with t() - wreck your ship

The ideal system

Translation is a rendering operation

Comments

Group organizers

Group categories

Content categories

New groups

Group notifications

Hot content this week