Options for making objects translatable in core

You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Currently in Drupal core we have no general solution for making fields translatable.

There are several emerging options. Here are some. If you know of others, please add them in, and also add more information on those already listed.

Choosing between these approaches is the key challenge as we implement translation of user-defined strings in Drupal 6.

  1. Extend the locale system to support translatable objects
    • Summary: This option would extend the
    • Issue: http://drupal.org/node/141461.
    • Pros
      • Existing UI and storage.
      • Consistency with other string translation.
      • Build-in support for .po import/export.
      • Existing patch.
    • Cons
      • Requires separate handling for each field. E.g., each string is translated separately, out of context, and in the code a separate call is required each time a field is to be translated, inserted, updated, or deleted.
      • Records are fragmented among many rows, making it costly to load translations.
  2. Follow the node/tnid approach for other object types (users, etc.)
    • Summary: Currently the node table has a langauge field and a tnid field for linking nodes as part of a translation set. This approach would do the same for other objects. E.g. taxonomy terms would get language and ttid fields. taxonomy/term/21 might be a French version of taxonomy/term/20 (English).
    • Issue: none.
    • Pros
      • Consistency with nodes.
    • Cons
      • The node approach of having a separate object per language has raised many challenges and difficulties--dealing with multiple paths, handling changes in tnid, determining between tnid and nid as primary object identifier, etc. These would multiply if all our other objects that need translation followed the same approach. Multiple records per user, one per language, each at a different address?
  3. Parallel tables, automated locale handling based on schema
    • Summary: In this approach, fields that are translatable would get a 'translatable' => TRUE attribute. The locale system would scan tables for translatable fields and create parallel tables with all primary keys, translatable fields, and a language field. Automated handling of translation on CRUD operations.
    • Issue: http://drupal.org/node/367603.
  4. Fields API includes built-in support for translation
    • Summary: In this approach, the new Fields API would include built-in support for translation. To make a new or existing field translatable, one would simply convert it to use the fields API.
    • Issue: http://drupal.org/node/367595.
    • Pros
      • Build in support would mean no need for custom handling of each translatable field.
      • Greater incentive for use of the new Fields API.
      • Translations wold be loaded on initial object load, rather than waiting until later and loading each translation as a separate override.
    • Cons
      • Wouldn't handle data not converted to the Fields API.
  5. Don't solve this in core, but introduce a hook to allow contrib to solve.

I read an email saying this sprint should happen over the next two weeks. That's probably a bad idea considering the resources necessary to complete the Drupal.org redesign. --David Strauss on 1 Feb 2009


I like 4

catch's picture

And just updated that issue to say so. Since nodes already have their own home-grown translation methods, the primary initial use for translatable fields would be for users/taxonomy terms etc. These objects have far less modules taking advantage of their APIs - so I think it's a smaller set of functionality which would need converting to fields for this to work. Node translation is likely to continue to use the 'node-per-translation' method for a while - especially on existing multi-lingual sites, so there's much less urgency there in having a total solution.

One thing which occurrred to me just now and didn't make it into the followup, is what about direct properties of these objects themselves? Obvious things would be $user->name (except I can think of very few cases where it'd be viable to translate user names, so we can forget that), but $term->name is a lot trickier. A possible option would be a module like auto_node_title - to replace the title with a field from contrib. Since this has applications other than just on multilingual sites it might well work. Title/name as field I have a feeling will be a much more sticky discussion to have though.

Per language vs translatable

Jose Reyero's picture

For all objects in general I see these two cases.
- Per language: You have different objects / items for each language. The obvious solution here is to add a a language field to whatever (terms, menu items, blocks...) that will allow later selecting the right items depending on language.
- Translatable: You create the objects in the default language (or in any other) and want to have exactly the same objects showing up with some fields translated for each language (name, description)

The key differences:
- Having objects per language allows you to have different set of objects for each language. I.e. a site running different news for each language, or having multiple support forums, one for each language.
- Having translatable objects makes the assumption that the site is the same for all languages, just the (same) stuff needs to show up in different languages properly translated

When it comes to building a site, I see all the time the need for both options:
- For some objects like users, whatever the site is, it doesn't really make sense to have different sets of users for each language though you may want to have some user fields being translated ('hobbies', 'occupation', etc..),
- For content the needs vary greatly from the site on which the content is the same in all languages and is not published till you have it fully translated, to the one which has completely unrelated content in different languages
- Others like menus, menu items and blocks may be easier to handle just setting a language to them. They're just stuff that needs to show up or not depending on language, are navigation aids and usually point to some other object/s that may have a language of its own. (A menu item language may just be the language of the node/page/term/etc it points to)

Then comes taxonomy, which is a whole world on its own, and it may be treated as content or UI navigational elements depending on how you are using it. If you have tagging enabled you possibly want to create different terms that are tied to content in one language only. But when using predefined sets of terms for navigation you may prefer that they're just the same for all languages but translatable (you may want to search filtering by one taxonomy term and get all content for that term in all languages). Moreover, some terms may be translatable while for other's there's no one to one mapping between languages.

The point is there's no solution for all and it varies greatly depending on how you architect a specific site. Thus the solution from Drupal core IMHO should be to support but not to enforce any of the approaches.

In my ideal multilingual world it would go like this:
- You create a (anything: node, term, menu item, block)
- You either:
a) set a language for it
b) Make it language neutral
c) Make it translatable
- Then some flexible navigation system figures out for every page which stuff to show, and on which language (There are some options here too....)

The problem: the overhead of the needed UI would be huge and the administration overhead for the site too (and possibly the performance too). So we need to simplify this a little bit.

I think adding a language field everywhere should be ok in principle, so we can support the first case (objects per language) without hacking the data model later, while localization is not necessarily tied to each object's table and can happen at a later stage (not when loading but when displaying the object)

There are two huge breakthroughs in D7 that will possibly make our life easier to build all these features:
- The new powerful db layer. That will allow to add language conditions in run time for most queries with not performance penalty for non multilingual sites, the logic may go like 'does this table's schema have a language field? are we doing language selection here? if yes, add language field and language condition here.
- The fields API (that hopefully will be committed soon). This one should allow us to make any field translatable, or to build a generic mechanism to pull in one db hit the translations for all the textual fields for a list of objects. And maybe to have generic fields per language.

I think next I'll be bulding some matrix mapping the different types of objects we have and the options to make them multilingual / translatable, maybe I post that one as a different thread.

Just a remark

plach's picture

I think the scenarios below are compatible if we adopt the translatable fields approach.

The key differences:
- Having objects per language allows you to have different set of objects for each language. I.e. a site running different news for each language, or having multiple support forums, one for each language.
- Having translatable objects makes the assumption that the site is the same for all languages, just the (same) stuff needs to show up in different languages properly translated

As I was pointing out in http://drupal.org/node/340355#comment-1220013, having for instance a "creation date" translatable field would allow us to treat an aggregation of translatable/untranslatable fields (a bundle) as a whole translation object (the first scenario).
If we have different news between english and, say, french, we just create one node in english and then another unrelated one in french: both nodes are field bundles with their own language id specified.

Short implementation proposal for #1 (schema-based)

Jose Reyero's picture

Use cases: content type definitions, menu items, ...

1. We mark in the schema which fields of the table are translatable ('localize' => TRUE)
2. We build on this one, using information from schema: http://drupal.org/node/141461
3. As they are limited groups of strings, we can cache all of them

- Generic support for user defined string translation in locale module, which in turn will take us to locale data API http://drupal.org/node/361597

So I think the locale data API is the one to real start with.

Having this few things translatable will make a lot of people happy I hope :-)