Translatable fields implementation details

You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

In the groundwork is being laid for multilingual support in the core fields api.

At a May 21, 2009 IRC meeting to plan translatable fields (see this summary), a consensus emerged that translatable fields should emulate the current Translation module behavior.

With translatable fields we are introducing a second way in which nodes (and other Drupal data objects) can be made translatable, alongside the existing (node-only) one.

The purpose of this wiki page is to begin to map out in detail just how field-based translation will work, taking as our basis the existing node-based translation. For each feature of node-based translation, will there be an equivalent in field-based translation? If so, what will that equivalent look like, and how will it be represented?

Most or all of this will go as follow-up patches to; but it's worth starting to map the work out now.

This page so far has a few preliminary notes and needs a lot of expansion. Please add and edit. What further aspects of the existing node-based translation will we need to emulate in field-based translation? What will we need to consider?

Also discussed (see below) is the question: can/should these two approaches to content translation coexist for a particular content type? or even on a given site?

Field-based Vs. node-based translation

Factor Node-based translation Field-based translation
Data storage
Language information Each node has a node.language value, stored as a language code string. Each field value has a language field, stored as a language code string.
Source translation The first version of a node from which translations were made is the "source translation". This datum is stored in node.tnid. All members of a translation set share a node.tnid value.

While on some sites the source translation isn't an important datum - all translations are given equal weight and translation can occur from any to any other language - in some cases it is important to know what is the original. For example, an original document may be the definitive one.

Do we need a "source translation" concept with field-based translation? If so, where is this stored?

One possibility is the node.language field.

Translation workflow
Translations tab Users with the access to create translations see a "Translations" tab on node view, listing languages and links to edit existing translations or create new ones. Probably we want the same tab and the same links.
Configuring a content type On the workflow fieldset of a content type's configuration form, site admins can select the desired translation behaviour. Probably we want the same options. How we present them, however, will depend on whether the two types of translation can coexist on a site. If not, the options we offer will be the same. If so, we need to introduce a distinction between node-based and field-based translation here.
Creating a new translation Clicking a link to create a new translation brings up the node/add/nodetype form. Language information is passed through url-encoded parameters for the language to translate to and the node to create from. The node to translate from is the node the user is currently viewing--not necessarily the source translation. E.g., if the source translation is in English and the user is viewing a French translation and clicks to create a translation in Spanish, source values will come from the French translation. This detail is important, as a given translator may be able to translate from French to Spanish but not from English to Spanish. When the node add form is presented, hook_nodeapi_prepare_translation() has been called, and the form values are prepopulated by whatever implementing modules decide to put there. In the case of fields that are translatable, they are generally prepopulated with strings in the language of the source node--in the case of our example, French. Clicking on a link to create a new translation brings up the node edit form at a path like, possibly, node/#/edit/es/fr where es is the language being translated to and fr is the language being translated from. What about hook_nodeapi_prepare_translation() ? Do we still need this? Probably we do, but it may need a new argument, indicating which type of translation is enabled.
Marking translations as outdated When updating a node, a user can choose to mark other members of the translation set as outdated. If this option is selected, the node.translate field is set to 1 for all other members of the translation set. Probably we want the same, but how to implement is unclear. Likely this datum will need to be stored at the field value level. Another field value storage field alongside language?
Paths, links, and language switching
Translation links When viewing a node that has translations, links are provided to the existing translations of the node. Probably we want the same.
The language switcher block When viewing a node that has translations, the language switcher block's links are altered such that the links lead to the member of the translation set in the given language. E.g., when viewing a node in English that has a translation in Spanish, the language switcher block's Spanish link points to the Spanish translation. Probably we want the same, though at e.g. node/#/es for the Spanish version of the node.

Are node-based and field-based translation compatible?

Given that we're introducing a second approach to content translation, it will be important to map out the relationship between this and node-based translation. Can the two co-exist on a given site? If so, can they both be enabled for a given content type?

One possible precept would be: typical users shouldn't have to know or care what implementation is enabling multilingual content. They shouldn't have to ask themselves "should I use the field-based or node-based architecture to translate this piece of content?" They should just go ahead and translate, and whatever back end is in place should handle the translation needs, with the result being virtually identical in UI and presentation.

If we accept this precept, the implication would seem to be that, for a given content type at least, it should be or the other. In practice this might be extra options in the translation setting in the workflow fieldset of a content type's configuration.

But is there any reason for these two to co-exist on a site? That's less clear. If both are implemented in translation module, we could introduce a site-wide configuration option, translation method (field-based vs. node-based).


Node attachments are part of the problem

BordenR's picture

One aspect which I think gets overlooked is the problem with node attachments. I built nodes to store PDF documents and another for photos. Using the node's native title and body areas, I could provide a summary for the PDF file or photo and then use templating to create a summary of the attached file and link to it.

That all worked fine and dandy until we had to translate the site into French. Node-based translation necessitates uploading that file a second time in order to get a 'complete' node. This, in turn, creates all sorts of headaches in terms of bloating up the site with redundant files and the headaches with updating one file and then having to go to all of the translations to replace the same file in the other nodes.

Things get worse when you want to use a field from a node in some piece of PHP magic which needs to be translatable. It takes a fair bit of work (at least for me) to track down the node's sister so I can pull the needed information from it.

I think that field-based translation would solve these problems more efficiently.

Field are translatable

plach's picture

Fields are now natively translatable in Drupal 7. See and