Why i18n is so hard

chx's picture

Because the problem is not creating a node (menu, etc) in two languages, that's peanuts. You either add a lang to the primary key of the node or create a (nid, nid, lang) relationship table and you are basically done.

Basically. Because with more complex node types, some things are shared. Dates of events for example are the same with a different formatting. On the other hand, places are not necessarily -- even the country names can be different. So, we either extend the node API with provisions for this (shared/not shared among languages) or every single module needs to think on this. I can't say I love either.

Sharing files have already been raised by Dries.

A normal document is usually created by someone and then it's up for consumption. But a multilangual document has a much more problematic workflow: someone enters it in one language, then a translator translates it and usually an editor checks the translation. If we assume that it's a sync'd site where every document is immediately available in many languages then we are looking at complex node access control on unpublished nodes and good workflow handling.

I end this here, but I am afraid by the time the group finishes the list we have the better half of contrib enlisted as requirement or near-requirement . What gets in core? The current (nothing) is definitely not a good approach. However, we should never forget that many, many, many Drupal sites are single language and we shall not make our APIs more complex, our code more bloated just to support the very small percentage of multilanguage sites. locale is done right -- when it's switched off, it's a very small performance penalty which in Drupal 5 serves security purposes anyways so it's not wasted.


Throw it away?!

z.stolar's picture

I disagree with your suggestion at the end of your post, where you express your fear of drupal's code gets too complex. I also find it wrong to claim that because

many, many, many Drupal sites are single language

and so we shouldn't try to

support the very small percentage of multilanguage sites

Following this logic doesn't lead to development or growth, but to mediocracy, and that is not the drupal way, the way I know it. Should we not allow for easier uploads, a better and more flexible menu system, or yet a better templating system? They are all already good and nice, but improvment can always be achieved.
On the contrary, we should ask ourselves why aren't there many multilingual druapl sites? Maybe because it's not THAT easy to do it?

The web is developing, and the world develops with it. Things that use to be done in an homogenic surrounding, are now becoming more universal and pluralist. Many sites and web applications don't stay within their confined borders of a certain country, or culture, but offer their products, services and contents to a bigger variety of customers and publics. Drupal should support this tendancy, prepare itself for it, and I'd say that it should even promote it. Drupal is a powerful tool for building and supporting communities that are not (only) geographicaly centered, thus raising the need to serve each user with his/her language.
This and more, think of all those countries that are not unilingual. There is more than one. In fact, there are many counries, where a multilingual ayatem is the only right way to do it. Let's not disregard their needs. Why wouldn't Drupal be the lead CMS in multilanguaging, as it is i many other aspects?

I18n IS hard. I agree. But isn't it what this group is all about?

It's hard

Boris Mann's picture

And if we solve it in the same general way we've "solved" the general framework of Drupal....the rewards are HUGE.

nano nodes and lists !!

alkhulaifi's picture

I know this is a little bit weird idea but i think Drupal at this stage need to go to the lower layers to organize it self again.

This idea start after working with taxonomy module to develop a newspaper module. I discovered that I need to create a different table and module for each list I create (dates, issues numbers, writers). I tried as much as I can to use the taxonomy module to store my information but the module only store text fields listings.

as chx mentioned there are shared items between modules and they can not be translated because they are not centralized or managed by the Drupal framework and especially listing of different field types.

after thinking of listing i started to think about the nano node idea to serve the listing. The nano node type can only be one data type like (string, number, date, node refrains , user refrains, list etc). the nano node is build like node but with simpler and lighter structure.

lists are to store the listing of different nano nodes with full API for building, manipulating and calling lists. any list can be of two types (ordered, unordered) also any list can have a parant list or a sub mixed lists of different listing types. this API can be used to store even the form API arrays and the menu arrays.

by implementing this API there will be a centralized location to manage and translate and nano nodes to any language.

basically Drupal support infinite listing by nodes and views but 80% of our programing is finite listing and item sets (drop down lists, trees, menus, tags ....) which do not have any support in drupal.

You misunderstand me

Anonymous's picture

As Boris says it needs to be done right.

There are not just Drupal sites of which multilanguage are rare but web sites in general.

I just wanted to point out that we should be very careful to protect Drupal core values like lean code when implementing all this. I also expected more problems rised. I got none.

what is language?

adixon's picture

This is a great conversation to have, thanks for your posting. I come from Canada, where all federal government sites have to be "bilingual". I think that the interesting thing is that it's not exactly clear what this means in practice, and like many shoulds, the imperatives tend to obfuscate the interesting issues.

I think your third paragraph accurately captures the collective Drupal choke that happens every time this issue reaches a critical state of interest - we "have" to do it, but we actually don't know what "it" is. And since Drupal has so many rigorous thinkers, the proposed lists of features that are supposed to capture this "it" are quickly shot down, and rightly so, because there's a lack of underlying clarity of what it is (and so a fear of where it would all lead to, like 'bloat'). If you think I'm being too abstract, your second paragraph is a great example, and multilingual categories is another one.

So - what is language? For a start, it goes way beyond a feature, and trying to capture the concept of language using a language is doomed to be incomplete. So, we need to be specific about what aspects of language we want to implement, and recognize that there will always be incompleteness in any implementation. We'll never be "done" - at best we'll be successful at implementing various aspects of our use of "language" in different models.

1. The Current Approach to Language

Indeed, the locale module successfully implements one aspect of language - being able to provide the drupal messages in a users' preferred 'locale'. And the current language field in the node table allows the i18n contrib module to do another aspect - being able to identify the 'locale' of the node, and to restrict the node lists to only those matching the chosen locale. But even here, it gets messy - what about looking at the list of nodes of one locale while uisng the drupal interface of a different locale. Does that make sense? i18n chooses to override the users' locale choice if they've made an explicit language choice for the listing, but a quick look at the code logic makes it clear this is geting hackish in design.

And the part of i18n that's now a separate dependent module, called 'translation', adds another desired feature that connects nodes that are considered 'translations' of each other. But this creates more questions - e.g. when looking at a node in one language, should the link to its translated node also change users locale?

You can now see that once we get to questions about whether taxonomies and cities should be language-dependent, all hell definitely breaks loose.

So, let's consider two other approaches:

2. Use case.

Instead of implementing 'language', we could try and support specific use cases. Here are two:

a. official translation - where everything comes in every language, and every page has a link to it's equivalent in every other language.
b. tower of babel - where anyone can contribute in any language, and the language tools for restricting by language are all explicit (i.e., there are no hidden dependencies).

If this is what we want, then I think a. actually should use a different model that involves a multi-site install (one for each language) and some tools for managing translations of every user-supplied content that goes in the database. And then b. is pretty much already done. Unfortunately, all the sites I have worked on fall somewhere in between these two extremes, where translations get done on an adhoc basis, and some kind of "best" display is desired to balance the desire to provide the user with content in their own language, while not providing to exclusive a filter so that they see the untranslated content as well in it's original language. For this model (which I'll bet most multilingual sites that I'm interested in aspire to ...), I propose a different approach:

3. Language as revision.

In this model, translations are revisions of the same node. The reason I like this is because it captures more accurately what a translation actually is, i.e., another way of saying the same thing. When you edit someone else's posting, aren't you just changing the language to your own? In this world, different languages aren't discrete, they're merely concentrations, and the labels we give to our language (english, spanish) are at best suggestions.

This would mean that all user-supplied content that get's stored in the database by a module would either need to be tied to a revision (and inherit it's language), or provide it's own language. It would also mean that any content that did neither would implicitly be language neutral (e.g. latin names of trees, numbers).

The hard work would then be to think about how to implement the various use cases, in terms of both workflow and user display.


Gábor Hojtsy's picture

The revisions idea was raised before. If you think about the current revision implementation, you see it is linear. If one modifies the original language content, a new revision is created. If one modifies a translation, a new revision is created. Images might need to be different (ie. screenshots), which will show up for all revisions of the node then (ie. Czech images in Spanish translations and all others). Permissions are tied to nodes, and not to revisions, but many need permissions to get associated with translated content.

The question is if these problems with the revision approach can be solved with a tree-like revision solution. Can we bend the permission system, attachments, node display code, so that it supports languages? The tree idea is a neat one (even mentioned by Dries as one possible way to look into), in case we have other components aligned to such a revision controlling system.

Tagging revisions

robertDouglass's picture

I'm considering writing a module that would let you tag revisions. This would let you display a node and inform the viewer that there are other revisions available. My original motivation was for handbook documentation on Drupal.org, but I immediately wondered if, should I build such a system, could it be useful for i18n? After all, you could tag revisions as translations and the interface would automatically take care of alerting the viewer of the other viewing options. It isn't a far stretch to think that the other necessary functionality could be built on top of that.

Interesting problem/idea with the tree revisions. I'll have to chew on that a bit more to get the full sense of it.

If Drupal decides to evolve

ludwikg's picture

If Drupal decides to evolve in the direction of CCK (all node data is divided into sub-tables) then I would imagine the translation being stored on the field level. This would solve the problem of some data being the same for all translations (this would be defined as a property of a field in CCK). Also, I believe this wouldn't change the performance much, because the translated content could be stored in additional columns of the field tables. What do you think?