Defining a roadmap for i18n in Drupal 6 core

Events happening in the community are now at Drupal community events on www.drupal.org.
gábor hojtsy's picture

Although actual implementation details are still forming around the pieces we are trying to implement for the Drupal 6 core i18n feature set (which will get shipped with Drupal 6 hopefully), we should define some goals which we aim for, so we see the target. Unless we work in the same direction, there will be no useable consistent i18n solution in Drupal core for people to build their sites on. So what is possible to accomplish in the Drupal 6 timeframe, and what will only be possible to implement in contrib modules?

Drupal 6 targets (AFAIS)

I think we should start with the basics: how do we identify the language to use. This means implementation of the URL handling code I discussed earlier and implementation of proper browser language detection.

Once we know what language should we display, we need to have content in that language. Nodes or their content should be possible to get marked as being in a specific language (or being language neutral). Translations should get associated with each other. This (as far as I see) will most probably get implemented with a very present i18n.module like user interface with the language code worked into node module.

After we have nodes in different languages, we should revisit how URL aliases are handled. It is a common request to be able to specify different URL aliases to translations or share them if not specified.

We need to have a (probably) generic dynamic string translation service to use for translating menu items, taxonomy terms, profile field names, etc. I doubt we will be able to deliver a taxonomy or menu module with different hierarchy/term layout support per language in Drupal 6. I believe that we should focus on providing simple interfaces to translate these user defined items, without them being able to defined different structures for different languages.

Variables are an interesting question, since they are not all textual information. The current i18n module says that we should define an PHP array with variables to i18n-ize, and then we can set them in every language independently. I say we can improve on this with introducing a hook_variables(), which would be similar in nature to hook_perm(). It would return a list of variables used by the current module (with some description), so we can provide a list to the user to choose i18n-ized variables from. This hook can also be used to provide the default values of variables which would fix a long standing problem in Drupal: that you can have different default values for your system variables in different places of your code. All this would still only give a better UI on top of the variable handling code, which itself might not be desired by you. Do you have a better idea?

A very big open question is block handling. As far as I see, the current i18n block translation support is hacky at best, and it is definitely not hitting core anytime soon. It is far from user friendly. We need better ideas. Come and tell your view on this!

How the search module will handle multilingual sites is still a good question. Do we need to have different indexes, or should we tag the extracted keywords with the language and use only one index?

What is not going to be part of Drupal 6 (AFAIS)

I don't think you are going to see different taxonomy term and menu hierarchy support for your different languages in Drupal 6.

I also doubt you will see complex translations workflow support for translations in Drupal 6. Workflow/action modules should be reused for this task, and the core modifications should be done with keeping these modules and needs in mind. But core workflow support is not a soon-to-be-seen reality.

Support for external translation tools, like automatic Google Translation helpers when you start your page translation and/or interfacing with other (desktop or web service based) translation tools will hopefully be done in contrib modules, but definitely not in Drupal core.

Shared field support in translated nodes is not a Drupal core question unless the custom content type feature will get extended with field support in Drupal 6. Therefore this might not be solved by Drupal core itself.

How can you help?

Provide constructive ideas, improve on existing ones or offer completely new (better) implementation possibilities. Technical details of the i18n collaboration will be posted soon, so you can follow the exact development process.

Comments

cascading variables

moshe weitzman's picture

the backend of the cascading variables system was specced out a while ago by adrian. as for a front end, i think your hook_variables() proposal is good. I agree that we should not support differing defaults for a single variable depending on where it is used. that feature causes more trouble than good. ideally, this cascading variables system gets implemented early so your project can depend on it. some user preferences are currently stored in user.data (e.g. comment viewing pref) so that might have to get migrated to variables table.

Hi Moshe, Do you know what

Development Seed's picture

Hi Moshe,
Do you know what the current status of the cascading variable DEP is?

nothing interesting

moshe weitzman's picture

adiran posted his ideas and there were some comments but he never made a revised doc nor wrote any code.

you might want to crosspost

bertboerland's picture

here

--

bert boerland

--

bert boerland

done

gábor hojtsy's picture

Done.

Proposal and first objective

Roberto Gerola's picture

Hi.
I agree with you.

I try to write down a proposal of very generic objectives on which we can try to create a plan and a roadmap.
1. Provide the mechanisms to identify (and to switch to) the language to use
2. Provide a language switching block
3. Content translation : provide a system that can work both for a complete node and for individual
strings (like taxonomy terms for example, title and description of images, menu items titles)
4. Management user interface : where the user can work on translations of singles strings
and on nodes translations. I don't mean complex workflows, but only some interfaces where
the user can see all the translated / untranslated nodes or strings and act on them.
5. Provide a set of API that can be used from contributed modules or also from the users
in their directly edited contents (for example to print where they want a switching block)

Is the locale module and interface translation system in discussion ?
If I correctly remember one of you previous post, you said that locale module
would need to be revisited.

As you say, the first objective we have to reach is to have a mechanism to
change / identify the language by URI.
Of course, this system should also work not only for content translation, but also
for user interface translation provided by locale module.

Following your previous post regarding this issue, I try briefly to list them with my comments :

  1. language as part of the hostname (en.example.com, de.example.com, ...) : the language
    code should be configurable : italiano.example.com, english.example.com, and so on

  2. having different hostnames for different languages could be an option (example.co.uk, example.it)
    I think this system will not be widely used because you have to register multiple domains and in a shared hosting environment it cannot work,
    but if we are able to implement it without too much effort, it would be nice to have also this option

  3. language as part of the URI with configurable codes : /it/, /italiano/, /en/, /english/ and so on.
    I think this will be the more complex option to implement because it impacts with the Drupal
    URI management and its internal paths.
    As for option 1 the language code should be configurable.

  4. language as property of the node and detected when the node is called.
    I have this system implemented in Localizer.
    For example :
    node/10 is in english
    node/11 in its corresponding translation in italian

When you call one node, localizer reads the language associated to the node and change the language.
It works in the same manner if you have defined the aliases for the above nodes.
Then language detection in this case is implicit, but it is, I think, very useful, because you can
have URIs of types :

www.example.com/contact
www.example.com/contatto

that point to the corrected localized versions.

Roberto

locale system or not

gábor hojtsy's picture

It is still a question where should we support dynamic string translation. The locale module is an inviting target (and is used by existing modules), but there are also interesting new ideas, like the proof of concept solution from Rob Ellis. It would be nice to be able to export/import dynamic translations, and locale module already has this feature. But it is a requirement to implement this cleanly and extensibly, which is not yet possible with locale module.

Yes, as far as I see, we should start with path handling, while debates are still active around node translation implementation. There were no objections against the path handling proposal.

You have to be aware that

gerhard killesreiter's picture

You have to be aware that currently the locale system stores all data in a gigantic serialized blob that is loaded for each page view. If you add the dynamic strings, then this blob will grow a lot and performance will suffer.

A possible alternative would be to split up the blob into per-page blobs as a recent patch of mine does for the path alias cache: http://drupal.org/node/100301

However, this would probably increase the storage requirements as there would be a lot of duplication of cached strings. Making this an option would maybe be an idea (either low overhead per page load or optimize for low storage requirements).

Another thought: Dynamic

gerhard killesreiter's picture

Another thought: Dynamic strings should be marked as such in the translation table. Since they are site-specific we might want to export PO files that don't contain them.

I am aware

gábor hojtsy's picture

I am aware how this cache is done, since I use locale module extensively. I see a potential of reusing the import/export functionality and maybe even the database storage. It does not mean that we need to cache the dynamic strings in the same blob or in a blob at all...

Storing the dynamic strings

gerhard killesreiter's picture

Storing the dynamic strings in an alternative blob would of course be an option, but I don't see why the performance should be much better than storing it in the same blob. I don't think to not store it at all is an option. I am curious how much if any improvement the menu path approach could bring for locale.module.

I've just done a test and

gerhard killesreiter's picture

I've just done a test and the serialized blob for a single page is about 1% of the total blob. There are probably pages with a higher percentage, but if the deserialization and memory consumption is a bottleneck, then the outlined approach would sure fix it. We could then probably relax the requirement that only very short strings are cached to maybe 200 characters which would include most form field help texts.

Have you considered using

jose reyero's picture

Have you considered using the same page cache for both, the url aliases and the localized strings, and maybe some more things like blocks, etc... ?
This way, we'd need a single db query to retrieve most of the data needed to build the page.

No, I had not considered

gerhard killesreiter's picture

No, I had not considered this, but it seems like a good idea that needs exploring.

About locale module, I think

jose reyero's picture

About locale module, I think it does a nice job translating static short strings, but it's not that good when you use it for big texts.

I.e. help texts for modules would be better all stored in the database, including English version, and using only some kind of text ids in the modules, which will also help translators -and module developers as they won't have to patch modules to fix these texts.

Another option for this gigantic blob may be to store it as module-dependent for what we'd need some kind of context information for the 't' function. This will also help to manage module translations and allow different translations for the same string in different modules -that sometimes would be nice-.

So I think the locale module may be extended with some features and different options to handle different kind of strings.

Only the short strings are

gerhard killesreiter's picture

Only the short strings are stored in the blob, long strings with more than 74 chars are always looked up in the database. The blob is about 130kB on a site of mine, compared to the menu cache for a simple user of about 40kB.

Existing DEP and some comments

jose reyero's picture

I had already posted this DEP, http://drupal.org/node/77266 [Multilanguage support in Drupal core].

For this next release I think we should focus on the language handling/selection part, language in paths, and multilingual variables. From my experience with i18n, these are the most difficult parts to implement as a contributed module without patching the core.

So, for now, instead of having real usable multilingual features implemented as a core module, I'd rather have only some basic support in core, which would allow other modules to build on that.

We need to have a (probably) generic dynamic string translation service to use for translating menu items, taxonomy terms, profile field names, etc. I doubt we will be able to deliver a taxonomy or menu module with different hierarchy/term layout support per language in Drupal 6.

While for profile fields it's mostly a question of string translation, for menus and taxonomy I'd rather see some support for having multiple objects per language, thus they don't really need string translation. Having some language field linked to menu items or taxonomy terms, and then using the query rewriting mechanism will allow other modules to handle multilingual taxonomies and menus.

A very big open question is block handling. As far as I see, the current i18n block translation support is hacky at best, and it is definitely not hitting core anytime soon. It is far from user friendly. We need better ideas. Come and tell your view on this!

Again, language field and query rewriting for block selection would fix the problem.

Variables are an interesting question, since they are not all textual information. The current i18n module says that we should define an PHP array with variables to i18n-ize, and then we can set them in every language independently. I say we can improve on this with introducing a hook_variables()

This hook_variables may be a good thing -which will also help when uninstalling a module, to clean up the variables table, btw-. Another option may be having forms API handling variables marked as 'localizable' by the modules.

simple menu and term translation as first target

gábor hojtsy's picture

While for profile fields it's mostly a question of string translation, for menus and taxonomy I'd rather see some support for having multiple objects per language, thus they don't really need string translation. Having some language field linked to menu items or taxonomy terms, and then using the query rewriting mechanism will allow other modules to handle multilingual taxonomies and menus.

Sure, this is possible to implement as a contrib module for Drupal 6. But many need simple string translation (as reflected by localizer module for example). We need to somehow provide this functionality anyway, not just the (too complex for simple needs) multiple hierarchy support for terms and menus. My goal would be to support this via simple string translation in Drupal 6, which does not mean there should not be a contrib module to provide different hierarchies. We can add support for this kind of extensibility into core if need be.

If you need different menus, different taxonomy, translated interface and different language content, what distinguishes this need from a multisite setup? As far as I see, the only distinguishing factor is that you would like to manage is more easily and possibly have connections between items (nodes, terms, menu items). Apart from that you are building multiple sites simultaneously in the same Drupal instance.

Multisite vs. multilingual

jose reyero's picture

If you need different menus, different taxonomy, translated interface and different language content, what distinguishes this need from a multisite setup? As far as I see, the only distinguishing factor is that you would like to manage is more easily and possibly have connections between items (nodes, terms, menu items). Apart from that you are building multiple sites simultaneously in the same Drupal instance.

That "connections between items" is what really makes the difference from multisite set-up. And I may need them for some sites but not for others.
From having localization plus a few translated items to having everything different for each language -multi site- there's a full gradation of features you may need or not. Also, having menu items or taxonomy terms per language would save us a lot of trouble with string translation, while being really a more flexible solution.

Having translatable menu items is something. But the real thing would be to have per language menu items which may be translation of each other or not.

Realistic targets

gábor hojtsy's picture

Indeed, but as far as I see we need to have realistic targets and I don't think that per language menu items or taxonomy terms are realistic for the Drupal 6 timeframe.

i18n vs. Localizer

funana's picture

Although I'm not a Drupal Developer I have to say that I would really appreciate localization as a core module for Drupal 6.

BUT I also have to say that imho i18n (for 4.7) is really pain in da a**.
Have you guys actually taken a look on the (relatively) new module "localizer" (http://drupal.org/project/localizer) ?
AFAIC it is way better than i18n...

Just my two cents,

have a nice day!

Can Localizer be extended?...

dahacouk's picture

Yes, Localizer does seem to be a very good starting point for internationalisation work. Could it also be a good bouncing board to see where it's limitations are and then figure out what's needed to extend it? Where does it fail what is required by the team?

I, for one, would like to see an option to have the language selectors be drop down menu boxes.

Also, I'd like to see an auto translate option for the content. A robot that could chuck some text into one of the many online translator sites, select the from and to language and then collect the results. OK, it's a bit cheap but hey it's a good start!

Cheers Daniel

Localizer can be extended

Roberto Gerola's picture

Hi.

I, for one, would like to see an option to have the language selectors be drop down menu boxes
Yes, this only a block. It is possible to implement it in also in another module calling
the API of localizer.

Also, I'd like to see an auto translate option for the content.
I think you mean about menus and taxonomies titles and descriptions.
Yes, it is possible to create a similar automatic system also in an external
module, using then the Localizer's API to populate its localizertranslation table.

Roberto

http://www.speedtech.it

Auto translate option...

dahacouk's picture

Also, I'd like to see an auto translate option for the content.

I think you mean about menus and taxonomies titles and descriptions.

Well, actually, what I was talking about was the node body content. So that I would be able to submit a blog post and then it would automatically get translated into French, German, etc. And, like I said, it's a bit dirty - those online translators aren't always the most accurate. But it would be a great way of getting a multilingual site of the ground - people will come along and say "that translation is no good" and I could let them do a better job! ;-)

Cheers Daniel

landover's picture

hi all,

the more i play around with locales, the more i find the translation-based system (with a semi-hardcoded default language - English and a complicated translation system for all other languages) is the core of many problems.
The fact that one language is treated differently makes lots of artificial problems
also, unless i misunderstood how Drupal handles translations, it makes for an important overhead (SQL etc.) for every access to a non-English string: English source string -> lid, then lid -> translation.
the solution would be to have a completely symmetric language system, where English is just one language among others and instead of processing everything in two translation steps from english, the starting point would be a language-independent string id (instead of a hard-coded english string).

Fortunately the sql overhead

gerhard killesreiter's picture

Fortunately the sql overhead is much less than you think because all the translations are cached. Translation IDs in the way you propose them would make the code totally unreadable.

it's not just sql overhead

landover's picture

sql or other translation overhead isn't even the real problem.
the real problem is that there are a couple points where the fact that one language is treated differently and that the english strings aren't stored the same way makes problems.
that principle of having a basically monolanguage design - then translate
is the core of the problem.

"Taxonomy on the fly" with freetagging to Language

macm's picture

Hi

I love i18n and how the time is ending I would like to express my suggestion.

I think the most important feature that should be improvement is "Taxonomy on the fly" with freetagging to language.

Let me explain. This feature is amazing ("Taxonomy on the fly" with freetagging ) but when the user submit a new term this term is available to all languages and I think if the user is Korean this term should be available only to Korea Language, or Russian to Russian....

No make sense allow this new term to all languages.

If is feature is easy to implement by snippet let me know if not think about it.

Cheers

Mario

Multilingual tag handling

StevenK's picture

When i first saw the i18n module, I expected to see multilingual handling of tags, and i thought that this should be handled like synonyms, but with language information.
I think that there should be a table, where:

  • the first column is the reference term in the base language (eg Base)
  • the second column is the same but displays the base language name (eg English)
  • and each other column is for another specific language (Italian, Spanish etc), where you can put there its translation

And then, you just select what will be displayed (Selections: only for the specific language, all tags, etc), like content. If a term is translated, it takes precedence over the base term, and thats all.
Put a new term in any language and automatically create entries in the other languages (and base), waiting for translation.

Regards,
Stefanos

How about having an integration with Google Translate module so as to automatically translate Tags?

Unofficial Drupal Roadmap

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: