Posted by gábor hojtsy on October 4, 2006 at 8:21pm
As part of our voyage to build up a solid base for Drupal core i18n, it is important to see, how other systems approach this problem. Maybe you have used some other system which has incredible i18n capabilities, or you have heard of one? Tell us the story, or just post a link to this system (as a comment), which we should learn from. I plan on setting up some of those you suggest (those that seem like the most valuable in this field), and write up my experiences, as well as the workflows, capabilities, features we can learn from.

Comments
LiveJournal's i18n/l12n
I have some experience with and knowledge of LiveJournal and the way it does/they do i18n (and l12n), however, I'm currently way tired (having rehearsed twice today with two different groups, plus school, plus transport, plus being somewhat ill still, etc...) and so can't really think of how to describe the system. I might use the subject for an English assignment though, which will make it all fit together in a nice academic unit. I'll get back to you on this. :)
--
Frederik 'Freso' S. Olesen
Gettext
Then there's, of course, gettext. Both how Drupal use it, but it might also be of interest to compare Drupal's usage of it with how 'traditional' (non-web) applications use it, with regards to string clean-up etc.
I also realise that neither this, nor LiveJournal's system, is for (dynamic) content i18n, but studying couldn't do any harm, could it?
--
Frederik 'Freso' S. Olesen
gettext?
You mean gettext is used anywhere for translation of user defined content, ie. news items or articles? In what system?
gettext.
I also can't recall any system having used either of these for (dynamic) content. And it's been so long I can't recall what I had in mind when writing these two comments. Oh well.
--
Frederik 'Freso' S. Olesen
--
Frederik 'Freso' S. Olesen
content i18n
You might want to take a look at Trados and WordFast. These are the most popular tools that translators use.
Trados is the industry standard, $$$.
WordFast is the freelancer's choice of tool, a Word macro package free with translation memories up to 110 Kbytes, and/or 500 TU (translation units).
As far as I understand, there are 3 things that a translator wants from a CAT solution:
As i18n stands now, content translation is probably a way off in the future. But it may worth your time taking a look at how this industry works (where the goalposts are). Let's say, Drupal could take care of access control and workflow, and content translation could be assisted by a Drupal-conversant local solution, like a Firefox plugin...:)
Some people mentioned that
Some people mentioned that gettext would be a lot faster than our solution. I've tried using gettext, but didn't find it faster.
xaraya
.. and the others Jose showed during his presentation at drupalcon.
When I first was looking for a cms I checked out a couple and xaraya and drupal in detail. Xaraya touted i18n in core which was very interesting to me but I really liked the way drupal was built, I found my way through the code instantly.
Then I also found out that the way i18n was built around that time (I think 4.3) allowed me to have translated labels, content but mixed comments which was one of the requirements of the site. They wanted that the content was translated but that everyone could view all comments. The i18n module in those days required patching core and creating tables for each content-type (eg nodes/comments) that you wanted in each language (eg. en_node, nl_node).
Anyway, I also read your post on the drupal devel list, and it's nice that it is being done the school way. First document what needs to be achieved, what you are going to achieve and how you are going to achieve it. Good luck!
Unfortanly I couldn't
Unfortanly I couldn't participate at drupalcon, but has someone mentiened Typo3? It has some pretty ok i18n if I recall it correctly.
Xaraya
As far as I know Xaraya does only localization, not internationalization like the i18n module does.
Midgard
Never used Midgard, but, here there is explained how it works their system.
http://www.midgard-project.org/discussion/developer-forum/midgard-s-mult...
From my point of view there are two possible alternatives for content localization :
provide a system that permits the translation of every field of every table that is necessary
This system has the maximum flexibility. No changes are necessary when a new table / module is added.
But the performance could be very poor.
make a copy of every node for every language necessary. But it could be a problem for new modules
and their multilanguage support.
Gettext : I think that gettext is more adequate for interface localization instead of content.
--
http://www.speedtech.it
--
http://www.speedtech.it
Internationalization of OpenACS / dotLRN
I think OpenACS has incredible i18n capabilities. BTW, dotLRN is the main distro that added many features, but core is the same as OpenACS.
Internationalization has been achieved for the UI, not for content (yet).
Note: The demo site will be renewed daily with a fresh checkout from CVS, AFAIK. It may happen that the server setup is corrupted. Then just wait a day or more. (Btw, it may also be in the process of the new checkout.)
Instructions to find i18n UI of the demo site :
I think it is very sophisticated. The problem lies in the amount of active and good translators, the enormous amount of message keys and the lack of collaboration between translators. For example, translation should avoid using words like "you" in most languages. In Dutch "you" could be translated with "u" or "jij", depending on audience. A collaboration tool for translators would be an important extension.
sorry
My mistake... I didn't see you are not revamping UI i18n, but looking for a multi-language content solution.... That is marvelous, however my post is less relevant in that respect. Somehow it may still be interesting.
Pear/cakephp/ror
There is a lot of internationalization package in pear (http://pear.php.net/packages.php?catpid=28&catname=Internationalization) but this are not content management systems only solution for handling it. There is a solution for cakephp: http://wiki.cakephp.org/tutorials:i18n_v2 with pear:: translaton2 package, and an other without it: http://wiki.cakephp.org/tutorials:i18n, but I think this is not a good way.
And there is an ather solution in Ruby on rails: http://ri18n.berlios.de/ but this is very similar to our t() function.
Plone has very nice i18n
As part of my CMS search which I performed 1yr ago, I've evaluated the i18n functionality.
The best scoring to my oppinion was plone.org
They have it all: interface Localization, ML Content and ML Content Integration (translation).
Very nice is the content translation page where you can see the original and the translation side-by-side on one page.
Let's note that my evaluation was taken strictly from the end-user point of view (no code reviews).
CityDesk
Even though CityDesk isn't a web based CMS it is wortwhile looking at. The model they use makes it very easy to maintain a number of languages of a site.
I've developed a few sites in Swedish, English and Japanese using CityDesk and it is by far the easiest way to maintain multi-lingual sites that I have come across.
http://www.fogcreek.com/CityDesk/
Cheers,
Joakim
Translation Use-Cases, Actors and Feature Requests
Here is a use-case analysis of my expernece from the Hebrew Translation of Drupal. I hope this is the right place to pulish it, so that it will get the most attention.
Needed Translation Actors:
* UI Translator
* End-User (Surfer)
* Content Translator
* Webmaster (Translates the site-speficic definitions)
* Drupal Developer
Needed Translation Use Cases:
==== Webmaster ====
* Webmaster (Translates the site-speficic definitions) must be able to easily translate vocabulary names, taxonomy terms, categories, and profile field names as well as the site-specific properties like default email messages ).
* Webmaster should be able to translate the name of a taxonomy vocabulary.
* Webmaster should be able to translate the name of a CCK field.
* Webmaster should be able to translate the name of a profile field.
* Webmaster should be able to Bulk-translate taxonomy terms in different languages.
* Webmaster (and End-users) must be able to easily see which content was already translated and which content still needs translation.
==== UI Translator ====
* UI Translator must be able to get translations from other translators, review them, approve them, and upload them to the CVS (or to an Online Translation website).
* Modify an existing module translation to upgrade it to a new version.
* UI Translator should be able to easily upgrade a translated module POT file with all the new string fromthe new version of the module. It should be possible to fetch all new strings from a module and merge them with the current translation.
* UT Translator should be able to CHANGE translation of a single word across the site (for all the string which contain the word, the translator must see the ORIGINAL sentence and the translated string TOGETHER, in context, in order to decide about the best new translation in context.
* UI Translator should still be able to do offline translation, then merge it back.
* UI Translator should be able to merge several different translations and view a DIFF between them. Decide which translation to approve and which translation to deny.
* UT translator should be able to easily find the name of a settings field (e.g. site mission) in order to know how to define it.
==== End User ====
* End user should be able to view taxonomy terms in a selected language only.
* End user should be able to view a certain taxonomy hierarchy embedded in the menu.
==== Modules Developer =====
* Modules Developer must be able to use multilingual features automatically, built into the written code.
Missing Features:
* Language direction (ltr, rtl, or ttb) must be embedded in the lang table in core!
* Themes should be able to display an item correctly according to it's direction.
* Templates should merge misc/drupal-rtl.css (after the normal misc/drupal.css). drupal-rtl.css should only contain the missing defintions. Merging should only be done if the current I18n language is RTL. Same about drupal-ttb.css.
* Vocabulary names must be translated. Same about CCK fields, profile files, and other definitions.
Known Bugs:
* There is a bug with the captured translation URL, - it must substract the base path from the recorded path. For example: http://www.example.com/~levavie/admin/setting must be recorded as admin/settings.
known bugs?
Thanks for your opinion and tips. RTL support is clearly important. But could you please submit bug reports in the bug system (or search for the bug if one exists for this issue)? Thanks!
Indymedia Collaborative Translation & Transcription
http://www.indymedia.org/en/index.shtml
Indymedia's affiliated newswires are in many ways the grand daddy of RSS CMS news portals like Drupal. It manages lots of affiliates in different countries. People from different countries can contribute translations. They also often have transcripts of videos and audio submitted by users for those without A/V on their machine. I don't know how it is implemented exactly but community translation and transcription are two things I'd like to see enabled as options.
My use case
I'm trying it
http://johan.vanherreweghe.com
I'll be glad to help.
Bart Van Herreweghe
Views and Translation
We are currently developing a Canadian bilingual site, with all content, including blogs translated. The site is at reducefees.catalystinternet.ca (it is not even beta yet) We have come to use the Views module extensively and, unfortunately, it does not work cleanly with i18n. Here are some comments about i18n and Drupal based on my user experience:
We are contemplating writing a hack in the next few days to deal with the views module (we have a couple of other multilingual sites in our near future) but would wait for 5.0's release if this will be taken care of in core.
Thanks for working on this. Drupal's usability will elevate greatly with integrated internationalization.
help is appreciated
If you have some ideas about how to make Views i18n compatible, then I bet the Views/i18n maintainers will be happy to hear about it.
With your point on views -
With your point on views - if CCK worked with i18n module, you could use viewfield (http://drupal.org/project/viewfield) and then your header, footer, and title would just be additional fields and title on the node, and you have a viewfield on the CCK which will pull in your view.
I'm not sure which approach is better for this specific case...
A generic approach
Hi there,
For a couple of years I'm developing a framework that can be used for php based sites (not only cms) and I'm currently using it with success. As everyone else I also needed to have multilingual support so I came down to a partialy custom i18n solution:
Texts are split to static and dynamic:
The idea of the dynamic translations is that the real text and the 'label' of it are different things (just like gettext does). For example, my name is 'Stefanos' but this is only the English representation of it, so the user database includes a pointer to the translations table that holds the translations of my name, even though I'm entering it using the interface. This means that the users table has an INTEGER column (which points to the translations) to hold the name instead of a VARCHAR.
To accomplish this a modifiable hierarchy is required to hold the data. This can be done using this table (postgresql):
CREATE SEQUENCE seq_i18n_str_hierarchy; CREATE TABLE i18n_str_hierarchy ( id INTEGER PRIMARY KEY DEFAULT NEXTVAL('seq_i18n_str_hierarchy'), parentid INTEGER REFERENCES i18n_str_hierarchy(id) ON DELETE CASCADE ON UPDATE CASCADE, k VARCHAR, v VARCHAR, UNIQUE(parentid, k, v) ) WITHOUT OIDS;This table can hold any hierarchy for every possible module.
sample data are:
This is a hierarchy under node '92': 92->persons->firstname (for example: module1->submodule2->persons->firstname)
So, person with id 4 has the name that is defined by entries with id 153.
Notice that we need a key/value pair so that hierarchies can be relative to userids etc. If you think about it you'll find out that this may hold every sort of information in an well organized tree that can be viewed by a generic translation page and will not allow conflicts between modules
The translations are held in two separate tables:
-- -- Original strings to be translated -- CREATE SEQUENCE seq_i18n_str; CREATE TABLE i18n_str ( id INTEGER PRIMARY KEY REFERENCES i18n_str_hierarchy(id) ON DELETE CASCADE ON UPDATE CASCADE, str TEXT, t_change TIMESTAMP DEFAULT CURRENT_TIMESTAMP ) WITHOUT OIDS; -- -- Translated strings -- CREATE TABLE i18n_str_tr ( id INTEGER REFERENCES i18n_str(id) ON DELETE CASCADE ON UPDATE CASCADE, lang VARCHAR, str_tr TEXT, t_change TIMESTAMP DEFAULT CURRENT_TIMESTAMP PRIMARY KEY(id, lang) ) WITHOUT OIDS;The original string is held in i18n_str:
and the translated strings in i18n_str_tr:
Having the lang as a string instead of a reference to another table was a choise I made because I believe that translated content are data that should not depend on the installation.
After that each module that needs to hold translations does something like this (in a table named test):
firstname INTEGER NOT NULL REFERENCES i18n_str(id) ON DELETE RESTRICT ON UPDATE CASCADE,We need one more thing:
SELECT func_i18n_add_trigger('test', 'firstname');This will add an on delete trigger to the 'test' table so that whenever an entry is removed the translations of it will also be removed.
By providing a set of i18n helper functions, the modules are able to retrieve the translations from the database with minimum coding. An array like this does the job:
<?php
$t=array('tei'=>NULL, 'persons'=>'@id', 'address'=>NULL);
$data=array('id'=>1);
$v=>array('en'=>'asdf', 'el'=>'ασδφ');
// Set translations
$t_hid=i18n_form_hid($t, $data);
$t_id=i18nAddHid($t_hid);
i18nSetTranslationsById($t_id, $v);
// Retrieve translations
$t_hid=i18n_form_hid($t, $data);
$t_id=i18nGetIdByHid($t_hid);
// and then:
$str=i18nGetTranslatedStringById($t_id);
// or:
$str=i18nGetTranslatedStringByIdOnly($t_id, 'en');
?>
i18nGetTranslatedStringById() is a best effort function that returns the translated string in the current (or a specified) language, or a default string (predefined language, string-id, whatever) if the translation doesn't exist. i18nGetTranslatedStringByIdOnly() searches only for a language and returns null if the translation is not found.
Of course the retrieval is a lot easier since the translated string id is held in the table and can be fetched without using the hierarchy any more:
<?php// After we fetch a record from table 'test' in $rec:
$address_str=i18nGetTranslatedStringById($rec['address']);
?>
There are some more things like metadata to identify the translated strings type (one-line text, pages etc) for easier editing but this is too much for a simple post.
I hope it helps...
Oh and one more thing... Personally I'm trying to provide as many translatable strings as possible. I currently enclose in {t}{/t} everything that is translatable and this includes the summary attribute of tables too. I believe that you should do it too since there are people out there that rely on summaries and other information like it and need the translations more than the rest of us...
<>
dynamic strings
We already have a pretty stable static string translation mechanism via PO files, and we already wrap every possible static string in a t() call, so this is not the area we are looking to improve (although if you have better ideas, we are open to them).
Our open question is how should we handle dynamic strings. The main problem is that if a user does not want i18n functionality, he should not have noticably degrading performance, just because we are grabbing the dynamic strings with a JOIN of three tables (at least). Do you have some performance testing of your solution compared to a simply cached varchar value in the original table?
There are two things that I
There are two things that I can say:
I'm sorry that I don't have any benchmarks but I believe that the overhead of one read (or join) is reasonable and low.
As for the static strings, we're doing the same thing and I must confess that I got the idea of the .po files from drupal :-) and of course I liked gettext (and standards)...