I was thinking through dynamic variable translation with Konstantin Kaefer yesterday, and some good ideas come to me after I gone off IRC. This could solve variables, content types, menus, taxonomies, blocks, user profiles, as long as we add dynamic string based translation to these (which will be possible to turn off by type :).
-
We cannot reuse the locale functionality, because it is designed for "static English string to static foreign language string translation" from the ground up: http://groups.drupal.org/node/1827
-
We cannot use a simple t() like function either, as we need special widgets with validation and submit functions for some values (like the site logo or default user picture).
-
We cannot reuse the admin interfaces (menu, taxonomy, content type setup, settings pages) as localizer does, because many sites will not give administrator, menu editor, taxonomy editor, content type admin or site setting admin permissions to translators. BUT we need to reuse some forms from there.
-
Dries would like to see a simple reusable API anyway :)
So, we need to somehow get together what is good in t() and what is missing from t(), as well as provide a specialized UI for translation of these strings, a one-stop-shop for dynamic interface translation for the user.
Let's start with the API, it is easier :)
<?php
// Simple string translation with textfield providing a default value
dt('menu item', 2, 'Our products');
dt('taxonomy term', 16, 'multilanguage');
// Slightly more complex, we need a textarea, not a textfield
dt('variable', 'site_mission', 'Our mission is...', 'textarea');
// Slightly more complex, we need a textarea with some
// documentation about tokens used as placeholders and such.
// Note that instead of just a form item type, we provide a
// full FAPI array
dt(
'variable',
'user_welcome_mail',
'Dear %user...',
array(
'user_welcome_mail' => array(
'#type' => 'textarea',
'#description' => t('Use placeholders like...'),
)
)
);
// Even more complex, we need validation and submission,
// so provide these in the FAPI array.
dt(
'variable',
'site_logo',
'mysitelogo.jpg',
array(
'site_logo' => array(
'#type' => 'file',
...
)
'#validate' => ...,
'#submit' => ...,
)
);
?>Note that we could possibly provide a generic dynamic translation layer for variables, menus, taxonomies, user profile options and blocks with this system.
But how do we make it extensible? Contrib might have more stuff to translate on top of these basic core Drupal objects. Well, the first parameter of dt() is a 'domain' (in the sense of how GNU gettext uses this term), identifying what domain the value is in to translate. Contrib modules can reuse the domains used by core (when adding more variables for example), or use new domains themselves.
But how do we make this user friendly? Well, we should not throw the whole list of translatable stuff at the user at once, so I say we can either provide a filter based on domains or tabs for domains on the translation interface (in core we would have at most 5 domains). Well, these tabs needs to have user friendly titles, so 'variable' is not suitable there. Here we connect to how extensibility can be implemented.
<?php
/**
* Implementation of hook_dt_domains().
*/
function system_dt_domains() {
return array('variable' => t('Configuration'), 'menu item' => t('Menu item'), ...);
}
?>This provides a human readable list of names for domains provided by the actual module.
But what if Jose would like to add taxonomy term and menu item translation with standalone taxonomy terms and menu items, not dynamic string translation? Well, we have our domains handy, so we can easily provide an interface to select what stuff actually gets translated by this API. The user can be provided a selection list, where he or she can check off some of the domains he or she is not interested in translating with dynamic string translation (and translate with some other means, not included in Drupal core).
OK, this way the system is both extensible and restrictable. We can assemble translation forms for dt() values based on the default textfield, or the type of form item requested in the string parameter or the form chunk provided in the FAPI array. By default, most values are simply textfields, some are textareas, and there are a few exceptions, where custom upload widgets and validation is needed. Hopefully the above API is scalable enough to those needs.
Now for the harder part: how do we implement the above API? Well, my API examples above actually mix in two things: providing a domain name, a domain key and a default value AND providing information about the widget which should be presented to edit the value. We need both of these to present a UI to the user to edit the value, but not to grab a previously stored value from the database and use it on the site. It would be a show stopper if developers should include the UI widget information at all places, where the value is used. So we should decouple the UI definition from the dt() using API. This could be done with a new hook:
<?php
/**
* Implementation of hook_dt_forms().
*
* Provides form API chunks for all special dt() values
* defined in this module.
*
* OR alternatively we can reuse form_alter here, but with some creative thinking,
* as we don't need all forms chunks for all values on the same page, we need paging
* to flip through them all (or a next-next-next... kind of form with editing for one value
* on every page load, in which case it gets quite easy to reuse form_alter here!).
*/
function system_dt_forms() {
return array(array(
'variable',
'site_logo',
array(
'site_logo' => array(
'#type' => 'file',
...
)
'#validate' => ...,
'#submit' => ...,
)
));
}
?>This is all nice and good (or prove me wrong if not :), but there are three unresolved issues still:
-
How do we collect the translation base values for dt()? Just as we go along, like t() or variable_get/_set() does? We won't have variable values for translation whose default values are acceptable for the base language... Come out with your good ideas!
-
How do we store the values translated? Interestingly we previously had a good storage strategy for variables but no UI, now we have the UI, but an extended API applicable to more than just variables, so storage is a question again. We can model our tables on locale modules tables: we need a source and a target table, source storing the domain and the key with the default value, while target storing values for the translations. We could even (oh the horror) look into reusing the locale table altogether, as we have most things needed already there (but adding a flag to signify dynamic vs. static strings). That way we could even get PO import and export for the dt() values as sweet topping :) We also need to somehow work out an in-memory cache of the values, especially variables, used on every page (we already have an in-memory cache for variables in our i18n SVN repository, but not for other values).
-
What about database level translation, so we can do SELECTs whose results are directly in the requested language? For that to work, we would need to be able to join the translation table to all possible object tables (which would require these objects to all be in some table, having a unique id, whose type is the same as our translation table id for every object). I am not sure we are about to solve this. If all the above are solved, that is still a huge step. Anyway, if you have ideas in this department, do not stop yourself from shouting up!
Finally I hope that my writeup was clear enough to start some active discussion and get a good solution into our i18n SVN repository and then to Drupal 6 as soon as possible. Come with your criticism and ideas!

Comments
Yes, looks good. Actually, I
Yes, looks good. Actually, I see it quite similar to the Idea Roberto and I were outlining here, http://drupal.org/node/133745
Though it also has some nice new concepts, like the 'domain' idea and the related hooks, so we can have the modules providing some information about their translatable variables, which will be useful to provide some translation interface. And we were missing the replaceable parameters issue.
But we need also a language parameter here, to be able to provide messages localized in more than one language, like for sending emails on cron.
I'd say we better collect variables as we go, like t. Something like this is implemented -roughly- in i18n package, i18nstrings module...
The other option is having the modules declaring everything, which which result in some 'monster' hook I don't like that much.
I think we better do some 'dt' specific storage, while we still can store values in the current tables for a default language. This will mean better performance for the case of running with a single language.
About variables, I think what is currently implemented is better for that case, so variables still be some quickly accessible data with no performance penalty.
We can make it possible, but not mandatory. And this is again about peformance for single language sites so maybe we wouldn't like to be always joining language tables everywhere.
Also, when loading some objects -I.e. taxonomy terms- which may have 2 or even 3 translatable texts this may result in multiple joins or in more complex schemas -object specific translation tables-, which also will make everything more complex.
Again please, see http://drupal.org/node/133745, where we have some more ideas about keeping the function simple, while allowing to pass on more than one variable which will be used as index field to retrieve the text, like 'content_type:story:name' paramaters.
Yes, actually I'd like to :-). This really depends on how much we want to achieve.
So, do we want only 'translatable terms', same for all languages, or do we also want to be able to define completely different terms per language?
The thing is I'd like to have both, they can exist together, but anyway it will depend on how much we can get into next Drupal version, so let's start coding something... :-)
About blocks, I really think the approach of having a language field there is much more flexible than running block texts through this. Also in this case, there's really no point in having 'block translations' but the really useful feature would be having blocks per language. Please, see http://drupal.org/node/135464
So, while we definitely need some of this mechanism implemented, I don't really think this is the solution for everything. I rather have it working, so we can use it for some cases, but some specific solutions would be better for others -like blocks, variables, maybe taxonomy terms, etc...
we are on the same track!
Excuse me, maybe I did not indicate your prior art clearly. I was thinking hard about a user interface for variables, and this came to me, so I figure I'd better hash in some concepts to get it to a suitable state for all of us. (Posting it in the group and not a mailing list post or issue follow up was solely based on visibility, we need more people to review higher level concepts such as this).
I gave more thought to dt() specific storage in recent hours, and figured we'd better see the current locale interface tables here:
Now, what we have already is a storage for source strings with an id and translations for that string. What would we need for dt() storage? Well, storing string ids, a default string (of which the translations are created) and translations for every language. Huh, just as we have already in locale tables! You and Roberto expressed your desires to use cache table key like semi-structured string keys for identification. This just fits nicely into locale_source tables location field. What we would need to do here however is to decouple the source code based and the dynamic translations. Text domains help here. We add a domain column to the locales_source table, which would contain "interface" for currently stored static strings, and would contain "menu", "taxonomy", "block", "content type" and so on for new domains. Reusing the locale storage brings us a lot of good stuff:
- import/export in gettext (yes, we can export the structured key as PO location comments, and get that back into the database with existing tools)
- existing search and review screens are already there, but probably need some small improvement
- users will easily get onto the new translation screens
We should be careful here however and should look into what types of queries do we need on this table. Substring matching is not as speedy as exact string matching in any SQL database, so if we need a lot of substring matching, maybe we should still look into breaking up some fields (but that might still be possible in the locales tables, which should be renamed btw :)
Your comments about SQL level solutions are right on, I don't see good solutions either.
Jose, I see you are about to implement taxonomy, menu and block translation with different object instances, but as far as I see (code freeze is approaching in around a month) it is not feasible for Drupal 6. And at the same time, Roberto and others repeatedly point out that string level translation is many times better suited for translators (with which we can also cut through the permission maze much easier). As Roberto pointed out at one of the two issues you pointed to: "In every case, by the users feedback, it seems that many times you need to translate only the block title and not the body."
Anyway, the domains are there to be able to disable this functionality per domain. So you can decide to implement any of the domains with different implementation (taxonomies, blocks, menus, whatever you want), but a simple solution would be provided by Drupal 6 out of the box. That would fit with Roberto's user experience impressions and would provide a complete solution set which is flexible enough to fit different needs too.
Yes, I think we are
Yes, I think we are basically on the same page.
This sounds like a very good idea to me.
I was talking about using domains and subdomains for parameters, but we can split them before doing the actual query. This will allow also some smart caching for some kind of objects.
function dt($stringid....) {// Where stringid is like domain:field:id, to simplify parameters
list($domain, $field, $id) = split(':', $stringid);
// Then we search for strings using these three keys and we can pre-fetch all the strings for the same domain and field and have them cached.
I agree the main priority at this point is the string translations. So I guess I'd be happy enough just wrapping the blocks query in 'db_rewrite_sql', just to add some options... and avoid dirty hacks in the contrib module I'll add later for that :-)
But anyway, about the blocks, you'll also want full block translation for static text blocks. And for menu items the patch would be quite simple and would allow different items per language, which is one feature I think it's important enough.
A mixed approach
I think, that, perhaps a mixed approach could guarantee more flexibility.
It would be important to have dt used in all core modules : menu, taxonomy, block, and so on.
(For example, at now, translating single menu items with an external module is very complex and not
all features can work. Perhaps the menu structure permits this ... I don't know.
For example expansion mechanism doesn't work with cache support in Drupal 5.
The problem is that we change the link of the menu on the fly.
I think it is another important issue that we should consider.)
Then we should introduce db_rewrite_sql wrapper everywhere it could be useful.
In this manner we should be able to add additional features to these modules in contrib modules.
For example, it could be useful to add not only a language for which a block should be showed, but a list of
languages. The same is valid for taxonomy and menus and menu items, I think.
--
http://www.speedtech.it
develop, get feedback, polish, get into core
Well, there are a lot of things which would be nice, but we need to go through "develop, get feedback, polish, get into core" for all of them, so look into what we can do in a month, and do that nicely. A lot of features rushed into Drupal does not help our cause much. The code freeze is here in five weeks. We should try to implement a lot of extension points, not adding everything into core necessarily. IMHO we should be closer to thinking hard about "what else we need in contrib, and how to support that in core", rather than "what else needs to be put into core".
hook_variable_defaults
Ok, so let's take
anonymouswhich defaults toAnonymousas an example. The user is viewing the English version of the site information settings page and wants to keep that value, so he or she doesn't submit the page. When the form is built, wevariable_get()the default value:variable_get('anonymous', t('Anonymous')). The problem is that inhook_dt_forms()would need to duplicate that definition in some way.This problem could be solved by adding a
hook_variable_defaults():<?php
function system_variable_defaults($localizable = FALSE) {
if ($localizable) {
return array(
'site_name' => 'Drupal',
'anonymous' => 'Anonymous',
'site_logo' => '',
);
}
else {
return array(
'site_mail' => ini_get('sendmail_from'),
'site_frontpage' => 'node',
);
}
}
?>
Now we can query
system_variable_defaults(TRUE)to obtain a list of localizable variables and their default values.variable_getcaches the result and looks up the default values if it doesn't find the appropriate value instead of returning the value from the function call.variable_getfirst looks in the localizable storage, then in the non-localized array to get default values. That allows modules to make previously unlocalizable values localizable by returning them in the$localizable = TRUEbranch. We could even go so far and display a list with available variables to the user and let him or her choose what variables to localize. I don't think this would result in a "moster hook" as José stated because a module usually doesn't have more than 20 variables.This central place for variable defaults would also save us from having different default values in different places (I recall the switch from bluemarine to Garland...).
When we return an unchanged localized variable in
variable_get(), we pipe it throught()before, if we can find a localized value, then we simply return this one.hook_settings()?
Good idea! We actually chalked up a similar concept to the drawing board (actually a big wrap paper :) in Budapest with Jose and Chx earlier this year. In fact default variable values would be nice to get centralized anyway, so this can be submitted as a standalone improvement. We still need the $localizable switch in the standard Drupal version, because those are what are run through t(), the others are not (if we think in how the current variable defaults system works). So all your above suggestions are good improvements for the current Drupal codebase :)
If you have some time, could you work on implementing such a patch? We have a lot of changes in the pipe already from our SVN repo, and mixing this in would be another big change to filter out when submitting a patch to Drupal.
BTW maybe we can use hook_settings() as a name, because all settings will be in there anyway, the hook is good to define a list of settings not only their defaults. Also hook_variable is still taken by themes as far as I remember (so hook_variable_defaults could be confused with being in connection to theme variables).
Missing: Locale Data
If I may pitch-in my two cents, there is one glaring hole in Drupal i18n: no support for Locale data.
Let me explain what I mean by Locale.
The way I understand internationalization, and arguably many would agree, i18n has two sides: Locale data and translations of custom phrases. Locale data is language/geography based, usually standardized by United Nations standards committee, diff. standards bodies in countries, different intl. consortiums and it rarely changes.
Locale contains information like: country names, language names, date formatting, currency names, currency formatting, number formatting, sorting and ordering etc. You CAN NOT and MAY NOT translate those using simple phrase-translation (e.g. t() function through PO files) unless you want to get yourself in a big mess. Locale is rules-based and can not be represented adequately by phrase translation.
I apologize if I was unable to find it but AFAIK there is no support for Locales in Drupal and if that is true - it is a huge problem for any serious localization effort.
Thankfully we are not in the position where we need to reinvent the wheel. As far as API goes - there is a lot done in Java that we could learn from. As far as standards go - God bless unicode.org's Common Locale Data Repository (CLDR - http://www.unicode.org/cldr/data/charts/summary/root.html) - it's a golden mine in its most pure form.
Anybody?
.............................................
http://twitter.com/inadarei
would be great
Better locale functionality in Drupal would be great. What we will hopefully be able to do in Drupal 6 is piggy-backing some existing translation functionality in Drupal for stuff like date formatting (we already done that for language names in previous Drupal versions). Currency related stuff is not relevant for Drupal core itself, but contributed functions in the ecommerce field would greatly benefit from that. Sorting and ordering are entirely different, decidedly implemented on the database layer, so unless that layer supports correct sorting per locale, I am not sure that we can fulfill that need properly (MySQL's Hungarian collation support for example is far from correct).
We are already having lots of stuff on our hands to do for Drupal 6, so unless someone comes around to implement CLDR, I don't think we can commit ourselfs to it.
Separation?
Maybe I did not make myself clear enough...
Of course implementing the entire CLDR is truly mammoth task. However, what really worries me is the lack of clear separation of Locale from String Translation in Drupal. We do need an existance of a simple, flexible API that supports Locale in all its shapes and forms.
Once we have that, I am sure many people will sign up to write implementations of the Locale for different locales and the work load will distribute in a truly communal manner.
If we do not agree for the need of a separate Locale API, then we are stuck. If we do but it's just a matter of time and effort - well, I raise my hand and hope that core team will have enough time to at least bump ideas off of you, guys :)
cheers
.............................................
http://twitter.com/inadarei
draft?
Well, unless you show how a locale API would be different from the current basic string translation, there is not much to agree on :) Maybe you have some draft ideas to show? I mean, like: "this is bad to implement with t()" vs. "this is how I would do it with a locale API".
I would suggest posting that as a new Translations group posting, as this topic did not deal with locales, until you got that topic going.
sure
:-) will do. thanks
.............................................
http://twitter.com/inadarei
How about an Interface layer?
I have translated a quite large software from english to another language, and have seen how the program was organised from within:
It had TWO text fields for its basic language:
That way, you point the program at the right basic language, and the program finds what needs to be displayed (in its originall language) and displayes the translated one.
This means that if you want a multilingual site, you design it in a base language, and then duplicate everything and this becomes the first language. Then, you just add translations which point to the base language, NOT the first. And so it goes on. You first make a base reference system (be it in english, as the core language of drupal) and then make a translation system available that points to this reference.
If you do not have a base reference for an eg a category, then you must have a workflow to create first the reference, and then everything else. This is the best practice available.
StevenK
BTW, the software was Compiere ERP, the main developer had multilingual experience while working in SAP as in charge of localization, and its translation was based on XML files (that I didn't like). Here 's an example, from base language to english (the standard language - the displayed text is exactly the same as the base reference):
<?xml version="1.0" encoding="UTF-8" ?>- <!-- Compiere(tm) Version 2.5.0/2003-06-16 - Smart ERP & CRM - Copyright (c) 1999-2003 Jorg Janke - 2.5.0 20030704-0200 - ComPiere,Inc. USA
-->
- <compiereTrl language="en_US" table="AD_Process_Para">
- <row id="227" trl="Y">
<value column="Name" original="Delete existing Accounting Entries">Delete existing Accounting Entries</value>
<value column="Description" original="The selected accounting entries will be deleted! DANGEROUS !!!">The selected accounting entries will be deleted! DANGEROUS !!!</value>
<value column="Help" original="" />
</row>
- <row id="233" trl="Y">
<value column="Name" original="Pricelist Version">Pricelist Version</value>
<value column="Description" original="Only used if Price List is used to set future cost price">Only used if Price List is used to set future cost price</value>
<value column="Help" original="" />
</row>
- <row id="302" trl="Y">
<value column="Name" original="Target Payment Rule">Target Payment Rule</value>
<value column="Description" original="How you pay the invoice">How you pay the invoice</value>
<value column="Help" original="The Payment Rule indicates the method of invoice payment.">The Payment Rule indicates the method of invoice payment.</value>
</row>
</compiereTrl>
thanks for explaining
Thanks for explaining what we do with an eagle eye view. :) Using simple text fields is much tempting, and might be our first implementation in fact, but it is unfortunately not enough. Many times you are not only translating simple text, but you need placeholders and/or special form widgets.
Getting translation from a dynamic database node
Hi Guys,
Pardon me, If I ask the wrong question here. Basically, I m looking some ideas similar to above to tackle dynamic translation of data coming from database tables. e.g.
I have tables like Company Sectors, Company Types ('Commom', 'Public','Private'...etc.) in a MySQL.
So, in case if I were to display company types, if lang=english is set, it sould return;
Common
Public
Private
but, if lang=french, it should display corresponding eng->french conversion
e.g.
Fr-Common...
...
...
My question is how to do the above. Any ideas on what should be the MySQL structure to hold Translations values etc. would be much appreciate.
I've been searching for sample codes, etc on the web for the same,. hope u can help here. Any guidence is much appreciated.
Many Thanks...