As a follow up on Gabor's post, I fully agree with, String translation: why using t() for user specified text is evil, I'd like to introduce some idea about how we could have web configurable translatable texts.
Instead of reworking all modules providing web configurable field names, descriptions, etc... We can handle that strings having an unique id for each string with minimum changes for the modules. How? Please, read on
Instead of storing in the main tables these translatable texts, we can have stored only an unique id, like 'profile_field_title_xxx' or 'flexinode_field_description_xyz'.
These texts will be stored in a table like
string id
language
text
key(string id, language)
Then a function that can very well be 'tt()' to return the string based on string id, language, quite trivial I'd say
And how can we easily rework all Drupal modules having web configurable texts to work with this?
Guess... Forms API
$form['mytranslatablefieldname'] = array( '#type' => 'textfield',
....
'#localizable' => TRUE,
'#string_id' => 'unique_string_id_generated_by_the_module',
'#locale' => language code here, which defaults to current locale
......
);Then, an extended forms api could
a) on form display, fill in the field with the current string for that language
b) on form submit, save the value in our table, replace the value with the string id (the module creating the form can save the string id or nothing depeding on each case)
For a module to display this localizable texts, just
tt(string id)
This unique id should be easy to get, as modules like profile and flexinode, which are the most outstanding examples of these web configurable texts, have a unique id for each configurable field created.
I really think such a mechanism could fix a lot of our current problems about multilingual sites (profile, flexinode...) and eventually, it may be a good replacement for this huge translatable texts we are handling with t() right now. On the other side, it shouldn't be big deal to provide some interface for translators to do their work
Please, comment. I'd like to elaborate on this idea..

Comments
from where do we get that data?
This looks like a nice idea at first, but let us think about a few possible problems.
First, every module author needs to explicitly mark localizable form items as such. What if a user would like to localize some form item, which the author did not intended to (ie. the author was not aware of this feature)?
The bigger problem is the source of the data. How do you populate the form field with properly translated data? The form using module will surely store some value in the database for this field somewhere (how this is done depends on whether you edit a profile field description or site variable, or something). What is stored by that module? If you populate the data with properly translated value for the current locale, will that get stored in the database (resulting in values with mixed locales depending on what locale you edited the value with). If you modify the value using some locale, then the originally stored value will be invalid (ie. your new value in your locale will not be a translation of the primary value stored in the database).
Maybe I don't understand your proposal completely.
Well, maybe my explanation
Well, maybe my explanation was not that good :-)
I'll try to explain with an example: profile fields
The profile module doesn't even need to store this data. It has to be aware only of the unique ids for strings, and add the right fields to the form.
I.e. for the first configurable profile field, id = 1 in the profile_fields table, the string ids may be:
- profile_field_title_1
- profile_field_description_1
So, for display:
tt('profile_field_title_1');tt('profile_field_description_1');
Then, our 'tt' function looks into this table
About explicitly marking localizable form items, that's true. The rule is simple: all web configurable texts should be localizable if no good reason against.
If you want to make transition easier for module authors -which I wouldn't- there's another option. Just use tt() for all text output -when configurable texts-. Then we can have some prefix like '#' to know when we are about to display a localizable string -so it is an string id and should be localized- or a plain text.
The forms api + some extended locale module could take care of all this string replacement, so the module gets the string id saved instead of the string.
I think that as long as we could have some standard function (tt) for all these texts to be run through this extra localization layer, all the functionality could be provided using some form_alter functions. The module can be tricked into saving string ids instead of real strings with some Forms API magic.
function tt($string) {global $locale;
if (strpos($string, '#') === 0) {
// String id, get the right text
// Some fallback mechanism to default language may be added here
return db_result(db_query("SELECT text FROM {our_locale_table} WHERE string_id='%s' AND language='%s', $string, $locale));
} else {
// Plain string. Just return
return $string;
}
}
The module doesn't need to know whether this is a string id or a real string what it's saving. it may have been replaced on form display and form submission by our super-smart-localization-system. Of course everything would be much easier if module authors were marking localizable strings on forms as such -so they wouldn't need to be saving string ids-.
I know this is quite a complex thing and not sure if I can fully explain myself without a complete patch. So just hope I kind of transmit the general idea...
understandable without a patch
I probably see what you mean without a patch. Let's keep profile as an example.
What would happen, if you disable the interface translation feature? Single language strings should be written into the profile tables, so that profile module still works. What if you enable this feature again? How would it write the strings back to the profile table and the other affected tables? What if you never enable this feature? Do we keep the current database structure or should i18nless sites use a string table even if you only have one language on your site?
We have two options 1. Use
We have two options
1. Use it for all the strings, whether the site needs localization or not, so you dont even need to save this data in the profile table, as the system will take care of all this strings.
2. Have mixed strings and string ids. We'd need some marker -it was '#' in the previous code, just as an example- to be added into the text to know when we are using an string id instead of a string. The module would need to save the data, but when the localization is enabled it will be saving string ids instead of strings.
I think the first one is the cleanest approach and will save a lot of db fields, as all the strings will be stored in the same table for all the modules. And then you can just enable localization and have all the strings there waiting to be translated. It will make module development even easier -you dont need to worry about saving strings- but may have with some minor performance drawbacks even for single-language-english sites, a few more queries for some pages.
However, option 2 is also workable, and I hope it will make happy the 'only-english' part of the world -only need to wrap all that texts in some function- while allowing the rest of us at least some options.
Anyway, the only one that can allow playing with enabling/disabling localization without side effects -like maybe losing some strings on the way- is the first one, which is like having localizable strings built in as default.
And I'd like to introduce another option yet :-)
tt($string_id, $mycurrentlocalizabletext);#for a profile field it will look like
tt("profile_field_title_$field->pid", $field->title);
We have here a default value stored in the main table -may be english or default language- while passing on some reference for localization. This will really allow enabling/disabling/whateveryoudo with the localization part.
"minor performance drawbacks"
It is probably not a minor performance drawback to do joins on all possible term, profile field, menu item, etc displays.
While tt() seems to be nice, we should think about massive database result sets, like when you query all your profile fields at once in a query. Then if you wrap them in tt() on display, you need to do a SELECT on every tt() call, which does not seem to be too efficient (eg. in case you have ten profile fields, you will have ten SQL queries for the translated strings, which you could have done in the JOIN, if you would have known in advance what language do you need and from what table you get it). This is more of a problem on node listing pages where you display multiple taxonomy term names on all nodes.
Too many queries? First, if
Too many queries?
First, if you look at the query log of any Drupal page since Drupal 4.7, you'll see like a few dozens related with path system. It doesn't seem to be a problem -though I'd really like a way for disabling this when not using path aliases for small sites...
However, there are ways to avoid this, like having the string id split in (type, number) like ('profile_field_title', 1) and then joining the table as you select the profile fields. That's exactly one query.
We could even use some extended query rewriting mechanism to add some joins in when localization enabled..
Also, I'd only use this for configurable fields -profile, cck- but not really for taxonomy -I like more the multilingual taxonomy idea as currently implemented by i18n module. That would leave us with only a few pages on which this has to be used. Anyway, once the mechanism is in place, it could be used for a few more things, like module text help (?)
So, my question is, do you think such a patch -once polished enough and benchmarked of course- would have any change to get into Drupal core? I'm willing to work on something like that because otherwise, we'll need to address localization for each case -i.e. profile, cck...- separately which at the end will mean a lot more work.
Thus, we could have in place some mixed approach which may be enough to translate the whole site.
- This translatable strings system
- Current localization
- Nodes with language and translations
- Multilingual variables
...
the reach of this approach
What about the reach of this approach? Why wouldn't it be possible to handle variables via this mechanism? Why wouldn't it be possible to handle simple taxonomy translations via this mechanism? (I seriously doubt we will have different taxonomy/menu tree capabilities per language in core in Drupal 6).