Move interface strings entirely out of Drupal code files?

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
Gábor Hojtsy's picture

This was suggested several times: one, two (see dropcube's comment, not hotlinkable), and most importantly a whole discussion thread started by Jose Reyero at http://groups.drupal.org/node/151169#comment-507464.

The basic problem here is that strings are stored in the source code of Drupal projects, so (a) they all need to be written in English (b) when they are changed, all translations for those specific strings become obsolete (c) when you need to override them for English sites, you need to act as if you are translating the site.

The suggestion is to replace the strings themselves with keys, like

<?php
string
('mymodule_hello_world')
?>
(Jose's example) or
<?php
some_function
('user.login-form.username.label');
?>
(droplet's example). This is being done by lots of systems in the industry. If you just look at how Java property files work or .NET resource files (which originate from decades old Windows programming standards). Here is an example on Eclipse plugin internationalization that explains "string externalization" and how the property files fit there: http://www.eclipse.org/articles/Article-Internationalization/how2I18n.html

There are various advantages to "externalizing strings":

(a) the module can originally be written in another language and can be later translated to English
(b) small changes like typo fixes will not invalidate translations
(c) no code patches required to change strings (or maybe yes, see below)
(d) code size will go down in some cases

However, there are a sizable set of disadvantages:

(a) you need to make up the string IDs for your module, how do you do that for a hook_help() which can have tens of strings? or a form definition or hook_form_alter() which can again have tens of strings?

(b) it reduces module readability because you cannot look up the strings in the module, you need a two step process to find where a string is used; it also disconnects placeholders from strings, eg.

<?php
string
('user.login-form.username.link', array('@user' => $account->name))
?>
disconnects the placeholders from their use a lot

(c) it slows down all language sites (while English worked best performance before), because you need a string lookup for English too

(d) We'd also need to figure out how does this play with database based editability of translations. Jose suggests module authors would get their initial strings imported in the db on install time, then they'd need to write update functions for string changes. Sounds like more complex to change things.

(e) if the translations are tied to modules, we loose the sharing feature of Drupal which worked fantastic in some cases, but did prove to have issues in others (for which context could already work)

To be honest, I'm not sure either way. We have a system which has its disadvantages, and we know them well. Also, the same system is used by other CMS like eZPulish and Wordpess (not necessarily the exact same file formats and processes, but the original English key based translations). On the other hand, string externalization can solve some of the problems of foreign language module developers and translators. Maybe we can somehow get the best of both worlds? Discuss!

Comments

Jose's long text proposal

Gábor Hojtsy's picture

Forgot to mention that Jose already has a proposal for moving longer texts to individual files: http://drupal.org/node/365934 - that does solve some of the big text chunk related issues but tens of strings can still easily appear in a hook_menu or hook_form_alter or a form function.

"(a) the module can

aspilicious's picture

"(a) the module can originally be written in another language and can be later translated to English"

I fear we will have an invasion of chinese, arabian, ... modules, unreadable for (most of) us so it hard for the reviewers of the "project application queue" to get a grip on these projects.

Invasion?

cleaver's picture

I don't think we need xenophobic architecture decisions.

There's this guy with over 10,000 commits to core who happens to not be a native English speaker.

Xenophobia aside, it's a fair

IceCreamYou's picture

Xenophobia aside, it's a fair point that it makes modules harder to review. That's not necessarily a compelling argument against this proposal, but we do have a lot of problems with the review process already (and it's not just for project maintainership -- organizations have to evaluate modules as well. In many cases it may be easier and cheaper to duplicate a module than find someone who can translate an existing one).

I'd hope to grow Drupal to be

cleaver's picture

I'd hope to grow Drupal to be more international. More foreign-language modules will hopefully lead more foreign-language reviewers. Keeping barriers in place certainly won't lead to more reviewers.

Chinese users, Arabic users (and other RTL languages) will have different needs. Some modules might not even be useful in English.

This is a bit of a side discussion, but the original proposal does help remove some barriers and implied bias in the code. Personally, the logic of having a separate key is more important--normalization shouldn't be restricted to databases. But all my modules have been English only.

xenophobic?

chx's picture

aspilicious made a sound argument. Don't try to read more than what he said. You can replace 'invasion' with 'influx' etc. He is merely concerned that we barely have enough reviewers already and if modules begin to pour in -- cue in the number of non-English speakers vs English speakers -- then what? Can't see that xenophobic. Whether this is a serious problem enough to affect a core decision I am not sure but aspilicious does not deserve the xenophobic label. We are debating social and technical issues here but never call others names. You are warned.

Thnx for your support chx. It

aspilicious's picture

Thnx for your support chx. It wasn't ment to be so hard :), as I'm not a native english speaking guy (yes I speak the same language as the guy with 10 000 commits) I sometimes use the wrong words to express my feelings. Don't take it personal lol. I was just concerned because I reviewed 3 projects in the queue that had chinese comments everywhere and it made the job very hard for me.

But stop responding to my single line response and move on to the more important stuff in the bottom of the conversation ;)

aspilicious does not deserve

cleaver's picture

aspilicious does not deserve the xenophobic label. I was addressing the statement not the person. I thought I was careful in how I was phrasing my comment--sorry if it wasn't clear.

Drupal is international, but tends to be more European and North American. I hope to see more global involvement.

English is required to be the original source

sun's picture

There's not really much to discuss here and I'm not sure why Gábor brought up the point as something debatable. I don't think it is.

  1. All code, documentation, and strings in Drupal is in English.
  2. If you work on and with Drupal, you have to understand English.
  3. If you contribute back your code, other international developers and collaborators have to be able to understand it.
  4. If you happen to suddenly contribute back code that wasn't intended to be contributed back originally, and it contains non-English stuff for some reason, then it needs to be rewritten/redone in English, complicating the process of contributing back, leading to less contributions in the end.

The only place where one may use non-English strings is in customization modules for a specific site and hacks. In those cases, you either don't use t() at all in the first place (because your site is monolingual), or you'll use it the regular way.

I don't think it makes sense to introduce exceptions and special support for non-English source strings.

That is, as far as strings in code are concerned. The situation is entirely different with user input strings (on fields and configuration).

Daniel F. Kudwien
netzstrategen

The only place where one may

aries's picture

The only place where one may use non-English strings is in customization modules for a specific site and hacks. In those cases, you either don't use t() at all in the first place (because your site is monolingual), or you'll use it the regular way.

The problem with this is that many sites are multilingual but the developer doesn't know the terminolgy of the site in English. It would be less "risky" for developers (or eliminate the English interface translation from the development prerequisites) to use id-based translations.

What happens if the site is monolingual in the beginning, but after a while as a new demand it has to be translated? All of the strings has to be replaced with the English terms, then translate it again as a .po etc… This step is much easier with "externalization".

Aries

I must say I like Drupal's

meba's picture

I must say I like Drupal's approach to english strings. When I worked on (my own) system that used keys, I suffered royally and I remember that many times I just told myself "gosh, f*ck it" and hardcoded the strings in. So I think one of the disadvantages might be users that will skip this whole step. Right now, they mostly just embed the strings in t()

Yes! Yes! Yes!

cleaver's picture

As I mentioned elsewhere t() is fundamentally flawed. There should be a separate key from the value. I think the Drupal community can benefit from looking at how this is handled in other systems.

The performance aspect does require some testing to see how this would impact things. We could test on typical site configuration with some of the more common modules (testing on a pure core site might be misleading).

Personally, I like the Java-like strings as I find them very readable and easier to type (underscore is awkward).

As for the readability of code, this hasn't been a huge issue in other languages.

Contrib developers should attempt to use core strings (eg. 'user.login-form.username.label', or whatever we use) whenever possible.

on UX implications

Gábor Hojtsy's picture

tuag (the user advocate guy) has a whole set of articles on how we could enable people to improve UX in projects by doing this, notably http://www.tuag.ca/articles/core/section1/string-practice and http://www.tuag.ca/articles/core/section1/string-manager. I've asked for a summary/comments here, so we'll hopefully get more from there, but in the meantime, these are very good reads. Thanks James Walker for the pointer (https://twitter.com/#!/walkah/status/78501799778193409).

User Advocate's picture

Nice to see this post. I am greatly in favour of external strings for a number of reasons.

My perspective as a UX designer is that usability is greatly affected by the on screen 'User Narrative' - i.e. speaking precisely to targeted users groups. This is a concept I describe in detail in a series of articles starting around here:
http://www.tuag.ca/articles/core/section1/user-narratives-intro

My perspective as a developer is that the t() function does not allow control over strings in a systematic and precise enough manner that is required to even think in terms of User Narratives. Embedding strings in code leads to ambiguous interfaces and misses the potential for defining effective Narratives.

Translation from one language to another is still too broad for my UX design needs. Defining User Narratives requires string management at the level of idioms. To that end, I've written a module that defines a string manager class that allows me do this through the use of a key method and external string resource files. I haven't got this module up to the level of contributing publicly just yet but I have had great success with it in several projects (Drupal and non-Drupal).

I've just recently posted an article that describes my String Manager in some more detail. It also touches on some of the magic of building keys algorithmically that can factor in context. I've used this technique to define and display context aware help strings. This article was written for both technical and non-technical readers so it requires a bit of patience from both sides :) You can find it here:
http://www.tuag.ca/articles/core/section1/string-manager

Cheers

Michael Keara

Michael Keara
User Interface Systems Architect,
The User Advocate Group

The performance issue is

mikey_p's picture

The performance issue is reason to be concerned. The reason that this works well for large Java or .NET or desktop apps is because they aren't PHP ;) I've seen similar approaches to many Rails projects as well, but the difference with all of the above examples and Rails as well, is that they don't bootstrap from scratch for every request. Once the strings are parsed and a stored statically in a data structure, they don't need to be re-parsed and re-processed for the next request.

I am sure we could find a way to use caching (and in turn memcache) to make string lookups fast, but this will definitely impact smaller sites on shared hosting.

Why not both?

IceCreamYou's picture

Is there any particular reason that we would have to have one system or the other? It seems to me that for things like form labels, t('Username') works perfectly well. For text that needs to be more descriptive, it could use the external system. That also helps alleviate some of the pain induced by removing strings from the context they are supposed to explain. t() could even be changed to use string() internally.

I am concerned that this

pwolanin's picture

I am concerned that this would greatly raise the barrier to entry (both for reading and writing code), and also, as Jakub suggests, lead people to ignore the internationalization system all together when writing modules.

Is there a way to combine keys with default English strings in code? In other words, something closer to variable_get(). Is there a way to provide a tool (coder?) that would auto-generate appropriate keys for you once you have a 1st pass of the module?

For example:

<?php
t
( 'Enter your @s username.', array('@s' => variable_get('site_name', 'Drupal')));
?>

has a NULL 3rd parameter, so an attempt to translate it might only use the hash of the string as the key, but if you write this:

<?php
t
( 'Enter your @s username.', array('@s' => variable_get('site_name', 'Drupal')), 'user.login-form.username.label');
?>

Then you have a meaningful key that allows translations to be insensitive to changes in the English text. If the site is running English-only, then the speed would be the same as now.

interesting combination

Gábor Hojtsy's picture

Interesting combination, however it does not solve all the problems. Ie. to solve the string change invalidates translation issue, we'd need update functions to change the source string with module updates for this to work and/or we need to require keys to be specified on all t() calls and store each translation related to that key instead of the English text. Similarly for it to be useful for cases like explained by "User advocate" in his excellent blog posts, we'd need to make this key mandatory on all strings.

I sense there is a serious educational component involved here. Ie. how do we make it obvious for people how to make up those keys? This is a problem anyhow, regardless if we replace strings with keys or just add on keys to strings.

It does mostly solve the

pwolanin's picture

It does mostly solve the string-change-invalidates-translation issue since the translation would go off the key, not the string. So in core and well-maintained contrib modules that defined a sensible key for each string, there is no issue.

In the case where it's a contrib module that didn't define keys (NULL 3rd param), it would still be a problem, but you could work around it in the worst case by manually putting the original hash into the codebase as the key (3rd arg). Yes, you'd have an ugly non-semantic key, but at least existing translations would remain valid.

I like the pattern I'm suggesting also because it has an obvious code upgrade path - your existing D7 code would work fine in D8, and then you run the automated tool on it to build the keys, maybe based on module name + function name (or class + method) + some index or perhaps the form key.

Having both the English text and the key in the source means it's also very easy to automatically extract from the source an up-to-date mapping of all keys and their untranslated text.

Using strictly keys is a non-starter for me - I'd not support it for core.

Plus 1 for optional key in t()

chaps2's picture

I second this approach - i.e. t() taking a key parameter.

Working in Java for many years I at first deplored the lack of resource keys in Drupal but soon grew to love the simplicity and directness of the t() function. Having the option to specify a key in addition to default text is surely the most pragmatic solution. Quick, easy, less error-prone (mismatched/misnamed keys, missing key values etc), backwards compatible, self-documenting, supporting and encouraging best-practice but not mandating it. What's not to like? You could allow for idiomatic strings via a fallback key discovery mechanism, but don't pin everything on something that won't be used 99% of the time.

As for name-spacing, I don't see why this should be any more of a problem than ensuring unique function names.

A better option IMO

jpstrikesback's picture

I really like this approach - it seems workable for I18n and also cause it leaves English as the base for core & contrib. I think it's important to the community to have a lowest common denominator language to communicate in, even if it's a barrier to entry. At the size we're at now I don't think it's xenophopic to consider that this could lead to fragmentation, duplicate efforts, etc...maybe the benefits outweigh the risk but I'm not sure...and I'm biased. Food for thought...
JP

Strong +1 on the key + text

Damien Tournoud's picture

Strong +1 on the key + text idea.

Let's also make the key mandatory and use it as a hierarchical context.

Damien Tournoud

+1 for key+string

valthebald's picture

Nice suggestion!

Good, but with proper namespacing

cosmicdreams's picture

I think the syntax for the key should be like what we have for Drupal.settings.

Perhaps like: Drupal.user.login-form.username.label. That way over time, these keys will never need to be druplicated and can be module specific.

Software Engineer @ The Nerdery

vote for II->b such system

podarok's picture

vote for II->b
such system make the strings mostly short (id is much shorter than full text) and give us possibility for changing original strings without code change


Andriy Podanenko
web: http://druler.com

I dunno, using ids instead of

bojanz's picture

I dunno, using ids instead of full strings always seemed perfectly logical to me before coming to Drupal. Saw it in plenty of custom PHP projects.
Plus, it's nice for someone to be able to see a page with ids, and be able to edit the strings mapped to those ids, practically recreating the UI without having to touch the code.

I don't think it would open a problem with non-english modules. Nobody's stoping people from writing comments and naming functions in other languages right now, and yet people don't do that, because they follow the general convention.

Of course, there are real technical problems (performance?), but the basic idea is sound to me.

I don't think it would make

cleaver's picture

I don't think it would make it harder. Most software uses something like the proposed scheme. If someone is throwing together a one-off unilingual module today, they may omit the t() function.

Breaking the rules (normalization in this case) for the sake of convenience makes it harder in the long run. I think we're seeing the consequences.

I have to strongly disagree.

pwolanin's picture

I have to strongly disagree. Writing a generically useful contrib module is already hard. Having to cross-reference every damn UI string in a separate file would be a huge impediment to me.

Same here

Michelle's picture

I don't know much about this issue, having only looked at this thread, but I have to say that it sounds like a DX nightmare. I put a lot of comments in my code because it's easier for me to read English than code and having numbers instead of text in there would totally disrupt that. Unless, of course, I added a comment to each one that gave what the string was, but then that would need to be kept in synch as well.

Michelle

It wouldn't be numbers

drunken monkey's picture

As I understand it, it wouldn't be numbers, but any key of your liking, which could be as descriptive as you want. It still makes things harder in my opinion, though, if you a) have to come up with descriptive keys and b) don't immediately see the exact text a user will see, especially in more complicated situations. When there are several lines of text describing some functionality, it would (I think) become much easier to forget updating the description when you update the code (which already happens too often).

The big problem I see with the original suggestion, however, is the disruption of the workflow. Instead of coding a user interface just like other things, you'd have to make up a key, then switch to some other file and map that key to its text, for every piece of frontend text you encounter. This sounds to me like unnecessarily increasing the work load to create a module.

It of course appears a bit biased when native English speakers declare using English as the base language is the best. However, as a non-native speaker I can assure you that I'd rather code a module in English and afterwards translate it to German with poedit, than use that construct with keys and a properties file. (However, I can see this being more difficult for people who have troubles with English.)
I can imagine this works a bit better in Eclipse, where you have an automated tool to later make strings translatable with that method. Originally and manually using these keys doesn't seem to me justified by the advantages. And even if we whipped up a similar tool for Drupal, the problem with code readability and UI updates later would still remain.

Adding a third parameter to t() calls would also mean some extra work (inventing new keys for all strings) and also would make re-using translated strings harder (you would have to specifically look up the key that is used by the original module – also, would this mean you could only re-use strings by enabled modules, i.e., ones you depend on or the mandatory core ones?). However, as I see it, it would have almost the same benefits as the other variant, with much less effort. So I think I'd support that method.

There might be ways to lessen

bojanz's picture

There might be ways to lessen the DX impact.
Create a module that lists all ids on the page and allows you to enter the text. When you're done, it spits out a done language file.

So the workflow would be:
1) Code a module. Place the ids. For example commerce_product.title.
2) Open the form you just coded, enter the text. Get the text file.
3) Save the text file.
4) Commit.

This is essentially what l10n_client already does, just extended for the new use case.

How does that lessen the

pwolanin's picture

How does that lessen the impact? I still have to cross-reference the code and it would break the workflow of initial coding and testing as you go to e.g. build up a form with many elements.

Really, a key-only solution is unacceptable.

In my experience, it doesn't

cleaver's picture

In my experience, it doesn't make anything harder and it fixes the problem with t() breaking when strings change and a host of other problems.

If as a Drupal developer you can handle hooks and nested arrays, then "get_string('user.login-form.username.label')" won't cause a problem. Most systems I've seen use something exactly like this.

If you're writing a one-off unilingual module, then simply hard coding a simple English string is perfectly acceptable.

Developer tool support

hansfn's picture

As a developer I dislike using keys in stead of normal texts. If we switch to using keys, we need good tool support in our IDEs:

a) Easy search for existing keys when you need to insert some text.

b) Hovering a key should display the normal text.

And so on. Just like we have for functions and variables (in our APIs) ... I would prefer if we could solve this problem without using (only) keys in the code. I'm all for good UX, but it should be easy to read/develop code too.

Interesting discussion. I

KarenS's picture

Interesting discussion. I would add another advantage and another disadvantage:

Advantage:
- English-speaking developers would be able to immediately see when they have created something that does not translate well, possibly resulting in more robust modules that work better in all languages. They are somewhat protected now from things that break when run through the translation system if they only work on English language sites.

Disadvantage:
- I fear a gigantic mess of name collisions/confusion when developers don't realize they have created a key that some other module has used for some other purpose.

Of the other advantages and disadvantages mentioned above, my biggest concerns are readability of the code and performance.

a possible solution to

Hadi Farnoud's picture

a possible solution to disadvantage you mentioned above would be a unique module identifier that acts as a prefix to string. something unique like d.o module name used in URL. for example, myworld module (http://drupal.org/project/myworld) Unique Identifier would be 'myworld' as this is unique. It could be a random generated number too.

e.g.

<?php
string
('myworld_hello_world')
?>

there is another benefit here too, some words may have more than one meaning in different sentences. if we translate a word like "set", another module may use this one-word for a button or column name somewhere else with different meaning. Now (as far as I know, I may be wrong) "set" could have just one translation, even though other languages may have two different word for it.
Do I make sense or confused you all?

meaning of words

Gábor Hojtsy's picture

The meaning problem already has an optional solution in core in Drupal 7 called context, so you (=developer) can give context to your string. With the suggestions here, it would still be up to the developer, except they would be made required (or maybe not so much, see above) to give a unique identifier. That sounds like would not replace context, see the discussion with Jose linked in above, where people have feedback to keep the linguistic context information too, which can be much more helpful (I think also very useful for the UX arguments from above).

two unique identifiers != a unique identifier

silloyd's picture

Concatenating two unique identifiers does not give you a unique identifier.

For example, a module called foo with a string index bar_baz and a second module called foo_bar with a string index baz would create two keys of foo_bar_baz

As far as your disadvantage

cleaver's picture

As far as your disadvantage goes, I think I see it as an opportunity for logical hierarchy in the key structure.

global.button.ok
user.login-form.username.label
views.whatever.whatever

We already have potential semantic collisions with t(). For example, t('Book') in the context of Book module could mean one thing, while t('Book') in the context of reserving a room could mean something entirely different. Each would be translated differently in many languages. Thus the context in D7 t().

For me, the biggest issue we have to tackle is performance. We need to be sure that a typical unilingual site will not suffer a performance burden from this scheme.

The DX argument is IMHO just cranky programmers (of which I am one). Many other environments deal with this issue similarly.

I strongly agree with all

DjebbZ's picture

I strongly agree with all your points. Hierarchy is clearly a strong advantage, as it will provide a clear namespace, and avoid all kind of collisions. Remains the performance.

We are actually suffering a

gdd's picture

We are actually suffering a similar problem in configuration management. We want to remove the need to specify defaults in variable_get() (or whatever its replacement is) because it can create a lot of really weird bugs and its hard keeping those up to date with each other throughout code. However what this means is that we will have to have the defaults specified in the files where we are storing configuration, and thus when you start using a variable you will need to go into that file and add it by hand somehow. We may also be able to come up with some tools to manage this, but ultimately it is the same problem you're facing in terms of the DX hassle. We have internally decided to go this route, but I wouldn't be surprised if we get a lot of flack for the decision and have to re-evaluate after feedback from the community.

The number of variables in

pwolanin's picture

The number of variables in code is generally far less than the # of strings, and you often fetch the same variable multiple places in code, while relatively rarely do you use the same string multiple times.

So, I see this as substantially different in terms of DX impact and acceptance. In that case, I'd be happy with a switch to putting defaults elsewhere - in any number of modules we've done one-off versions of that via e.g. an extra wrapper function to avoid the annoyance of bugs due to a missed instance of a variable with a changed default.

I already developed custom

DjebbZ's picture

I already developed custom php sites with i18n in mind, and the easier solution is clearly using keys. It helps decoupling the work of the developer and of the UX designer/translator (complete different jobs) as the translator doesn't have to handle code, and the coder doesn't have to do the work of the designer/translator (writing and translating). This system is already well in place other technologies as mentionned above.
Workflow problems can be solved with the idea suggested by bojanz, i.e. a module listing keys and allowing input for translations. This workflow already exists in D6 in the translation_table module.
Name collisions is a false problem, we don't have much functions/variables names collisions today and it's the same behavior.
Code readability is ok as long as your provide descriptive keys, really. It even allows lighter php files.

The performance may be the only real concern, I don't have suggestions here.

User Advocate's picture

Just to add some thoughts on these two problems based on the work I've done using keyed strings (in Drupal and elsewhere)...

The Creation of Keys Problem:
I can understand the concern that it might appear that defining keys is a lot of work. But I've found it not a problem in actual practice. Beyond that, as @DjebbZ just mentioned, is that the developer is relieved of the burden of trying to do UX work while in the middle of writing code. IMO mixing the two tasks (which require quite different states of mind) is a recipe for UX disasters. Developers (myself included) might think they are being clear in their wordsmithing but i would argue it is impossible to do this properly while writing code. In practice, I enjoy the freedom to define keys (using the method I describe below) and stay focused on the task of building code.

I think a solution to figuring out how to create a key is to have a defined method for deriving sensible key semantics. There could be a defined lexicon of prefixes as well as an agreed upon convention for syntax. Just like any code.

BTW, I also like to use UPPERCASE_AND_UNDERSCORES because it helps me visually distinguish THE_KEYS from the rest of the code. I know Cleaver, for example, prefers a Java style with dot connectors. That's a relatively minor issue that could be worked out.

The Meaningful Context Problem:
The context that I use in my key definitions is the 'screenware' context - i.e. what describes the element that uses the string? This suggests some obvious prefixes such as:
TITLE
MENU
OPTION
PROMPT
LABEL
TEXTFIELD
RADIO
CHECKBOX
BUTTON

Other higher level prefixes could describe the page, block or form level so that string wordsmiths can get the general idea of usage context. Knowing the 'physical' usage context is critical to defining strings that enhance, rather than inhibit, usability. (Note I'm not saying 'use case' here - that's a whole other discussion which I talk about in my recent articles.)

Construction Syntax:

A possible construction syntax might be:
container : screenware : item description

Which means:
(name of page/form/block) : (element type) : (element detail)

So some examples are:
'LOGIN_FORM_LABEL_USER_NAME'
'LOGIN_FORM_EDIT_USER_ADDRESS'
'CHECKOUT_PAGE_PROMPT_CREDIT_CARD_NUMBER'

Or, in Cleaver's prefered java style:
'login.form.label.user.name'
'login.form.edit.user.address'
'checkout.page.prompt.credit.card.number'

(As I write this it occurs to me that last one looks longer in lowercase!)

Hmm, one big reason why I prefer underscores over dots is that I can 'pick up' the strings much more easily in my editor - i.e. one double click selects the whole thing whereas dots separate the key components. That alone saves me hours of time when manipulating strings.

So a compromise would look like this:

'login_form_label_user_name'
'login_form_edit_user_address'
'checkout_page_prompt_credit_card_number'

// When I see this it looks ok again. I think muscle memory was telling me it would be lot of work to pick up those dotted keys!

MK

Michael Keara
User Interface Systems Architect,
The User Advocate Group

not enough

Gábor Hojtsy's picture

In reality there are multiple login and checkout modules, so you'll at least need a module/package prefix there too, like COMMERCE_CHECKOUT_PAGE_PROMPT_CREDIT_CARD_NUMBER or LOGINTOBOGGAN_LOGIN_FORM_LABEL_USER_NAME. It is getting pretty long, doesn't it?

Also, if you need a label/prompt and a description for your credit card number field, you'd use COMMERCE_CHECKOUT_PAGE_PROMPT_CREDIT_CARD_NUMBER and COMMERCE_CHECKOUT_PAGE_DESCRIPTION_CREDIT_CARD_NUMBER?

That is, sadly, hilarious :)

User Advocate's picture

Hmm, they are pretty long keys.

Ok, so really this gets down to a question of scoping the keys. Once again I'll refer to practices I've been using and that problem is addressed by directing the String Manager object to locations for specific resource files. Despite what I had said in that last comment (thinking out loud) I tend to not include the outer context (page/form/block) and stick more to the last two specifiers - element type and detail. The example I used in my article is:

$str_mgr = get_string_manager('my_module', 'forms');

$ui_string = $str_mgr->t('PROMPT_EXAM_REGISTER_VERIFY_ADDRESS');

or the cases we're talking about here:

'PROMPT_CREDIT_CARD_NUMBER'
'LOGIN_FORM_LABEL_USER_NAME'

I don't mind keys around that length but the real question is how to reduce them down to the minimum required and that's why 'pointing' the string manager to a resource file is useful.

The matter of scope is really a matter of defining a logical 'space' to keep the keys which ultimately represent things we want to say to users. The simplest 'space' available in the short term is the module file system. (There are others but I won't go into that here.)

I'd like to hear more about why distributing resource files through the module files system would be a bad idea. (I'm not saying it's a good idea - I just want to understand the specific arguments against it.) It seems to be a natural way to begin the process and it does allow the long key name problem to be addressed reasonably well.

Michael Keara
User Interface Systems Architect,
The User Advocate Group

Completely agree with User

DjebbZ's picture

Completely agree with User advocate, and I would choose the UPPERCASE_AND_UNDERSCORE snyax for the same reasons than you. This would allow and enforce UX people to write meaningful form fields labels and description for example (really, we developers are bad at it, because it's not our job)

Madness

chx's picture

There was a time when Drupal was simple and fun. We have deviated far and this proposal would move even further. What's wrong with treating the English string as the key? Let's see your pros:

  1. the module can originally be written in another language and can be later translated to English -- that's a horrible idea and this from someone who is not a native English speaker. We will end with contrib that noone can reuse because the doxygen, the function names, the variables make no sense Edit: to an English speaker.. English is the language of coders, sorry! Remember the madness Hungarian Excel is.
  2. small changes like typo fixes will not invalidate translations -- keep a map of typos in a variable and write those typofixes into the variable on a hook_update.
  3. no code patches required to change strings (or maybe yes, see below) -- that's one 'big' plus.
  4. code size will go down in some cases. -- another 'big' plus. Don't forget that in memory size will certainly go up OTOH.

haha

Gábor Hojtsy's picture

Well, I've started the post off with recognizing all those (rightly so foreign language) speakers who suggested this idea, because obviously those are the people who are exposed to foreign languages and all the pressures of that, right? Not much surprise there :) I'm merely trying to centralize the discussion about it, so we can properly gather feedback :) I did not suggest that people should contribute non-English code to drupal.org.

BTW the Drupal is easy for developers "train is gone" (so they say in Hungary) with Entity/Field API and to some extend DBTNG I think... Don't get me wrong, I want to make/keep this as simple as possible, that is why I enumerated the problems that need to be solved, so we can concentrate on trying to find a solution for those... I really wanted your http://london2011.drupal.org/conference/sessions/engineering-80-too session to be accepted, let's do it as a core conversation maybe?

Please realize

chx's picture

that i have actually suggested an alternative that aside from the minor update wart does not have many cons.

update functions

Gábor Hojtsy's picture

Yes, update functions are in fact something that could ALREADY happen for these string updates in all supported Drupal versions, if we introduce such a process, need not wait for Drupal 8.

I also think that this does not solve all the issues. The UX complaint and the translator complaint remains that "context" is still often missing from strings. Of corse that is an optional Drupal 7 feature on t() - also not supported universally for watchdog messages and not to well for menu items... Using string keys would put us on the other end of providing context - in effect all strings would have context built in by having unique ids or at least ids only reused when they really mean the same thing vs. just using the same English words. We clearly did not get much use of context with the optionality it has in Drupal 7. I think the question still stands as to how to improve in this area. The update function problem was just one piece.

DX and UX

DjebbZ's picture

We could try combining string/key and context.
Let me try to weigh pros and cons :
- by combining string (as we already use) with context identifier we souls something like t("Buy", "checkout page") or string("Buy", "checkout page"). What does it mean ? Developer is still responsible for handling text writing/translating but he's trying to provide an easy for translators to do their job.
- by combining unique key and context, we would write something like string("BUY", "checkout page). Developers and translators are now happy because the developer would rarely change this part of code, and will use update function to update the value. And translator would everything he needs : a unique file/page with text to translate, each one accompanied by a context.

Now let's go a bit deeper. Whenever a developer is providing the context of the text to translate, he's communicating with translators, not with Drupal. This human interaction/communication needs to be clearly defined, because if developers communicate well with machines, this skill is not optimal when it comes to people communication especially if these people are not tech savvy. Let me provide examples to illustrate this :
- string("BUY", "checkout page") : the context is a specific "physical" place in the web site. That means the translator may need to visit this page to see the context and provide the proper translation. That also means that the outer context needed to properly translate the word is engraved in the string. In another place I. The site we may need string("BUY", "about page"). So for another meaning of the same word we would need another identifier. This design (forcing outer context I to an object) is a bad idea.
- string("BUY", "commerce.module"). This one is tempting for developers, but it means that the translator need to know what is and how works the commerce.module, and even know what is a module. Please let's never do this.
- string("BUY", "Call to action"). This is what I think the best approach. Describing the role, the function of the text helps understand what kind of translation the writer has to provide. It's good because it does not tie the sentence to a particular context, but enforces this text to be used in the proper context.
Of course we could try combining these ideas : context + module + function, or any other. Anyway let's not forget that it will be the developer's job to write this context. If do it right (performance, documentation, good examples in core), we could provide both a good DX and UX.

already in Drupal 7

Gábor Hojtsy's picture

Yes, Drupal 7 already has string context support built-in and even core uses it minimally. See http://drupal.org/node/1035716 for the issue about defining guidelines, where your feedback would be very appreciated.

Chx, the real issue is

cleaver's picture

Chx, the real issue is normalization. With t() as originally conceived, we denormalized for the sake of convenience to make the value the key. In my experience, whenever you denormalize, you have a series of side-effects. In Drupal, the side effects have been non-English sites breaking on string change, semantic collisions, avoiding necessary user interface updates, etc.

The t() context argument in D7 works around the semantic collisions, but to me feels like a bit of a hack. When I see a composite key like that (two attributes for the primary key, one dependent attribute), I just want to normalize it to become a single key. (OK, a hierarchical key violates atomicity, but I don't see a downside in practice.)

Using a key makes the translation process much better. Every site needs to be translated... sometimes from English to another language, but always (to paraphrase User Advocate) from programmer-English to user-English.

Having worked in Java with property files and a little bit in .NET, I can honestly say the DX argument is a bit of a red herring.

Agree. Especially

DjebbZ's picture

Agree. Especially programmer-English to user-English.
Let's just remember that for translators, a context may still be useful. Even if it's not used for building the key, it may still help people who actually do the translation/writing job. UI and/or translation tools may build upon these contexts too.

I would like to see more

meba's picture

I would like to see more evidence of the DX argument being a red herring. I don't think there is a lot of .NET and Java systems that can compare to Drupal in terms of it's community and number of developers. It is extremely easy for you to work with the property files if you are a disciplined programmer yourself. In Drupal, you have 100 000 additional programmers

I guess we need to try

cleaver's picture

I guess we need to try putting together a solution and see how well people can work with it. Like any design consideration, there's some degree of compromise.

To me, a single key for string lookup is already simpler that t() with context. As more programmers come to Drupal from Java and .NET, they might feel the same. But that's not everyone in the community.

Crazyness

Sahin's picture

  • "the module can be ... translated to English -- that's a horrible idea..."
    -- We are not talking about the code or comments, it is the ui which needs translation, and yes, sometimes into English, sorry!
  • "...keep a map of typos in a variable and write those typofixes into the variable on a hook_update."
    -- Horrible!
  • It is painfull to face such a mentality here while the whole virtual world out there is working harder to find better ways to reach and respect non-English speaking people. If you were a film director in India I am afraid you would shoot films in English with Indian dubbing:)

    keep focused

    Gábor Hojtsy's picture

    Sahin, please keep your comments focused on the discussion. Name calling never advances your cause at least not in the Drupal community.

    Agree, but the address is wrong

    Sahin's picture

    Gabor, I only ment to reply chx with his own wording, so you should better address to him. Anyway, I appreciate your care for the community spirit.

    Wondering whether this could

    sun's picture
    1. Wondering whether this could be a two-step process/transition; e.g.
      D8:

      <?php
        t
      ('user.user_login_form.username.label', 'Username', array(...));
      ?>

      D9:
      <?php
        t
      ('user.user_login_form.username.label', array(...));
      ?>
    2. Wondering whether this context could replace and incorporate the existing context; e.g.,
      Current:

      <?php
        t
      ('View', array(...), array('context' => 'verb'));
       
      t('View', array(...), array('context' => 'noun'));
      ?>

      New:
      <?php
        t
      ('node.admin.content.list.link.view', array(...));
       
      t('views.menu.link.view', array(...));

        function
      node_string_info() {
         
      $strings['node.admin.content.list.link.view'] = array(
           
      'text' => 'View',
           
      'context' => 'verb',
          );
          return
      $strings;
        }
        function
      views_string_info() {
         
      $strings['views.menu.link.view'] = array(
           
      'text' => 'View',
           
      'context' => 'noun',
          );
          return
      $strings;
        }
      ?>
    3. I don't understand why source strings would have to live in the database. Monolingual English sites could still run without the major overhead of retrieving non-localized strings from the database. Given a pattern similar to 2) above, a current call to t() would stay the same and would merely have one additional internal call to fetch the source from the info hook.
      That said, to make that possible without a string info cache, it'd have to be:

      <?php
       
      function t($module, $id, $args = array());

       
      t('user', 'user_login_form.username.label', array(...));
      ?>

    Daniel F. Kudwien
    netzstrategen

    No, I really don't want to

    pwolanin's picture

    No, I really don't want to have to write a hook or look elsewhere is the code for every string, and have the overhead of many extra function calls. Note also in my initial suggestion contrib modules JUST WORK with the same t() call as in 7 in most cases, allowing for an initially simple upgrade.

    While the DX of D7 is certainly much worse than D6, why are we trying to make D8 impossible?

    I advocate removing context

    cleaver's picture

    I advocate removing context entirely and just make it implied by the key. Defining a hierarchical key correctly incorporates context. Simpler, more normalized, less potential side-effects.

    "I don't understand why source strings would have to live in the database."
    I hope they don't live in the database, but rather in the file system. I think the strings would be distributed around the file structure of the modules. I was thinking that they might be aggregated by a caching mechanism (like css and js optimization), but I'm not sure this would be the most efficient. IE. why load the strings for views if you don't have a view on the page.

    As some of the strings can be quite long (help text, etc) we probably don't want to load stuff we don't want. Admittedly this works better in other languages where you have an application context to store these values.

    <?php 

    meba's picture

    <?php
      t
    ('user.user_login_form.username.label', 'Username', array(...));
    ?>

    I like this

    no verb/noun

    Gábor Hojtsy's picture

    Just a little note that verb/noun are not good context names. "View" as a noun can still mean various things. Just compare "how nice is your view from the window" to "save this view to the database". How contexts are best defined is being discussed at http://drupal.org/node/1035716

    Its going to become an utter

    lathan's picture

    Its going to become an utter mess!

    just look at those tokens 'node.admin.content.list.link.view' OMG! Think this is going to make the code so unreadable? Really worth it? I say NO ways. Keep english in code keeps things readable and easier for people to figure out.. changing this is gonna cut out a bunch of people who can contribute back out the loop.

    my 2 cents.

    Waste of time

    chx's picture

    We are not going to make the life of every Drupal developer utterly miserable for the perceived possible comfort of those relatively few who work with multilanguage sites.

    Edit: i unsubscribed from this thread so this was my last reply.

    Pluggable translation systems

    wizonesolutions's picture

    (Edit: I seem to have misunderstood "plugins" conceptually. So take the following for what it's worth (DX and commentary) and not for its technical merit.)

    A major initiative in D8 is to make subsystems pluggable. This looks like an obvious shoo-in for one of the systems that should be pluggable. I know this is a bit off-topic, but I wanted to chime in because it's obviously a very polarized discussion.

    So, why not:

    • allow one internationalization plugin that does things the old way, or the old way with optional keys/context
    • have another one that uses keys instead of source strings for those who need it
    • allow plugins to run side-by-side - in other words, core modules can use one, contrib modules can use another. users/developers wouldn't need to do anything fancy or rewrite their code for things to work

    This is a really conceptual suggestion. I haven't thought about how this would actually be done, though I do think it's doable. But basically, I think some of the beef with Drupal 7 is that it gets in your way. "Managed files? Entity/field API? jQuery.noConflict() is on? wtf? What is all this stuff I have to learn?"

    Being able to have Drupal be fun for those who prefer that style while offering other coding styles to other people...would be nice. I guess this might divide the community into cliques based on preference, but look, that happens and is going to happen anyway. As long as users don't suffer, it'd improve DX and probably mitigate developer attrition.

    WizOne Solutions - https://wizone.solutions - Drupal module development, theme implementation, and more
    FillPDF Service - https://fillpdf.io - Hosted solution for FillPDF

    Context in t()

    RobLoach's picture

    Isn't this why we added the "context" argument in the t() function? I'm probably thinking of something different...

    <?php
      t
    ('Username', array(), array(
       
    'context' => 'user_login_form', // Or some other useful context
     
    ));
    ?>

    Haven't really dealt with translating Drupal websites into other languages much, so I can't really chime in that much here... But as a developer who's used other platforms that required string registries, I can say they are a pain to deal with. The experience I had was quite similar to the one meba mentioned earlier.

    While I work with other

    Xen's picture

    While I work with other language sites, I'm not to fond of the external strings approach. Some points:

    1. Often you create a module that reuse existing strings. For instance, if I create a login form for authenticating with an external service, I can reuse t('Username') and get the translation for free. With external strings I either cheat and reuse the key for the user form (and then we're back to the problem of context that t() suffers from) or have to create a new key, requiring that new key to be re-translated again (putting a burden on translators).

    2. It's not uncommon for the client to want a string translated differently depending on context. "Please enter your username" on the login page, just "Username" in the login block. Using just keys doesn't help here, as either you'll have a user.login.username that's used for both, or user.login.username.page and user.login.username.block, which is then in 99% of cases the same string translated to the same string (putting extra work on the translators to support the 1% usecase).

    In order for keys without context to work, we'll have to have some sort of hierarchical system where a module can use form.username.login.block and the system falls back to so translators can just translate form.username, and it'll catch form.username.block, form.username.page, form.username.user.profile.edit. But I don't want to think of the overhead of that (never mind dealing with placeholders), or how to organize such a translation tree without crippling it.

    If t() could have the file and function as context, it would pretty much solve all my problems (apart from the same-form-different-places issue, but that's a form_alter thing). Then I can translate View to Vis in the general space, and View to Oversigt in the context of most of the Views module (save the few places where Views uses the verb).

    I decided to look how this is

    meba's picture

    I decided to look how this is solved in Joomla. Joomla uses keys stored in {languagecode}.ini files. I do not know if we can compare quality of Joomla vs. Drupal extensions but from the 5 extensions I have checked, I can clearly see that using JText() class to output translations is definitely not used as often as t().

    Too soon, keep in contrib

    wizonesolutions's picture

    Yeah, this kind of comment implies that externalizing strings might be best left to the contrib space unless the majority of people who actually work with i18n want externalized strings. Many of those people probably have the resources to sustain a bit of custom development anyway...or to hire developers who can code in English :)

    WizOne Solutions - https://wizone.solutions - Drupal module development, theme implementation, and more
    FillPDF Service - https://fillpdf.io - Hosted solution for FillPDF

    Joomla Way

    wojtha's picture

    Fortunately (or ironically?) my flatmate is co-owner of the probably biggest Czech Joomlashop I could interview him what is his experience with this approach.

    string context

    adraskoy's picture

    We also have PHP´s LINE, FILE, and FUNCTION "magic constants" to work with. Perhaps passing some of these to t() could give us something to work with. I too have run into the situation where I want to change a particular instance of a phrase, but not the phrase in general.

    Not too fond of having to

    Xen's picture

    Not too fond of having to pass magic constants to t(). It would be possible with debug_backtrace, but it wouldn't exactly be considered kosher to use it for that purpose.

    This would be easier to do if PHP had a preprocessor.

    Regarding:
    "This is being done by lots of systems in the industry. If you just look at how Java property files work or .NET resource files (which originate from decades old Windows programming standards). " and "Most software uses something like the proposed scheme."

    Our t() is modeled over gettext, which is the base of translations in Linux. Gettext first release was in 95, back when Windows still didn't grasp the concept of 'one binary, many languages'.

    Regarding the comments that developers shouldn't be doing UX while coding:
    Well, excuse me, but it happens to be part of the job a lot of times. I don't know where you work, but in my experience the UXers do 'the frontend', which means the stuff that the clients end users (often their customers) see, and leave out all the administration pages to manage said frontend, to us Drupal people. Not to speak of the occasions where the client "just need this little thing" and don't want to involve too many people.

    A few clarifications...

    User Advocate's picture

    Regarding item 1: Sorry Xen, I'm not grasping your point about constants yet. Can you spell out the problem for me? Thx.

    Quick point about item 2: I started programming for Windows in 1990 at Corel Corporation. String resource files (pre-compiled DLLs), accessed via constants, were critical to the extensive internationalization of Corel products back then.

    Regarding item 3: Apologies if my earlier comments appeared to show lack of understanding of developers' responsibilities. I realize the huge commitment developers make to projects like Drupal. I think it's nothing less than heroic for the most part, given the voluntary nature of it all. I also understand that in many circumstances developers are the ones who have the responsibility for doing all aspects of the job.

    Here are a few clarifications on my part:
    I see Drupal as having a continuum of 'front ends' because the various levels of administration require some sort of UI. UI strings are present throughout.

    I believe clients vary in terms of how much they can or want to take on the task of administrating a site. Adminstrative UI strings play a critical part in setting the boundaries of clients' involvement.

    I believe that developers should focus on code and UX designers should focus on UX. They can be the same person. As such a person, I find that when I'm focused on code I make, hmm, not such good UX decisions. I prefer to wait until I've gotten out of the technical mindset before I get into UX stuff. This is especially important with regard to UI strings.

    In terms of DX, I find the use of keyed UI strings allows me to separate these two aspects of my work so that I can do a better job of each.

    Having said all of this, I'll mention again that, despite success at the level of custom modules, I have no idea how the widespread use of keyed UI strings would affect performance. I'd love to hear more about that. @cleaver and I also intend to do some tests in that regard.

    This is a great discussion. I'm learning lots.

    Michael Keara
    www.tuag.ca

    Michael Keara
    User Interface Systems Architect,
    The User Advocate Group

    dropcube's comment is at

    User Advocate: Item number

    Xen's picture

    User Advocate:
    Item number one was on adraskoys suggestion of passing magic constants to the t() function. I wouldn't like to do t("Save", FILE, FUNCTION).

    On item number 2, Corel was ahead of the times. Most software, Windows included, was released in different language versions. I remember because I came from the Amiga that had a locale system that allowed to change language on the fly.

    "As such a person, I find that when I'm focused on code I make, hmm, not such good UX decisions."

    Well, that's how you work. That's not how I work. When I code a form, I know I want a textfield here with the label "Surname". Having to figure out that the key should be mymodule_form_label_surname (never-mind hating multiple underscore keys), open the string resource file, add the key and the string, and go back to adding the next field.. That's seriously gonna mess with my mojo.

    Another point is that by adding the indirection of the keys, you add another thing to maintain. Whenever you rework your form, you have to add, rename and remove strings from the resource file. You're not just adding a key, you're adding rules and semantics on the keys. It's Hurgarian notation all over again. I'll bet that it wont be very long before someone that doesn't understand or care for the key system will use 'module_str123' as a key.

    Regarding performance, English sites will be slowed to localised site speed, but I don't think that it'll be much difference for localised sites, unless the translation function does funky things like looking upwards in key hierarchies.

    Just spotted this one: @cleaver
    "Contrib developers should attempt to use core strings (eg. 'user.login-form.username.label', or whatever we use) whenever possible."

    No they should not. Else you're implementing the thing you're trying to fix in t(). You cannot have a global.button.save that's used by everybody, because you'll have someone like User Advocate saying "hey, it shouldn't say 'Save' on the user profile form, it should say 'Save your profile'", and without unique keys, he wont be able to change that single instance.

    But that means that every translator out there have to deal with 302 different keys (and that's just core) for what amounts to 'Save'. And have to deal with every new key in contrib modules that also points to 'Save', unless we develop tools that essentially does the same thing as t() does now.

    My point of view is from

    cleaver's picture

    My point of view is from consistency of interface, limiting the translation burden and know I could change "Save" in one place and have the change show throughout my site.

    I think it's a good idea, but I hear you that not everyone will want to do that. For sure you would want to override "global.button.save" to "mysite.profile.button.save" or something like that. I that it hook_form_alter() would take care of most needs.

    Right now, it is essentially equivalent to "global.button.save" in most cases. I think most module developers will do a t('Save') and be done with it.

    hook_form_alter already there

    Gábor Hojtsy's picture

    Well, hook_form_alter() is already there to do that, so this would not be anything new, right?

    Lets do evolution, not revolution.

    wojtha's picture

    As I'm one of the lead of Drupal translators here in Czech Republic, I'm sometimes angry when I see how are the developers unresponsible. I remember that I translated one release of Übercart to have almost full 100% translation. (Except bunch of words in Drupal 6 which can't be translated since it have completely different meanings in different contexts, unfortunetaly with the Drupal 7 and the lack of usage of the context is the situation almost same.) So I had the translation done, spending couple of days on it, and was proud of it, but within the week there was another (minor) release and 300 strings were changed. WTF. And almost all changes were in the following style:

    • "Some form label:" => "Some form label"
    • "You have error in !input." => "You have error in @input."
    • "Hello @user." => "Hello %user."
    • "My orders" => "My Orders"

    I was very angry when I spotted these kind of changes and that was time when I will change the t() for string constants based translation. Fortunately some translators tools like Poedit has the translations memory and automatic translation feature which could performs the fuzzy translation. If we implement the similar functionality to the l.d.o, it will save a big part of translators effort which is wasted with fixing changes like this.

    This my experience (or rant if you wish) will spoke for the "moving interface strings entirely out of Drupal code files", however I'm against that idea.

    Constant based translation will be another barrier for the most of the developers as someone mentioned earlier. I'm giving -1 to that. I see the way in the better translators tool. Automatic translation isn't too hard to implement (at least for european languages I have no experience with Chinese or similar), I have been thinking about that for year, but the time is still against me.

    I would like to see some standard for the context, and might be we could force to use at least module name as the context. It will help for the most cases, but e.g. for Ubercart there will be still problem with words like Order in the context of "ecommmerce" and "sorting". So probably only some forced and standardized marking with appropriate contexts (no matter if hierarchical or not) should work. Another idea is e.g. if we will pass the module name to the t() we could define the default context or contexts in the module's info file... just thinking.

    Someone in this thread also called for the non-english based translation, I don't. But I call for the equivalent of the string formatter function like t() or format_plural(), to cleanup and format strings but without the translation processing. When I'm developing some custom modules for the local Czech clients, its idiotic to force Drupal developer to write the strings in the english and translate them through the Drupals translators interface. I'm often using just sprintf... But it has not full formatting potential of the t() and format_plural(). What about nt() - not translate, or might be local_t() and local_fomat_plural()? Or passing some option to t()? But since no untranslated strings are allowed in the Drupal core and contrib (I agree!), this may live in the contrib anyway, I just feel bad to create a module just because of one or two functions, but is probably the best and only way...

    Desired usage:

    <?php
    t
    ('Hello %user', array('%user' => $user->name)); // going through translation interface
    local_t('Ahoj %user', array('%user' => $user->name)); // not going through translation interface
    ?>

    Sorry, the last paragraph is a kind of offtopic, it just came to my mind.

    Summary:

    • Lets do evolution, not revolution. I feel like we are trying to replace whole system which more or less fits for 95% use cases with a system which will theoretically solve all current conflicts and problems (the last 5%), but will be less friendly for developers and totally different. It seems like replacing bunch of problems (but which are known and we know how to live with them) with another set of problems (most of them unforeseen now).
    • Current problems of the translation system could be solved 1) with forced usage of context together with good standard, how to create good context name and 2) with fuzzy automatic translation system which will automatically generate suggestions for the untranslated strings and allows to quickly fix the changes.
    • Site level string overrides could be done, "there is a module for that" (Drupal TM) http://drupal.org/project/stringoverrides. Global site strings overrides could be done via translators interface, big changes could be done directly in the theme layer. Might be just proper tool for importing/exporting the local string overrides is missing.

    Despite I don't agree with the idea of "moving interface strings entirely out of Drupal code files?", thank you Gábor for opening and promoting the discussion about it. The Drupal interface translation system needs to be improved for sure.

    Issue for that

    Gábor Hojtsy's picture

    Moving the formatting out of the t() function was part of the big patch at http://drupal.org/node/361597 (Write locales data API), but I've moved it to its own and applied to JS as well at http://drupal.org/node/1191614 (Make t() formatter available as its own function). Please support and spread far and wide to get accepted. It should be very helpful in solving some of the problems mentioned here.

    Thanks! You are scratching my

    wojtha's picture

    Thanks! You are scratching my itch :-)

    CSS3 - solution for designers?

    wojtha's picture

    Text-replace CCS3 property might be a way for the designers soon.

    http://www.cssportal.com/css-properties/text-replace.htm

    Well it doesn't solve tokens. But for basic interface strings could be pretty useful. I can imagine even the multilingual use - CSS file per language or replacement based on element language class.

    While, as wojtha points out,

    Xen's picture

    While, as wojtha points out, it is annoying that minor text changes means that the translator has to update the string, it is a feature, not a bug. Consider a module with a text field with the following changes in the #description:

    v1.0: "User to mall daily roports to."
    v1.1: "User to mail daily reports to. "
    v1.2: "User to mail daily reports to. Alternatively an e-mail address."

    Evidently a new feature was introduced in v1.2. With t(), this will require the eyeball of the translator for each version, which will be annoying going from 1.0 to 1.1, but a Good Thing for version 1.2.

    If the key is the logical mymodule.form.reports.user, how will the translator discover that the meaning of the text has changed? Are we to require versioned keys ala mymodule.form.reports.user.v2? I'll wager good money that that's a requrement that'll slip through the cracks often, resulting in old out of date translations.

    right

    Gábor Hojtsy's picture

    Right, people would need to change their keys whenever their text meaning changes. Yeah.

    I know we're not discussing

    te-brian's picture

    I know we're not discussing implementation by any means, but what would you change the key to in the case of v1.0 to v1.1? I think this is a case where the keys fail because you'll have to change a perfectly logical key just for the sake of changing the key. There was no intentional change in meaning (just fixing a typo after-all).

    I suppose there is an argument to be made that the translator would have noticed the typo and translated it correctly.

    From v1.1 to v1.2 there was a change in meaning so the key could be update to something like: mymodule.form.reports.account or mymodule.form.reports.user-or-email .. but even then, the translator still has to view the original string, in it's context, in order to be able to translate it properly. I'm not sure where the gains are.

    Edit: In summary, I was happily going along with the conversation; starting to agree with the key supporters. But I really like how Xen puts it in that in many cases some of the annoyances are a "feature not a bug", in that it guarantees string changes always have an impact.

    The 'elephant in the room'

    User Advocate's picture

    I agree with Xen and Gabor on the matter of a potential Text-replace CCS3 solution.

    'Changing keys whenever their text meaning changes' defeats the purpose of keys, which is to 'nail down' a fixed code reference point for a variable value (the UI string).

    But as I understand it, this is pretty much how we use t() at the moment isn't it? The text passed into t() is in fact a key that is used to retrieve the translation from the db. Unless it is an English site in which case we display the key itself. This is the 'key=text' condition.

    But logically, doesn't this amount to changing the keys whenever their text meaning changes?

    The problem with shifting keys, as Xen's scenario indicates, is that it a poor way to do version control. Version control is better left to a version control system (which incidentally supports the argument for external files, I think).

    However, I don't think this is primarily a version control type of problem. Version control systems support serial changes and translation requires parallel changes.

    And a subtle variation on the concept of 'parallel changes' is idioms, which I refer to in my first comment. This is what I need as a UX designer.

    Idiomatic string management requires parallel string definitions because it's a form of 'sub-language' translation. Because of the key=text condition, which is prevalent in t() usage, idiomatic control is excluded from the equation when it comes to UX design in Drupal.

    In addition to 'out of date' translations problems there are also countless 'out of context' UI string problems. This means that UX problems that might be resolved fairly easily with multiple sets of targeted UI strings are diverted to a variety of other 'screenware solutions' that may or may not be as effective (or cost effective).

    To use a popular English idiomatic expression, this is the 'elephant in the room' for matters pertaining to t().

    Michael Keara
    User Interface Systems Architect,
    The User Advocate Group

    CSS3 and key changes

    Gábor Hojtsy's picture

    I did not comment on the CSS3 idea, and I do not agree it is a display level operation like that. It does not help with your search indexing for example. I do think that is pretty late. Also most strings are not wrapped in HTML elements to be addressable with CSS3 and many strings have replacement items and escaping and such which make it impossible to replace with CSS3. I don't think that is a good idea.

    On changing keys, if we do not change keys when the underlying text changes, we'll not only need the keys but also supplemental "meaning of this key" information, that can be changed when the meaning is changed, so we can properly mark translations outdated. Otherwise useful changes will slip through the cracks. That seems to complicate this idea even more. :|

    I still think we agree on the CSS3 matter

    User Advocate's picture

    Gábor, I still think we agree on the CSS3 matter - I don't think it's a good idea either. It looked to me that your last comment was addressing that. Apologies if I misconstrued. (I had initially hit 'reply' to your comment but somehow my comment fell out of the stream.)

    With regards to catching outdated strings that need translation, I still think this can be addressed through external files that are managed through a version control system. It would require a particular resource file to hold the 'definitive' meanings (i.e. what is currently defined in the t() functions). For consistency with past practice this may be always English - or maybe not. I would think well defined version labeling would allow diffs to be generated that identify which strings requires modification. A key based system would allow those particular strings to be found within each language resource file.

    That is how I've seen it done in the non-open source world. Is it possible in the Drupal context?

    Tell me if I'm missing something.

    Thanks,

    Michael Keara
    www.tuag.ca

    Michael Keara
    User Interface Systems Architect,
    The User Advocate Group

    Jose Reyero's picture

    After having read many of the comments, here are some more thoughts.

    • That each string has a unique id doesn't mean you always need to write it. Most of the short strings around are either menu texts, form texts, etc.. These can get an automatic id from some string parser: menu.item.my/menu/path.title, form.my_form_id.field_name.description etc...

    So you'd only need to add a explicit string id for random texts in the module that are not part of a known structure.

    • Moving long texts out of the code can actually increase performance and readability. Performance because we may save some thousands of lines to be parsed for each page request, we only load the strings needed for each page. About readability, I don't see how a 20 lines help text wrapping each line with t() and '<p>' is more 'readable' than the same text on its own plain text file. So about performance I don't mean it will be faster, I just mean we cannot make the straight assumption it will be slower.

    • About having the defaults in the code (for short texts) it sounds good to me if that helps. We just need to kill the assumption that default will be always English. Still we could require defaults to be English for contributing the module.

    When you are developing a module for a non English site, the economics of translating the default texts to English in order to contribute it doesn't pay back at the moment because then you need to go through the full translation/update workflow for every text. But if we've got a key for every text, just replacing the default strings by English ones and moving the non-English ones to a separate file would make the module 'contributable' while keeping your original module working on your site without more work.

    So it sounds good if all the contributed code needs to have string defaults in English. As long as all texts have keys, translating it and contributing and still keeping it working for your original language would be way easier.

    Also having string keys can make code developed in a different language readable for English speakers. See the difference:

    <?php
    // Current. You may have no clue about what the module is printing out.
    // Also it will need changing the code to make it translatable, wrapping in t(), which I wouldn't do for a string written originally in Spanish.
    print "Hola mundo";

    // Proposed. Now you can see what the string is about, then just rewrite it in English
    // This will work for my Spanish site, but also would make the string translatable right away.
    // Still, for contributing the module I would translate it and move my Spanish string to mymodule.es.txt file or similar. With that
    // the module would still work for my original site without further translation work.
    print some_function('mystring.hello_world', "Hola mundo");
    ?>

    So please, let's make some important difference: Being able to write non English strings in code doesn't mean we need to open contributions to non English texts. It just means it will be much less work for me to translate and contribute the module while still using it for my original site.

    The current system makes code written for a different language both not translatable and non readable (for English speakers).

    The real big problem is not exactly having the text in the code. It is not having a language independent key to refer to it (override, translate, etc). So just adding keys (still having source strings there) would fix most of it.

    We don't need to force everybody to move texts out of the code. But we can enable that option by having strings with a mandatory key and an optional default in code.

    So, the whole point is: we need to enforce unique string keys. Whatever else doesn't matter that much.

    +1

    DjebbZ's picture

    I read many comments too, and I really agree with you Jose. Externalizing translation will enable developers to focus on code, translators to work with known files (.po files), and git (or any other version control system) to handle most of the translation change problem, except maybe marking translation as outdated. I really think you hold something good.

    Before I started to work with

    Pedro Lozano's picture

    Before I started to work with Drupal I was involved in some projects that used gforge (http://gforge.org/). Its code uses the key string stranslation method that Gabor is proposing.

    From a developer standpoint it was a pain. When working in a project you have to be constantly updating the translation file. Some times things broke and the keys ended up appearing in the website everywhere.

    When I switched to Drupal I felt that the t() method was more friendly. It has its glitches, but the other method will have too.

    DX versus UX?

    User Advocate's picture

    I've said enough on this discussion already and I realize it may appear that I'm adding red herrings (UX issues) into an already difficult discussion. I'd like to offer a vote of support for Jose's summary because I think this is moving in the right direction.

    Pedro, I can fully understand your position too but I'd like to again point out that what makes things simple for developers can have negative consequences for users, down the road.

    At the same time, I don't think it has to be DX versus UX choice. I have posted further thoughts on that on my web site.

    I think the UX dimension (beyond translation issues) that I've been injecting into this discussion requires further proof and certainly discussion on the UX side, so that's where I'll be turning my attention.

    Thanks for your patience with my long comments.

    Michael Keara

    Michael Keara
    User Interface Systems Architect,
    The User Advocate Group

    User-defined text

    donquixote's picture

    There is one important use case that made me consider a keyword-based system:
    Text that is supposed to be specified by the user / "the client", and not be hardcoded.

    Typical situation: The client wants a custom-desiged page with a fancy javascript widget, a list of latest whatever, and some text boxes with intro and description text. The developer decides this is not a node type, it is not a view, it is not a panel, but a custom hook_menu() page callback.

    The client is still undecided about the text to put, so the developer can either
    - hardcode something, and offer to change it later.
    - fetch the text(s) with variable_get(), and provide a custom admin form to customize the text. The variable has to be added to i18nvars.
    - Use t() in combination with strings_override module.
    - Or, if the site does not have any English, the developer can simply hardcode a t() string, and the client can use the translation tools to customize it.


    I think we do not need to make a hard decision here.
    - We can keep the t() system for the biggest part of the interface, especially for most of the strings in contrib and core.
    - We add tools similar to string_override, but better integrated in the overall architecture. These allow to customize the English original, and to make case-specific overrides for any translation.
    - As discussed above, we might add an option for non-English original strings. However, those non-English strings need to be cleaned up before they go to contrib.
    - In addition, we add a keyword-based system, for those cases where the developer has only a very rough idea of the intended string. This would be similar to variable_get() + i18nvars, but we get the admin form for free, integrated in a translation browser with search. This solution can grow in contrib in D7, and go into core in D8.

    Some ideas about syntax

    donquixote's picture

    If we do add a keyword-based system, in addition to t(), then probably we want to namespace-prefix by module.
    Yes, this adds some limitations to cross-module reuse, but this is intended. We still have t() available, remember.
    We might even have additional prefix by page or sub-section etc (that's something the module author has to decide).

    Now this offers some possibilities about syntax:

    <?php
    print t_keyword('mymodule.userpage.description');
    // becomes
    print t_prefix('mymodule')->prefix('userpage')->t('description');
    // and finally
    $t = t_prefix('mymodule.userpage');
    print
    $t->t('key1');
    print
    $t->t('key2');
    ?>

    In some cases we can inject the $t into functions and methods, so those don't need to specify the prefixes anymore.

    Yes, this is all implementation detail, but interesting as a "what can it look like".

    I strongly agree with the key

    LaurentGoderre's picture

    I strongly agree with the key approach having worked so much with multilingual web sites and application. A default language make it much harder for multilingual.

    One thing I would add to the proposition is the ability to pass an array to the function to fill some placeholders. Consider this error message in two languages.

    2 search results were found.

    La recherche a générée 2 résultats.

    The ideal way of achieving this is with an array.

    English string : "{0} search results were found."
    French string: "La recherche a générée {0} résultats."

    and the function call:

    t('search.results', [$results->count]);

    placeholders

    Gábor Hojtsy's picture

    Support for placeholders is implemented for quite a few years at least. The earliest one still documented on api.drupal.org is Drupal 5 from 2007: http://api.drupal.org/api/drupal/includes%21common.inc/function/t/5

    Suggestion

    travis uribe's picture

    What about changing the t function slightly?

    <?php
    // Allow providing a key in the third options parameter
    $string = t('click', $args, ['key'=>'mymodule.instruct.click']);
    ?>

    This would allow string keys, but not require them. If a key is provided, find the translation by key. Otherwise, search by the passed string.
    This would be the best of both worlds. I apologize if this has already been suggested. There's a lot to read here.

    context

    Gábor Hojtsy's picture

    An optional context is already possible to provide as of Drupal 7 which makes the string unique and disconnected from other strings of the same value: http://api.drupal.org/api/drupal/includes%21bootstrap.inc/function/t/7 (among $options). It does not make the context used in place of the string like you propose, that is true. These are variants on making it possible to specify individual unique strings when needed.

    As this thread showed up on

    Xen's picture

    As this thread showed up on Twitter, it got me thinking.

    While I'm still a big fan of t(), I am aware of it's limitations and issues. And looking at the state of D8 lately, I'm not sure that we haven't out-grown it.

    Grabbing some of the ideas already mentioned:

    Take this:

    <?php
    string
    ('username.label.login_form.user', 'Username');
    ?>

    The first argument is the key, the second a fallback value. Note the wording here, "fallback", not "default". It is defined as a fallback in "developer English", which, in my experience, tends to be terse and jargon filled.

    The idea of the fallback is to provide exactly that while developing, and providing something string extraction tools can use as a comment to jog the memory of the developer when doing the proper "translation" to the default English. In addition, it provides a human readable string in the code, so you don't have to mentally translate 'username.label.login_form.user' when scanning the code.

    And we'll not even frown upon

    <?php
    string
    ('key.label.settings.google_analytics', 'Key, kthxbye');
    string('duplicate_key.error.settings.my_module', 'Et tu, Brute?');
    ?>

    As the fallback is internal, and not for public consummation, like a code comment would be. What we will frown upon, is contrib modules without the default translation into English that'll hide the internal strings.

    The key is the real thing used for lookups, but it's done recursively, like so:

    username.label.login_form.user
    username.label.login_form
    username.label
    username

    Which allows core to have a translation for username, and my_auth_module can then use:

    <?php
    string
    ('username.label.login_form.my_auth_module', 'Username');
    ?>

    for it's login form and reuse cores translation, while still allowing for a translator to target its form specifically.

    It encourages reuse:

    username.label.login_form.user
    username.label.user_edit.user
    username.header.user_list.user
    username.label.node_edit_form.node

    can all fall back to username, whlie

    username.label.irc_login.irc_gateway can override it to "IRC Nick".

    As for the translation files themselves, it could be handy if they supported pointing one translation to another 'show.button.image_details.image' to 'view' for instance, and "soft" specific translations, so that it can define a translation for 'username.label.login_form.my_auth_module' that'll only be used if a more general translation doesn't exist. Or maybe instead being able to promote a specific translation to a general if there is not, I'm not entirely sure on that one.

    Some sets of patterns will need to be discussed, of course. I came up with these while pondering:

    <short string>.<type>.<form/page>.<module>

    Where:
    short string: username, view, google_app_id, help, ..
    type: label, button, description, header
    form/page: form_id, router name
    module: well, you get it...

    And perhaps

    entity.<type>.<what>

    Where:
    <type>: view, node, user, order, whathavewe
    <what>: name, plural, perhaps

    And

    help.<module>.<topic>

    And yes, lowercase separated by periods. That's not up for discussion.

    It'll be a bit more difficult to implement that the current t() function, and performance is probably going to be a challenge, but it'll bring some advantages over both t() and the simpler "just keys being looked up" approach.

    So, does this move the discussion along?