Addressing languages by URIs

Events happening in the community are now at Drupal community events on www.drupal.org.
gábor hojtsy's picture

If you look at the existing implementations I reviewed, you see different methods used to allow you address content on your site with a URI. The most requested feature (which was implemented in tr.module and is somewhat implemented in localizer) is to be able to use different hostnames for different languages. Let's see the good and bad of all the options used.

No language in URI

The node/2 type of URIs are kept, and the node language specifies what interface should be shown. The good side is that nothing needs to done to specially handle the URL, the bad side is that we don't know where we should link, if we would like to address a foreign language version of the node.

Some language in URI

This is desired, because this way search engines index the pages under that specific URI, and we get explicit information about what language is requested. The user can bookmark it, and send the link to friends. In this regard, the localizer approach is fundamentally flawed, since it does a language redirection by changing the user session, and keeps displaying the same URI, while the content is translated. The language somehow should be in the URI!

Additional language URI parameter

Both tr.module and localizer.modlule allow for q=node/2&locale=de type of URIs. This is good, because the language comes in a separate variable, we don't need to guess it. The problem is that you have this extra variable even if you have short URIs enabled. Tr.module overcomes this by rewriting /tr/de/node/2 type of URLs to q=node/2&locale=de URI-s. That prefix is needed to surely identify tr URIs, and is certainly not desired. But we cant be sure that a URI prefix is a language code, since we don't know the enabled language code list when request processing happens (in .htaccess).

Prefixing $_GET['q'] with the language code

i18.module and localizer.module have this as as an option (i18n only supports this type of URIs). The good part is that we are nice URI compatible, and we can (need to) handle them in our source code, so we know the available languages, and can also check if there is something to display for this language under that URL. We can also strip this code out, so the remaining Drupal processing can work as if there was no language (or can take the stripped out and globally available language code and use that too). This is simple to implement with URI generation automation by using the ISO language codes. For some reason (which is not clear to me) localizer module does not allow for automatic ISO language code prefixing, only manually entered URL aliases are supported this way, which is bad.

This type of aliasing is already implemented in i18n module, but there is one remaining problem with it. Many users desire to have different hostnames for their translated content, so that search indexers (Google, Yahoo, etc) index their different language content under different names. This is good for later searching.

Different hostnames for different languages

tr.module and localizer module have the option to support different hostnames for different languages. Tr.module is more versatile, allowing for arbitrary hostnames to be specified to be used (like example.de, example-home.it and example.hu). Localizer only allows for automatic hostname prefix handling with the fixed locale names (ie. de.example.com, it.example.com, hu.example.com). As far as I see, since this is only a one-time setup option, the tr.module approach is desired.

For this to work, you of course need special server setup, pointing different hostnames to your single Drupal setup. Since this is not always possible, we should not enforce using only this possibility onto our users.

Conclusion

So what comes out of the analysis above in my opinion? We need to have the language in the URI. We need to support language prefixing of the path, and we need to support arbitrary hostnames to be used for different languages (tr-like feature). The path language prefixing could also be arbitrary to allow for prefixes being in the target language for example, like: example.com/deutsch/, example.com/italiano/ and such. This is also a one-time config option, and the built in language codes could be the defaults.

What do you think? Is this enough for your needs? Is this too much? What else should we take into account? Discuss!

Comments

Languages by URIs - Localizer

Roberto Gerola's picture

the localizer approach is fundamentally flawed, since it does a language redirection by
changing the user session, and keeps displaying the same URI, while the content is translated

Yes. Because it uses the locale parameter stored in session and
it is the same system of the base localization system in Drupal.
Yesterday I have released a new version that address the issue you are reporting
(This was causing also some problems with webform module).
Now localizer changes all menu items links to the correct localized nodes links
and it doesn't need a language prefix anymore.
Example. If you have two alias like : contact (for english) and contatto (for italian) you can
call directly : http://www.example.com/contact or http://www.example.com/contatto
and the correct version of the site is shown.
You can also use :
http://www.example.com/en/contact or http://www.example.com/it/contatto
http://en.example.com/contact or http://it.example.com/contatto

If you are on english version you see as link for contact page
in your menu: http://www.example.com/contact when you change language
to italian the link for the contact page become : http://www.example.com/contatto
It uses the alias that you have specified for every node.
Also the switching block language has been fixed to point to
the localized url alias.

For some reason (which is not clear to me) localizer module does not
allow for automatic ISO language code prefixing, only manually entered
URL aliases are supported this way, which is bad
No specific reason. Simply not implemented. I can add also this as an additional option.

Tr.module is more versatile, allowing for arbitrary hostnames
to be specified to be used (like example.de, example-home.it and example.hu)
For me it has not much sense to use different domains.
You have to register different domains and in a shared hosting environment every
domain has a different workspace, so you cannot share the code of Drupal, only
database. But, of course, it can be also implemented has additional option,
perhaps extending the system of hostname prefix handling.

The path language prefixing could also be arbitrary to allow for prefixes being in the target language
Yes, this is pretty simple to implement. I agree with you.

--
http://www.speedtech.it

Mostly agree. This type of

jose reyero's picture

Mostly agree.

This type of aliasing is already implemented in i18n module, but there is one remaining problem with it. Many users desire to have different hostnames for their translated content, so that search indexers (Google, Yahoo, etc) index their different language content under different names. This is good for later searching.

I just wouldn't call it a problem, but an unimplemented feature :-)

And a desirable feature, I'd say.

However as language-in-path is strip out for incoming requests so all further processing is done with language-less paths I think this would fit nicely into how i18n works. So the reason why this isn't in i18n is only that I've never needed it and nobody else has provided such a patch for the module...

I see

gábor hojtsy's picture

Maybe my wording was a bit misleading, I just intended to point out that this feature should be implemented IMHO in Drupal core.

More natural

mki's picture

More natural, user-friendly (non technical) solution:

http://www.example.com/node/1 - for English,
http://www.example.com/noeud/1 - for French,
http://www.example.com/inhalt/1 - for German.

In that case language variable is smuggled and users will see page in proper language. They are happy becouse they don't need to remember and write foreign-language paths and add language sign.

Ambiguity problem:

  1. theoretically languages can have the same spelling of some words (e.g. proper nouns),
  2. some URI may not contain words to translation, certainly main page. Furthermore, we can't assume that in future we will still use http://www.example.com/node/1 instead of
    http://www.example.com/1, if everything going to be a node. So this will work only for not default operation: http://www.example.com/1/transated-operation (unfortunately default mean frequently used).

Becouse of above issues, it seems that this idea is ineffectual.

In other way, if we decided to translate path, we will have path that is redundant and looking not too good, e.g.:

http://www.example.com/en/node/add
http://www.example.com/fr/noeud/ajouter
http://www.example.com/de/inhalt/hinzufügen

What is your opinion? Where's the golden mean?

I described this here at one time, but I don't know you saw this.

prefixing is required

gábor hojtsy's picture

Prefixing (of the path part or the domain name) is the standard on all major sites you see with different languages. With that you can do www.example.com/node/add and www.example.com/deutch/inhalt/hinzufügen if you wish. It is very flexible. Mixing the URL space with different languages would require users or us to provide translations of the URLs themselfs, or you would not be able to display a taxonomy listing in different languages for example. This is both a performance and human resources problem as far as I see.

Paths already translated

mki's picture

Majority path compotents are already translated (locale packs), but these translation is related only to page content. That applies to translation providing by user as well. Paths are reflected in page titles which have translation. If we don't have translation for path, probably we don't have translation for page content, so dispaying this content in proper language is imposible anyway.

Utility example: If I want to read some book in my national language, I will search this book by title in my national language, not in original language and adding languale sign. Majority product names are translated or designed to be international. This is really natural aproach and Prefixing (...) is the standard on all major sites you see with different languages isn't convincing me, we should place a bet on IRI.

Performance problem? It's difficult to say to how far we can't solve this.

have you tried?

gábor hojtsy's picture

I am operating a site with mostly automatically translated Hungarian paths since Drupal 4.4. That means since more than three years ago. That site now runs Drupal 4.6 and is being upgraded to 5.0. We did have several performance related problems on the way with path translation, like resolving recursive path translations (ie. one should not need to provide two translations or translation rules for taxonomy/term/12 and taxonomy/term/12/0/feed). We have a lot of preg magic stuffed into two functions for translating incoming and outgoing paths, sometimes recursively calling themselfs. After three years I am still not sure it is worth it, although recently we have not done actual benchmarks of how this subsystem of our site is performing, and what percentage of a page load this whole conversion takes.

Site url: http://weblabor.hu/

BTW I would love to see a better implementation of path translation if you have ideas, even if I can only use it at this Hungarian site :)

Unfortunately I just theorize, but...

mki's picture

... all paths are precisely defined in hook_menu functions. We can raise all paths easy and number of paths to search for translation isn't that many. Searching for translations process should start there (in menu cache system). If we are talking about this approach, is this really performance problem?

not all paths are defined in hook_menu()

gábor hojtsy's picture

If you define a "/example" path in hook_menu(), every path that starts with "/example/" will end up in your handler functions. Like "/example/foo" and "/example/bar" and "/example/foo/2". Many of the modules use this to register menu items where the number real paths used are possibly endless. Like taxonomy pages are "/taxonomy/term/2" but they also have "/taxonomy/term/2/0/feed" for feeds, which are not registered at all, but simply parsed on the fly. Clearly "feed" is an english word, which never appears in the menu system. This is just a tiny example though, "/node/2/edit" and "/user/2/edit" all work a similar way.

Internationalization

Group organizers

Group categories

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: