Why are URLs available for non-translated content

I've been having a look at the internationalization module for the week or so and a couple of things have put me off using it. Of primary concern is that it generates URLs for untranslated content. For example, if I create an English page with the URL 'A' then the page will automatically be available via the URL 'ja/A'.

This isn't great at all in SEO terms as two URLs are being created which contain the same content. This can get you penalized by goodle. Is there any way of preventing this?

Also, when the language switcher is clicked when viewing a node which doesn't have a Japanese translation it simply displays the same page but with the 'ja' prefix. This doesn't seem very friendly as the user is likely to think that the language switcher isn't working. Surely if a translation isn't available the user should be redirected to a designed page in the required language (either the homepage or a page which provides information that a translation isn't available).

I found other odd things too. For instance when creating menus the language selection seemed to serve no purpose whatsoever. Visiblity of menu items seemed to be determined by the language of the node which they pointed to rather than the menu item's language selection.

Also, why does the module try to translate the administration menu when 'Japanese' is selected. When I select Japanese I want to view the content which I've provided in Japanese but I've no interest in having Drupal's menus translated.

I also noticed that URLs to translations which I provide can't be directly referenced when creating menus. I get a message that the path is not accessible, or something to that effect.

Overally I haven't been very impressed by the Internationalization module so far, but then again, I could be making some silly mistakes, in which case I'd be really grateful if somebody could point me in the right direction.

My site goes live in Japanese too by the end of next week. Currently I'm planning not to use this module and rather simply display menus and blocks based on the prefix of urls. This works fine providing that I don't have pages which I want to be viewable in both languages - it gets tricker then.

Anyway, thanks in advance for any help,

Regards,

Richard

Login to post comments

Just fyi, this behavior is

Jose Reyero's picture
Jose Reyero - Tue, 2009-09-01 11:46

Just fyi, this behavior is not i18n module but Drupal core.

About SEO issues. Honestly: fuck Google. What I care most are website users and sure they prefer to get the page in the original language (with the UI translated) if no translation exists yet.


hmmm

RichieRich - Sat, 2009-09-05 02:20

Well, you can say fuck google, but if you end up with the wrong url being indexed you are shooting yourself in the foot. For example, if google was to index pages with English content with a 'ja' url prefix then people searching in English could be directed to a page which has a Japanese menu. This certainly isn't user-friendly.

You could get seriously penalized for having so many pages with duplicate content. Imagine if you've got 6 languages and a number of pages which aren't translated. Having 6 pages with the same content wouldn't go unnoticed.

I'd love to say fuck google too,,,in fact, I say it a lot, but you have to be pragmatic.

I've actually gone ahead without using this module now due to the above reason. Doing so has given me some minor problems but on the whole I think I've made the best decision.

Thanks,

Rich


I think Drupal meshes pretty

mfb's picture
mfb - Sat, 2009-09-05 04:12

I think Drupal meshes pretty well with Google's recommendations. See http://googlewebmastercentral.blogspot.com/2008/08/how-to-start-multilin... Drupal core allows you to easily "Put the content of every language in a different subdirectory. This is easier to handle when updating and maintaining your site. For our example, you would have example.com/en/, example.com/de/, and example.com/es/."

Drupal makes it pretty obvious what language each page is in via e.g. <html xmlns="http://www.w3.org/1999/xhtml" lang="ja" xml:lang="ja"> and hopefully many/most bots do take this into account. My limited testing suggests that a page with lang="ja" but containing some English content as well will be indexed by Googlebot as both Japanese and English. Which isn't ideal when you're searching for English pages, but at least it works when you're searching for Japanese pages...


Thanks

RichieRich - Sat, 2009-09-05 05:52

Thanks, that was a really useful link. On the link you provided it says that content is not considered to be a duplicate if it is in a different language. However, in cases where no translation for a node exists you can clearly end up with multiple pages with exactly the same content in the same language.

Yep, you may be right about the lang="ja" thing. Actually, this is a problem which I'm having to deal with at the moment. As I've gone ahead without the internationalization module all of my pages have lang='en' at the top irrespective of whether they contain Japanese text or English. It looks like I'm going to have to select a specific page.php.tpl file to override this setting based on the page's path....unless of course anybody knows of an easier way that I can go about modifiying this language directive.

Google currently isn't indexing my Japanese pages and I suspect that this may be the cause. Having said that my translation has only been up for 3 days.

Here's my site by the way. I highly recommend that you visit this beautiful island if you ever get the chance.

www.ishigaki-japan.com

Thanks,

Rich