Internal href links are getting generated incorrectly by Drupal

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
Anonymous's picture

I have an English/Japanese drupal site and some of the URL links in my pages are not being generated correctly when I view the translated page.

For example in English I put the link as href="documents" and in my translation I put it as href="資料". But when I view the Japanese version of the page the translated link is rendered as as "node/資料". The correct link should be ja/content/資料

Why is Drupal adding 'node' in front of my link and how can I fix this?

Comments

using kanji or anything other

ussher's picture

using kanji or anything other than romanji in the address bar is asking for trouble. It will not look as nice but try either using rawurlencoding for the address bar or the node/id for the Japanese side of things.

if you must use japanese in the url then take a look at transliteration http://drupal.org/project/transliteration

Found a solution

totsubo's picture

Looks like Drupal is having issues trying to figure out how to automatically generate internal links to translated content if the link is of the form x/y

I've found a solution to this by writing the link as /x/y (adding a first forward slash). With this Drupal then correctly changes this to ja/x/y

PS SO far no issues using Japanese in link. Then again I'm not using IE, which I guess I should since that is what most Japanese use ...

Nice one. Glad you found a

ussher's picture

Nice one. Glad you found a solution.

IE will have issues, so will a lot of places where links are pasted and automatically converted into a link. like forum comments.

Also check out what some major sites do. take a look at something like yahoo japan and see how many of their urls contain kanji. There is a reason.

You probably need to double

Antoine Lafontaine's picture

You probably need to double check your pathauto configuration to make sure it handles your japanese content auto aliasing properly. Another thing is that your content might have been aliased before your did your pathauto settings. just reedit that content and delete the curent alias and set it to be set automatically. You can also delete all aliases from nodes and have pathauto try to batch generate the alias again...

@ussher:

It doesn't look nice on browsers that doesn't support displaying utf-8 url properly... The standard exist, is valid and it should be supported correctly. Providing urls in utf-8 is quite valid and will not break on old browsers (the display will just be unintelligible for humans) . Wiki pedia and amazon are providing such links. I think it is a good idea to start providing more content and more reasons for browser to properly display utf-8 urls. From my experience most 'regular' users don't even bother to look at or remember a url... I believe supporting and promoting it will just improve search result and content findability.

my two cents :)

Thanks for the vote of

totsubo's picture

Thanks for the vote of confidence Antoine :)

Antoine: I don't see any translation or i18n related setting in pathauto. Did I miss something?

No, there is not.

aiwata55's picture

In my experience, Pathauto takes Japanese texts out of the box. For example, suppose you have a Japanese title for a node and want to use the title as a token for Pathauto. Pathauto just takes the Japanese title and use it in the aliased URL address.


Aki Iwata
FOREST & trees

You should have two (or more)

Antoine Lafontaine's picture

You should have two (or more) pathauto settings for all your content types, one for each language available on your site. then you can set the auto aliasing settings for both languages... someone might want to have a path like article/name-of-article be 記事/記事ータイトル in japanese (if you decide to have utf-8 urls) or kiji/romaji-title flavored urls.

If you don't have this in your pathauto settings, please double check you module version... maybe you need a dev version of it... (but I doubt it you need dev)

A bit worried now ...

totsubo's picture

Looking under admin/build/path/pathauto I only see the following and nothing looks like it's for i18n

General settings
Punctuation settings
Node path settings
Taxonomy term path settings
User path settings
Catalog path settings
Forum path settings

Under the Page content type I have multilingual settings but no pathauto related settings ...

When I create content I have the regular one setting for pathauto:

[ ] Automatic alias

[ ]
Optionally specify an alternative URL by which this node can be accessed. For example, type "about" when writing an about page. Use a relative path and don't add a trailing slash or the URL alias won't work.

On one of my bilingual site I

Antoine Lafontaine's picture

On one of my bilingual site I have this

under admin/build/path/pathauto >
NODE PATH SETTINGS>

Default path pattern for Article (applies to all Article node types with blank patterns below):
Pattern for all language neutral Article paths:
Pattern for all Article paths in English:
Pattern for all Article paths in Japanese:

My article contents type has:

OPTIONS FOR NODE LANGUAGE
Set current language as default for new content selected
Require language (Do not allow Language Neutral) selected

under MULTILANGUAGE OPTIONS
Extended language support:
Normal - All enabled languages will be allowed selected

I have most of the i18n modules turned on except
poll aggregate
profile translation

Maybe you didn't install the i18n module (you you did, but sometimes...)

Correct!

totsubo's picture

You're right, those settings are there. For the pathauto settings I just left them as is and didn't change the node patterns. The default one (content/[title-raw]) I figured was good enough. Ss aiwata pointed out it just takes the Content title to create the link.

Maybe I was wrong in creating internal links as a/b and really the correct way is /a/b and it was juts a fluque that a/b was working in English?

I'm using 6.x-1.x-dev

Antoine Lafontaine's picture

I'm using 6.x-1.x-dev (2009-Dec-02) of the module.
The dev version available now is a bit more up-to-date.
It is perfectly stable for what I can tell.

Hello, I see what you are

56rosa's picture

Hello,

I see what you are saying there. I can see the pattern for the Japanese language but what do I need to do in order to have Japanese in the URLs? So far, I'm getting lots of question marks for the page title for example. It doesn't take into consideration the characters for the Japanese language. What do I need to please?

56rosa, check that your

Garrett Albright's picture

56rosa, check that your browser is interpreting the page using the correct encoding. Check that your browser's text encoding settings are set to something like "Auto-detect." Unfortunately, this setting will be changed in different places in different browsers, so I can't tell you exactly how to do that, but if you browse through the options in your browser's menu bar, you'll probably find something along these lines.

The issues that i have had in

ussher's picture

The issues that i have had in the past have been from validating the urls once they get to php. its easier to check that the url does not contain anything malicious if you know that its only going to be either numbers letters or hyphens.

php just doesn't have anything nice to validate kanji yet. (or last time i looked anyhow.)

and even wikipedia is using rawurlencode for their addresses. its either the browser converting that rawurlencoding into the correct japanese display for the user or some piece of javascript. when you take that address bar and paste it somewhere like here:
http://ja.wikipedia.org/wiki/%E3%83%81%E3%83%A3%E3%82%A4%E3%83%AD%E3%82%...

you can see that there is no kanji to be seen. (using FF)

not sure what happens if you change that link to the native unurlencoded and past it here, lets see:
http://ja.wikipedia.org/wiki/チャイロイエヘビ

--edit--
oooh drupal doesn't think its a link and doesn't turn it into one.

You are right about some

Antoine Lafontaine's picture

You are right about some browsers urlencoding urls and not displaying utf-8 equivalents, but this is changing rapidly. But those links are totally functional. They are ugly but functional. The advantage of those link is that they are more "future proof" and are in line with the semantic web.

My point is that it's a question of choice. Both options are valid for now. In the (near) future, semantic urls will have more value than "computer" friendly links...

When it comes to validation, php (5 only, not sure) offers mb_ (multibyte) variants of most (all?) the string functions offered by PHP. This will let you do some limited validation. I guess the next step would be to find libraries developed by Japanese to provide more complex validation patterns (I'm confident some opensource solution exist, although maybe not easy to find for us)

And yes, many tools in Drupal are english centric... or at least letter centric... (like the url filter) this means we have two (maybe three) choices... not use, transliterate (often bad when done automagically done) or fix the tools... I think we should try to aim for providing a fix (while dodging in the meantime)