Another duplicate content problem

Events happening in the community are now at Drupal community events on www.drupal.org.
TonW's picture

Hi,

I just checked google webmaster tools and i found that i have 4 pages duplicated from categories (taxonomy) one of them is:

/taxonomy/term/10 that its duplicated of ‎/taxonomy/term/10/0

Is the correct method to disallow all of those duplicate categories with:

Disallow /taxonomy/term/*/0

I would like to just remove all the /taxonomy/term/*/0 in all categories, for example:

/taxonomy/term/1/0
/taxonomy/term/2/0
/taxonomy/term/3/0
...

Thanks.

Comments

modules

Z2222's picture

It looks like that code would work, but I would double check it in the robots.txt validator in Google Webmaster Tools...

Disallow: /taxonomy/term/*/0

Do you have any extra taxonomy-related modules installed?

One possible problem is if your pages are linking to those URLs you might accidentally block a lot of your internal links to your taxonomy pages.

No modules

TonW's picture

Im not using any taxonomy related module, i have noticed that doesnt matter what you write after the last "/" will link with 200 code to the category.

Example: /taxonomy/term/1/0jkdhajkhckajnca will work too so if somebody is linking outside of your site you will get a new indexed duplicated page.

taxonomy modules

Z2222's picture

If you go to Yahoo Site Explorer and find out what pages link to this page, it might give a clue to where those URLs are coming from. Just type this into Yahoo Search, replacing example.com with your domain:
link:example.com/taxonomy/term/10/0

If you send me the URL of your site I could take a look.

Looks ok...

niklp's picture

Should be fine, have a look at this article for a full run down on robots.txt in Drupal, it's pretty thorough and I've found it very useful.

http://drupalzilla.com/robots-txt

Web Development in Nottingham, UK by Kineta Systems

how?

TonW's picture

This (Disallow /taxonomy/term/*/0) would work for 0 at the end but how would it be for anything at the end?

Disallow /taxonomy/term/*/* ?

I have checked that page and looks interesting i have added a few for my robots but couldnt find what i am looking for.

Thanks for the help.

robots.txt

Z2222's picture

The problem where Drupal sends 200 headers for any URL might be a problem with Views. (?)
See also: http://groups.drupal.org/node/8795

You could block everything with this:

Disallow: /taxonomy/term/*/

That means block any URL that has two slashes after term.

Double check it in Google Webmaster Tools though.

Hm

taxonomy and global redirect

Z2222's picture

I think that Global Redirect will only help if the taxonomy URLs are aliased.

Check this issue is related

TonW's picture

Check this issue is related to the problem http://drupal.org/node/258399

Search Engine Optimization (SEO)

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week