Another duplicate content problem

public
TonW - Thu, 2008-06-26 05:10

Hi,

I just checked google webmaster tools and i found that i have 4 pages duplicated from categories (taxonomy) one of them is:

/taxonomy/term/10 that its duplicated of ‎/taxonomy/term/10/0

Is the correct method to disallow all of those duplicate categories with:

Disallow /taxonomy/term/*/0

I would like to just remove all the /taxonomy/term/*/0 in all categories, for example:

/taxonomy/term/1/0
/taxonomy/term/2/0
/taxonomy/term/3/0
...

Thanks.

modules

J. Cohen's picture
J. Cohen - Thu, 2008-06-26 07:59

It looks like that code would work, but I would double check it in the robots.txt validator in Google Webmaster Tools...

Disallow: /taxonomy/term/*/0

Do you have any extra taxonomy-related modules installed?

One possible problem is if your pages are linking to those URLs you might accidentally block a lot of your internal links to your taxonomy pages.


No modules

TonW - Thu, 2008-06-26 09:53

Im not using any taxonomy related module, i have noticed that doesnt matter what you write after the last "/" will link with 200 code to the category.

Example: /taxonomy/term/1/0jkdhajkhckajnca will work too so if somebody is linking outside of your site you will get a new indexed duplicated page.

taxonomy modules

J. Cohen's picture
J. Cohen - Fri, 2008-06-27 00:28

If you go to Yahoo Site Explorer and find out what pages link to this page, it might give a clue to where those URLs are coming from. Just type this into Yahoo Search, replacing example.com with your domain:
link:example.com/taxonomy/term/10/0

If you send me the URL of your site I could take a look.


Looks ok...

NikLP's picture
NikLP - Thu, 2008-06-26 10:01

Should be fine, have a look at this article for a full run down on robots.txt in Drupal, it's pretty thorough and I've found it very useful.

http://drupalzilla.com/robots-txt

Web Development in Nottingham, UK by Kineta Systems


how?

TonW - Thu, 2008-06-26 10:34

This (Disallow /taxonomy/term/*/0) would work for 0 at the end but how would it be for anything at the end?

Disallow /taxonomy/term/*/* ?

I have checked that page and looks interesting i have added a few for my robots but couldnt find what i am looking for.

Thanks for the help.

robots.txt

J. Cohen's picture
J. Cohen - Fri, 2008-06-27 00:40

The problem where Drupal sends 200 headers for any URL might be a problem with Views. (?)
See also: http://groups.drupal.org/node/8795

You could block everything with this:

Disallow: /taxonomy/term/*/

That means block any URL that has two slashes after term.

Double check it in Google Webmaster Tools though.


Hm

NikLP's picture
NikLP - Fri, 2008-06-27 11:29

Is some of that problem negated by using Global Redirect...?


taxonomy and global redirect

J. Cohen's picture
J. Cohen - Fri, 2008-06-27 20:36

I think that Global Redirect will only help if the taxonomy URLs are aliased.


Check this issue is related

TonW - Fri, 2008-06-27 11:41

Check this issue is related to the problem http://drupal.org/node/258399