Another duplicate content problem
public
group: Search Engine Optimization
TonW - Thu, 2008-06-26 05:10
Hi,
I just checked google webmaster tools and i found that i have 4 pages duplicated from categories (taxonomy) one of them is:
/taxonomy/term/10 that its duplicated of /taxonomy/term/10/0
Is the correct method to disallow all of those duplicate categories with:
Disallow /taxonomy/term/*/0
I would like to just remove all the /taxonomy/term/*/0 in all categories, for example:
/taxonomy/term/1/0
/taxonomy/term/2/0
/taxonomy/term/3/0
...
Thanks.

modules
It looks like that code would work, but I would double check it in the robots.txt validator in Google Webmaster Tools...
Disallow: /taxonomy/term/*/0Do you have any extra taxonomy-related modules installed?
One possible problem is if your pages are linking to those URLs you might accidentally block a lot of your internal links to your taxonomy pages.
No modules
Im not using any taxonomy related module, i have noticed that doesnt matter what you write after the last "/" will link with 200 code to the category.
Example: /taxonomy/term/1/0jkdhajkhckajnca will work too so if somebody is linking outside of your site you will get a new indexed duplicated page.
taxonomy modules
If you go to Yahoo Site Explorer and find out what pages link to this page, it might give a clue to where those URLs are coming from. Just type this into Yahoo Search, replacing example.com with your domain:
link:example.com/taxonomy/term/10/0
If you send me the URL of your site I could take a look.
Looks ok...
Should be fine, have a look at this article for a full run down on robots.txt in Drupal, it's pretty thorough and I've found it very useful.
http://drupalzilla.com/robots-txt
Web Development in Nottingham, UK by Kineta Systems
how?
This (Disallow /taxonomy/term/*/0) would work for 0 at the end but how would it be for anything at the end?
Disallow /taxonomy/term/*/* ?
I have checked that page and looks interesting i have added a few for my robots but couldnt find what i am looking for.
Thanks for the help.
robots.txt
The problem where Drupal sends 200 headers for any URL might be a problem with Views. (?)
See also: http://groups.drupal.org/node/8795
You could block everything with this:
Disallow: /taxonomy/term/*/That means block any URL that has two slashes after term.
Double check it in Google Webmaster Tools though.
Hm
Is some of that problem negated by using Global Redirect...?
taxonomy and global redirect
I think that Global Redirect will only help if the taxonomy URLs are aliased.
Check this issue is related
Check this issue is related to the problem http://drupal.org/node/258399