robots.txt

?q parameter

I just discovered an unfortunate function in Drupal 5.x (Drupal 5.20) which creates multiple content in Google.

http://www.example.com/?q=Drupal

Where Drupal is an url alias.

http://www.example.com/Drupal
&
http://www.example.com/?q=Drupal
are offcourse the same but google catches both and indexes them.

adding Disallow: /?q= to robots.txt wil block these multiple urls.

6 comments

&from=1289 and node?page= produces multiple pages and fictional pages

Currently in Drupal 5.10 it produces multiple content in multiple urls:

domain/?page=16&from=1289
domain/?page=16&from=1357

Are currently indexed by Googlebot. But is being showed as double content for the same page in Google Webmaster Tools. In fact it displays the ?page=16

Similar to this ?page= produces fictional pages for the last page in tracker pages.

These pages are indexed by google:
domain/node?page=565
domain/node?page=751
domain/node?page=759
domain/node?page=787&%24Version=0&%24Path=/&%24Domain=.domainname.xx

But currently the last page is:
domain/?page=568

7 comments · Read more

Problem with thousands of pages made by refine by taxonomy and search engines

I enabled the module: refine by taxonomy, http://drupal.org/project/refine_by_taxo a while back and didn't think much about it until I discovered in Google Webmaster Tools that it produced some 50.000 additional pages which of-course was indexed by Googlebot !

My site has some 6.500 nodes at the time being covering politics in Denmark with the option for 12 taxonomies on each refine by taxonomy page. I have some 500 taxonomies defined. Refine by taxonomy is currently only avialable on Drupal 5.x

11 comments · Read more
J. Cohen's picture

Drupal Robots.txt

The default robots.txt file in Drupal 5.* has some problems. Also, the more modules one adds, the more duplicate content and low-quality URLs are created.

What robots.txt issues have people come across? Here are a few of my common modifications:

10 comments · Read more
Syndicate content