&from=1289 and node?page= produces multiple pages and fictional pages

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
FlemmingLeer's picture

Currently in Drupal 5.10 it produces multiple content in multiple urls:

domain/?page=16&from=1289
domain/?page=16&from=1357

Are currently indexed by Googlebot. But is being showed as double content for the same page in Google Webmaster Tools. In fact it displays the ?page=16

Similar to this ?page= produces fictional pages for the last page in tracker pages.

These pages are indexed by google:
domain/node?page=565
domain/node?page=751
domain/node?page=759
domain/node?page=787&%24Version=0&%24Path=/&%24Domain=.domainname.xx

But currently the last page is:
domain/?page=568

You can stop this by blocking it in robots.txt:
Disallow: *?page
Disallow: *from=

Bug report for Drupal 5.10 here:
http://drupal.org/node/307244

Comments

*?page=

Z2222's picture

I think it might be risky to block *?page= because then you may block all links to your older content. When I tried something similar on my blog, I noticed a traffic drop (see my robots.txt where it says "removed 2may07" -- that rule was blocking the paginated taxonomy views).

Example:

A taxonomy term called "drupal" in a "tags" vocabulary might create this URL:
example.com/tags/drupal

Everything that you tag with "drupal" gets listed in that paginated "drupal" section. After you have a few dozen posts with that tag, you'll have pages like this:
example.com/tags/drupal?page=1
example.com/tags/drupal?page=2
example.com/tags/drupal?page=3
etc.

After a post moves to ?page=1 then the only link to that post may be from that paginated "drupal" section on page=2, page=3, etc. So blocking the pagination blocks search engines from reaching your older posts (unless you have an alternate view setup that leads search engines into your content).

I think the solution is for Drupal to use clean URLs for pagination and to send 404s if a page doesn't exist.

example.com/tags/drupal
example.com/tags/drupal/1
example.com/tags/drupal/2
example.com/tags/drupal/3

...and also to change the HTML <title> on each pagination page like this:

<title>Drupal | Site Name</title>
<title>Drupal - Page 1 | Site Name</title>
<title>Drupal - Page 2 | Site Name</title>
<title>Drupal - Page 3 | Site Name</title>

(The tag is "Drupal".)

That should fix the problem in Google Webmaster Tools...

I think the ?page=987 (the ones that don't exist) can sometimes happen if you change the number of posts per page. For example, if you have "5 posts per page with 1000 posts" you get 200 paginated pages (example.com/node?page=199). Those URLs will get indexed. If you then change it to "10 posts per page", you will have only 100 paginated pages (max. example.com/node?page=99). So Google will keep requesting ?page=199 because there used to be links to it, and it won't de-index the page because it still sends "200 OK" headers.

--
My Drupal Tutorials

I'm having the same problems

patchak's picture

I'm having the same problems in several sites with Drupal 6... Actually I have hundreds and hundreds of pages with the same title just because I have hundreds of pages in the pager for a specific view.

Could be a bit more specific on how to insert the page number in the view's page title please??

thanks a lot, this would solve a lot of SEO issues with Drupal I think

Patchak

Yeah, you 're right about changing post per page

FlemmingLeer's picture

J.Cohen,

Yeah, you 're right about changing post per page is the cause. As I recall it I have gone from 6 per page to 10 per page and that is probably the reason why the extra pages are being indexed. But the right thing would be to show a 404 for those pages.

I've been searching the forums for a solution to showing the page number in the title for the last couple of hours. But I found nothing that worked.

I found this http://groups.drupal.org/node/3472#comment-31717 but It created errors on my pages as well as http://groups.drupal.org/node/3472#comment-34711

I tried http://drupal.org/project/cleanpager but to get it to work with paging you would have to add the taxonomy terms and pages manually and since I have 500 taxonomy terms with url aliases at the moment it's quite some task.

So I ended up requesting a feature showing the number page in the Page Title module instead:
http://drupal.org/node/307796

That way a lot more Drupal sites can benefit from a suitable solution.

Regarding the risk of loosing traffic. I commented out the
Disallow: *?page
for now. :/

Nobody wants that.

Even a turtle reaches it´s goal...

FlemmingLeer's picture

In Demystifying the "duplicate content penalty" Google says: no such thing as a "duplicate content penalty"
Read more here:
http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicat...

Even a turtle reaches it´s goal...

.

Z2222's picture

.

nodewords module has a fix for this

FlemmingLeer's picture

Described here:
http://drupal.org/node/294996#comment-1241947

Get nodewords module here:
http://drupal.org/project/nodewords

There are both drupal 5.x and drupal 6.x versions.

Even a turtle reaches it´s goal...

.

Z2222's picture

.