Beware duplicate content - redirect IP to URL with .htaccess

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
PRFB's picture

Today I discovered something disturbing: a bunch of pages of my Drupal 5 cruise guide site were missing from Google's index under the site name, and showed up under the IP address instead. Worse yet, Google has been known to penalize sites for duplicating content on a large scale - a real risk if it's seeing both the URL and the IP address. I did some digging, and I think I found a solution worth sharing.

First, here's an example: my blog entry on “How to get Seasick” was in the Google index when I searched the URL, site:72.52.247.79/site_blog/how_to_get_seasick, but NOT when I searched the main site, site:cruisesavvy.com/site_blog/how_to_get_seasick.

In theory there shouldn't really be a problem because Googlebot should never find the IP because who would ever link to something so unweildy? But, as it turns out, Google somehow picked up both the URL and the IP address from a post I wrote in Drupal Groups, and some of my own pages. Which is weird, because the links were all URLs, even in the Google cache, but nonetheless the page shows up on searches for link:72.52.247.79. Needless to say, I don't use IP when linking pages.

Does Drupal sometimes change URLs to IPs for some reason?? I'm perplexed.

But whatever the cause, it seems that the way around this is to add the following line to .htaccess , as part of the rewrite rules:

RewriteCond %{HTTP_HOST} ^[0-9]+(\.[0-9]+){3} [OR]

That’s in addition, of course, to un-commenting either

# RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
# RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

Or

# RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
# RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]

Comments

THANK YOU!

emeelio's picture

I wish I would have thought of this before. One of my clients had this issue in the past and was really frustrating.

THANKS! :O)

Search Engine Optimization (SEO)

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week