Today I discovered something disturbing: a bunch of pages of my Drupal 5 cruise guide site were missing from Google's index under the site name, and showed up under the IP address instead. Worse yet, Google has been known to penalize sites for duplicating content on a large scale - a real risk if it's seeing both the URL and the IP address. I did some digging, and I think I found a solution worth sharing.
First, here's an example: my blog entry on “How to get Seasick” was in the Google index when I searched the URL, site:72.52.247.79/site_blog/how_to_get_seasick, but NOT when I searched the main site, site:cruisesavvy.com/site_blog/how_to_get_seasick.
In theory there shouldn't really be a problem because Googlebot should never find the IP because who would ever link to something so unweildy? But, as it turns out, Google somehow picked up both the URL and the IP address from a post I wrote in Drupal Groups, and some of my own pages. Which is weird, because the links were all URLs, even in the Google cache, but nonetheless the page shows up on searches for link:72.52.247.79. Needless to say, I don't use IP when linking pages.
Does Drupal sometimes change URLs to IPs for some reason?? I'm perplexed.
But whatever the cause, it seems that the way around this is to add the following line to .htaccess , as part of the rewrite rules:
RewriteCond %{HTTP_HOST} ^[0-9]+(\.[0-9]+){3} [OR]
That’s in addition, of course, to un-commenting either
# RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
# RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
Or
# RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
# RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
Comments
THANK YOU!
I wish I would have thought of this before. One of my clients had this issue in the past and was really frustrating.
THANKS! :O)