Posted by pkchoo on October 12, 2011 at 4:10pm
Hello, I was hoping someone could help me with a question I have with search engine indexing.
I was doing a search recently for a site I built, so I put the words "nuggucciet cellars" in to perform the search and the 3rd and 4th results are references to the site, but using the development URL (the URL that was being used before the domain name was pointed to hosting).
Obviously, if I type in the domain name fully "nugguccietcellars.com" it finds the site correctly.
How do I keep Google (or any search engine) from displaying the development URL (nuggucci.www65.a2hosting.com/)?
Thank you for your help.
Comments
Modify your robots.txt file
Modify your robots.txt file on the development server so it informs most other search engines to not crawl your site:
Disallow: /I believe you can also use google webmaster tools to claim that domain and then tell it to recrawl the site; once it does, the entries should now be gone. Modifying the robots.txt file should get you most of the way there this way though.
You could also put some sort of password in front of it via a .htaccess file (http://www.elated.com/articles/password-protecting-your-pages-with-htacc... has a lot of info on there. There is also http://tools.dynamicdrive.com/password/ which could get you a chunk of the way there).
Thanks BTMash. This is
Thanks BTMash.
This is probably a stupid question but when I do the
Disallow: /Robots.txt wants site directories, right? Plus, I want the site to be found by the search engines. I just don't want the development URL to be found. Is there a way to add the nuggucci.www65.a2hosting.com so that URL can be blocked from search engines, so it can't be referenced?
I will check out the links you suggested.
Thanks.
That's correct
Robots.txt wants site directories to look at or not look at. SO to not look at anything admin, you could have
Disallow: /admin. In the case above, the / essentially says do not index this site. But atleast from what I am understanding now, are both your development url and your production url pointing to the same codebase on the server?I see one neat trick you can use: have htaccess rewrite the robots.txt file to look at if looking at the dev url (an example at http://www.webmasterworld.com/robots_txt/3957412.htm)
Someone else should be able to confirm it but I think something like:
RewriteCond %{HTTP_HOST} ^nuggucci.www65.a2hosting.com [NC]RewriteRule ^robots.txt$ /robots-dev.txt [L,NC]
should do the trick?
This solution worked really
This solution worked really well for my site. Thanks BTMash!
Mark W. Jarrell
Online Applications Developer
Richland Library
http://www.richlandlibrary.com
http://fleetthought.com
Twitter: attheshow
301 Redirect
The best solution would be to setup a 301 redirect. As suggested by Google: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=3444...
Add the following to your .htaccess file:
RewriteCond %{HTTP_HOST} !^nuggucci.www65.a2hosting.com$ [NC]RewriteRule ^(.*)$ http://nugguccietcellars.com/$1 [L,R=301]
Hope that helps
-G
Redirect Will Redirect Everything
Note that gclicon suggestion will redirect all traffic from nuggucci.www65.a2hosting.com to nugguccietcellars.com, including your development work. You could add another RewriteCond to exempt your IP address so you could continue to develop.
How about using google
How about using google webmaster tools to remove the searches you don't like from search results
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=164734
dev url
I think in order to provide you with a proper solution, we might need so more info on how you're using both URLs.
The nuggucci.www65.a2hosting.com URL is actually a temporary url provided by a2hosting that is created so users can work with their sites before changing domain records.
If you're using both URLs in some sort of a multisite configuration and both sites are not identical, then realEuph solution works.
If you're not using both URLs in some sort of multisite configuration and both URLs contain the exact same site, then my initial solution would be enough.
Hope that makes sense.
-G
Everyone, thank you for your
Everyone, thank you for your help. I totally appreciate it!
gclicon's first suggestion best fits my scenario. The nuggucci.www65.a2hosting.com was just a temporary url that the hosting company provided so I could work with the site before pointing the domain.
I applied your suggestion to the htaccess file... hopefully, it will work.
The google webmasters links are invaluable... thank you.
Thank you again, everyone!
Hello, I tried...
Hello,
I tried...
RewriteCond %{HTTP_HOST} ^nuggucci.www65.a2hosting.com [NC]RewriteRule ^robots.txt$ /robots-dev.txt [L,NC]
and I got an error page saying,
"The page isn't redirecting properly
Firefox has detected that the server is redirecting the request for this address in a way that will never complete"
Any ideas?
Thanks.
no need to worry about robots.txt
The change in the robots.txt file will prevent Google from indexing your temporary domain, but will not keep vistors from visiting the temporary domain. If you applied the redirect that i described you don't have to worry about the robots.txt. Anyone, including Google, will be redirected to your main domain if they visit the temporary domain. Future Google indexing will not index your temporary domain.
Google says, "If you need to change the URL of a page as it is shown in search engine results, we recommended that you use a server-side 301 redirect."
https://www.google.com/support/webmasters/bin/answer.py?answer=93633
Based on what you want to do,
Based on what you want to do, @gclicon's method would be the way to go at it. BTW, for my approach (which is mainly meant to not let the site get indexed; I would password protect it if I didn't want anyone to be accessing it or if it was ok, let users get redirected to the production domain), you can try:
RewriteCond %{HTTP_HOST} ^nuggucci.www65.a2hosting.com [NC]RewriteRule ^robots.txt$ http://nuggucci.www65.a2hosting.com/robots-dev.txt [NC,R=301,L]
Sorry, in my previous post, I
Sorry, in my previous post, I meant to write when I use the code that gclicon gave me, I get the error I mentioned above from Firefox.
When I use the code that BTMash gave me above... nothing seems to happen.
401 challenge
Using a 401 challenge will eliminate prying eyes and block bots from your site as well..I use this multi-purpose approach to keep things from being accidentally being promoted by editors as well.
Thank you BTMash! He made
Thank you BTMash!
He made this change for me (he took the ! out of the expression)
RewriteCond %{HTTP_HOST} ^nuggucci.www65.a2hosting.com$ [NC]and it's working!
Thank you everyone for your input.