Views block exposed filters get indexed?

halisemre's picture

Hello recently I am hectic about my drupal sites seo.

Recently I have created a block in my site with exposed filters and I noticed that googlebot is trying all the filters and sometimes giving me...

"Message Illegal choice in field_age_value_many_to_one element"

should i block the spider crawl those filters?

Thanks

Login to post comments

The handbook page Search

smk-ka's picture
smk-ka - Sun, 2009-05-10 14:22

The handbook page Search engine optimization (SEO) for Views contains information on how to prevent this, basically the answer is: yes.

Stefan Kudwien
unleashed mind


Thanks for your fast reply I

halisemre's picture
halisemre - Sun, 2009-05-10 20:35

Thanks for your fast reply I have already read that page but as I stated in the thread

I have a question?
If i use
Disallow: /?
Allow: /
?page=
Disallow: /?page=&*

http://www.mysite.com/?page=1 is ok
http://www.mysite.com/?page=2 is ok

but what about

http://www.mysite.com/?page=0
it is the same as http://www.mysite.com/ so it is kind of you are duplicating the frontpage.

Is there a way to eliminate this problem?

In the mean time this discussion seams to be private. I am new to this groups and how can i make this public so everyone can benefit


Block "page=0" like

J. Cohen's picture
J. Cohen - Sun, 2009-05-10 22:39

Block "page=0" like this:

Disallow: /*page=0$

There is a more simple way to write the rules in that SEO for Views handbook page. Here' is what the handbook page says:

# Disallow all URL variables except for page
Disallow: /*?
Allow: /*?page=
Disallow: /*?page=*&*

[EDIT: removed mistake...]

You could rewrite it as:

Disallow: /*&

That would block all URLs with more than one parameter.

--
My Drupal Tutorials


First Page

mikeytown2's picture
mikeytown2 - Mon, 2009-05-11 19:06

Does Disallow: /*& work on the first page, since the page variable isn't in the url?


robots

J. Cohen's picture
J. Cohen - Mon, 2009-05-11 20:09

Do you mean this URL?
/?page=0

This will block that one:

Disallow: /*page=0

I think it would be better to fix the module so it doesn't link to that kind of URL.

This rule only blocks URLs with more than one parameter:

Disallow: /*&

--
My Drupal Tutorials


yes it works on the first

halisemre's picture
halisemre - Mon, 2009-05-11 19:57

yes it works on the first page.

There is a great tool to test it.

In Google WebMaster Tools page there is "Tools" link click on it then choose "Analyze robots.txt" in there you can find your latest downloaded robots.txt you can find for your site.

If you want to try some new disallow combinations just enter there and in the box below enter the urls to test if it is blocked or not.


in the mean time do you guys

halisemre's picture
halisemre - Mon, 2009-05-11 20:39

in the mean time do you guys think Disallow: /frontpage is a good idea

Do you guys think this is a okey robots.txt I have attached a image because for some reason "*" character strips out upon submission

AttachmentSize
robots2.png 66.23 KB

/frontpage

J. Cohen's picture
J. Cohen - Tue, 2009-05-12 01:04

You don't need /frontpage if you have the Global Redirect Module installed. Global Redirect will automatically redirect it in a search engine friendly way.

--
My Drupal Tutorials


well for some reason the

halisemre's picture
halisemre - Tue, 2009-05-12 01:49

well for some reason the robots also indexing the http://www.example.com/frontpage?field_
?field_ is from the views exposed filter search query so I have included disallow: /frontpage*


Either this: Disallow:

J. Cohen's picture
J. Cohen - Tue, 2009-05-12 02:18

Either this:

Disallow: /*?field_

or

Disallow: /frontpage

or

Disallow: /frontpage?

would work... depending on what other URLs might be getting spidered.

You could grep the logs to be sure -- something like to see what weird URLs Google might be hitting:

$ grep 'Googlebot\/' access.log | grep 'field_' > googlebot_field_.txt

--
My Drupal Tutorials


J. Cohen I really appreciate

halisemre's picture
halisemre - Tue, 2009-05-12 21:47

J. Cohen I really appreciate your feedback.