a noindex tag

Events happening in the community are now at Drupal community events on www.drupal.org.
greggles's picture

I just recently learned about the meta robots noindex tag and realized that it could be useful in removing some of the lower quality pages in a site from the index. For example, user/ user/register user/password are all pages which frequently get indexed early and stay in the index, but which are not really useful parts of the SERPS.

I'm sure that you all can think of other pages which you would like to exclude.

My proposal for the module is pretty simple:

Provide an interface where users select a Drupal path, whether it should be index/noindex, whether it should be follow/nofollow.
Then, somewhere early in the page calling cycle, check the page and add in any meta tags specified.

Anyone have any thoughts on the usefulness of this idea or how to make it better?

I know that the tag is not supported by all engines, but it is supported by enough that I think it could be worth it.

Comments

Good idea

codevelopment's picture

Yes, sounds like a good idea. I've been updating the .htaccess file so far to try and do this, but the problem with that is that those pages that have already been indexed often stay indexed for some time after. I imagine this would specifically tell them to now exclude the page as opposed to not indexing it in future.

?

niklp's picture

Isn't this what robots.txt is supposed to be for? Am I missing something here...? :p

robots.txt no always an option

hansfn's picture

I know I'm replying very late, but I found this thread while searching for the Meta tags, eh Node Words, module. For completion I just wanted to add:

In many situations you don't have access to robots.txt - you don't have access to the root folder of the domain where your Drupal is installed. Think personal home pages for students at universities, people using hosting services but without their own domain name and so on. Then the robots meta tag is the only option.

Hans

robots.txt is different to noindex

eaochoac's picture

with robots,txt you can prevent a url for being crawled, but it still can be indexed (without description or title, anyway)
you should combine meta tag with /robots.txt.

robots.txt

Z2222's picture

You can do that with robots.txt. See this post.

really?

greggles's picture

Perhaps my knowledge of robots.txt is not as strong as it should be, then!

I guess I was thinking for pages like:

I would like to have pages with this pattern in the index:
http://example.com/user/*

I would potentially like to remove pages like this from the index:
http://example.com/user/{any}/track
http://example.com/user/{any}/feed
http://example.com/node/{any}/feed

As far as I know, that kind of wildcarding would not be possible in robots.txt which would mean that I would need to have a

Disallow: /user/1/feed
Disallow: /user/2/feed
Disallow: /user/.../feed
Disallow: /user/n/feed

for every user, right?

I look forward to the insight from you and NikLP on this idea.

--
Knaddisons Denver Life | mmm Chipotle Log | The Big Spanish Tour

theory vs. practice

greggles's picture

Of course, the difference between theory and practice is that in theory there is no difference ;)

I just read the rest of that robots.txt discussion you linked to and realized that while wildcards aren't supported in the spec they are supported in practice by the major engines, so who cares anymore.

Great news!

/me goes to edit his robots.txt module entries ;)

--
Knaddisons Denver Life | mmm Chipotle Log | The Big Spanish Tour

wildcards

Z2222's picture

You can also block the /feed URLs with this:

Disallow: /*/feed$

All three major search engines recognize that format. It will block all RSS feeds but Drupal's main RSS feed /rss.xml

If you have other important feeds you would like indexed you could give them aliases that don't end with /feed

Drupal feeds are a problem for Google -- they basically contain the same text content as the taxonomy list pages except that they are marked up with RSS/XML instead of X/HTML. Duplicate content.

To block /track pages you could use this:

Disallow: /*/track$

There is a module that adds

yaph's picture

There is a module that adds meta tags: http://drupal.org/project/nodewords
The above features could be added in this module, which is quite useful because meta tags are not ignored by search engines and their users.

A discussion on robots.txt can be found here: http://groups.drupal.org/node/5391

--
Websites: <a href="http://www.seo-expert-blog.com" title="SEO Expert Blog>SEO-Expert-Blog.com | Torlaune.de

hmm

niklp's picture

I think that's a fair point, but realistically, you're unlikely to build a site and then have an increasing number of pages that you don't want indexed - I suggest the probable outcome would be that you have "finished" a site, and then have a static list of pages that you do not wish to have indexed, and then can easily control that list via robots.txt.

Of course it would be easier from an administrator's (site manager rather than dev, here) point of view to perform this task at a node level from the admin pages. However, it's really just bloating the system in effect, as it will perform checks of one sort or another per node view, which impacts on "us" (and the users) rather than the target audience, ie the search engines, who, in regards to robots.txt, currently "take the strain" of the processing.

So ultimately probably not worth it. The only time I can see that option being really useful is if a site manager was really adding/deleting massive swathes of content (like sections or something) all the time, and there was a need to block one or two specific pages in each section. Even that's pretty tenuous at best, I think.

good point

greggles's picture

I always forget that module because "nodewords" doesn't ring in my brain as "meta tags". Thanks for the tip!

--
Knaddisons Denver Life | mmm Chipotle Log | The Big Spanish Tour

Ah!

niklp's picture

Heh, yeah this is one of those cases where the word "misnomer" comes into play... :)

From the nodewords.module page: (for the sake of completion)

The module may also be known under following names: nodewords or node (key)words. Meta tags is the preferred name.

The moral being, get it right first time, or pay the ultimate price! :p

to create tags cloud - module meta tags

infortec's picture

Good morning friend! my name is Fernando, I am Brazilian programmer (Porto Feliz-SP)
and it would like to remove a doubt with you, please! As I create a tags cloud
according to his/her site below: and as I leave them in big sources with link
exactly as it is in his/her site? Thank you very much and a happy one 2008!

I hug of the users of Drupal of Brazil!

Fernando H. Santorsula

Tags example
http://www.ramiro.org/

Metatag extension