a noindex tag

Posted by greggles on August 2, 2007 at 2:35pm

I just recently learned about the meta robots noindex tag and realized that it could be useful in removing some of the lower quality pages in a site from the index. For example, user/ user/register user/password are all pages which frequently get indexed early and stay in the index, but which are not really useful parts of the SERPS.

I'm sure that you all can think of other pages which you would like to exclude.

My proposal for the module is pretty simple:

Provide an interface where users select a Drupal path, whether it should be index/noindex, whether it should be follow/nofollow.
Then, somewhere early in the page calling cycle, check the page and add in any meta tags specified.

Anyone have any thoughts on the usefulness of this idea or how to make it better?

I know that the tag is not supported by all engines, but it is supported by enough that I think it could be worth it.

Comments

Good idea

Posted by codevelopment on August 2, 2007 at 3:21pm

Yes, sounds like a good idea. I've been updating the .htaccess file so far to try and do this, but the problem with that is that those pages that have already been indexed often stay indexed for some time after. I imagine this would specifically tell them to now exclude the page as opposed to not indexing it in future.

?

Posted by niklp on August 2, 2007 at 5:23pm

Isn't this what robots.txt is supposed to be for? Am I missing something here...? :p

Web Development in Nottingham, UK by Kineta Systems / Follow me on Twitter! @NikLP

robots.txt no always an option

Posted by hansfn on March 12, 2008 at 11:08am

I know I'm replying very late, but I found this thread while searching for the Meta tags, eh Node Words, module. For completion I just wanted to add:

In many situations you don't have access to robots.txt - you don't have access to the root folder of the domain where your Drupal is installed. Think personal home pages for students at universities, people using hosting services but without their own domain name and so on. Then the robots meta tag is the only option.

Hans

Ok then...

Posted by niklp on March 12, 2008 at 9:38pm

...how about this module? http://drupal.org/project/robotstxt

Web Development in Nottingham, UK by Kineta Systems

Web Development in Nottingham, UK by Kineta Systems / Follow me on Twitter! @NikLP

robots.txt is different to noindex

Posted by eaochoac on June 29, 2010 at 5:43pm

with robots,txt you can prevent a url for being crawled, but it still can be indexed (without description or title, anyway)
you should combine meta tag with /robots.txt.

robots.txt

Posted by Z2222 on August 2, 2007 at 10:15pm

You can do that with robots.txt. See this post.

really?

Posted by greggles on August 3, 2007 at 8:23pm

Perhaps my knowledge of robots.txt is not as strong as it should be, then!

I guess I was thinking for pages like:

I would like to have pages with this pattern in the index:
http://example.com/user/*

I would potentially like to remove pages like this from the index:
http://example.com/user/{any}/track
http://example.com/user/{any}/feed
http://example.com/node/{any}/feed

As far as I know, that kind of wildcarding would not be possible in robots.txt which would mean that I would need to have a

Disallow: /user/1/feed
Disallow: /user/2/feed
Disallow: /user/.../feed
Disallow: /user/n/feed

for every user, right?

I look forward to the insight from you and NikLP on this idea.

--
Knaddisons Denver Life | mmm Chipotle Log | The Big Spanish Tour

knaddison blog | Morris Animal Foundation

theory vs. practice

Posted by greggles on August 3, 2007 at 9:13pm

Of course, the difference between theory and practice is that in theory there is no difference ;)

I just read the rest of that robots.txt discussion you linked to and realized that while wildcards aren't supported in the spec they are supported in practice by the major engines, so who cares anymore.

Great news!

/me goes to edit his robots.txt module entries ;)

--
Knaddisons Denver Life | mmm Chipotle Log | The Big Spanish Tour

knaddison blog | Morris Animal Foundation

wildcards

Posted by Z2222 on August 3, 2007 at 9:27pm

You can also block the /feed URLs with this:

Disallow: /*/feed$

All three major search engines recognize that format. It will block all RSS feeds but Drupal's main RSS feed /rss.xml

If you have other important feeds you would like indexed you could give them aliases that don't end with /feed

Drupal feeds are a problem for Google -- they basically contain the same text content as the taxonomy list pages except that they are marked up with RSS/XML instead of X/HTML. Duplicate content.

To block /track pages you could use this:

Disallow: /*/track$

There is a module that adds

Posted by yaph on August 2, 2007 at 10:35pm

There is a module that adds meta tags: http://drupal.org/project/nodewords
The above features could be added in this module, which is quite useful because meta tags are not ignored by search engines and their users.

A discussion on robots.txt can be found here: http://groups.drupal.org/node/5391

--
Websites: <a href="http://www.seo-expert-blog.com" title="SEO Expert Blog>SEO-Expert-Blog.com | Torlaune.de

hmm

Posted by niklp on August 3, 2007 at 12:17pm

I think that's a fair point, but realistically, you're unlikely to build a site and then have an increasing number of pages that you don't want indexed - I suggest the probable outcome would be that you have "finished" a site, and then have a static list of pages that you do not wish to have indexed, and then can easily control that list via robots.txt.

Of course it would be easier from an administrator's (site manager rather than dev, here) point of view to perform this task at a node level from the admin pages. However, it's really just bloating the system in effect, as it will perform checks of one sort or another per node view, which impacts on "us" (and the users) rather than the target audience, ie the search engines, who, in regards to robots.txt, currently "take the strain" of the processing.

So ultimately probably not worth it. The only time I can see that option being really useful is if a site manager was really adding/deleting massive swathes of content (like sections or something) all the time, and there was a need to block one or two specific pages in each section. Even that's pretty tenuous at best, I think.

Web Development in Nottingham, UK by Kineta Systems / Follow me on Twitter! @NikLP

good point

Posted by greggles on August 3, 2007 at 8:23pm

I always forget that module because "nodewords" doesn't ring in my brain as "meta tags". Thanks for the tip!

--
Knaddisons Denver Life | mmm Chipotle Log | The Big Spanish Tour

knaddison blog | Morris Animal Foundation

Ah!

Posted by niklp on August 4, 2007 at 1:45pm

Heh, yeah this is one of those cases where the word "misnomer" comes into play... :)

From the nodewords.module page: (for the sake of completion)

The module may also be known under following names: nodewords or node (key)words. Meta tags is the preferred name.

The moral being, get it right first time, or pay the ultimate price! :p

Web Development in Nottingham, UK by Kineta Systems / Follow me on Twitter! @NikLP

to create tags cloud - module meta tags

Posted by infortec on May 10, 2012 at 1:15am

Good morning friend! my name is Fernando, I am Brazilian programmer (Porto Feliz-SP)
and it would like to remove a doubt with you, please! As I create a tags cloud
according to his/her site below: and as I leave them in big sources with link
exactly as it is in his/her site? Thank you very much and a happy one 2008!

I hug of the users of Drupal of Brazil!

Fernando H. Santorsula

Tags example
http://www.ramiro.org/

Metatag extension

Posted by Sahin on January 24, 2017 at 6:19am

For the future google visitors: I think Metatag</a href=https://drupal.org/project/metatag> extension answers this issue (and much more).

Comments

Good idea

?

robots.txt no always an option

Ok then...

robots.txt is different to noindex

robots.txt

really?

theory vs. practice

wildcards

There is a module that adds

hmm

good point

Ah!

to create tags cloud - module meta tags

Metatag extension

Search Engine Optimization (SEO)

Group organizers

New groups

Group notifications

Hot content this week