taxonomy and views pages and SEO -- what's the best practice?

Events happening in the community are now at Drupal community events on www.drupal.org.
samcohen's picture

Hi,

If you have a site where the same teasers can appear on many different pages -- such as taxonomy pages that show all teasers with that tag, or views pages that use arguments -- what's the best practice for SEO?

I'm concerned that by having the same teasers show up on so many pages/urls may be hurting over all SEO status.

Should some of these taxonomy and view pages be excluded in robots.txt and an xml site map, so that search engines only index the pages that really count. Or is there no penalty for having the same content show up on many urls?

Any advice would be greatly appreciated.

Sam

Comments

.

Z2222's picture

 

I wouldn't worry too much

binary basketball's picture

I wouldn't worry too much about having multiple teasers show up on multiple taxonomy and/or views pages especially if your main concern is duplicate content.

I have articles being written that get filtered into around 40 different terms/categories at the moment with the option of selecting multiple terms for the pages to show up on. Really, as long as that view is semi-active it can only work in your favor because it's only partial duplicate that ends up linked to a single version of the page which is what Google really cares about for that particular item.

As such, if that category is active enough and different stories get filtered through, It'll rank well, and if it's not active, it's not something to worry too much about anyway.

Duplicate content doesn't hurt your overall SEO at all. All it does is force Google to choose which version it feels is the most important. Often times, if you pay enough attention, you can figure out why Google felt that particular page was more important and make your changes accordingly.

My experience has given me the understanding that the benefit of having a couple more teasers link to my article outweighs what possible issue you may have. And if your teaser is taking roughly 25% of the page or less, then you don't need to worry at all.

duplicate content

Z2222's picture

Duplicate content doesn't hurt your overall SEO at all. All it does is force Google to choose which version it feels is the most important.

It depends on the site, how the duplicate content is being generated, and the extent of the duplicate content. I've seen big improvements by eliminating duplicate content in cases where it's out of control. (e.g., redirecting thousands of pages at a time to canonical versions)

Google gives mixed messages about whether duplicate content is bad. Here is an example where they warn about it:
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=66359

Minimize similar content: If you have many pages that are similar, consider expanding each page or consolidating the pages into one. For instance, if you have a travel site with separate pages for two cities, but the same information on both pages, you could either merge the pages into one page about both cities or you could expand each page to contain unique content about each city.

--
» Twitter » Blog » Website

alright, alright

binary basketball's picture

You got me

But in this particular situation we aren't talking about the final, static page. We are talking about pages that will likely change fairly often. Obviously it's not going to be in his best interest to go nuts. I think the main goal would be to figure out your overall navigation strategy. If he is hoping to get all of these views ranked well then the idea is just not to spread yourself too thin, IE keep content sitting on pages for too long.

Say if for a period of time he has 2 different views that share 6 out of 10 teasers that's a hefty amount of duplicate content, depending on what else is on the page... Decreasing the amount of text in the teaser is going to help slightly but IMO you are heading in the wrong direction. You want more content on the page not less.

If instead, while creating the view, you add a decent chunk of intro text to the header you would not only be decreasing the amount of overall duplicate content , you would also be adding a very necessary SEO component.

My guess is that with a proper descriptive intro for each page, if he had enough enough content cycling through those views pages to get Google excited enough to get a decent crawl rate going and if for a short period of time they shared some duplicate content, that views page isn't going to drop off SERPs and the node itself should be safe as well. That is, assuming the teaser isn't half of the node.

Basketball Blogs

Say if for a period of time

Z2222's picture

Say if for a period of time he has 2 different views that share 6 out of 10 teasers that's a hefty amount of duplicate content, depending on what else is on the page... Decreasing the amount of text in the teaser is going to help slightly but IMO you are heading in the wrong direction. You want more content on the page not less.

I would prefer less, or unique, content on the views pages. In WordPress you can make it unique, but not yet in Drupal...

IMHO, 10 short duplicate teasers are better than 10 long duplicate teasers...

--
» Twitter » Blog » Website

I block a lot of drupal generated pages

FlemmingLeer's picture

I block a lot of drupal generated pages like these for
(samples from my robots.txt)

User-agent: Googlebot 
Disallow: /tagadelic
Disallow: /?page
# forum sort drupal code
Disallow: *from=
Disallow: *sort=
# stop indexing printer friendly pages
Disallow: */print
Disallow: /print/
Disallow: /printpdf/
Disallow: /printmail/
# no tracker pages
Disallow: /tracker
Disallow: /
//track
Disallow: /node$
# I want traffic from blogsearch.google.

Allow: /node/feed
Disallow: /taxonomy/
# I want traffic from blogsearch.google.
Allow: /
///feed
# but I don't supply duplicate feeds
Disallow: ////0/feed
Disallow: /
///all/feed
Disallow: /taxonomy/term/*/all/feed?page=

Basicly I want the clean urls version of taxonomies and I don't mind having 10 or 15 tags in one post as long as they´re relevant to that specific blog entry.

Instead I have a hidden link to a sitemap at the bottom of all my pages with the tags in clean urls using sitemap and an urls alias for sitemap.

I block front page paging via this Disallow: */?page to stop duplicate content from the frontpage, but
/taxomy_vocabulary/taxonomy/taxonomy/taxonomy?page=1 is still being indexed by google, so older content is still accessible.

The result is this on Google webmaster tools:

Crawl errors

HTTP 21
In Sitemaps 0
Not followed 0
Not found 217
Restricted by robots.txt 14,311
Timed out 0
Unreachable 3

HTML suggestions is like this:
Meta description Pages
Duplicate meta descriptions 2
Long meta descriptions 0
Short meta descriptions 56

Title tag Pages
Missing title tags 0
Duplicate title tags 203
Long title tags 0
Short title tags 0
Non-informative title tags 0

Before I used this aggresively tactic in my robots.txt I had thousands of duplicate title tags. Also Google now only has some 14.000 urls from about 7.700 nodes from this site compared to above 45.000 actually there were 55.300 page for this site in august 2008 ! And I even have more content now.

My point is that just because google can use algorithms to sort through content and mostly all drupal sites serve a lot of duplicate content I as a webmaster can choose which pages to serve for google to digest. Noone knows what algorithms Google will apply in the future and why not limit the scope of clutter in the first place as a kind gesture to google and your users ? Less choices means in my opinion more of a chance for relevant urls to make it into google and therefore more traffic in the long term.

Dynamic internet is a dead end for information

Also I keep all url alias forever - and I mean forever ! The term "dynamic internet" is a myth and an obstacle for searching and acquiring relevant information fast and efficient. I believe that if I can serve relevant information to a user on multiple occassions which maybe old but relevant, I will gain an advantage over other sites. This offcourse means sifting through entries to update outdated outgoing links, but there's link checker a very handy tool. But I think it will be went spent.

Even a turtle reaches it´s goal...

best practise imho would be

bara.munchies's picture

best practise imho would be to have individual content on every page,at least a short header. i know it's a lot of work, but it's worth it imho. it can be achieved by calling another view in your view header or by some php scripting (switch by arg()), depending on how many tag pages you need to fill up.

Taxonomy and Ecommerce products duplicates

MakeOnlineShop's picture

Taxonomy and Ecommerce products duplicates seen on different pages, bad? These taxonomy words should not be clickable?

Hello,

Do you add taxonomy keywords to your websites or shops ?

Imagine that you sell Tshirts.

A Black Nike Tshirt can be seen on taxonomy words BLACK TSHIRT and NIKE TSHIRTS, but is it bad for SEO ?

Should we just avoid to type any taxonomy keywords or is there a way not to make them clickable ?

Thank you for your help.

Search Engine Optimization (SEO)

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: