Comment Permalinks and Duplicate Content
I'm working on backporting the comment permalinks from Drupal 7 to Drupal 6 and am curious what the SEO folks think about the way it works.
The code was added to Drupal 7 in this commit as a result of this issue which is really about making the "recent comments" links work properly regardless of which page comments are on.
What this code does is:
- Create a new path - "comment/CID" which points to comment_permalink
- comment_permalink figures out which page the comment is currently on, then does some fakery in the menu system to pretend the request is for that page
- It then adds a canonical link in the header which points to the "real" version of the page
- And then it runs along letting the normal Drupal node/comment mechanisms render the rest of the page based on the fakery from step 2
My question is around the canonical link entry.
Is the canonical link enough to avoid duplicate content with this page? In the issue where it was added Dries suggests that additions to robots.txt wouldn't help. Anyone have thoughts on that?
Duplicate content seems to be a really murky area of SEO where some people say it doesn't matter, or that it matters but not much, or that certain ways to "fix" it don't matter. So, I'm interested in hearing all points of view.


Touchy Subject
I'm fighting this issue currently on my vBulletin forum where I have the canonical URL defined but I still have multiple versions of the page being displayed which have been disallowed in robots.txt. Currently I have about 19,000 pages that are being prohibited by robots.txt on a community with 10k threads and 190k posts. I haven't taken a major hit yet but I can't see that it's helping.
I also kind of question the usage of canonical reference since technically the comment page isn't going to risk being considered an exact duplicate...
source - my link to Google's blog triggered the spam filter so you'll just have to take my word for that quote... :-)
Obviously it's going to be an exact duplicate of "some" of the content on the page but I don't think this usage is exactly what they had in mind since it's not a different version of a particular page. I'm interested to see what others have to say about this though.
As far as indexing goes though, I haven't noticed anybody indexing any of the pages I don't want to be indexed with my usage of canonical links and robots.txt exclusion
Basketball Blogs
What happens when the original node is removed ? comment page
Hi Greggles,
I have two questions.
What happens when a cannonical link of a comment is used when the original node is deleted with all comments ?
I use comment page and it produces php error at present when calling deleted comment page in the Drupal 5.x version.
Comment page is here:
http://drupal.org/project/comment_page
Comment page also uses
comment/UID
Is it possible for you to allow updating the Drupal 5.x versions of comment page to your Drupal 6.x version as many will most likely update to Drupal 6.x once Drupal 7.x is released ?
Even a turtle reaches it´s goal...it gives a 404
Right now both 7.x and 6.x give a 404 error if the comment ID is "invalid" in some way (deleted, or nonsense data).
I think comment_page is not exactly the same feature as what I added to permalink (oh, by the way, the code is now added).
--
http://growingventuresolutions.com | http://drupaldashboard.com | http://drupal.org/books
SEO warning
I think this is bad for SEO. Making new URLs for comments will probably negatively affect Drupal's SEO-friendliness in a significant way.
There IS such a thing as duplicate content and it can be very bad when done on a large scale.
Don't rely on a canonical link tag. It's a Google thing, and Google is not the only search engine. It might not even be reliable with Google. Also, it's not good to have a lot of links on a page. Google will distribute PR to those new comment URLs even if they are blocked with robots.txt.
The hash ("#") can't be fixed? It's definitely not a good idea from an SEO perspective to make new URLs for comments.
--
» Twitter » Blog » Website
Agree but also Disagree
While I agree with J. Cohen that creating a different path for each comment is a bad idea, I cannot agree with the comment on Canonical links.
The Canonical link tag is not just a Google tag and has been adopted by all 3 major search engines. I have used it extensively and it does what it is was intended to do with similar to exact content pages.
Bing adopts Canonical Link tag:
http://www.bing.com/community/blogs/webmaster/archive/2009/02/12/partner...
Yahoo adopts Canonical Link tag:
http://www.ysearchblog.com/2009/02/12/fighting-duplication-adding-more-a...
By nature, search engines want to index more useful content. URLs with parameters and hashtags used to cause issues with indexing this deep content. Using the robots.txt (which is also recognized by all 3 major SEs) instead of a canonical link is going against what they are trying to accomplish with adopting this tag across all 3 search engines.
Jason
Rapid Waters Development
http://www.rapidwatersdev.com
Canonical Links
I'm not saying to ignore the use of Canonical links -- I am recommending, "don't rely on it" for SEO. One of the core rules of SEO is to keep things a simple as possible for search engines. Search engines frequently make mistakes and have a lot of bugs. IMHO, creating a lot of similar URLs and trying to rely on Canonical links isn't a good idea for SEO.
(BTW, search engines typically don't index hashes in the URL. MSN used to do it, but it looks like it was fixed in Bing.)
--
» Twitter » Blog » Website