Top Menu

Tuesday, August 20, 2013

How to Solve Duplicate Content Issues by Specifying a Canonical Web Address

Duplicate content on your own website, where different web addresses ("URLs") on your site display identical content, can lead to problems in your website's performance in the search engine. This article discusses one way to resolve this for the Google search engine.

The Problem with Duplicate Content

I have discussed the problems associated with having different URLs show the same content, or what webmasters often call "duplicate content", in my articles before.
In case it isn't clear to you how this can come about, take the example of a site selling a product called "Widget A". The site links to a product page showing details of widget A using the URL "http://www.example.com/widget-A/". Like all sites with high usability, the webmaster also provides other ways in which the visitor can end up on that product page. For example, if the visitor uses the site's "Help" function to look for a product with certain features, the site may show information about Widget A that fulfills the visitor's criteria. The information page may use a different address, like "http://www.example.com/help.php?features=fix+kitchen+sink". Both addresses show the exact same information, since they are talking about the same product.
Even if you don't use scripts on your website, it's still possible to end up with duplicate content problems. For example, the "index.html" page of a website or its directory is usually the same page displayed by a web server when the visitor accesses the site without specifying a filename. That is, "http://www.example.com/index.html" and "http://www.example.com/" are usually the same page, showing the same content. (For more information about this behaviour, and its ramifications, see "Should Your URLs Point to the Directory or the Index Page?".)
When a page can be accessed with multiple web addresses, you run the risk of link dilution. I've mentioned this before in How to Create a Search Engine Friendly Website, so if you're not familiar with the term, please check that article out for details. In general, link dilution causes the relevant page on your site to rank less in the search engine results than it should had it not occurred.

The New Canonical URL Link Tag

To help webmasters solve this problem, Google has declared that it will recognize a new HTML / XHTML tag, which, if you insert into your web page, will allow you to state which URL you want to be the "official", or "canonical", address for that particular content.
This tag needs to be inserted into the HEAD section of your web page. It has the following format:
<link rel="canonical" href="http://www.example.com/correct-page.html" />
Replace "http://www.example.com/correct-page.html" with your actual web address. Remember: the code has to go into the HEAD section of your web page where all the meta data are, and not into the BODY section where your content lives. If you use a WYSIWYG web editor (where WYSIWYG means "What You See Is What You Get"), change to the "Source" mode to locate the right section.

What the Canonical URL Link Tag Solves

The canonical URL link tag will cause Google to take the web address you put into the tag as the "official" or "correct" version of your web address. If you have two URLs that resolve to the same content, Google will use the one declared as canonical as the actual URL. This means the following:
  • In search engine results, it will display the canonical URL instead of all the variants it finds on your website.
  • You will avoid the link dilution problem mentioned earlier. Links from other sites that point to your content using all its myriad URLs will be regarded as pointing to your canonical URL. That is to say, your page rank from all the diverse URLs will flow correctly to the page it's supposed to be attached to.

Limitations of the Canonical URL Tag

There are some limitations to what the new link tag can do.
  • The information about the canonical URL does not work across different domain names. However, it works across sub-domain names.
    For example, if you have a URL like "a.com/something" that is identical with "b.com/something-else", Google will not take your canonical url link tag on b.com to apply to a.com.
    However, if you have URLs on multiple subdomains on your domain that show the same content, like "www.example.com/xyz.html", "my.example.com/whatever.html" and "example.com/index.html" all showing the same page, putting a canonical link tag will cause Google to accept the URL you put in your link tag as the real URL.
    Update: Google now accepts cross-domain canonical tags. That is, this limitation no longer exists.
  • The tag is currently only recognized by Google. As such, you should still continue to find ways of reducing multiple URLs that lead to the same content on your website.
    Update: some of the other search engines have said that they will support the canonical tag as well, although they may not necessarily give it the same weightage as Google. Nor will they necessarily support its use across different domains (see above point).

Solving the Problem of Duplicate Content

It's probably too early to say whether this will become the definitive method that helps webmasters solve the pesky duplicate content problem that plagues many sites. (The tag was only officially announced by Google on 12 February 2009.)
I personally think it is an ingenious solution, and it puts the power of how to resolve the issues into the hands of the webmasters themselves, rather than letting the search engine, which usually does not have enough information, try to figure out the correct URL. Hopefully, the other search engines will also recognize this tag, making this a problem of the past.

No comments:

Post a Comment