Skip to content

Using Canonical URL's

A canonical URL is the authoritative URL for your web site and is a term often bandied about by Google.

Canonicalization is the process of picking the best url when there are several choices, and it usually refers to home pages. For example, most people would consider these the same urls:

  • www.example.com
  • example.com/
  • www.example.com/index.html
  • example.com/home.asp

But technically all of these urls are different. A web server could return completely different content for all the urls above. When Google "canonicalizes" a url, we try to pick the url that seems like the best representative from that set.

Matt Cutts

Not only can URL's, such as the examples Matt Cutts used, return different content, they can also return the same content - resulting in duplicate content and possible penalties by search engines. Due to the way most shared hosting servers are set up, the content served from your domain will be exactly the same as the content served from the www subdomain of your site. This can dilute the value of your content and lead to page ranking being split across what is effectively two different sites.

Technically, www is a subdomain of the no-www site. It is just the same as, for example, http://forum.example.com or any other sub.domain.com, although it exists for a very different reason (related to the early days of the Internet).

So, why don't we get our hosts to just set us up as one or the other? We can, but that also leads to problems. If you use the no-www for your site, you will still find that some other sites will put up links to you using www. The reverse also happens. If you are already listed in links or search engines as being both www and no-www, you need to make changes that will standardise your search engine listings while not harming your page rank or making pages inaccessible through the URL.
This can be done in several ways.

If you want to redirect all traffic to www, add this to your .htaccess file:

Options +FollowSymLinks 
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

To redirect traffic from www to no-www, use this:

Options +FollowSymLinks 
RewriteEngine on
RewriteCond %{HTTP_HOST} !^example.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]

If you have access to your DNS settings you can avoid using htaccess directives by using a CNAME record with the non-www version pointing to the www version (or vice versa, depending on which configuration you prefer).

What to do if you are using WWW for different content?

This is not as uncommon as some may think and there are many reasons why a site may be offering different content on their www subdomain. To ensure that search engines and visitors realise that your www site is different to the no-www domain site I always recommend that the www content is transferred to a different subdomain, using a subdomain with a meaningful title. You might be amazed at how many more visitors your www subdomain site gets!

Canonical Issues with Index Pages

Every web site has an index or home page and because Apache automatically loads the index page when a visitor goes to your domain name in their browser, this causes duplicate content because the same page is accessed from http://example.com, http://example.com/ and from http://example.com/index (with or without a file extension such as .htm, .html, .php, .asp, etc). See the problem? Three URL's = one page of content. Never fear, htaccess comes to the rescue again.

To make sure your home/index page loads as http://example.com/ (or http://www.example.com/) without index.php, index.html or whatever your default index page is called, use the following htaccess directive:

Options +FollowSymLinks 
RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ http://example.com/ [R=301,L]

You will need to replace "index" with "home" or whatever your default index page is called, and replace "php" or "html" with the file extension your page uses.

Canonical URL's in Mambo or Joomla

Mambo, Joomla and many other dynamic web applications cannot use the htaccess directive given above because the software itself needs to use index.php. In this case, we use the Apache "IS_SUBREQ" flag to tell Apache to skip the rewriting rule if the current request is an internal sub-request.

Options +FollowSymLinks 
RewriteEngine on
#Redirect Mambo index.php to fully-qualified domain only
RewriteCond %{IS_SUBREQ} false
RewriteRule ^/index\.php$ http://example.com/ [R=301,L]

That's it! Apache rewrites won't work on some sites and there is no guarantee that what I have posted here will work for you on your server configuration. They do, however, work for me so I thought I would share. I hope you find them useful, either as is or as a starting point for your own htaccess directives.

If you enjoyed this post, make sure you subscribe to my RSS feed!

Topic: Search Engine Optimisation
Tagged as: Apache, canonical url, canonicalization, DNS, Google, Joomla, Mambo, page rank, PHP, rewriterule, search engine listings, search engines, web server

Share on FriendFeed

{ 0 comments… be the first to comment }

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Any comments that look like spam will be treated as spam - this includes SEO titles and use of spurious keywords.

By submitting a comment here you grant this site a perpetual license to reproduce your words and name/web site in attribution.