Search Engines Focus on Duplicate Content
Duplicate content has always been frowned upon when it comes to search engine optimisation. The search engines do not want to see the same content in their search results. Instead they search for the canonical or the authoritative version and then get rid of the other pages, which are duplicated.
The trouble has been for marketers trying to come up with ways around the duplicate content issue. Those with computer science backgrounds do not have the same problems as the rest of us. They typically understand how to make the links without getting caught by search engine bots. Most often this has also meant changing the robots.txt file and the server redirect options. Since it is so tricky to get around the duplicate content problem for an SEO laden site, it behooves you to know what changes are coming from the search engines.
The search engines like Yahoo, Google, and MSN have just released a new way to make a canonical link. Instead of being a programming person you just need to know where to put the tag:
<link rel= “canonical” href= http://www.the site.com/real-page/>
This tag is the new voice to getting duplicate content problems figured out. It will save the search engine crawlers time trying to decide what page should be indexed. It will also save you, the website manager, from getting the wrong page indexed on the search engine.
By placing the tag in the <head> section where the title and meta description tags go, you are telling the search engine what page should be indexed. The href= part should be on the pages you do want indexed. In other words these are the real pages. Any page you want indexed should have the absolute links with the href= part like... http://www.domain.com/page.html rather than the href=page.html.
There is one small downside to this overall change. Just because you have told the search engine which page is the “real page” to be indexed does not mean the search engine has to follow this advice. They may opt out of using the canonical tag. Of course the top search engines are designing their software to actually accept your suggestions in order to eliminate the duplicate content issues. This means that while it could be ignored, more than likely the search engine will take the suggestion you have made.
There are exceptions in which the canonical tag may be ignored. This would be if the page comes up with a 404 page not found, if the page has not been indexed yet, or if there is a duplicate content that has also been specified as the canonical. In other words, if you have content that was stolen from your site and the person put the tag on it you have to hope the search engine takes yours instead of theirs because the canonical is telling it to take both. The only other issue is if the canonical page is at a different domain. In other words, if it is a sub domain it may ignore the tag.