Author: James Clark
It’s said that 25%–30% of content on the web is duplicative, meaning that the content is very similar to another piece of existing content. For search engines, like Google, serving all that duplicate content isn’t particularly useful for users.
That’s why search engines will choose one of the two (or more) versions to show in search results. Canonicalization can help you tell search engines which version is the original one, which can help your most important pages rank better and improve crawl budget.
In this post, I’ll walk you through:
What is canonicalization?
When managing a website, canonicalization is the process by which you declare a web page or URL to be the original (or canonical) version of your content.
It allows you to tell search engines which version of the content is the most authoritative, and that anything canonicalized to it is simply another version of that content. This makes canonicalization an important part of both your site management and your content strategy.
Without canonicalization, you’d have no control over which URL a search engine chooses to show in relevant search results. Once you add a canonical tag to a URL, you are, in effect, telling search engines that this is the original version of the content and the one that should appear in search results. Search engines can choose to ignore this canonical tag, but generally, canonicalization is considered to be an effective way of managing your duplicate content.
Let’s explain this with a working example: You're browsing an eCommerce website looking to buy some shoes. To locate shoes in your desired price range, you use the sorting options to show the most expensive shoes first. As you do this, you notice that the URL of the page changes. It was https://example.com/shoes, but it's now https://example.com/shoes?price=high.
Is it the same page as before, or a different page? You could argue it's the same page—the block of text about shoes at the top remains the same, the header and the footer are the same, the filtering options on the side of the page are the same. The page may even show the same shoes, just in a different order.
But, to Google and other search engines, it's a different page because the URL is different. So, this website now has two different pages with the same content—or, as it's called in SEO, ‘duplicate content.’
This poses a problem for the site owner. Google won't want to show both pages in its search results because it’s not very valuable for searchers, so it will choose just one. But, what if the site owner wants the “price=high” page and Google chooses the other page, or vice versa?
Enter canonicalization, a grand word for quite a straightforward concept. Where you have duplicate content, it's a way of telling search engines which page is your main or “canonical” version. Google also uses the phrase “most representative.” There are lots of reasons why a site might have duplicate content, and we’ll look at the most common ones later in this article.
Duplicate content doesn’t necessarily mean identical pages: “Minor changes in sorting or filtering of list pages do not make the page unique," Google said—just like our shoe search results example. Here are some other phrases that Google uses to describe duplicate content:
“appreciably similar”
“largely identical”
“similar content”
You may well ask, “How similar is ‘appreciably similar’?” Well, that’s up for debate in SEO circles, so use your best judgment. However, if you apply a canonical tag to a URL (more on this below) that search engines deem to be dissimilar, they may ignore the tag.
What is a canonical tag?: How to canonicalize URLs
The most common way to designate a canonical is to add a meta tag called the ‘canonical tag.’ The canonical tag looks like this:
<link rel="canonical" href="https://mysite.com/page">
A canonical tag can point to any URL, either on the same website or on a different website. If it points to a different website, it’s called a ‘cross-domain canonical tag.’ But, in most cases it will point to the current URL, indicating that the current URL is—you guessed it—canonical. This is known as a self-referential canonical tag because the page is referring to itself.
Wherever the tag points, Google says the URL should include the domain name. In other words, it should be something like https://mysite.com/page rather than just /page.
While Google has strong opinions on canonical tags, it still sometimes ignores them. This might happen if the canonical tag points to a page with significantly different content, or if the page loads so slowly that Google has trouble indexing it, for example.
While canonicalization is straightforward from a technical point of view, it isn't always clear why or when you should do it. Let’s look at both of these considerations.
Why do you need canonicalization?
Now, we know that canonical URLs are important to search engines like Google. But, search engines don't just use them to decide which pages to index and show in their search results—they also use them to decide how often to crawl (visit) a page.
This means canonicalization can help you optimize your crawl budget (the number of pages a search engine bot will crawl and index on a given site, within a given time period). If you have a site with thousands of pages (such as an eCommerce site), it might take Google a long time to crawl all of them. You certainly don’t want Google to waste your crawl budget on lots of pages that have the same content and potentially leave out other important pages.
By using the canonical tag, you are telling search engines which pages are duplicates so it will crawl those ones less often. This means canonicalization frees up Googlebot to crawl your other pages, finding and indexing new content more quickly.
That’s not all: canonicalization can actually help your pages rank higher in Google Search. Although the exact algorithm that Google uses is a secret, we know it's influenced by lots of different factors. These factors include (but aren’t limited to) the content of your page, whether your page is user-friendly, and how quickly it loads on mobile. Links are particularly important: so-called “inbound” links (also known as backlinks) from reputable sites tell Google your page is high quality as well.
But, if you have duplicate content, the different versions of a page may have different inbound links. For example, the first version could have five links from various sites, and another version might have only two. Sometimes this happens if you run a marketing campaign that uses a special URL to help with tracking. For example you might run an email newsletter campaign with a URL that looks like this:
The parts of the URL after the question mark are called “URL parameters” and, in this case, are just there to help with campaign tracking and reporting. But, other sites might link to this special campaign URL rather than just to https://www.mysite.com. So, the benefit gets diluted across the different versions of the page.
Canonicalization helps you address this by consolidating the benefit of those links. With proper canonicalization, the version you want to appear in search results benefits from all the links to all versions of the page. This can potentially give that page a boost in search engine rankings.
When do you need canonicalization?
Canonicalization is useful in lots of different situations, not all of which are obvious. Here are some common scenarios.
01. Republishing content across sites
First, think about canonicalization whenever you publish the same piece of content across multiple sites. Although this may seem like something only larger publishers do, it happens surprisingly often with smaller and local businesses, too.
For example, an osteopath writes a useful article about the common causes of back pain and publishes it on the website for their clinic. They then open a new clinic across town and set up a new website specifically for this second location. The article is relevant here as well, so they also publish it on the second site.
In this case, the osteopath would be well advised to canonicalize one version of their choosing rather than rely on Google to make the choice for them. (In an ideal world, of course, each site should have its own unique content.)
02. Syndication
Another similar scenario is syndication. If you run a blog, you may have chosen to syndicate your content on third-party sites. This is often done using RSS feeds and can be an effective way to reach new or larger audiences.
You may want to ask your syndication partners to add a canonical tag to any republished post, specifying your original blog post as the canonical version. You’ll definitely want to include a self-referential canonical tag on the original. Otherwise, you may find that the syndicated version is the one that Google decides to index—and your original blog post doesn’t appear in search results at all.
Canonicalization of syndicated copies help the original in Google News as well: “Publishers that allow others to republish content can help ensure that their original versions perform better in Google News by asking those republishing to block or make use of canonical.”
But, even with a canonical tag in place, the syndicated copy may outrank your original in search results. This is more likely to happen if “there’s a lot of other content around that page that is completely different,” according to Google’s John Mueller.
03. Parameterized URLs
Canonicalization is also important whenever a website has parameterized URLs. Many websites use parameters for:
Marketing campaigns, like our Christmas email example above
Search filters, as we saw with the shoe results page example
Keyword searches on content such as blogs
Whatever the reason for them, parameters create a new URL. These versions should have a canonical tag pointing to the original (and the original should have a self-referential canonical tag).
04. URL variants
You may have noticed that some web addresses contain “www” and others don't. Similarly, some end in a slash (/) and others don't. And, some are secure (starting in https) and others aren't (starting in http). In a worst case scenario, these three factors give us eight different versions of the same URL:
In an ideal world, seven of those variants should automatically redirect the user to the eighth. (There are lots of different types of redirects, but the most appropriate one here would be a permanent 301 redirect.) However, if redirects aren’t in place, a canonical tag could mitigate the problems that these different URL variants cause.
Canonicalization on Wix
Wix automatically adds a self-referential canonical tag to every page on your site. While parameterized URLs aren’t particularly common on Wix sites, they do appear in a few situations (collections on Wix Stores make use of parameters, for example). Wix automatically adds the correct canonical tag to these pages so you can be confident you won't have any problems with this type of duplicate content.
Additionally, Wix URLs follow the format https://www.mysite.com/. All other variants will automatically 301 redirect to this. This is also the format used in the canonical tag.
In most instances this is all you need. But in some very particular situations (for example, if you’ve already published a similar piece of content on an external site), you may want to change that default canonical tag. Instead of being self-referential, it should point to the canonical version of the content.
How to customize the canonical tag on a Wix site
To customize a page’s canonical tag on Wix, click on Menus & Pages on the left-hand side of the Editor. Next, click the Show More icon next to the relevant page and select SEO Basics, as shown below.
Then, go to the Advanced SEO tab and click on canonical under Additional Tags. Click on the Show More icon and select Edit to customize the tag and select Apply to save the change.
For vertical pages, such as blog posts, you can customize the canonical tag by navigating to the desired post within the Editor. Next, click on SEO in the left-hand menu and go to the Advanced tab. Similar to the workflow above, the Additional Tags section is where you can customize your canonical tag.
If you do delete a custom canonical tag, the page will automatically revert to a self-referential tag. This removes the risk of accidentally having no tag at all.
Even though Wix offers the ability to customize your canonical tags, in most cases, you’ll be just fine relying on the self-referential canonical tag that Wix adds for you.
Use canonical tags to manage duplicate
When used properly, canonical tags allow you to present Google and other search engines with one, canonical, version of each page. Make sure to use them with the best practices outlined above so that search engines and, ultimately, users land on the right page.
James Clark is a web analyst from London, with a background in the publishing sector. When he isn't helping businesses with their analytics, he's usually writing how-to guides over on his website Technically Product. Twitter | Linkedin