How I add canonicals into Perch CMS sites
The Canonical link in a page's header lets the search engines know where the original page resides. Google tends to choose the oldest version of a page that it can find and any other pages with the same or very similar content are considered duplicates.
Canonicals can trip up your sites SEO
Originally conceived for situations where articles were duplicated they would reference the original. Google tends to choose the oldest version of a page that it can find (but not the only method it uses) and any other pages with the same or very similar content are considered duplicates and will not do a well on the Search Engine Results Pages - SERPs and we want our pages to do well there for the traffic.
In most content management systems, developers tend to take the quick option and reference the URL the page is on. To an extent, this works very well but duplicate pages can occur by accident / non-design. For example, if you are using Perch and you decide to prettify your URLs by removing the .php you will have set up .htaccess rules to remove them. But did you decide your URLs should end in a / or not? Search Engines index URLs with and without the / as different pages — hence you can suffer from duplication.
All the above are essentially the same page of content — a home page and the search engines have to work out which one is the original. They are getting much better at this but that’s not a reason to help them understand your website.
For subpages, canonicals are more critical as the search engines are less likely to be tolerant and often they will find your site through links to a subpage rather than down through the home page. Having the canonical automatically generated means that any URLs that resolve that you actually do not want on the site will include the incorrect canonical. If you remove the .php from the URLs, as I tend to do, then you may have situations where Perch is outputting links with the .php — the canonical would then include the .php and cause duplicate content issues. Footer menus are an example of where this may happen.
I like to manually add the Canonical so that I know I am in control but this can lead to issues if an editor mistypes the URL so the technique I use grabs the list of pages from within Perch as a dropdown list for the editor to choose from.
Perch field type — Pagelist
You will need to add the Perch field type into /perch/addons/fieldtypes/— drop the folder and its php file in there and you are good to go.
The Perch 2 field type Page list is available from the Perch CMS site. At the time of writing, there is no Perch 3 version but the archived Perch 2 version seems to work ok.
Perch template code
The following code goes into perch/templates/pages/attributes/seo.html
<link rel="canonical" href="<perch:pages id="domain" /><perch:pages id="canonical" type="pagelist" output="pageurl" replace=".php|,/index|" label="Canonical page" help="Please select the page you wish to have as the canonical URL for this page (normaly just choose this page)" required="true" />">
replace=”.php|” removes the .php from the URL.
type=“pagelist” provides the list of pages on your site
On each page in the CMS appears a drop-down box with the pages you have on your site. The editor can select from this list thus avoiding manual errors — though they could choose the wrong page so that’s worth checking!
The output code in the head:
And there is more...
Clive Walker asked me how do I deal with pagination. Generally, I don’t, as pagination is the work of the devil and advertisers. There are so many sites who make you click through a series of pages to read an article — this is just to sell advertising, not to make it easy for you to read as usually the whole article could easily go on one page and you would scroll down to read it.
There is, however, a situation where pagination is very useful — lists of article entries, categories, topics and tags. In these situations, it is recommended that there is a view all page and that the paginated pages are canonicalised to that, but with huge lists, a view all page is impractical — will take days to load etc. and then the paginated pages can be self-canonicalised. If you want to know more then head over to Deep Crawl’s information on canonicalisation and pagination.
18 December 2017 Update for home page
I have also updated the perch code I used as there was an issue. The home page was outputting ‘/index’ so I have added that into the replace statement as it was canonicalising the home page to a URL that didn’t exist — and that is a bad thing! Apologies to anyone who had used the code prior to today.