Canonical disaster - How do you treat pagination? [Opinions]

Nikos · Aug 1, 2017

Hello Local experts,

Let's switch to a little classic SEO for a bit.

Came across a massive news site with many thousands of category pages in series. Developers have placed canonicals pointing to the starting page in series..

Of course Google did not respect the canonical (because they are not duplicates) and thousands of paginated pages are getting indexed e.g. www.example.com/news/politics?page=6511

How do you treat pagination?
You:

a. do nothing
b. block URLs in robots
c. place noindex tag
d. place noindex,nofollow
e. use 301s
f. Other (e.g. URL parameters tool)

I did not include rel=next/prev because we want those pages to be eliminated somehow.

Also, if you have an article/case study showing how a website benefit after removing thousands of pages that did not receive organic traffic - I would be grateful.

Cheers.
Nikos

LJ Ferguson · Aug 1, 2017

Hi Nikos,

I'm not sure I quite understand the issue. You say you want those pages to be "eliminated". Can you explain what you mean by this? Is the content on those pages elsewhere on the site or are you consolidating it somewhere?

Best,
Linda

- - - - - - - -
Linda J. Ferguson
Technical SEO Strategist
https://www.linkedin.com/in/linda-ferguson-83b20a28/

Nikos · Aug 1, 2017

Hi Linda,

Thanks for joining.

Well, ideally I think those pages should not appear since they consume crawling budget and also have very little chance to receive impressions/clicks.

How do you treat pagination?
Do you use the rel=prev/next directive? This directive can consolidate pages in series but it won't stop them from being crawled and indexed right? Please outline your best practices.

Thanks.

James Watt · Aug 1, 2017

Hi Nikos,

There's a lot of evidence I've seen that pruning low quality pages from a larger site can be helpful. Here are two case studies.

For archive pages, canonically linking the pages to the first in the series isn't the best idea. Robots.txt isn't a great idea either, you're explicitly telling google bot not to go to the page at all, it's a club when you have much better tools. If you're worried about duplicate content and don't want the page in the index, you can tell Google that directly... that's what the noindex tag is for. Use that.

LJ Ferguson · Aug 1, 2017

Good case studies, James.

The problem I have with using noindex as a rule of thumb is that those pages still will be crawled. Crawling a large number of "dead weight" pages can create crawl budget issues, especially if they constitute a large proportion of the site. In some cases I would delete the content altogether.

I don't know if this will help but I'll give an example of the effect of deleting content. A site I used to work on had a license agreement with a news publisher. The agreement stipulated that news pages had to be taken down after 90 days. So after 90 days, the page would be taken down and the URL would return a 404. This didn't have any measurable effect on the site and the news was heavily used. News pages weren't a major percentage of the content but enough to say that I am confident that deleting the pages wasn't a major issue.

But let's assume there's a good reason to keep some of the content and have it indexed (e.g., heavy user engagement, client wishes). In that case, I'd remove the canonical and use rel/prev on the paginated series.

-Linda

Linda J. Ferguson
Technical <acronym title="Search Engine Optimization" style="border-width: 0px 0px 1px; border-top-style: initial; border-right-style: initial; border-bottom-style: dotted; border-left-style: initial; border-top-color: initial; border-right-color: initial; border-bottom-color: rgb(0, 0, 0); border-left-color: initial; border-image: initial; cursor: help;">SEO</acronym> Strategist

James Watt · Aug 1, 2017

Good point Linda, I mostly work on smaller sites so I haven't butted up against crawl budget too many times. You're right, that does change things on bigger sites.

LJ Ferguson · Aug 1, 2017

Size matters ;-) Seriously though: yes, you are right, James. Crawl budget may not matter much if at all on smaller sites.

Nikos · Aug 1, 2017

James,

Thank you for these guides. They definitely inspired me.

I like the concept of dead pages. I also like using noindex until crawlers respider the links and then disallowing through the robots file.

From what I've seen canonicals are the worst solution, since Google honors them only if the content is identical.

Robots can be effective although it is a quick fix.

Returning a 404 error is another choice if we are talking about thousands of zero traffic pages. I don't know if 301 them to the first archived page makes sense or Google will this as soft 404 (?)

Rel next/prev in combination with self referencing canonicals is usually the best option. However, it does not solve possible duplication and crawl budget. Thus, it is recommended for small to medium sized sites.

James Watt · Aug 1, 2017

Yep, sounds like you've got it.

Canonical disaster - How do you treat pagination? [Opinions]

Nikos

0

LJ Ferguson

0

Nikos

0

James Watt

1

LJ Ferguson

0

James Watt

1

LJ Ferguson

0

Nikos

0

James Watt

1

Similar threads

Login / Register

Events

Newest Posts

Trending: Most Replies

Trending: Most Viewed

Promoted Posts

Share this page