Full Site Character Count Tool? (Preferably Free)

RobertCarnitas · Nov 11, 2021

Does anyone know of a tool, that calculates the total number of characters on a website? Screaming Frog has word count per page, which is pretty easy to calculate, but it doesn't have character count (unless I am missing it, which is possible)

I am trying to calculate the potential cost of Google Cloud Translation services across a large number of websites.

Thanks in advance!

Conor Treacy · Nov 11, 2021

Hey Robert, ready to get complicated?

You could use importxml via Google Sheets. Scrape the site/pages and using XPath to scrape specific blocks/areas of text, then count characters from there.

One of the biggest issues with word count systems is that they count ALL words on a page. That's the menu, dropdown menu, footer links - if a text is on the page, then it counts it. The problem however is that the actual "meat" of your page may be just a product description that is 100 words, but the entire page says it's 270.

The normal calculation for words is "5 characters" - so the lazy way would be taking the word count in screaming frog and just multiplying by 5. It would give an estimate for you.

I can't find any option with regards to counting characters in screamingfrog. But if you wanted to run the XML route and define xpaths etc, that might be the "easiest" way.

Conor Treacy · Nov 11, 2021

OH, also, under the Configuration in ScreamingFrog, you can select "Content > Area" and from there you can define classes and areas to total - but again, it's just word count and not character.

You could possibly use the Configuration > Custom > Extraction to pull content, then export the content to CSV, then import that CSV to Google Sheets, then count the characters in a particular cell (which has the content export)?

RobertCarnitas · Nov 12, 2021

Awesome! Thanks, Conor.

I did try the lazy way

with Screaming frog, total words multiplied by 5 and 6.5 (A Google search listed 5-6.5 letters per word) to get a range. What I did not take into account was the header/footer/navigation.

I will try importxml into Google Sheets. I am not super technical, but I can probably figure it out with a few tutorials, along with some trial and error.

I appreciate your help

Conor Treacy · Nov 12, 2021

RobertCarnitas said:
What I did not take into account was the header/footer/navigation.

From the Configuration > Content > Area setting it looks like you can define classes or IDs to either scan or ignore. So if you were scanning all contents of an "article" class on our site for example (which is built with Elementor), you'd set the class as "elementor-widget-theme-post-content" as that would contain only the body of the post and ignore the header, featured image, comments and other recommended posts.

So rather than defining all the areas to EXCLUDE in the scan, you can define the area INCLUDE in the scan. But again, you'd still need to multiply on the 5 characters or 6.5. I hunted the screamingfrog documentation yesterday but didn't see which character count they're using for the definition of a word. Maybe open a ticket with them on that?

Going SF way is definitely easier than setting up importxml and parsing etc.

Either way, please do post how you got on. I'm interested to hear how it went!

Full Site Character Count Tool? (Preferably Free)

RobertCarnitas

Member

Conor Treacy

0

Conor Treacy

0

RobertCarnitas

Member

Conor Treacy

0

Similar threads

Login / Register

Events

Newest Posts

Trending: Most Replies

Trending: Most Viewed

Promoted Posts

Share this page