More threads by RobertCarnitas

Joined
Dec 26, 2019
Messages
10
Reaction score
5
Does anyone know of a tool, that calculates the total number of characters on a website? Screaming Frog has word count per page, which is pretty easy to calculate, but it doesn't have character count (unless I am missing it, which is possible)

I am trying to calculate the potential cost of Google Cloud Translation services across a large number of websites.

Thanks in advance!
 
Hey Robert, ready to get complicated? :)

You could use importxml via Google Sheets. Scrape the site/pages and using XPath to scrape specific blocks/areas of text, then count characters from there.

One of the biggest issues with word count systems is that they count ALL words on a page. That's the menu, dropdown menu, footer links - if a text is on the page, then it counts it. The problem however is that the actual "meat" of your page may be just a product description that is 100 words, but the entire page says it's 270.

The normal calculation for words is "5 characters" - so the lazy way would be taking the word count in screaming frog and just multiplying by 5. It would give an estimate for you.

I can't find any option with regards to counting characters in screamingfrog. But if you wanted to run the XML route and define xpaths etc, that might be the "easiest" way.
 
OH, also, under the Configuration in ScreamingFrog, you can select "Content > Area" and from there you can define classes and areas to total - but again, it's just word count and not character.

You could possibly use the Configuration > Custom > Extraction to pull content, then export the content to CSV, then import that CSV to Google Sheets, then count the characters in a particular cell (which has the content export)?
 
Awesome! Thanks, Conor.

I did try the lazy way :) with Screaming frog, total words multiplied by 5 and 6.5 (A Google search listed 5-6.5 letters per word) to get a range. What I did not take into account was the header/footer/navigation.

I will try importxml into Google Sheets. I am not super technical, but I can probably figure it out with a few tutorials, along with some trial and error.

I appreciate your help
 
What I did not take into account was the header/footer/navigation.
From the Configuration > Content > Area setting it looks like you can define classes or IDs to either scan or ignore. So if you were scanning all contents of an "article" class on our site for example (which is built with Elementor), you'd set the class as "elementor-widget-theme-post-content" as that would contain only the body of the post and ignore the header, featured image, comments and other recommended posts.

So rather than defining all the areas to EXCLUDE in the scan, you can define the area INCLUDE in the scan. But again, you'd still need to multiply on the 5 characters or 6.5. I hunted the screamingfrog documentation yesterday but didn't see which character count they're using for the definition of a word. Maybe open a ticket with them on that?

Going SF way is definitely easier than setting up importxml and parsing etc.

Either way, please do post how you got on. I'm interested to hear how it went!
 

Login / Register

Already a member?   LOG IN
Not a member yet?   REGISTER

LocalU Event

Trending: Most Viewed

  Promoted Posts

New advertising option: A review of your product or service posted by a Sterling Sky employee. This will also be shared on the Sterling Sky & LSF Twitter accounts, our Facebook group, LinkedIn, and both newsletters. More...
Top Bottom