Google: don’t use TF-IDF to optimize your web pages

In a webmaster hangout on YouTube, Google’s John Mueller said that you shouldn’t rely on TF-IDF to optimize your web pages for high rankings. What is TF-IDF and what are better ways to optimize your web pages?

What is TF-IDF?

What is TF-IDF?

TF-IDF is short for term frequency–inverse document frequency. It’s a value that is intended to reflect how important a word is to a document in a collection or corpus.

In the case of the term frequency, the simplest choice is to use the raw count of a term in a document, i.e., the number of times that term occurs in the document. The inverse document frequency is a measure of how much information the word provides, i.e., if it’s common or rare across all documents.

It is the logarithmically scaled inverse fraction of the documents that contain the word (obtained by dividing the total number of documents by the number of documents containing the term, and then taking the logarithm of that quotient).

Basically, the value checks how often a word appears in a document, and then it compares this to all documents that are in the index.

What’s the problem with TF-IDF?

TF-IDF compares the frequency of words in one web page to all web pages that are indexed by Google. No tool has access to all pages that Google has indexed. That means that all TF-IDF tools can only show very rough estimates instead of the real number.

According to Google’s John Mueller, you cannot reproduce the real value. In addition, TF-IDF is a very old metric:

“TF-IDF is essentially a metric that that is used in information retrieval so if you’re building a search engine with regards to trying to understand which are the relevant words on a page. We use a ton of different techniques from information retrieval and there’s tons of these metrics that have come out over the years […]

My general recommendation here is not to focus on these kind of artificial metrics and because it’s something where on the one hand you can’t reproduce this metric directly because it’s based on the overall index of all of the content on the web. […]

This is a fairly old metric and things have evolved quite a bit over the years. There are lots of other metrics as well, so just blindly focusing on one kind of theoretical metric and trying to squeeze those words into your pages – I don’t think that’s a useful thing. I think that’s very short-sighted thinking because you’re focusing just purely on on a search engine […]

I would strongly recommend focusing on your website and its users and making sure that what you’re providing is something that Google will in the long term still recognize and continue to use as something valuable.”

You can view John Mueller’s statement here:

There are better was to optimize your web pages

John Mueller’s recommendation is to create a valuable website that helps your users. The content of your web pages is important. It’s also important that your web pages can be found for the right keywords. In general, you should do the following:

  1. Identify keywords that will bring targeted visitors to your website.
  2. Create good content that is related to these keywords and optimize these pages.
  3. Remove technical errors from your web pages to make sure that search engines can index all of your pages.
  4. Make sure that your web pages work on desktop and mobile.

Don’t focus on a particular metric. Improve the overall value of your web pages. The tools in SEOprofiler help you to do that:

Optimize your website

Johannes Selbach

Johannes Selbach is the CEO of SEOprofiler. He blogs about search engine optimization and website marketing topics at "".