SEO Blog - Internet marketing news and views  

Chocolate and PaIRs and Jaguars Oh My!

Written by David Harry   
Wednesday, 23 September 2009 13:33

(the following is a post from Sarah Goodwin)

A beginners guide to Phrase Based Indexing & Retrieval

Let’s start this post with a caveat: I am not an expert in this area. This post is about how I have taken the information that David has imparted here over the last couple of years to use in my efforts. A starter guide if you will. 

The concept behind PaIR, (phrase based indexing and retrieval) on the most basic level seemed to be to be this; Google would look at all of the pages that mention a particular word, let’s say ‘chocolate’. They analyze what other phrases occur on pages that contain the word chocolate (as well as other related phrases)

Basics of phrase based IR

…and then use calculations/weights to look at the occurrence on each page, (compared to predetermined sets) and find the “ideal” occurrence ratio which determines the “best” page.

It’s all about relations

As with all things Google, there is no way to know exactly what this is. The various patents do suggest some numbers; however I don’t believe that every subject could have the same values. It is far more natural to mention some words lots of times when discussing certain subjects so an analysis of the language used when discussing those subjects would result in a different pattern to others.

Google search for jaguar

We also don’t know the special maths that Google would use (could I make it any clearer that I’m no maths geek, “special maths” you don’t hear that term too often round here) however we have a small advantage, we know which sites Google thinks are good, they rank at positions 1, 2, and 3.

Look at the keywords and keyword phrases on the pages that rank highly for your chosen terms, and compare them to the terms on the page you want to rank as a whole.

Jaguar the Car

Jaguar the Big Cat

Finding optimal occurrences

Comparing these phrase occurrences can help you to look for patterns. In doing this analysis on some of the sites I work with, I have found that high ranking sites follow similar patterns for on page related phrase occurrence, and that the page I am trying to rank does not fit to the same shape, despite being relevant content.

So if the top ranking pages for chocolate have a high occurrence of the term ‘luxury’, but a surprisingly low occurrence of the term smooth, but my page never mentions luxury but has a high occurrence of the term smooth, I can determine that I am not fitting into Google’s ideal shape, and tweak the content accordingly.
It’s really important that you take into account other factors when you look at these phrase patterns.

I have found that there are anomalous results where a single page has an extremely high link value, or where it has the key phrase as the domain, but it has produced some startling results, and a clear way of presenting content change requirements to clients.

So there you have it, a starter guide to using PaIR in your every day ranking efforts. Hope it provides as useful to you as it has to me.

Sarah Goodwin

About the author - Sarah Goodwin is a UK based SEO who says she's nowhere near as geeky as she likes to pretend (I guess that makes her a geek in training - but def a SOSG).When she's not working or blogging with the boys at she can usually be found playing with her 8 pet rats, or getting away from technology at her spinning wheel.

Dave says; what is important about this particular semantic analysis approach, is that we try to learn to think more in terms of no only KWs and modifiers (ie; 'cars' and 'car repairs' or 'car sales in New York') but also in terms or related phrases. It is not a rankings gold mine; it is a easily implememented practice in the content evelopment cycle... Here's a bunch of stuff on it over the years;

On my SEO Company site -

Spam detection in a phrase based IR system
Phrase based IR (part one)
Phrase based IR (part two)

Here on the Trail;

What you need to know about phrase based IR
Phrase based IR one more time
Lost Google patent on Phrase Based IR
Google awarded another Phrase based IR patent
Phrase based optimization resources

From Bill

Phrase Based Information Retrieval and Spam Detection
Google Phrase Based Indexing Patent Granted
Yahoo Phrase Based Indexing in a Nutshell (Yahoo stuff even)
What are the Top Phrases for Your Website?


Search the Site

SEO Training

Tools of the Trade


On Twitter

Follow me on Twitter

Site Designed by Verve Developments.