SEO Blog - Internet marketing news and views  

Phrase Based Indexing and Retrieval one more time

Written by David Harry   
Tuesday, 16 September 2008 10:21

How Cuil is that?

It seems phrase based IR (PaIR) is back – momentarily at least. Super search geek Bill Slawski had a post today about a late entry to the set of patents from Google on the topic (see; phrase based indexing and clusters), which is as good a reason as any to revisit it.

Simply put it is one of the methods for understanding relevance of concepts and topical anomalies through phrases and semantics. It is a probabilistic learning model which seeks to add more relevance to the ranking process. A large number of people search in phrases, not singular keywords and this method certainly had some merit.

For more see these;

Phrase based optimization resources (here on the trail)
Phrase based indexing and retrieval (Reliable SEO)
Phrase based IR – a second look (Van SEO Design)
Phrase Based information retrieval and spam detection (SEO by the Sea)

This one, from what I gather, deals with clustering of concepts/related phrases and using occurrences to create a ranking signal. For any given topic (query space) there are related phrases that will also occur in a given document. By looking at ratios of related phrases, clusters can be created to valuate other pages and so forth…  it seems to have some ranking mechanisms not covered previously.

I’ve only read Bill’s breakdown at this point, not the patent itself – but that’s what I am getting.

Now, these patents were first applied for back in 2004 when Anna Patterson first arrived at Google. The others were granted in 2006 – this one seems to have had some back and forth apparently. Being four years old, if any of these concepts are at play in the Google results; they have been for a while.

Is phrase based IR Cuil?

For those of you playing along at home, Anna Patterson might be a name you’re familiar with of late. No longer with Google, she and the hubby recently launched the search engine Cuil. This does raise questions as well;

Is Cuil built on similar technology?
Why doesn’t Cuil work?
Is Google using PaIR?

I wonder if Google combines these methods with other ranking signals such as links and behavioural metrics to come up with better results. We’ve all heard the tales of LSI and Google, was phrase based IR the next link in the chain? Was this the relevance change many noticed years ago? Who knows…

What I do know is that it is a very interesting method worth reading up on. If Google is using it or not, ideas for sound content creation in your SEO (and avoiding Spam flags) can be borne from understanding phrase based IR.

If you haven’t read up on it… I suggest you do ;0)



Search the Site

SEO Training

Tools of the Trade


On Twitter

Follow me on Twitter

Site Designed by Verve Developments.