SEO Blog - Internet marketing news and views  

Phrase Based Indexing and Retrieval one more time

Written by David Harry   
Tuesday, 16 September 2008 10:21

How Cuil is that?

It seems phrase based IR (PaIR) is back – momentarily at least. Super search geek Bill Slawski had a post today about a late entry to the set of patents from Google on the topic (see; phrase based indexing and clusters), which is as good a reason as any to revisit it.

Simply put it is one of the methods for understanding relevance of concepts and topical anomalies through phrases and semantics. It is a probabilistic learning model which seeks to add more relevance to the ranking process. A large number of people search in phrases, not singular keywords and this method certainly had some merit.

For more see these;

Phrase based optimization resources (here on the trail)
Phrase based indexing and retrieval (Reliable SEO)
Phrase based IR – a second look (Van SEO Design)
Phrase Based information retrieval and spam detection (SEO by the Sea)

This one, from what I gather, deals with clustering of concepts/related phrases and using occurrences to create a ranking signal. For any given topic (query space) there are related phrases that will also occur in a given document. By looking at ratios of related phrases, clusters can be created to valuate other pages and so forth…  it seems to have some ranking mechanisms not covered previously.

I’ve only read Bill’s breakdown at this point, not the patent itself – but that’s what I am getting.

Now, these patents were first applied for back in 2004 when Anna Patterson first arrived at Google. The others were granted in 2006 – this one seems to have had some back and forth apparently. Being four years old, if any of these concepts are at play in the Google results; they have been for a while.

Is phrase based IR Cuil?

For those of you playing along at home, Anna Patterson might be a name you’re familiar with of late. No longer with Google, she and the hubby recently launched the search engine Cuil. This does raise questions as well;

Is Cuil built on similar technology?
Why doesn’t Cuil work?
Is Google using PaIR?

I wonder if Google combines these methods with other ranking signals such as links and behavioural metrics to come up with better results. We’ve all heard the tales of LSI and Google, was phrase based IR the next link in the chain? Was this the relevance change many noticed years ago? Who knows…

What I do know is that it is a very interesting method worth reading up on. If Google is using it or not, ideas for sound content creation in your SEO (and avoiding Spam flags) can be borne from understanding phrase based IR.

If you haven’t read up on it… I suggest you do ;0)

L8TR

 

Comments  

 
0 # Bill 2008-09-16 13:09
Hi Dave,

I didn't expect to see this patent when I went digging around in the patent database, but it was exciting to see.

It provides an alternative way to think about how a search engine might rerank search results.

Is Google using it? Are they considering using it? Has it been part of their ranking system in the past, and they've since moved on to something new?

It's hard to tell, but for people who haven't dug into it in depth, and who are curious about how search engines may work, it's a system worth spending some time with, and as you note, it suggests some good ideas for content creation that can help people build richer pages.

Thanks.
Reply | Reply with quote | Quote
 
 
0 # Dave 2008-09-16 13:23
Hiya Mr Bill.... yes, it was quite the thing to see in my reader this morning - haven't seen much on phrase based IR in more than 18 months. Since then Anna has moved along from Google and that was the closest we've had.

But yes, I always found the goodies I learned from the original set of patents helped my SEO a ton, as well as potential relevance implications for content development.

So, if the peeps at the plex are using it or not; always worth studying for any search geek.

Thanks for dropping by ...hope things are going well for U
Reply | Reply with quote | Quote
 
 
0 # Neal \thePuck\ Jansons 2008-09-17 00:08
Nice post, Dave. I have to say that I predicted that phrase-based would be a fruitful idea, but I think that the technology to really make it work is just starting to surface.

There is an issue that comes to mind. I was talking with Aaron Brazell from Lijit, etc at the last Wordcamp, and he made a good point: Google has trained people to search a certain way. They have taught us a language of search. This means that any large scale development in search that doesn't speak in that language has a barrier to entry. This is something I have been increasingly thinking about since then, and it seems to be true: no matter how spiffy a new search technology is, unless it fits the way Google has taught people to search, people will not be as likely to adopt it.
Reply | Reply with quote | Quote
 
 
0 # Dave 2008-09-17 09:17
Well, back on 03 Google aquired Applied Semantics and so when relevance changes were noticed in 05-06 peeps started talking aobut LSI being more than ad serving technologies.

Ultimately, it may have been the 04 addition of Anna that was responsible.

I am not totally sure Google has peeps trained, or if it is current search technologies. I have done qualitative resarch by watching (in person) how peeps search for things. It varies wildly..lol
Reply | Reply with quote | Quote
 
 
0 # waveshoppe 2008-09-17 10:50
Aloha Dave, I don't think it ever left, its just that some have not been
paying attention. Those who have been hammering away at phrase based
indexing (cough, cough) are seeing positive results. Its interesting that
you mentioned people search in phrases, I have been messing with it for a
few weeks now, I am quite intrigued with the results so far.

MSN/Live did a big turn around today (for the good) and it seems that phrase
based indexing is in play over there, as well as at goog
Reply | Reply with quote | Quote
 

Add comment


Security code
Refresh

Search the Site

SEO Training

Tools of the Trade

Banner
Banner
Banner

On Twitter

Follow me on Twitter

Site Designed by Verve Developments.