SEO Blog - Internet marketing news and views  

Google granted a very Cuil search patent

Written by David Harry   
Wednesday, 01 October 2008 00:49

Yet another patent on phrase based indexing and retrieval (YaPaIR)

Unfortunately Cuil buzz has cooled, but at least co-founder Anna Patterson is back on the radar. If only in name and memories of what used to be that is. And if that radar is for patent droolers and semantic simians... then it is a name of note.

You see, things were different back in Oh4 when she was working with Google and toiled on computer learning model for understanding concepts and semantic relationships through phrasing. Ah yes, I remember it like it was yesterday….

That’s Cuil

Oh, we were just talking about Phrase based indexing and retrieval last week? You say Bill wrote about it as well? And we even managed to slip it into yesterday’s rant? Wow… just doesn’t want to go away huh?

Google's Cuil ideas

Hot on the heels of a delayed phrase based IR patent, another has surfaced. Also filed back in 2004 with the rest of the collection today Google was granted; Phrase-based indexing in an information retrieval system

Follow the drama…

Now this little diddy is part of an exciting ongoing mini-series. The excitement is waiting for pieces to pop up every few years. That’s the best part…like a treasure hunt. Here's the set as we know it;

Phrase Identification in an Information Retrieval System,
Filed on Jul. 26, 2004;
Granted; Jan 26 2006

Phrase-Based Generation of Document Descriptions,
Filed on Jul. 26, 2004;
Granted; Jan 26 2006

Phrase-Based Searching in an Information Retrieval System,
Filed on Jul. 26, 2004;
Granted; Feb 09 2006

Automatic Taxonomy Generation in Search Results Using Phrases,
Filed on Jul. 26, 2004;
Granted; Sept 16 2008

Phrase-based indexing in an information retrieval system
Filed on Jul. 26, 2004;
Granted; Sept 30 2008

Missing links?

Phrase-Based Detection of Duplicate Documents in an Information Retrieval System,
filed on Jul. 26, 2004, ?? (it is referenced in the others)

Phrase-Based Personalization of Searches in an Information Retrieval System,
filed on Jul. 26, 2004; ?? (it is referenced in the others)


Then in 2005  continuation of;

Multiple index based information retrieval system
Filed on Jan 25, 2005;
Granted; May 18 2006

And in June 2006 came

Detecting spam documents in a phrase based information retrieval
Filed on June 28 2006;
Granted; Dec 28 2006

So it has been a ride full of thrills and fun for the whole family. There certainly was some interest in this for awhile at Google. Sadly, the passion Cuiled and they parted ways. What ever did become of phrase based IR? Did Google use some for personalization and query revisions? A minor signal even? What about Cuil? Did Anna take the same mindset over there when they built that bad boy?


Will we ever know?

How the hell should I know? It does make for some interesting reading and opens up new dimensions on how better relevancy for concepts and ideas could be found through phrases. Last time I checked singular word searches are on the decline and 2-3 word phrases are becoming the commonplace.

In simplest terms the system would learn based on commonalities in phrasings across a given document set. Instead of a linear (Boolean) approach page content, link texts and so forth can be analyzed to topically related themes. A good example given was for a query for - ‘blue merle agility training’;

"blue merle":: "Australian Shepherd," "red merle," "tricolor," "aussie";

"agility training":: "weave poles," "teeter," "tunnel," "obstacle," "border collie".

Are examples of related terms that appear in other documents in the results set. By looking statistically for one can find documents with the topimal occurances. They also discuss snippets, personalization, spam detection, duplicate content and more through the phrase based IR approach.

Think of it like having related phrases to a given topic and then adding phrase extensions to further max out the list. Now look at the potential web pages to be ranked and start finding the ones with the optimal related terms. You can also look at the inbound and outbound link anchor text for further scoring. Check for duplication and away you go. Sure, that’s simplified; but we’ve been down this trail before.

And my spammer friends would be hard pressed to sort out what exactly the right magic mix is. And that’s assuming it was a standalone system.

Can I get some PageRank with that please?

Now let’s say we already have a system in place that uses a wide variety of factors in the indexing and retrieval system. This means we can take our phrase based model and slowly play with it and tune it so that it finds a home along-side of the other algos in the big Happy Googley world.

Because it covers a variety of areas from personalization and query analysis down to Snippet generation and spam detection, it would likely be a handy tool.

A blast from the past

Almost seemed funny, you know, funny-smirk-giggle, not bwaaa ha ha haha funny, that Cuil (Anna’s new beau) talks about not using personal info, nor relies on links. Because when I read this, almost sounded like a shot considering all that’s happened;

“Search engines that use a ranking algorithm that relies on the number of links that point to a given document in order to rank that document can be "bombed" by artificially creating a large number of pages with a given anchor text which then point to a desired page. As a result, when a search query using the anchor text is entered, the desired page is typically returned, even if in fact this page has little or nothing to do with the anchor text.”

Ouch… meeeooow!! You tell ‘em girl. I guess the writing was on the wall, erm, patent for this young couple. And as you can see… Google got to keep all her stuff in the divorce…. How Cuil is that?

For more on Phrase Based Indexing and Retrieval see; Phrase Based Optimization resources



Search the Site

SEO Training

Tools of the Trade


On Twitter

Follow me on Twitter

Site Designed by Verve Developments.