SEO Dojo - training for search geeks   the Community for search engine optimization geeks

SEO Blog - Search and internet marketing news
How relevant is Cuil? PDF  | Print |  E-mail
Monday, 04 August 2008 08:21


Building a better search engine

We all know about the latest Google killer by now right? You know, Cuil (pronounced ‘cool’)… the relevance based search engine that boasts more pages in its index than the mighty peeps at the Plex?

While there has been no lack of FAIL floating around after its recent launch, there are a few reasons why this Gypsy might just take a deeper look.

The phrase based IR connection

What I found interesting was that one Anna Patterson is on board... which itself might not mean much, but spare me a few moments to enlighten. She’s one of the ‘ex-Googlers’ that are part of the team at Cuil (along with hubby Tom). I know.. all falling in place now right? Ok.. a little further...

...let me quote from the management team page;

“Anna was the architect of Google’s large search index, TeraGoogle, that launched in early 2006. While at Google, Anna was the technical lead of one of the two Web ranking groups at Google, in charge of GoogleBase, and the manager for the core piece of Google’s ad-matching technology. She joined Google in 2004 after designing, writing and selling Recall—the largest search engine in existence at the time at 12 billion pages..”

Is Cuil using phrase based IR?

 

And what made any of this remarkable to me was the fact she has done a lot of work in the ‘Phrase based indexing and retrieval’ area… as you can see in these patents;

Why that is of interest is that I covered these back in a set of posts in ‘06 (hat tip to Senor Slawski)… and held high hopes for such systems. To me it makes for a good approach, though potentially limited without secondary signals… in my opinion at least. Now how does this play out with Cuil? Not sure, but from what they posted on the philosophy page, seems it could be in the mix;

“Cuil prefers to find all the pages with your keyword or phrase and then analyze the rest of the content on those pages. During this analysis we discover that your keywords have different meanings in different contexts. Once we’ve established the context of the pages, we’re in a much better position to help you in your search.” - the Cuil Philosophy

In simplest terms they are looking to rank documents via relevance, not popularity.  This is certainly in-line with the probabilistic modelling of the phrase based approach. So one does have to believe that it is somewhere near the core of the Cuil indexation and ranking methods. But is it enough?

 

Is it really a FAIL?

It is truly hard to say, but early reviews (at the end of post) and some of my own tinkering around seem to leave one wondering. Of the many problems I have with it so far, one is certainly the historical relevance of results. That is there are many older documents (2004 and beyond) ranking in query spaces I know to be somewhat time sensitive. This is certainly one area the old ‘query deserves freshness’ angle plays out well for Google. I would question how on page relevance deals with temporal issues. While topical relevance is great, historical signals are also important.

Another area I am bullish in is user behavioural metrics as ranking signals and personalized search methods. This is certainly NOT going to be in the Cuil bag of tricks as part of their mantra is not collecting related data;

“Cuil analyzes Web pages and not click-throughs, we don’t need to know your search history and habits. So our privacy policy is very simple: when you search with Cuil, we do not collect any personally identifiable information, period. We have no idea who sends queries: not by name, not by IP address, and not by cookie. Your search history is your business, not ours."

While there are those that have privacy issues with such data, it hasn’t stopped Google in its tracks now has it? At the end of the day these signals are important in understanding the user’s relationship with the search results and sites listed. They shouldn’t be discounted. Much like links and historical signals, popularity has its place in the results as does authority. Working from a relevance centric approach may seem utopian, but it can’t carry the day.

I also found problems with duplicate results or more than one from a given site which the indented listings in our familiar format lend themselves to (as do site links). There are a few problems out of the gate to be sure…

 

Somebody get me a blender

Even if it doesn’t rule the space, we might just be able to take away some ideas that can improve the next generation of search…

As recent as a few weeks ago I was writing about a world beyond links in an effort to illuminate ranking signals that we can consider beyond mere links and the offering from Cuil seems to be light on methods beyond mere page relevance. I truly yearn for a search engine that get's the mix right. As for Cuil, we haven’t even begun to look at the spam-ability of this approach either. I haven’t read much from the Cuil team on this end of things, always a serious area of concern for any search engine. Are they using the PaIR spam detection approach? Remains to be seen...

For the moment I am not going to write them off entirely and will dig deeper to see what positives there may be… if only for the fact that I was once a phrase based IR (PaIR) junkie.

If it is also using probabilistic learning, then it may (theoretically) get better over time. That would certainly explain holes early on with certain query spaces. I will do some more playing around with it next weekend.

 

Until then…more reactions;

 

Comments (4)
  • Internet Marketing Joy
    Although there are some problems with the site's result page..I still think that this SE will come a long way..^^
  • RedEvo  - Thought provoking
    Thanks for digging below the surface and providing some great insights and comments on Cuil. It's all to easy to dismiss others in a 'how could they top Google' kind of way. If we all thought that way all cars would be black.

    Thanks

    d
  • Chris Telfer
    I must say that I'm not a great fan of the results, as they appear just as spammy as Google. The other thing I would love to know is how their search engine technology decides which image to show next to the listing...as it doesn't work very well. I've seen buttons, bits of page headers and 'we take credit cards' seals. Another thing are the columns, I do not really like the look of that to be honest. It will be interesting to see it once they have ironed out the creases.
  • chris  - search engines
    it's been over a month now and I still don't think cuil is retuning good results. Keep on trying cuil... more search engine competition would be nice.
Write comment
Your Contact Details:
Comment:
[b] [i] [u] [url] [quote] [code] [img]   
=)=D=(XD:dizzy:T_T:blush:^_^=_=-_-:pout::angry:
=Oo_O:snicker::eyebrow::sigh::sick::whisper::whistle::nuu::gah::flame::cool:
:shy::kawaii::notfunny::snooty::uhh:X_XXB:talkbiz::grr::onoes::psychotic::scared:
:evil::nomnom::zombie::want::drunk::love::meow::music:
Security
Please input the anti-spam code that you can read in the image.
 

Find your way

Trail Maps

Follow me on Twitter
Think Visibility; Internet Marketing
Angie's professional copywriting services
Check out the full line of SEO and PPC keyword tools
Raven - SEO Tools
SEM Rush - keyword research tools
New Media - programming and plugins