Life is a mystery

Home arrow Features arrow Latest News arrow How relevant is Cuil?  
Friday, 21 November 2008
Guess Who


And my company
SEO Consulting services


Latest Adventures
Dave's Favs
Fire Horse Features
180+ SEO Tools and resources
the SEOsearch engine
All about Google's personalized search
Social Media Marketing search
Search Top Social sites
Tweed the Twitter and FriendFeed search engine

Guest Rider Round Up
Meet the Guest Riders - full list of guest posters on the Trail
Most Travelled
Syndicate

Follow the Fire Horse

Get The FireHorse Feed Add to Google

How relevant is Cuil?
Monday, 04 August 2008
Digg!


Building a better search engine

We all know about the latest Google killer by now right? You know, Cuil (pronounced ‘cool’)… the relevance based search engine that boasts more pages in its index than the mighty peeps at the Plex?

While there has been no lack of FAIL floating around after its recent launch, there are a few reasons why this Gypsy might just take a deeper look.

The phrase based IR connection

What I found interesting was that one Anna Patterson is on board... which itself might not mean much, but spare me a few moments to enlighten. She’s one of the ‘ex-Googlers’ that are part of the team at Cuil (along with hubby Tom). I know.. all falling in place now right? Ok.. a little further...

...let me quote from the management team page;

Anna was the architect of Google’s large search index, TeraGoogle, that launched in early 2006. While at Google, Anna was the technical lead of one of the two Web ranking groups at Google, in charge of GoogleBase, and the manager for the core piece of Google’s ad-matching technology. She joined Google in 2004 after designing, writing and selling Recall—the largest search engine in existence at the time at 12 billion pages..

Is Cuil using phrase based IR?

 

And what made any of this remarkable to me was the fact she has done a lot of work in the ‘Phrase based indexing and retrieval’ area… as you can see in these patents;

Why that is of interest is that I covered these back in a set of posts in ‘06 (hat tip to Senor Slawski)… and held high hopes for such systems. To me it makes for a good approach, though potentially limited without secondary signals… in my opinion at least. Now how does this play out with Cuil? Not sure, but from what they posted on the philosophy page, seems it could be in the mix;

Cuil prefers to find all the pages with your keyword or phrase and then analyze the rest of the content on those pages. During this analysis we discover that your keywords have different meanings in different contexts. Once we’ve established the context of the pages, we’re in a much better position to help you in your search. - the Cuil Philosophy

In simplest terms they are looking to rank documents via relevance, not popularity.  This is certainly in-line with the probabilistic modelling of the phrase based approach. So one does have to believe that it is somewhere near the core of the Cuil indexation and ranking methods. But is it enough?

 

Is it really a FAIL?

It is truly hard to say, but early reviews (at the end of post) and some of my own tinkering around seem to leave one wondering. Of the many problems I have with it so far, one is certainly the historical relevance of results. That is there are many older documents (2004 and beyond) ranking in query spaces I know to be somewhat time sensitive. This is certainly one area the old ‘query deserves freshness’ angle plays out well for Google. I would question how on page relevance deals with temporal issues. While topical relevance is great, historical signals are also important.

Another area I am bullish in is user behavioural metrics as ranking signals and personalized search methods. This is certainly NOT going to be in the Cuil bag of tricks as part of their mantra is not collecting related data;

Cuil analyzes Web pages and not click-throughs, we don’t need to know your search history and habits. So our privacy policy is very simple: when you search with Cuil, we do not collect any personally identifiable information, period. We have no idea who sends queries: not by name, not by IP address, and not by cookie. Your search history is your business, not ours."

While there are those that have privacy issues with such data, it hasn’t stopped Google in its tracks now has it? At the end of the day these signals are important in understanding the user’s relationship with the search results and sites listed. They shouldn’t be discounted. Much like links and historical signals, popularity has its place in the results as does authority. Working from a relevance centric approach may seem utopian, but it can’t carry the day.

I also found problems with duplicate results or more than one from a given site which the indented listings in our familiar format lend themselves to (as do site links). There are a few problems out of the gate to be sure…

 

Somebody get me a blender

Even if it doesn’t rule the space, we might just be able to take away some ideas that can improve the next generation of search…

As recent as a few weeks ago I was writing about a world beyond links in an effort to illuminate ranking signals that we can consider beyond mere links and the offering from Cuil seems to be light on methods beyond mere page relevance. I truly yearn for a search engine that get's the mix right. As for Cuil, we haven’t even begun to look at the spam-ability of this approach either. I haven’t read much from the Cuil team on this end of things, always a serious area of concern for any search engine. Are they using the PaIR spam detection approach? Remains to be seen...

For the moment I am not going to write them off entirely and will dig deeper to see what positives there may be… if only for the fact that I was once a phrase based IR (PaIR) junkie.

If it is also using probabilistic learning, then it may (theoretically) get better over time. That would certainly explain holes early on with certain query spaces. I will do some more playing around with it next weekend.

 

Until then…more reactions;

 

Comments
Add New Search
Internet Marketing Joy     |2008-08-07 03:28:28
Although there are some problems with the site's result page..I still think that this SE will come a long way..^^
RedEvo  - Thought provoking     |2008-08-07 15:41:24
Thanks for digging below the surface and providing some great insights and comments on Cuil. It's all to easy to dismiss others in a 'how could they top Google' kind of way. If we all thought that way all cars would be black.

Thanks

d
Chris Telfer     |2008-08-07 18:52:46
I must say that I'm not a great fan of the results, as they appear just as spammy as Google. The other thing I would love to know is how their search engine technology decides which image to show next to the listing...as it doesn't work very well. I've seen buttons, bits of page headers and 'we take credit cards' seals. Another thing are the columns, I do not really like the look of that to be honest. It will be interesting to see it once they have ironed out the creases.
chris  - search engines     |2008-10-09 21:54:41
it's been over a month now and I still don't think cuil is retuning good results. Keep on trying cuil... more search engine competition would be nice.
Write comment
Name:
Email:
 
Website:
Title:
UBBCode:
[b] [i] [u] [url] [quote] [code] [img] 
 
 
:angry::0:confused::cheer:B):evil::silly::dry::lol::kiss::D:pinch:
:(:shock::X:side::):P:unsure::woohoo::huh::whistle:;):s
:!::?::idea::arrow:
 
Please input the anti-spam code that you can read in the image.

3.20 Copyright (C) 2007 Alain Georgette / Copyright (C) 2006 Frantisek Hliva. All rights reserved."





Reddit!Del.icio.us!Facebook!Slashdot!Netscape!Technorati!StumbleUpon!Newsvine!Furl!Blogmarks!Yahoo!Ma.gnolia!PlugIM!Squidoo!BlogMemes!
 
< Prev   Next >
 

Find your way
Trail Maps

Learn Search Engine Optimization
Premium web directory
Search SEO profesionals
More SEO tips and advice
RSS & Blog Directory, Forum, Reviews & Tools.
Internet Marketing Broadcast

FireHorse Friends


Trail Riders

Trail Badges

Big list of Search Marketing Blogs

All the Top sites on the Web

Columbo Award Runner Up

SEO Superlatives