SEO Blog - Internet marketing news and views  

Real time search engines; should SEOs care?

Written by David Harry   
Thursday, 02 July 2009 11:41

The hype and the reality

(Update: more rambling on the topic my latest post on real time social search)

One of the more popular buzz words in search over the last while is ‘real time search’. For starters, that’s a bit of a misnomer; there are NO actual ‘real time’ engines… that’s simply not possible (even Twitter search updates in intervals not entirely ‘real time’). Regardless, people keep getting worked up about this particular area to the point of wondering how it will affect SEO efforts.

Last week the folks at Media Post contacted me for some quotable quips for a piece on real-time search and (One Riot’s) PulseRank. They wanted to know if SEOs were considering this the ‘wave of the future’ or something new to the lexicon.

For their part, One Riot has said, "We believe PulseRank will replace PageRank over time for the real-time Web. The reasons are clear. PageRank is based on the number of links to a page or a specific URL builds over time, as people link to pages. It provides the searcher with the "authoritative answer” -- These are some bold statements and yet another in a long line of self professed ‘Google Killers’ – but is there credence? We’re going to look at the world of real time search and see if there really is anything to be looking at (as SEOs)… care to come along?

What is real time search?

Ok, for starters I often see the commonly known ‘real time’ search engines are simply seeking out social mentions not really crawling the web and indexing it. This, to me, is the first part of the problem… are they really search engines? Or simply buzz monitoring tools?

Real-time search is one of the more difficult areas for search engineers to deal with. Much of the problem lies in establishing the most authoritative answers without being bogged down by spam. Dealing with web spam requires a great many signals that are hard to come by in 'real time'. At the end of the day if 'real time' search was an effective approach Google and others would (likely) be doing it already.

There are more than a few problems associated with real time search including;

  • Spam – this would be the most problematic area for any search engine. It would be nearly impossible to combat web spam in a large scale environment. This is why most of the major engines haven’t moved beyond ‘almost real time’ (such as Google’s ‘query deserves freshness’ approach.
  • Ranking – as with the QDF, there needs to be some type of evaluation of quality and authority to make ranking of documents effective. Some of the above engines use domain and (social) user authority, but this approach does kill some of the democratic nature of the web and the rich get richer. What about lesser known domains and new users/content?
  • Social dependencies – most of these real time engines are reliant on social signals which really does limit the actual abilities as a search engine. They are NOT indexing pages that aren’t getting social luvin’. And despite popular belief, not ALL content on the web is social worthy and this makes such search engines limited in scope.

If we consider the above, there is a lot of work to be done if such real time search approaches are to ever be of value. But

Rubber meets the road

In the testing we ran so far there is no real sense to the ranking mechanisms. In looking at some of the more general queries (in this case; ‘PageRank’) it seems that merely being the most recent citation is all that matters. Sadly, the post itself is devoid of any real content relating to PageRank and thus shouldn’t really be ranking. But it did… see the problem here?

One Riot - This is the application with the vaunted ‘PulseRank’ which claims to be a superior method from Google’s PageRank. Our early testing showed it to be generally inferior to Collecta, but in this test it did OK comparatively.

Test time; 49 Minutes - this was only AFTER it was Tweeted by someone that had my blog RSS hooked up to a Twitter account. What is interesting is this was the first one to list the actual blog post, not the Tweets
.One Riot test results

Scoopler - This one also tends to rely on Twitter, but it does pull out the most popular links which does list the actual blog post. This is a nice feature and of the social search engines, this one would be my fav….

Test time; 48 minutes; as with the above, this is obviously not REAL TIME and really only a social mention aggregator - though the actual post links are a nice touch.

Scoopler

Whos Talking - For the record, my man Joe has never claimed to be a real-time search engine or even a search engine for that matter. But considering many of these apps tend to be more buzz monitoring than search/indexing, I decided to include it into the mix.

Test Time; 50 minutes – and once again, the Twitter feeds are what it has picked up.

Whos Talkin

Crowd Eye - This is a nothing more than a Twitter monitoring tool and unless you have content Tweeted, it won’t show up. Thus, once more, this is not really a traditional search engine and is nothing more than a limited buzz monitoring tool. And really, it’s nothing more than a pretty Twitter search… soooo…. FAIL.

Test Time; 56 Minutes - and it was the Twitter account with my feed in it. Furthermore, it only picked up 1 of the 3 Tweets for the post.

Crowd Eye

Collecta - Test Time; 49 minutes - This is another one that seems to be dominated by Twitter results more than anything else, although there are some blog results as well (for the record they claim; ‘We draw from the web at large - not just social networks.’). While this may be true, it would seem much of the ‘real time’ aspects relate to social mentions.

Collecta

Google – Test Time; 3hrs 15 minutes - Of the big 3 only Google managed to actually get the post indexed. And while it wasn’t as fast as the social engines, I’d still say that 3hrs is pretty good considering that there are more involved ranking methods and better quality of results. Obviously more popular sites that are crawled more frequently would be even faster.
Google
Google Blog Search – Test Time; 15hrs - Strangely, the blog search took considerably longer to index the page. Once more though, the sorting/ranking abilities make this a more usable search engine. I am going to do some testing with more popular blogs to see how short a time frame there can be with this one.
Google Blog Search
Yahoo – Test Time; 19hrs - Not only was Yahoo slower, but it never did get the right page and was picking up the title from my side bar links not the actual page…. Have to give this one a FAIL
Yahoo

Bing – Test Time; 23hrs - Much like Yahoo, not only did they take a fair length of time to index, but it also got the wrong page by picking up side bar links. Also a FAIL

Bing

What is clearly obvious is that the current state of RT engines are wired for social. What that means is there is likely a great deal of content on the web that isn't getting a social mention that wouldn't get picked up by them (unlike a traditional engine such as Google). This is a serious problem and makes the case for a buzz tool over an actual search engine that crawls the web and makes decisions on levels of indexation for a given site/page. We also did some minor testing with pages on sites not getting social traction and as you may imagine, the RT engines failed miserably.

 

Nothing more than social regurgitation

What is really happening with the current stock of RT engines? For the most part these are nothing more than social mention regurgitation more than any type of formal crawling/indexation. That is certainly NOT what a search engine is or does…

And that's what most of this so-called real time search is essentially. Not as much about real time indexation and ranking as they are about buzz monitoring or glorified Twitter search applications.


What ARE the big three doing?

They seem to be looking at social structures as sphere's of influence. Google has a few patents on a system that has been dubbed 'FriendRank' and 'InfluencerRank' (though there is no wording as such in the patents) which could hint at a social structure and ranking system. Read more here - As is Microsoft with a related system (here)

Now, these approaches are primarily for Ad targeting and recommendation engines, but it does seem to show more of the direction that search engines may use in concert with traditional search approaches to develop some type of 'social search' aspect. This does seem a logical approach ultimately by integrating social signals into the existing approaches. Sure, it's more social search than real-time, but it does help solve some of the problems we outlined earlier.



The verdict

Real time search is still something that hasn't been effectively conquered from a technical and critical mass vantage point. As such, it shouldn't be a serious consideration for any SEO beyond the potential for buzz monitoring. In many ways these are barely what we would even call a search engine traditionally. If the big players are to ever broach this realm, much work would need to be done (especially with web spam).

Verdict? Should SEOs be concerned with real time search optimization? Not at this point… It's a passing fancy and until a serious engine (that drives traffic) embarks on a real-time adventure, SEOs are best to watch from the wings. And to those that tout Twitter search as a RT search engine, I submit that is is nothing more than a site search... not really a search engine.

 

More reading;

Real time search off - TechCrunch

Twitter’s real-time spam problem - Search Engine Land

Race is on for best real-time search engine - Seattle PI

Who rules real time search? -  Venture Beat

Bing Keeps Its Foot On The Gas, Adds Tweets To Results - TechCrunch

 

Search the Site

SEO Training

Tools of the Trade

Banner
Banner
Banner

On Twitter

Follow me on Twitter

Site Designed by Verve Developments.