The hype and the reality
(Update: more rambling on the topic my latest post on real time social search)
One of the more popular buzz words in search over the last while is real time search. For starters, thats a bit of a misnomer; there are NO actual real time engines
thats simply not possible (even Twitter search updates in intervals not entirely real time). Regardless, people keep getting worked up about this particular area to the point of wondering how it will affect SEO efforts.
Last week the folks at Media Post contacted me for some quotable quips for a piece on real-time search and (One Riots) PulseRank. They wanted to know if SEOs were considering this the wave of the future or something new to the lexicon.
For their part, One Riot has said, "We believe PulseRank will replace PageRank over time for the real-time Web. The reasons are clear. PageRank is based on the number of links to a page or a specific URL builds over time, as people link to pages. It provides the searcher with the "authoritative answer --
These are some bold statements and yet another in a long line of self professed Google Killers but is there credence? Were going to look at the world of real time search and see if there really is anything to be looking at (as SEOs)
care to come along?
What is real time search?
Ok, for starters I often see the commonly known real time search engines are simply seeking out social mentions not really crawling the web and indexing it. This, to me, is the first part of the problem
are they really search engines? Or simply buzz monitoring tools?
Real-time search is one of the more difficult areas for search engineers to deal with. Much of the problem lies in establishing the most authoritative answers without being bogged down by spam. Dealing with web spam requires a great many signals that are hard to come by in 'real time'. At the end of the day if 'real time' search was an effective approach Google and others would (likely) be doing it already.
There are more than a few problems associated with real time search including;
- Spam this would be the most problematic area for any search engine. It would be nearly impossible to combat web spam in a large scale environment. This is why most of the major engines havent moved beyond almost real time (such as Googles query deserves freshness approach.
- Ranking as with the QDF, there needs to be some type of evaluation of quality and authority to make ranking of documents effective. Some of the above engines use domain and (social) user authority, but this approach does kill some of the democratic nature of the web and the rich get richer. What about lesser known domains and new users/content?
- Social dependencies most of these real time engines are reliant on social signals which really does limit the actual abilities as a search engine. They are NOT indexing pages that arent getting social luvin. And despite popular belief, not ALL content on the web is social worthy and this makes such search engines limited in scope.
If we consider the above, there is a lot of work to be done if such real time search approaches are to ever be of value. But
Rubber meets the road
In the testing we ran so far there is no real sense to the ranking mechanisms. In looking at some of the more general queries (in this case; PageRank) it seems that merely being the most recent citation is all that matters. Sadly, the post itself is devoid of any real content relating to PageRank and thus shouldnt really be ranking. But it did
see the problem here?
One Riot - This is the application with the vaunted PulseRank which claims to be a superior method from Googles PageRank. Our early testing showed it to be generally inferior to Collecta, but in this test it did OK comparatively.
Test time; 49 Minutes - this was only AFTER it was Tweeted by someone that had my blog RSS hooked up to a Twitter account. What is interesting is this was the first one to list the actual blog post, not the Tweets
Scoopler - This one also tends to rely on Twitter, but it does pull out the most popular links which does list the actual blog post. This is a nice feature and of the social search engines, this one would be my fav
Test time; 48 minutes; as with the above, this is obviously not REAL TIME and really only a social mention aggregator - though the actual post links are a nice touch.
Whos Talking - For the record, my man Joe has never claimed to be a real-time search engine or even a search engine for that matter. But considering many of these apps tend to be more buzz monitoring than search/indexing, I decided to include it into the mix.
Test Time; 50 minutes and once again, the Twitter feeds are what it has picked up.
Crowd Eye - This is a nothing more than a Twitter monitoring tool and unless you have content Tweeted, it wont show up. Thus, once more, this is not really a traditional search engine and is nothing more than a limited buzz monitoring tool. And really, its nothing more than a pretty Twitter search
Test Time; 56 Minutes - and it was the Twitter account with my feed in it. Furthermore, it only picked up 1 of the 3 Tweets for the post.
Collecta - Test Time; 49 minutes - This is another one that seems to be dominated by Twitter results more than anything else, although there are some blog results as well (for the record they claim; We draw from the web at large - not just social networks.). While this may be true, it would seem much of the real time aspects relate to social mentions.
|Google Test Time; 3hrs 15 minutes -
Of the big 3 only Google managed to actually get the post indexed. And while it wasnt as fast as the social engines, Id still say that 3hrs is pretty good considering that there are more involved ranking methods and better quality of results. Obviously more popular sites that are crawled more frequently would be even faster.
|Google Blog Search Test Time; 15hrs -
Strangely, the blog search took considerably longer to index the page. Once more though, the sorting/ranking abilities make this a more usable search engine. I am going to do some testing with more popular blogs to see how short a time frame there can be with this one.
|Yahoo Test Time; 19hrs -
Not only was Yahoo slower, but it never did get the right page and was picking up the title from my side bar links not the actual page
. Have to give this one a FAIL
Bing Test Time; 23hrs -
Much like Yahoo, not only did they take a fair length of time to index, but it also got the wrong page by picking up side bar links. Also a FAIL
What is clearly obvious is that the current state of RT engines are wired for social. What that means is there is likely a great deal of content on the web that isn't getting a social mention that wouldn't get picked up by them (unlike a traditional engine such as Google). This is a serious problem and makes the case for a buzz tool over an actual search engine that crawls the web and makes decisions on levels of indexation for a given site/page. We also did some minor testing with pages on sites not getting social traction and as you may imagine, the RT engines failed miserably.
Nothing more than social regurgitation
What is really happening with the current stock of RT engines? For the most part these are nothing more than social mention regurgitation more than any type of formal crawling/indexation. That is certainly NOT what a search engine is or does
And that's what most of this so-called real time search is essentially. Not
as much about real time indexation and ranking as they are about buzz
monitoring or glorified Twitter search applications.
What ARE the big three doing?
They seem to be looking at social structures as sphere's of influence.
Google has a few patents on a system that has been dubbed 'FriendRank' and 'InfluencerRank' (though there is no wording as such in the patents) which
could hint at a social structure and ranking system. Read more here -
As is Microsoft with a related system (here)
Now, these approaches are primarily for Ad targeting and recommendation
engines, but it does seem to show more of the direction that search engines may use in concert with traditional search approaches to develop some type
of 'social search' aspect. This does seem a logical approach ultimately by integrating social signals into the existing approaches. Sure, it's more social search than real-time, but it does help solve some of the problems we outlined earlier.
Real time search is still something that hasn't been effectively conquered
from a technical and critical mass vantage point. As such, it shouldn't be a serious consideration for any SEO beyond the potential for buzz monitoring.
In many ways these are barely what we would even call a search engine traditionally. If the big players are to ever broach this realm, much work would need to be done (especially with web spam).
Verdict? Should SEOs be concerned with real time search optimization? Not at this point
It's a passing fancy and until a serious engine (that drives traffic) embarks on a real-time adventure, SEOs are best to watch from the
wings. And to those that tout Twitter search as a RT search engine, I submit that is is nothing more than a site search... not really a search engine.
Real time search off - TechCrunch
Twitters real-time spam problem - Search Engine Land
Race is on for best real-time search engine - Seattle PI
Who rules real time search? - Venture Beat
Bing Keeps Its Foot On The Gas, Adds Tweets To Results - TechCrunch