Shhhhhh be very quiet, we’re hunting information retrievers
Let's bounce off in a new direction, begin anew as it where and look at what resides in the craniums that index the world’s information. I would like to introduce to you a concept, (much like page segmentation),that isn’t a new one. It has been in front you all this time, but like a black hatter at dollar domain bazzar, you were to busy to notice.
As the regular Trail riders would know, we’ve gone from extreme interest in behavioural metrics and personalized search to more tempered views of potential usage. Yes, it’s true, I seem to have a bit of a personality disorder and waffling seems the call of the day…. to the uninitiated that is.
You see, it is the regular index, where implicit feedback signals seem more difficult to grasp than tumbleweed in a tornado. Time and time again this wayward web wanderer has mused that it is far more likely these signals could find value within personalized search than out in the wild.
And what do we know from our counterparts from Google? We know they’ve called them noisy and spammable; which some research done here gives credence to the claim. We also know the mantra for 09 (and recent years past) has been ‘personalization’. OK, this makes some sense and maybe worth delving into deeper? But where do we look?
Time travelling SEO style
Let’s take a journey back in time… waaaay back, in tech years at least, to 2003. At the time the little engine that could purchased a company named Kaltix, (interestingly a few months after the Applied Semantics purchase – of LSI fame).
At the time it was but a 3 month old operation put together by a few Stanford geeks that Larry Page (of PageRank fame lol) noted as, “working on a number of compelling search technologies, and Google is the ideal vehicle for the continued development of these advancements” – of particular interest was that they were “developing personalized and context-sensitive search technologies”. (Google’s press release).
Now, you see, Kaltix was headed up by a uber-smart fellow named Sepandar Kamvar (Sep). By no small coincidence Sep is now Google’s go-to guy (technical lead) in the personalized search department and works with iGoogle home pages as well. That all makes this Gypsy curious indeed… here’s a snippet from their Stanford PageRank Project ;
“Ideally, each user should be able to define his own notion of importance for each individual query. While in principle a personalized version of the PageRank algorithm can achieve this task, its naive implementation requires computing resources far beyond the realm of feasibility. In the past couple of years, we have developed algorithms and techniques towards the goal of scalable, online personalized web search. Our focus is on the efficient computation of personalized variants of PageRank.” - Standford PageRank Project
Enter Personalized PageRank
With this in hand we now look a little deeper and sure enough Sep had worked on more than a few papers including this one on his personal site; An Analytical Comparison of Approaches to Personalizing PageRank
This is an interesting read that looks into the problems associated with a pure Personalized PageRank as far as trying to calculate a more granular flavour of PR. As you can imagine calculating a personalized PageRank over the billions of people on the web is a massive and resource heavy endeavour that simply isn’t feasible in large scale implementations such as Google. Thus toying with the give and take of quality to functionality is the call of the day it would seem.
While I am sure things have evolved, let’s look at some of the approaches they talk about to help deliver acceptable quality while not over burdening the resource pool.
- Topic Sensitive PageRank; while not a direct personalization, it can be use to adapt rankings based on query topics and context. It would be calculated ahead of time and adapted at the time of a search using context elements of a given query.
- Modular PageRank: this aspect essentially restricts the random walk to more authoritative/trusted documents. So in a round-about way, not so random a journey ultimately.
Going beyond the work at Stanford, he also has worked on a patent a while back on personalizing anchor text scores in a search engine – (filed May 2004 and assigned August 2007) – which you can find well covered by Bill as well as my old chum Aaron (aka the MadHat).
This patent deals with a variety of link analysis factors including using user profiles, (User information database) and personalization (Page importance ranking). Dependant on the query and related user profile information, various documents in the results set can be given a boost (or demoted accordingly). Other factors may include past searches and selections from the users search history. It also discusses more weight being given based on anchor texts (link analysis), which weren’t really dealt with in traditional PageRank.
And Sep has also worked on the following patents;
Methods for ranking nodes in large directed graphs – filed August 2003 assigned May 2007
Adaptive computation of ranking – filed August 2004 and assigned April 2006
Query boosting based on classification – filed Nov. 2004 and assigned Oct. 2008
It’s not just a Google thing
While this is an interesting trail to follow, it isn’t actually limited to Google (although Google do seem to take the lead in personalization). The folks at Yahoo have also looked at personalized PageRank as noted in this patent; User Sensitive PageRank (filed June 2006 and assigned Jan.2008). The authors of that one also worked on Anchor-based Proximity Measures (PDF) – 2007; which discusses both Personalized PageRank (PPR) and Harmonic Rank (HR - I prefer HarmonyRank). This particular flavour also utilizes behavioural data into the mix;
“The present invention relates to techniques for computing authority of documents on the World Wide Web and, in particular, to techniques for taking user behaviour into account when computing PageRank.”
As with the Google offerings demographic (user profiles), behavioural and location data can be used in calculating a more granular user sensitive PagRank. They also discuss blocks as we noted earlier with some of Sep’s research and patent filing endeavours. The Yahoo papers even include a temporal factor (somewhat like Google’s – query deserves freshness). As with the Google approach, they also discuss anchor text scoring as added layers of relevance.
Why am I telling you this?
You see my wayward SEO web wanderers, given all the interest in behavioural metrics and Google’s pimping of personalized search, this is definitely a path worth travelling. As I have mentioned many times, implicit user feedback makes far more sense in a personalized setting and so we must begin to look in this direction. Let the journey begin…
Considering what we know about the fascination with personalization and behavioural data, we might be best served by understanding more about how search engines are going about such systems. This is an important area of interest for SEOs in 2009 and a far better use of pixels than debating the use of singular implicit aspects such as bounce rates – I’m just sayin’
Stay tuned as next time out we’ll look at interesting tidbits relating to personalized search, how it works and potential data collection points..
To start your own journey also read;
Personalization gurus Sep Kamvar and Marissa Mayer – SearchEngineLand
Interview with Sep Kamvar – StoneTemple