Microsofts take on user behavioural data
A search related patent released by Microsoft the other day touches on a popular theme of late, (with me at least); user behaviour analysis. In simplest terms, they look at various interactions with the search results (SERPs) and listing pages, to try and determine the relevance of a set of results. This has been a common theme among the Big 3 as noted by these recent posts;
Google confirms using query analysis use of past queries in the regular index
Google on User Performance Metrics trio of patents on the subject
Yahoo Personalized PageRank user behaviour and PageRank
Yahoo on Personalized Networks user annotations used to populate networks
..you get the idea...
While concepts utilizing user behaviour relating to AdServing have been around for while, there is increasing interest in ways to harness it within the main SERPs. That is beyond mere personal search implications though they work best there.
The Microsoft approach
The patent; Search system using user behaviour data - Filed; December 2003 Published; April 22 2008
As with other approaches weve looked at, both explicit and implicit data can be collected. Meaning, data can be gathered from direct interactions (such as annotations, book marking, voting etc..) or implied data, culled from behavioural activities.
One interesting aspect in this patent that I dont recall seeing before is trying to account for the time one might appear to be taking on a result that may not be accurate. Such as when one is on a web page, but actually are chatting on IM, writing in a word Doc etc
as noted with;
the implicit feedback data comprises time spent reviewing a specific item of the results list, wherein the time spent is calculated by subtracting any time that a user switched to another application while reviewing the specific item
Some examples given of potential data to track are;
- navigation to a new page using a hyperlink;
- navigation to a new page using a history list;
- navigation to a new page using an address bar;
- navigation to a new page using a favorites list;
- user scrolling behavior;
- user document printing behavior;
- adding a document to said favorites list;
- switching focus to a different application;
- switching focus back from a different application;
- and closing a window.
Some of the usual suspects with some interesting additions such as the switching to different applications and the browser history list selections.
The explicit data is described as a feedback system that could collect;
- a user rating of the quality or usefulness of the specific item reviewed from the results list and
- the user response to the at least one question concerning the results list as a whole
- and performing a context-dependent evaluation of the results of the search engine acquired during the search session
This type of survey element is interesting as in the past explicit data is often a simpler proposal such as ratings and direct actions not as much a formal approach as stated in this filing. They could just be covering the bases though. One potential collection method suggested includes a dialogue box to solicit the information;
regarding a search, a query, or a specific result, the user may be asked, via a dialog box, "Did this answer your question?" and allowed to enter a response. As another example, regarding a specific result which the user ignored, the user may be asked "Why didn't you try this result?" and given choices including "I didn't think this would answer my question."
Another method allows for a bulleted set of radio buttons in the side column such as;
Would you say that: evaluating a result;
- This result answered your question item
- This result somewhat answered your question
- This result did not answer your question
- You did not get a chance to evaluate this result (broken link, foreign language, etc.)
While an interesting undertaking I personally put more stock in implicit signals and have to wonder if some of this methodology is aimed at specific focus groups for quality control than the larger regular index users. The concepts seem bloated and adoption would like not be such to gain strong ranking signals.
The search for relevance goes on
As with most behavioural signals the goal is to deliver more relevant results. Through analysis and cross referencing of explicit and implicit data a probabilistic model can be developed to establish the usefulness/relevance, (contextually) of the query response for a given search.
While there are only a few differentiators here from other user behavioural methods weve seen before, the addition of the application focus and explicit collection methods are interesting. There is no real discussion of how the data can be used beyond creating a relevance model for improving future search quality.
Considering this was filed in 2003 and there are no real ranking applications associated, the real value here is more about cross referencing these ideas with others in the space.
But is it a Social Behaviour?
With all this talk of social search of late (thanks Jason C.. keep it up) I cant help but wonder why so many in the SEO community fail to fully grasp how search engines are seeking out passive behavioural data for users and groups of user types. For me this is a type of algorithmic social signal in many ways. At very least an interesting topic worth exploring further
One thing is for sure, were being watched ;0)