Putting behavioural metrics in perspective
Wise men don't judge: they seek to understand. - (Wei Wu Wei)
So here’s the question; are behavioural metrics being used in modern search? You do remember them right? Those warm and fuzzy little signals such as bounce rates that there all the rage in late 2008 in the search engine optimization world? Sure you do… but let’s take one last look.
Although bounce rates received the biggest attention, we would be remiss not to start by quickly listing some signals commonly looked at by information retrieval folks. The two elements include implicit and explicit data (actions and interactions) – examples can include;
- Query history (search history)
- SERP interaction (revisions, selections and bounce rates)
- User document behaviour (time on page/site, scrolling behaviour);
- Surfing habits (frequency and time of day)
- Interactions with advertising
- Demographic and geographic
- Data from different application (application focus – IM, email, reader);
- and closing a window.
- Adding to favourites
- Voting (a la Search Wiki or toolbar)
- Printing of page
- Emailing a page to a friend (from site)
Now that we’re past that let’s get a little geeky so those information retrievers don’t shake their heads to hard at us – the terminology. I am as guilty as the next Gypsy of flinging the term ‘behavioural metrics’ about over the last year or so, even performance metrics. If you want to research this more, start by using the term; implicit/explicit user feedback signals – because that’s what we’re talking about.
and thanks to Steve Gerencser for sending the pic
Follow the bouncing timeline
While you can trace the timeline back in the search (blogging/reporting) world many years, it really came home when Search Engine Watch mentioned it (Oct.2008) followed a few months later in a Search Engine Land post. Being the venerable publications that they are, grumblings around the SEO world soon followed. If you go and do some buzz monitoring and searching (which I have), much of the talk began after that. The cracks in the damn began to fissure and this Gypsy was left without enough chewing gum.
So what can we do? Where does one start to truly look for answers as to the potential of such methods being implemented by top public access search engines? It would stand to reason that we begin looking at the information retrieval world itself. Over the last month I have given the benefit of the doubt to the community and gone deeper to find some type of more definitive answer, (list of research papers at the end).
Inherent problems with implicit signals
One thing that became obvious real fast is that the IR world is still not entirely sure of the value for implicit feedback signals as far as how to infer engagement and satisfaction. While there are a long list of problematic areas let’s consider;
- You save the link for later and continue my search (in Doc let's say)
- You found what u needed on the page and went looking for more information
- You walk away from my browser and leave the window on a page for an hour
- Multiple users in your home during a given session
- Open a listing in a new window (when further tracking is unavailable)
- You found the information in a SERP snippet and selected nothing
- You were unsatisfied with the page selected and dug 3 pages deeper (unsatisfied, not engaged)
- Queries from automated tools (like a rank checker) which adds noise to overall data
- SERP bias – do peeps simply click the top x results regardless of relevance?
- Different users having different understanding of the relevance of a document (result)
…and on and on. Think about it, some situations can tell the search engine you’re pleased with the results and other times such signals mean nothing. You see, the essential motive is to attempt to assign an emotional evaluation of engagement with the search results. Unfortunately there are too many noisy elements which make this a very difficult task to do effectively.
Noise and confused
It’s widely felt that ‘implicit feedback is more difficult to interpret and potentially noisy’ as noted in - Evaluating the Accuracy of Implicit Feedback from Clicks and Query Reformulations in Web Search (partially funded via grant from Google) – in looking at click behaviour there was indeed a clicking bias based on a few elements;
“….First, we show that there is a “trust bias” which leads to more clicks on links ranked highly by Google, even if those abstracts are less relevant than other abstracts the user viewed.
Second, there is a “quality-of-context bias”: the users’ clicking decision is not only influenced by the relevance of the clicked link, but also by the overall quality of the
other abstracts in the ranking.”
Other research (on click data) looked at how users actually interact with search results as far as bias is concerned. People are often consistent in clicking patterns (clicking top result, second, third) regardless of the underlying data. This means the entire data set can be skewed as not clicking on the 8th result may no necessarily be a vote against the link in the result, but more of an ingrained habit on the part of the searcher.
“Our results show that click behaviour does not vary systematically with the quality of search results. However, click behaviour does vary significantly between individual users, and between search topics. This suggests that using direct click behaviour—click rank and click frequency—to infer the quality of the underlying search system is problematic.”
“Analysis of our user click data further showed that the action of clicking is not strongly correlated with relevance —only 52% of clicks in a search result list led to a document that the user actually found to be relevant. Attempts to use clicks as an implicit indication of relevance should therefore be treated with caution.” From - Using Clicks as Implicit Judgments: Expectations Versus Observations
Beyond that many of the papers have various elements of implicit user feedback that they felt warranted more study. In short, there is no consensus in the IR community about the validity of these signals – they’re not ready for prime time.
The Spam connection
And this my friends, as they say, is the proverbial fly in the ointment. While there is a ton of research and even patents on behavioural metrics, dealing with click-spam has not been addressed in any detail to this point. Many of the papers openly admit they are light in the spam detection area and more research is needed.
“A natural question that arises in this setting is the tolerance of this method to noise in the training data, particularly should users click in malicious ways. While we used noisy real-world data, we plan to explicitly study the effect of noise, words with two meanings, and click-spam on ourapproach.” From - Query Chains: Learning to Rank from Implicit Feedback
And that’s just one; it was a common theme among the papers on the topic. This for me goes a long way into understanding that it is premature to suggest search engines that we optimize for are using such signals. There is hope as some tests, as ran by Microsoft, concluded;
“ranking accuracy decreases indeed when more documents are spammed, but the decrease is within a small range. When only a small number of documents are spammed per query, ranking accuracy is only slightly affected even if a large number of queries are spammed.” From- Are click-through data adequate for learning web search rankings?
They felt that such a large percentage of queries are long tail queries that it would be more difficult to effectively disrupt the majority of query spaces (I hear Ralph rumbling some where with that one). But once more, there seems to be a lot more work to be done in this area to effectively combat spam in such a system. To this we add thoughts from a Cornell paper;
“… it might also be possible to explore mechanisms that make the algorithm robust against “spamming”. It is currently not clear in how far a single user could maliciously influence the ranking function by repeatedly clicking on particular links.” From - Optimizing Search Engines using Click through Data – Cornell (pdf)
For me, there simply isn’t enough research or hard data to suggest that the spam issues related to implicit user feedback and click data have been solved. This is a crucial element to the case of them being used today by Google or anyone else.
Not enough, then also try this recent post by your friend and mine, CJ, on Clickstream spam detection or Fantomaster’s Behavioral Metrics and the Birth of SEO Surfbot Nets – let us get to then now shall we?
Getting beyond the geeky; looking to the future
Are we getting somewhere yet? Great… but it’s not all doom and gloom, no need to call the corner just yet. You see, for the most part researchers have been finding some great improvements in search performance; they simply haven’t worked out all the values of such signals nor the spam concerns. In an enterprise environment, where manipulation/spam is far less likely, implicit feedback can be a more useful tool. It is the larger public access environment where spam is far more prevalent that the nut has yet to be cracked.
I stand on my original assertion that this type of approach is best served in a personalized environment. This would be huge in dealing with the apparent issues surrounding spam related issues as it is kinda’ hard to spam ones self you see. This makes personalized a likely candidate for user feedback signals. Either way, it simply hasn’t been solved yet
So what are we left with?? Some noisy signals that are spammable… hmmm… where have we heard that before?
And so now I leave all of this in your capable hands my weary web warriors. If you can go through the research papers listed below (or elsewhere) and find me strong evidence of how they deal with noise reduction and click-spam, then we can discuss it further. That is my challenge to you; because from what is out there, it is not yet viable in a large scale environment.
I submit to you, my enthusiastic optimizers, that bounce rates and it’s implicit feedback brethren are simply not likely to be in Google’s (nor any major search engine's) current ranking schemes. It is a novelty item at best with potential in a personalized environment.
Care to dispute this? I am more than happy to review any research to the contrary.
Want to know what I think is causing us to see what we believe this to be? You’re just going to have to wait until next week.
“Muddy water, let stand becomes clear.” - Lao Tzu
Research looked at for this post;
Using Clicks as Implicit Judgments: Expectations Versus Observations - RMIT
Improving Web Search Ranking by Incorporating User Behaviour Information - Microsoft
Evaluating the Accuracy of Implicit Feedback from Clicks and Query Reformulations in Web Search - Cornell (funded in part via grant from Google)
Learning user interaction models for predicting web search result preferences - Microsoft
Improving rankings in small-scale web search using click-implied descriptions - ICT Centre
Identifying “Best Bet” Web Search Results by Mining Past User Behavior - Microsoft
Using Clickthrough Data to Improve Web Search Rankings - Algorithms for Data Base Systems Seminar,
Query Chains: Learning to Rank from Implicit Feedback - Cornell
Are click-through data adequate for learning web search rankings? - Nankai University,
Automated Evaluation of Search Engine Performance via Implicit User Feedback - Pennsylvania State University
A Comparison of Evaluation Measures Given How Users Perform on Search Tasks – RMIT University
Modelling A User Population
for Designing Information Retrieval Metrics (dec 2008) - Microsoft
Bayesian Adaptive User Profiling with Explicit & Implicit Feedback - USC
Users’ Effectiveness and Satisfaction for Image Retrieval – University of Sheffield
Accurately Interpreting Clickthrough Data as Implicit Feedback - Cornell
Active Exploration for Learning Rankings from Clickthrough Data - Cornell
Web Search Engine Evaluation Using Clickthrough Data and a User Model - Yahoo
Identifying Web Spam with User Behavior Analysis - Tsinghua University
Are click-through data adequate for learning web search rankings? - Microsoft
Optimizing Search Engines using Click through Data – Cornell (pdf)
Be sure to visit - Fourth International Workshop on Adversarial Information Retrieval on the Web (2008)
and the 2007 stuff
Not enough for you? Here are some videos to pass the time away…
Implicit feedback learning in semantic and collaborative information retrieval systems - Gérard Dupont, EADS, EADS
This presentation try to provide an overview of one way to resolve those gaps: using feedback learning. The aim is to make the system learning on user behaviour in order to better define its current needs. Machine learning algorithms applied on signal coming from user while performing a search can lead to the understanding of what is really relevant to the users and then can be exploited to help him during its tasks.
User models from implicit feedback for proactive information retrieval - Samuel Kaski, University of Helsinki
Our prototype application is information retrieval, where the feedback signal is measured from eye movements or user’s behavior. Relevance of a read text is extracted from the feedback signal with models learned from a collected data set. Since it is hard to define relevance in general, we have constructed an experimental setting where relevance is known a priori.
Proactive Information Retrieval by User Modeling from Eye Tracking - author: Jarkko Salojärvi, Helsinki University of Technology
I now leave this to you.... where do you stand? Are search engines using behavioral metrics?