Modeling Google's Florida Update and Universal Search
(the following is a guest post from Terry Van Horne)
A number of papers have come out of the SIGR conference and for me one of the most interesting is the paper from the Microsoft research team using Conditional Random Field (CRF) and contextual analysis (pdf) to determine query classification. The paper discusses some interesting uses for click data and query "labeling". I found this paper especially intriguing in that as I read through it I kept thinking I was reading a Google paper about Universal Search.
Once I do an overview of the paper I'll write a bit about what I believe is similar and in use on Google. I will be basing my conjecture... er... thesis in the belief that the Florida update did add a taxonomy to the algo for the purposes of classifying queries. Further to that the current Universal Search Algo is just the next step in refining the query classifications and that it was done in a similar fashion to what the Microsoft paper describes. These are observations as indicated by the current blended/Universal Search SERPs.
I'm going to first go through the highlights in the paper and try to give you the elevator version. I'm not as interested in formulas or IR geek stuff because, well, it doesn't really help me do what I do. The thing to keep in mind about papers are they are research they indicate problems that are trying to be solved and the methods that can be used to solve them. As a webmaster I want to be part of that solution by aiding the search engines when I'm building a site. Note aiding not manipulating... there's a difference.
The Microsoft team chose some interesting features of CRF and there is no use in me explaining when they summed it up pretty well...
To answer these questions, we propose to use the Conditional Random Field model (CRF for short) to help incorporate the search context information. We have several motivations for using this model. First, CRF is a sequential learning model which is particularly suitable for capturing the context information of queries. Second, the CRF model does not need any prior knowledge for the type of conditional distribution.
Finally, compared with Hidden Markov Models, the CRF model is more °exible to incorporate richer features, such as the knowledge of an externalWeb directory.
...the advantages of using CRF
"When we use the CRF to model a search context, one of the most important parts is to choose the effective feature functions. In this section, we introduce the features used for building a CRF model of the search context for QC. In general, the features can be divided into two categories. The features that do not rely on the context information are called local features, and those that are dependent on context information are called contextual features."
Three local features are query, pseudo feedback and implicit feedback. I'll try and describe the problem that the research is trying to solve in layman's terms.
Query is the starting point for refining the association between query term and classification. The reasons for the query needing the contextual element was summed up well in the paper:
The available training data are usually with limited size and could not cover a sufficient set of query terms that are useful for reflecting the association between queries and category labels.
Long story short, there are not enough query terms associated with category labels to be useful so it has to be refined by adding Psuedo and Implicit feedback.
Pseudofeedback is basically using a taxonomy from the web, for example, ODP or Yahoo! where the directory category results are mapped to a category in a target taxonomy and a Confidence score calculated. I will admit the Confidence score took me a bit to understand the math so here is an excerpt that IMO, explains the confidence score pretty well:
Finally, we calculate a general label confidence score: GConf(ct,qt)= Mct,qt/M
where Mct,qt means the number of returned related search results of qt whose category labels are ct after mapping. Intuitively, the GConf score reflects the confidence that qt is labeled as ct gained from pseudo feedback; the larger the score, the higher the confidence.
In a nutshell, a confidence score is attached to the query term based on how many search results matched the category term from the outside/target taxonomy. IMO, this does not have to be a web directory in fact for some queries an offline taxonomy might be a better choice. For example occupational/professional categories used by human resource organizations.
Implicit (user) Feedback:
An interesting aspect of this research was the use of logs from the web directory to refine the confidence score with user data. Logs reduced the chance of spam and does make this useful for a "compiled confidence score". Where user data like this gets tricky is some people will inflate clicks when it becomes common knowledge among marketers.
Direct Hit tried to use click data many years back. Lets say there were/are lots of rumors as to who may have been gaming it with scripting. That said for the purposes of smoothing the category terms this seems a reasonable solution. For those with the low forehead, this stuff is basically compiled so clicking on links in ODP is not going to affect your Google rankings! There I just saved some mod in a forum a few hrs. of time.
One debate within the SEO community is the use of bounce rate data in rankings. Taking the scenario above where "historical click data" is used to smooth the category terms. I can see where especially "seasonal" bounce rates could appear to be affecting a group of similar sites rankings, I would say it myth to say it is individual site bounce rates. IMO, in the scenario outlined above you would see similar sites drop at the same time as they would be in the same category term and affected equally.
IMO, Google universal search has this characteristic. To do that in the way it is talked about in the community is well... very expensive in computing terms. Just an opinion and not worth debate.
The third element of the Local features is to build a Vector Space Model. Here's a quote from the paper on it's purpose. This is extremely boring stuff but obviously interesting to any IR professional:
(...) we build a Vector Space Model (VSM)  for each category from its document collection and make the cosine similarity between the term vector of ct and the term vector of ut as CConf(ct, ut). The snippets of the web pages are used for generating the term vectors....If a user does not click on any URL for qt, or qt is the current query to be classified, this score cannot be calculated.
In layman terms they do some fancy math and build a model that has 1 flaw. If the user didn't click any sites in the Web results or the query term matches the category label then the final confidence score cannot be calculated.
The last part of this experiment is to implement the contextual features. Below are quotes I felt best summed up the contextual features.
Since there is no existing approach for query classification that takes into account the context information, we design a naive context-aware approach as the second baseline to further evaluate the modeling power of CRF in this problem.
Yada... Yada... no one has used the context of the search to smooth the query classification.
To use the context information, we consider some features that can reflect the association between adjacent category labels.
To reduce the bias of training data, besides considering the feature of direct association between adjacent labels, we also consider the structure of the taxonomy. Intuitively, the association between two sibling categories is stronger than that of two non-sibling categories.
...the bridging classifier introduced by Shen et al in . The idea of this approach is training a classifier on an intermediate taxonomy and then bridging the queries and the target taxonomy in the online step of QC. Experiments in  show this approach outperforms the wining approach in KDD Cup'05.
Lets take the query classifications we compiled and compare them to the search query and refine the query classifcation with each new search query. To some degree real time personalization.
The Microsoft team implemented a method that was surprising to me because they used 3 "human labelers" to test if search context affected the ability to classify queries.
A query's final label is voted by the three labelers. Since each query is associated with context information (except for the beginning queries of sessions) and real user clicks which can help determine the meaning or intent of the query, the consistency among the labelers is quite high. For more than 90% queries, the three labelers give the same labels. This is very different from the general query classification problem.
Outcome:This stuff works!
One of the benefits to having some interest in the real "science of SEO" or IR is not only that it fills the time I'd spend watching the idiot box, it gives me an insight into what I should be doing as an SEO. If Universal search isn't screaming "I like video, pictures, news, local locations and product feeds" then there's a seat waiting for you at the "home for washed up SEOs".
In fact text relevancy has less value then at any time I remember. The number of blue slots on the page for text/link manipulators is limited and has decreased by over 40% since I semi-retired in 2004. That in turn means links are less valuable and real marketing is where the game is being played. Mike Grehan has been alluding to this for over a year.
Product/shopping, News/blogs/articles, Local Search/Maps, video and pictures are a lot more about managing content development and marketing strategy than manipulating Search engine algos and chasing links. IMO, if you gotta' chase/beg links... you likely don't deserve them. My bias is out there for you all to know and chide me, so... let the fun begin.
So, first let me warn you this is pure conjecture I'm not an engineer at Google at the very most if I'm right it was part luck and part watching Google and SE SERPs for many years. The paper I attempted to walk you through is IMO, a model of the Florida update and possibly how Universal/blended search evolved.
Evolution of Universal Search
To some degree there are strong ties between the ODP (owned by AOL) and Google. It has long been discussed in some circles that the Florida update added a taxonomy or topics/classification to queries on Google. I saw it in anomalies which are, when you find them, like having a red flag on the results. I knew Google was updating the index whenever these showed up in one of my "legacy SERPs".
A legacy is one I have watched, forever, I know every site in the SERP and when new ones show up they get closely reviewed because that is always a clue as to what is working content wise or the "in vogue spam method" and to be truthful sometimes it's a fine line between them.
Query classifcation or QC is an interesting concept in that to do it algorythmically is difficult. Kind of like hidden text seems easy enough to stop but... at what cost? Too many false positives.
Query classification makes it so much harder to obtain top positions just by link text because all the keywords on the page are mapped to a classification and a classification to a query... hence, it becomes pretty tough for pages to place that don't have the term on the page. Sure it happens but are they terms that matter? It also takes a massive amount of links to do it. Feasible yes, plausible... sure if you got shit for brains and are bored stiff!
The Human Touch
The human labeling in the Microsoft research was one peiece of the experiment that I didn't see the point in. Then I realized it was to smooth the web/target taxonomy query classifications.
The methodology in that was also interesting in that labelers would act like a person engaged in searching. The research used old Excite data. If memory serves me right Excite used to have data available for research on their site.
Google was on a hiring frenzy for a while and there were questions about what they were doing... labeling? Was Google employing labelers? The other day I sat in on a seminar with my old amigo Marshall D Simmonds, Greg Boser and Todd Malicoat. They came to sort of the same conclusion as myself. 1000's of people were QC labeling Google SERPs. Where the Google method differs is I think they also wanted a variety of media that can't really be ranked algorithmically. So Labelers also were slotting media.
Refining the QC provided by the web taxonomy is what IMO, Google Universal/Blended/Personalized Search is all about. When I look at the Google SERP, there's not ten links, there are ten slots, some times more. As a marketer and SEO I now have to consider media type and brand message when I am considering a marketing strategy for an audience, keyword term or promotion I want to position in the results. SEO is as much about PR and branding as it is links and onpage optimizing. That will be reflected in the "Socialization of SERPs" by social media.
Query type classification for web document retrieval
A unified and discriminative model for query refinement
Hourly analysis of a very large topically categorized web query log
Automatic Web Query Classification Using Labeled and Unlabeled Training Data
Improving Automatic Query Classification via Semi-supervised Learning
Context-Aware Query Suggestion by Mining Click-Through and Session Data
Towards Context-Aware Search by Learning A Very Large Variable Length Hidden Markov Model from Search Logs
About the author; Terry is an old school SEO geek that works out of International Website Builders and the founder of SEOPros.org - You can also hook up with him on Twitter. I'd like to REALLY send a huge thanks out to Terry for not only producing a search geek worthy post, but also for being a smart and open minded guy. We didn't meet under the best of circumstances but our love of search marketing pulled us together in the end - I look forward to many years of great chats together!