2 ½ Years of insight and what we can learn
I thought it would be interesting to have a look at the patents that the folks over at Google have had awarded over he last few years. Can we see a pattern? Does it speak to the future in any way? We shall see.
Recently I wrote about how Google might get beyond links. As an interesting follow up, I thought it would be useful to see just where they may be headed. It would be ever so interesting to do that same for the last 5-6 years, but that was something I really didn't have the time to dig into. Maybe some day.
As with all things, we don't want to get too excited by the adventure. Patents are merely concepts filed and protected over the years. Some are older, others more recent, I just wanted to do it as it would be interesting. I am sharing it, because well, it is always fun to think out loud.
The raw numbers
I started actually keeping a list of search patents, (of interest) awarded to the Big 3 (so sad to see Yahoo go) back in 2008. We will look from that point on to now. For starters, here's the breakdown;
Behavioural – 15
Systemic – 13
Geo local – 14
Semantic/NLP – 13
Query Analysis – 8
Universal – 8
Duplicate content – 5
News – 3
Spam – 3
Page Segmentation – 3
Temporal – 2
Links - 2
Ranking methods – 2
Named Entities – 2
Recommendation – 2
Social – 1
Deep web – 1
Video – 1
Semantic Markup – 1
Trust - 1
Now when we look at it from this perspective it probably isn't all that surprising to those that have been following the Trail over the years. If there are two places I am often on about it is behavioural data and semantic analysis (such as the phrase based series of patents). We also know that Google has had a keen interest in geo-local the last few years, so that isn't surprising either.
As for the systemic stuff, I doubt we'd need to look further than the recent Caffeien update and their 'need for speed' to see the interest in the systemic elements of improving the search engine's performance.
The active areas
Now we can look at a few of the more active areas/elements that seem to stand out for me;
Systemic – I don't want to get too much into this one as there are a wide variety of patents I have lumped together in this category. The main take-away for me is the constant need for better infrastructure and systems beyond what considers traditional elements. The 'need for speed' is now a hallmark of Google. The need for better processing should be obvious.
Behavioural – the main areas here is what are known as explicit (when the user takes an action consciously) and implicit (gleaned through user actions) user feedback. This has been an important area of the IR world for some time now. The problem lies in that explicit feedback is hard to get and implicit is often noisy. None-the-less, Google, as with the rest of the IR world, has shown great interest here for many years now. As such this is not surprising.
Geographic – this one is not unsurprising either really. If we consider that personalization (inclusive in behavioural) has been the mantra at Google for many years now, this is a no brainer. In many ways the geo-localized aspects are just that, a personalization.
Semantic analysis – this one is another area I wasn't taken off guard with either. Much like the behavioural areas, search engines are constantly striving to better understand web objects better in an effort to provide more relevant results. This holds true in the academic world as well as the land of patents. As we get more and more powerful tools (processing, database structures etc..) the abilities to better understand concepts and themes increases. It was once said that Google understands the context of objects and text at a 6th grader level. I dare say they've moved up to an 8th grade level of late. Who knows, maybe they'll make it to high school some day soon.
Query analysis – is another area that does seem to be a given in the IR world. Understanding, through query logs and click data, how users reformulate queries, show click bias and more, is an area that can be noisy at times, but very effective in producing quality results. While I've often considered this the domain of Yahoo, Google does seem to be using this type of data more and more
Universal – this one is actually larger than it would seem because I have broken out the 'News', 'Video' and 'Geo' related patents from this section. If we add those in, this one would actually stand at 12 patents. It likely doesn't take a rocket scientist (or computer scientist for that matter) to see the massive changes in many query spaces over recent years. Vertical search is most certainly one area that has and will continue to, grow in the years to come.
The lower levels
Next I wanted to touch on some of the areas seemingly not getting as much attention over the last few years.
Links – considering that most people related links to rankings, it might seem strange that there are not more prominent. There is a reason for that. Many of the other sections we've looked at (behavioural, geo-local, semantic analysis etc..) do actually have parts of the offerings relating to links. I had only put LINK ONLY patents in this section. Have no fear, links are still getting their fair share of attention.
Spam – this one was also curious. While, as with links, there are elements of spam detection in the other categories, it isn't as prevalent. If I really had to guess at this one I would say there wasn't the need to actually patent/document spam detection at Google or they may be concerned about folks like us patent hounds spreading the word on their methods thus defeating the purpose. That seems unlikely. I'd more consider that they just didn't feel the need to patent the processes.
Trust – it was strange for me that there wasn't more out there on trust related elements. Sure, as with many in the lower end, there are some elements in the other areas, one would think there would be more. I know Yahoo has more than a few on Trust/Harmonic Rank approaches, just not from Google. This is odd because I have felt they are putting more of a premium on trust signals over the last few years.
Social – another one that I thought seemed strange. That being said there is the whole 'social profiling' patents (actually PPC geared) and social graph API, just not a lot of straight search related offerings. This one I know for a fact (as with trust) is another huge area of interest. It is, to me, one of the cornerstones of the personalization efforts that have been ongoing the last few years. Thus I wouldn't read too much into the lack of specific patents in this area.
What does it all mean?
And so in the end analysis, what does it all mean? As stated off the top I'd not read too much into this. Yes, it does paint somewhat of a picture where Google has been interested in/focused on, but that is likely more as a supportive measure to our own instinctual feelings. After all, these are patent awards and some of them more recent while others were filed some 5+ years ago.
For me I look at this in a cross-referenced additive to what I have been seeing over the last few years. I believe it really isn't that far off as behavioural (and personalization), geo-local, social and deeper semantic analysis is a large part of what Google has been up to.
This was simply an exercise into some possible correlations and a history lesson. What is important is that we, as search optimizers, always have some insight into where we've been, and where we're headed. A great SEO practitioner should be able to construct programs that stand the test of time. Journeys such as this one can help us do that.
I hope you enjoyed the ride.
Get your geek on!