SEO Blog - Internet marketing news and views  

SEO Magic Bullet: 2010 Edition

Written by David Harry   
Wednesday, 05 May 2010 12:48

Link Assistant Scavenger HuntPhrase based WTF?

Often one finds themselves looking in the rear view mirror at topics that just won’t go away. One such shadow for me is a set of patents produced in whole, or in part, by Anna Patterson, (former Googler, was to Cuil for ‘em); the now (in)famous Phrase Based IR offerings from Google. I’ve lost count how many times I’ve discussed them/written about them over the years. It seems they just won’t seem to go away (we’ll get back to that shortly)

The other topic the keeps coming back? Well, that’s what is best known as; the SEO Magic Bullet

There are those that seem to believe in the magic bullet. Then, there are some sane people that passed on the tooth fairly long ago. For the purpose of today’s discussion, we’re going to go back to the 2007; The Magic Bullet - A chat with Bill Slawski

the SEO Magic Bullet - Bill Slawski

An email conversation with grand master Slawski, that turned into a post. The jist of the gig was that we shouldn’t treat patents/papers as gospel. Absorb them. Here’s some wisdom from that post;


The main benefit from looking at patents isn't necessarily seeing the methods that they describe, but rather being able to view the assumptions and the mindsets that they uncover.  We can be so absorbed in looking at things from the perspective of marketers, and make up our own folklore and mythology (sandbox, anyone?) that having this other perspective can be really helpful.”

Patent filings and white papers from search engineers don't necessarily provide a magic bullet, but they do provide the chance to look at information that comes directly from people working in search.  To ignore those documents means not taking advantage of publicly available information that gives us a glimpse what those search engines find valuable enough to protect as intellectual property.

There are trade secrets that will likely never be disclosed in patent applications.  And, the descriptions of processes in patent filings are only examples, and illustrations, that describe enough to protect the intellectual property behind the documents, while not disclosing enough so that they can be easily reverse engineered.” - Bill Slawski


Ok? Get the idea here or what? While it is a great exercise, learning about search engines, some perspective is required. Remember, this ain’t rocket science, it’s computer science.


Patent Pending

One thing that we need to remember is that when a patent hits the streets, that’s simply the award date. We can have a patent awarded today that was submitted back in 2004. Does that mean the search engine was waiting around and WHAM… started implementing it today? …erm… of course not. It has been a patent pending status.

This means it’s quite likely it was at least in some semblance of beta when the patent was written, implemented, morphed and evolved in the years that passed. On the other hand, it may never have been used, or used and abandoned as well. But, either way, they weren’t waiting around to start implementing the technology/methods.

Which all brings me back to my first redundant shadow; Phrase Based IR.

 

...Meh...

Was all one o’ me mates had to say when confronted with a recent spate of misconceptions I came across. One of the first technologies that captured my fire and forever geekified me, it was phrase based IR. Monsieur Slawski introduced us and it was love at first site.

What’s odd though, is it gets mentioned/rediscovered from time to time and people start to spark up as explanations for the oddest things. Witness;


I'm wondering if Google has made a change in their phrase-based indexing approach - something that the new Caffeine infrastructure makes feasible. Recently there has been more patent activity in that area.” – Google MAYDAY update - Ted via WMW


Hmmmm. Well, the patent in question was filed more than 3 years ago. We also know they had an interest way back in 2004. Obviously bringing the author, Anna, into the fold meant there was great interest. We can also note that in the later one, there were multiple authors (Anna was on the way to Cuil street?)

Would caffeine help? Sure, if as advertised, it is an infrastructure update. But that could be said for a lot of things (Open HTMM? PLSA? See? I can guess too...sigh). That’s not the point. What happens next is we see;


 “I'm still trying to get a handle on some of the odd fluctuations in site metrics attributed to what are undoubtedly bits and pieces of the Caffeine implementation. If you follow Google's patent activity, there's been some interesting recent activity in the area of phrase-based indexing.” – Dave Cosper via SEG


Awww…. Crap. See? This is how it happens. We’ve been down the LSI trail a time or two as well dontcha know. And who can forget the bounce rate fun? This is mis-reported and entirely improvable at the end of the day. But it does get around. But ok, a few posts, although in authority locales, but it’s not that bad…I mean, it’s not like there’s wide spread insanity over the phrase based stuff, right?


Phrase Based Information Retrieval

Dammit! Dammit! Dammit!DAMMIT!!!! Here we go again...


Slow down the ride! I wanna get off!

This, my weary web wanderer, is where the need to understand the magic bullet theory comes into play. These patents and papers are nothing more than insight. Even if we knew Google used it. Even if we knew there were know other signals. We’d still be lost as we don’t know the weights/thresholds/dampeners in place.

But alas, there are far more signals that we can’t account for in the mix. This makes isolation of any one signal next to impossible for mere mortals. Let us not do the chicken (little) dance, running about stating what Caffiene (an infrastructure thang big daddy) is being driven by. Nor blame the poor algo for wrecking Tom, Dick and Harry’s rankings. It’s grasping at straws. Ok? Thanks. I hope we don't have to do this again (however unlikely)

Oh, along the way, I also discovered what was surely the catalyst – that Bill guy again. Whaddya know. Well, at least we know that he doesn’t believe in magic bullets. Do you?

Until next time… play safe!


More reading

Here are some other PaIR posts not mentioned here for those interested in learning more;

Blog Posts;

 

Related Patents;

Phrase Identification in an Information Retrieval System,
Filed on Jul. 26, 2004;
Assigned; Jan 26 2006

Phrase-Based Generation of Document Descriptions,
Filed on Jul. 26, 2004;
Assigned; Jan 26 2006

Phrase-Based Searching in an Information Retrieval System,
Filed on Jul. 26, 2004;
Assigned; Feb 09 2006

Automatic Taxonomy Generation in Search Results Using Phrases,
Filed on Jul. 26, 2004;
Assigned; Sept 16 2008

Phrase-based indexing in an information retrieval system
Filed on Jul. 26, 2004;

Phrase-based personalization of searches in an information retrieval system
Filed July 26 2004
Assigned; August 25 2009


Don't be a lonely search geek!

SEO training community


 

Comments  

 
+1 # Bill Slawski 2010-05-05 14:44
Hi Dave,

Happy to see the return of the magic bullet.

I agree that it's important to keep in mind that patent filings are a good way to get a glimpse into the mindsets of people at the search engines, but might not provide an accurate and actual view at what is going on with the engines themselves.

Another of those first generation phrase-based indexing patents was granted this week, on duplicate content detection in a phrase-based indexing system.

The second generation patents from Google on phrase-based indexing are pretty interesting because they focus upon how such a system could be technically implemented in a very large file system/server system. They also provide some insights into how that system could work that the first generation didn't include.

Regardless of whether or not Google is using such a phrase-based indexing system, the patent's description of the environment in which that system works, including the inverted index, the term (or phrase) posting lists, etc., provide an interesting look at the architecture of a search engine.

When I write about patents, I try to present what the patent may mean as a possibility rather than a certainty, and hope that it provokes more questions than conclusions, more ideas for testing than proof that a search engine is doing one thing or another.

For instance, in my post that describes phrasification, I hope that at least some of its readers started asking themselves, what if Google were using this process now. What would that mean to the way I do keyword research? What implications does it have regarding the keywords tools I might be using?

What does it mean for the way I attempt to optimize pages for certain phrases, or what I decide to use in anchor text pointing to pages? What can I do to test that might make a difference? The questions that patents raise are more important than the answers.
Reply | Reply with quote | Quote
 
 
0 # Dave 2010-05-05 15:23
Agreed. Most certainly over the years I have always kept one eye on developing themes/concepts through ought the SEO process, much do to Anna's work. But I have also looked at other semantic analysis approaches as well. And that's what I've generally done, learn how search engines tick. Nailing down the specific is daunting at very best given the many layers/signals.

There are plenty of actionable tidbits that we can glean from IR, just not the specifics, which is what too many people seem to try and do.

As for the more recent additions, it was interesting that the last one had multiple authors and many of the early ones had just Anna named. Was that why she left? Or were they passing the torch? There's some fun stuff in there just looking at the evolution of them all. Who says IR watching isn't exciting? =D

Anyway, good to see ya as always, let me know when life has subsided we can get together for a chat. This post seemed warranted as a few too many discussions in the sphere, had to try and muzzle it a bit.

Happy Trails dude!
Reply | Reply with quote | Quote
 
 
+1 # Bill Slawski 2010-05-07 08:52
There's temptation for drawing conclusions and definites about search engines upon what we learn from patents and papers and experience. But we know so little about the things that we don't know that making those conclusions and stating them as absolutes really is folly.

But they do unveil things we can take action upon from time-to-time that can have positive effects.

One of the most recent of phrase-based indexing patents granted last month (but initally filed 3 years ago), Index updating using segment swapping, didn't even have Anna Patterson's name on it.

Given its focus, I guess that isn't a surprise since it covers optimizing query processing, taking the initial ideas and implementing them, or as you stated, passing the torch. It does make things interesting.

Would love to talk sometime. Thanks.
Reply | Reply with quote | Quote
 
 
0 # Dave 2010-05-07 08:53
See? Here we go again. Isn't this how the first post got started? =D

Quote:
"stating them as absolutes really is folly."


I think that's the reason I felt like revisiting this once more (and the other times I've 'gone off') is that when the more popular publications start to state such things, that's where the fire starts. There really does need to be some perspective on it. Some form of balance.

As for Anna not being on that one, have a look again - www.seobythesea.com/?p=3689

:eyebrow:

She's tucked away in it, thus I found it interesting as up til then most had been her alone. A gang on that one.... precursor to her leaving? oooo... the drama! (patents can be fun)

Anyway, deffo agree on the fact that learning IR can be immeasurably valuable to SEOs, we simply can't state things as fact. That's the part that bothers me... sigh...

I supposed we'll be back to this next year...and the year after... and after...and...
Reply | Reply with quote | Quote
 
 
0 # Bill Slawski 2010-05-10 20:29
Hi Dave,

It's exactly how the first post was started, and I have seen a wide amount of speculation attributed to the Phrase-Based indexing patents that take it much further than they probably should.

Anna Patterson was in on the Phrasification patent, but she's not listed on the other Phrase-Based indexing patent granted a couple of weeks later:

Index updating using segment swapping

The latest patents are definitely related, but she isn't listed as an inventor on this one
Reply | Reply with quote | Quote
 

Add comment


Security code
Refresh

Search the Site

SEO Training

Tools of the Trade

Banner
Banner
Banner

On Twitter

Follow me on Twitter

Site Designed by Verve Developments.