A ranting we will go, A ranting we will go
silly stuff SEOs seem not to know
a ranting we shall go
over the last while the spectre of LSI (latent semantic indexing) and SEO has raised its ugly head once more
It began with a piece my good friend Virginia wrote called; SEO strategy for semantic search - (a good post) which was based, in part on a post from SEO Phill (and comments on the BC blog). Later on, another couple of pals (Andy Beard and CJ) were musing on Twittter about it
All of this had me wondering how the state of affairs with the LSI-Google train were of late
a look at the Twitter-Hose shows that this topic is alive and well
For the record, weve already covered the LSI madness here on the Trail back in 07 with; Stay off the LSI bandwagon - but it cant hurt to look at it again, oui? You see, it is endlessly frustrating to see those peddling so - called LSI Programs, which are nothing short of pure friggen garbage (dont even get me started with referential integrity sigh)
and so, a ranting we must go.
In researching this point I was (somewhat) shocked at the number of even (cough cough) A-List SEO bloggers that have written about LSI over the years
did anyone do their homework? Or just regurgitate what another expert had surmised? Dunno
What many SEOs dont (want to?) understand about Google and LSI
The question remains, why do we still keep hearing about the magic bullet that is LSI? Id have to imagine much of it is ignorance with a side helping of snake-oil.
Google bought/developed technology that meant their computers could make intelligent decisions on whether a piece of content was good or not. This technology is called Latent Semantic Indexing (LSI). - (from some SEO snake oil site)
I found the above gem in something a mate sent me
they cant even decide if it was bought/developed never mind how it played into Googles evolution.
You see the whole thing started when Google purchased Applied Semantics back in 2003 strangely, for their ad matching technology NOT for an IR approach neccesarily. Google hoped it would, (
) make online advertising more useful to users, publishers, and advertisers alike.
They spoke of their interest in, Applied Semantics' AdSense product that enables web publishers to understand the key themes on web pages to deliver highly relevant and targeted advertisements.
Did you catch that? Some odd program called AdSense hmmmm
sound familiar? This (purchase of Applied Semantics) is by no means evidence of Googles use of LSI/A in the regular index, but t'was the beginning of the bullshit that ensued
Moreover, Google also picked up Anna Patterson and her phrase based IR methods around the same time (also a semantic analysis approach) but no one in the SEO world picked up on that (guess it didnt have a catchy acronym hehe
. jackasses). It is this type of subjective blindness that leaves me unsure if I should laugh, scream or cry
Yes, Google is VERY interested in semantic analysis, most search engines are
but the whole limited LSI view, is actually more suited to PPC peeps not the SEOs necessarily. It was (originally) for AdSense after-all. None the less
would be SEO snake oil is still rampant
Getting past LSI
Let us get rid of the whole LSI thang ok? Lets work it back to the parent group; LSA, (latent semantic analysis). Now, we certainly cant dismiss this approach as most search engines do employ various types of semantic analysis
this part I have no problem with
But is it really simple LSA? Id also doubt that.
I am more inclined to think along the lines of PLSA (as well as LDA and HTMM) as the engineers over there did seem to have a fancy for it in 07; as noted in this Google Research post on HTMM (Hidden Topic Markov Models). You see, these are more evolved versions of semantic analysis. Most in the IR world Ive talked to about this agree that simple LSA approaches are limited and arent likely being employed at Google.
Which all begs the question; are SEO types really just ignorant schmucks that seek to glorify themselves or cash in on this? Why no talk (in SEO circles, IR peeps do fine) about these other technologies? It doesn't take a rocket-scientist (maybe a comp scientist) to venture a guess it is simply folks trying scoon a few $$$$... or straight up ignorance...
Expanded Snippets do not prove LSI
Another area Id seen mentioned by those stating Google is using LSI is the recent updates to expanded snippets. That part, while possibly using some form of semantic analysis, is part of the Orion algorithm from Ori Allon (more on that here).
Of course, when Google picked up Ori, they also purchased the related patents and those were removed from the AU patent database, so what types of approaches are involved, Im not entirely sure. It may be a semantic analysis approach, it may not be. But it is NOT using LSI
Great, glad were past that also
Noticing a connect here?
I hear you asking, Ok Mr. Smart Ass, what is Google doing? I have no friggen definitive answer, ok ya mook?!?
It is likely that a variety of different signals/approaches are being used to understand concepts via various semantic analysis, (and NLP) methods. I am not that keen on using the oft used catch phrase LSA/I as it tends to cloud SEOs abilities to think laterally and subjectively.
Understanding how search engines are using semantic analysis (and other methods) to define concepts is something not well discussed in the industry, which should be. A large share of queries each day are ambiguous and new, (according to Google), they are constantly toying with signals such as semantic analysis (which works hand in hand with query analysis) to better understand what the user is looking for and the context therein.
At the end of the trail, Applied Semantics was (primarily) an ad matching technology using LSI; we cant assume it was added to the reg search processes (remember, the phrase based approach, for the reg index, was the same year). Its time for some perspective lest we get another spat of SEOs peddling Google LSI Compliant services once more (circa 2005-06)
and making us all look like jackasses mkay?
You have been warned; the ignorant, short sighted or snake oil peddlers professing LSI and Google, will get the smack from this author!! unless yer talking about AdSense
We shall return to regular programming shortly; thanks for your time on this - see LSI snake oil? You know where to send 'em :0)
Phrase based indexing and retrieval methods
Incorporating data based upon user query sessions
Probabilistic latent semantic analysis
Latent Dirichlet allocation
Hidden Topic Markov Models
And of course if youre looking to learn more about NLP and other semantic analysis approaches try these blogs;
Science for SEO
Natural Language Processing Blog
The Lousy Linguist
*Note; It should be understood that I do believe in understanding various semantic approaches to building content that Google will eat up
it is the fascination with the term LSI that irks me (and resulting systems sigh)