A ranting we will go, A ranting we will go
silly stuff SEOs seem not to know
a ranting we shall go
Hiya Kids
over the last while the spectre of LSI (latent semantic indexing) and SEO has raised its ugly head once more
It began with a piece my good friend Virginia wrote called; SEO strategy for semantic search - (a good post) which was based, in part on a post from SEO Phill (and comments on the BC blog). Later on, another couple of pals (Andy Beard and CJ) were musing on Twittter about it
All of this had me wondering how the state of affairs with the LSI-Google train were of late
a look at the Twitter-Hose shows that this topic is alive and well

For the record, weve already covered the LSI madness here on the Trail back in 07 with; Stay off the LSI bandwagon - but it cant hurt to look at it again, oui? You see, it is endlessly frustrating to see those peddling so - called LSI Programs, which are nothing short of pure friggen garbage (dont even get me started with referential integrity sigh)
and so, a ranting we must go.
In researching this point I was (somewhat) shocked at the number of even (cough cough) A-List SEO bloggers that have written about LSI over the years
did anyone do their homework? Or just regurgitate what another expert had surmised? Dunno
odd...
What many SEOs dont (want to?) understand about Google and LSI
The question remains, why do we still keep hearing about the magic bullet that is LSI? Id have to imagine much of it is ignorance with a side helping of snake-oil.
Google bought/developed technology that meant their computers could make intelligent decisions on whether a piece of content was good or not. This technology is called Latent Semantic Indexing (LSI). - (from some SEO snake oil site)
I found the above gem in something a mate sent me
they cant even decide if it was bought/developed never mind how it played into Googles evolution.
You see the whole thing started when Google purchased Applied Semantics back in 2003 strangely, for their ad matching technology NOT for an IR approach neccesarily. Google hoped it would, (
) make online advertising more useful to users, publishers, and advertisers alike.
They spoke of their interest in, Applied Semantics' AdSense product that enables web publishers to understand the key themes on web pages to deliver highly relevant and targeted advertisements.
Did you catch that? Some odd program called AdSense hmmmm
sound familiar? This (purchase of Applied Semantics) is by no means evidence of Googles use of LSI/A in the regular index, but t'was the beginning of the bullshit that ensued
grumble mumble
Moreover, Google also picked up Anna Patterson and her phrase based IR methods around the same time (also a semantic analysis approach) but no one in the SEO world picked up on that (guess it didnt have a catchy acronym hehe
. jackasses). It is this type of subjective blindness that leaves me unsure if I should laugh, scream or cry
Yes, Google is VERY interested in semantic analysis, most search engines are
but the whole limited LSI view, is actually more suited to PPC peeps not the SEOs necessarily. It was (originally) for AdSense after-all. None the less
would be SEO snake oil is still rampant

Getting past LSI
Let us get rid of the whole LSI thang ok? Lets work it back to the parent group; LSA, (latent semantic analysis). Now, we certainly cant dismiss this approach as most search engines do employ various types of semantic analysis
this part I have no problem with
But is it really simple LSA? Id also doubt that.
I am more inclined to think along the lines of PLSA (as well as LDA and HTMM) as the engineers over there did seem to have a fancy for it in 07; as noted in this Google Research post on HTMM (Hidden Topic Markov Models). You see, these are more evolved versions of semantic analysis. Most in the IR world Ive talked to about this agree that simple LSA approaches are limited and arent likely being employed at Google.

Which all begs the question; are SEO types really just ignorant schmucks that seek to glorify themselves or cash in on this? Why no talk (in SEO circles, IR peeps do fine) about these other technologies? It doesn't take a rocket-scientist (maybe a comp scientist) to venture a guess it is simply folks trying scoon a few $$$$... or straight up ignorance...
Expanded Snippets do not prove LSI
Another area Id seen mentioned by those stating Google is using LSI is the recent updates to expanded snippets. That part, while possibly using some form of semantic analysis, is part of the Orion algorithm from Ori Allon (more on that here).
Of course, when Google picked up Ori, they also purchased the related patents and those were removed from the AU patent database, so what types of approaches are involved, Im not entirely sure. It may be a semantic analysis approach, it may not be. But it is NOT using LSI
ok?
Great, glad were past that also
whew
Noticing a connect here?
I hear you asking, Ok Mr. Smart Ass, what is Google doing? I have no friggen definitive answer, ok ya mook?!?
It is likely that a variety of different signals/approaches are being used to understand concepts via various semantic analysis, (and NLP) methods. I am not that keen on using the oft used catch phrase LSA/I as it tends to cloud SEOs abilities to think laterally and subjectively.
Understanding how search engines are using semantic analysis (and other methods) to define concepts is something not well discussed in the industry, which should be. A large share of queries each day are ambiguous and new, (according to Google), they are constantly toying with signals such as semantic analysis (which works hand in hand with query analysis) to better understand what the user is looking for and the context therein.
At the end of the trail, Applied Semantics was (primarily) an ad matching technology using LSI; we cant assume it was added to the reg search processes (remember, the phrase based approach, for the reg index, was the same year). Its time for some perspective lest we get another spat of SEOs peddling Google LSI Compliant services once more (circa 2005-06)
and making us all look like jackasses mkay?
You have been warned; the ignorant, short sighted or snake oil peddlers professing LSI and Google, will get the smack from this author!! unless yer talking about AdSense
We shall return to regular programming shortly; thanks for your time on this - see LSI snake oil? You know where to send 'em :0)
More reading;
Phrase based indexing and retrieval methods
Incorporating data based upon user query sessions
Probabilistic latent semantic analysis
Latent Dirichlet allocation
Hidden Topic Markov Models
And of course if youre looking to learn more about NLP and other semantic analysis approaches try these blogs;
Science for SEO
Cog Blog
Thought Process
Natural Language Processing Blog
Ontology News
The Lousy Linguist
*Note; It should be understood that I do believe in understanding various semantic approaches to building content that Google will eat up
it is the fascination with the term LSI that irks me (and resulting systems sigh)
|
Comments
Wonderful post Dave, and it all needed saying!
It seems like we need to say this every flippin' year as the LSI/Google crap still is rampant out there. I hadn't really checked the state of affairs in a while, but after I did... the ranting came naturally (grrrrrrrrrrr)... we need to form a posse on this shit...
:woohoo:
As for the snake-oil and confusion in the SEO world on this one, LSI, it is my hope that we can get past such limited concepts, leave IR to the IR peeps, and work with what we do know... Will it happen? I doubt it, this has been going on since 03... no signs of letting up.
:side:
I hope the post was worth the wait, it's always cathartic to 'go a ranting' and getting these things of me little 'ol chest... sigh...should sleep well this evening!
http://www.guardian.co.uk/media-academy/search-engine-opimisation
Latent Semantic Indexing
Latent Semantic Indexing
• An overview of LSI
• How to write copy for LSI search engines
Running an SEO campaign
• Structured approaches to running SEO activity
• Everything you need to know about landing pages, and how to test them
Suhweet... nice to see it's not just the schmucks peddling it... hehe...
It's interesting that he was on about the post Leslie did as that was the 'referential integrity' that I ever so briefly touched on in this post :unsure:
Thanks for taking the time tho.... Dr. G was always close in my mind when writing this... I've only gone on about it twice - he's on a mission...
I think one reason for the continued use of the term LSI is that people don't understand the subject matter at all - it's a terrifically detailed and difficult area - one that's been around for a long time (I believe one of the first attempted uses was in library categorisation systems).
It's true, that given the index Google has (and this is a supposition based on their Backrub paper at Stanford) that LSI is POSSIBLE without changing much. What's often not noted is that it would be so horrifically slow and unwieldy as to vastly limit its use.
Semantics yes, LSI no.
I didn't know that this myth is still around. And so widespread. Bandwagon picking up speed?
LSI - Latent Semantic Indexing - just sounds so fancy, there is "latent" and "semantic" and "indexing". oh wow.
Abusing a fancy vocabulary is the sure-fire way to writing fantastic bullshit. It might even impress some customers: Google LSI Compliant? aha. That isn't even creative - it seems that the other guys are all selling the same stuff. Maybe the latest Must-Have-SEO-Hooha since keyword density and XML-sitemaps for well-linked 10-page-sites.
Abusing vocabularies, like in this case terminology from information retrieval, has a long tradition in esoteric writing and postmodern philosophy. Dazzling the readers with bullshit - could work with SEO as well. I'm waiting for Quantum SEOlogy.
p.s. I really liked the Assholes Inc.
p.s. II: sorry for posting this again, it seems that half of my comment got filtered - single quotes in BBCode are not allowed?
You've hit the nail on the head though, I would be ever so happy if people talked about semantic analysis instead of wrapping programs up in the term LSI - that's where the snake oil comes in...sigh...
@Infinity, np, shall nuke the dupes. It really is sad and I've been doing research/buzz monitoring this week and it is certainly alive and well (hadn't looked around for some in a few years). I do really like "Quantum SEOlogy" - hehe... that's a classic. We should write a post and see how many peeps fall for it... heehee
Thanks on Assholes Inc... have been meaning to follow that up with a site.... maybe soon..
http://www.aimclearblog.com/2009/06/03/is-whats-good-for-google-good-for-seo/
Nice Post
Informative One
Thanks for great stuff
Great post as always. Your in depth analysis of various algorithms sheds quite a bit of light on a controversial subject that nobody has absolute answers on.
One thing I'd like to point out regardless of what search professionals are calling it there's now some features of Google that definitely give evidence of latent indexing or at the very least attention to it (put what ever euphemism you'd like on it).
For example, on the Google SERP we're now seeing synonyms highlighted especially in lower competitive words. I believe over time as their database grows stronger with content so will their web intelligence to see the paralells with more relative, pertinent keywords/content highlighted much like we're currently seeing in newer personalized search.
My other confidence that there is some shape or form of latent indexing is the Google Webmaster Tools interface:
"Below are the most common keywords Google found when crawling your site. These should reflect the subject matter of your site."
It is for the above two defined reasons that I believe it to be important to at the very least insure that content is relevant/on topic with a specific amount of analytics. Bruce Clay's book seems to indicate that a ratio equal to or slightly less than competitors sites for specific keywords as doing more than that could lead to a drop in ranking.
I can certainly tell you I don't go around talking about LSI/LSA to clients because relevancy, and expert keywords for the clients far out ranks sitting on a page measuring how many times a word was mentioned. One has to consider the ROI of the time spent on well written content vs the loss of visitors for content that's congested/repetitive. There are far more important KPI to pay attention to, and for the most part these aren't even followed. It would seem that for now at least the beast that is Google is a very literal machine expecting content in an almost thesis like structure, and slowly adopting to human dynamics.
But now that I am reading your article, I agree with what you're shouting. I love LSI, from an artistic perspective, something like "isn't that cool how language is constructed?" perspective instead of "let's kill all competitors with our insanely awesome LSI methods!" one.
And LSI is short-sighted to say the least. Human "robots" running shouting about how important LSI is are missing a critical element in the entire game.
Brand. Message.
How about just a good f**king article worth reading, for Pete's sake. Which, BTW, also makes articles go - by getting them spread around by real, live humans. Sustainability through human popularity.
Thanks for putting it all in perspective, David.
So. Um, if you want to answer that, like, awesome.
RSS feed for comments to this post