I wanted to talk about something that has been bugging me lately
SEO folks that are quoting or selling LSI related advice or services. Its starting to really get on my nerves.
For the uninitiated, (lucky peeps) it stands for Latent Semantic Indexing which was worked on by Applied Semantics from Latent Semantic Analysis concepts. Then waaaaayyyyy back in April of 2003 AS was acquired by Google. For the most part, it would seem the main usage, at the time, was geared towards the AdWords AdSense aspects.
From the press release at the time;
Applied Semantics is a proven innovator in semantic text processing and online advertising," said Sergey Brin, Google's co-founder and president of Technology. "This acquisition will enable Google to create new technologies that make online advertising more useful to users, publishers, and advertisers alike.
Enter the Bandwagon
Somewhere along the line the sooth seers and prognosticators of SEO far and wide began to relate LSI with the organic SERPs to the point those would claim it could superseded the traditional back link heavy model (PageRank). Lately I see the term used loosely by SEO types and find it on websites espousing how they can use it to make your site rank better.
Now, I have no problem with folks making pages that are more relevant, instead of whacking the keywords they target all over the freakin place. That makes the web better as far as I am concerned. The problem I am having is the lack of general understanding. There is no direct evidence that LSI ever ended up in whole or in part, in the organic indexing and retrieval system. Actually, Michael Duz makes a good case for the LSI Myth.
So why would anybody claim that Google or any other search engine was using LSI? Two possible reasons, simple ignorance or as Dr. E. Garcia (information retrieval researcher) puts it snake oil marketers, SEO firms and individuals who find some commercial value in pretending they have an understanding of LSI. Here are some typical quotes right off their web pages:
So what IS Google doing?
Who cares? We are not going to know one way or the other definitively, so lets not get worked up over it. There are many other examples of methods search engineers are looking at to find relevance. Such as;
Phrase based indexing and retrieval methods
Incorporating data based upon user query sessions
Probabilistic latent semantic analysis
Latent Dirichlet allocation
Hidden Topic Markov Models
just to name a few
So lets give it a break
In the end it is learning about various models and looking for common elements that, for me at least, is the best way to look at things as far as your studies into search engines is concerned. It is a multi-layered system that could have any number of methods at work. Without knowing exactly which are being used and what thresholds and weighting is given to each, we cannot run around stating things such as LSI in Google being a certainty. I certainly dont feel people should be selling services based upon it, nor blogging about how to use LSI to improve ones rankings.
The next time some one tells you about the secret sauce that is LSI, shake your head gently, pat them on the head and walk away
the believers can be a touchy bunch.
Latent Semantic Indexing Resources;
LSI Explained - Meaning-based advertising and document relevance determination - Meaning-based information organization and retrieval - Dr. E. Garcia - SVD and LSI Tutorial 5: LSI Keyword Research and Co-Occurrence Theory - LSI Implementation and Scale-ability issues