SEO Blog - Internet marketing news and views  

Phrase Based Bomb Buster

Written by David Harry   
Thursday, 01 February 2007 22:59
Did Phrase Based Indexing and Retrieval stop GoogleBombs?

So, for some time now I have been a ‘relevance’ junkie as far as the future of organic search technologies go. Early last year it was Semantics ( LSA/I and their ilk) and it soon migrated to Phrase Based Indexing and Retrieval, ( which I lovingly began to call PaIR)

I stopped by a fellow warriors SEO Blog and noticed he was relating a suspicion that ‘phrase based’ technologies could be at play. ‘Somebody at Webmaster-Talk said that Google was going to use ‘phrase-checking’ to see if a site is being google-bombed

newsurf.gifWhile going through some G Patents the other day I noticed an interesting couple of descriptions that sounded a lot like they were pulled from the pages from the Death of the GoogleBomb. I will allow U fine folks to make your own decisions… so here they are;

Phrase identification in an information retrieval system ---

"[0152] This approach has the benefit of entirely preventing certain types of manipulations of web pages (a class of documents) in order to skew the results of a search. Search engines that use a ranking algorithm that relies on the number of links that point to a given document in order to rank that document can be "bombed" by artificially creating a large number of pages with a given anchor text which then point to a desired page. As a result, when a search query using the anchor text is entered, the desired page is typically returned, even if in fact this page has little or nothing to do with the anchor text. Importing the related bit vector from a target document URL1 into the phrase A related phrase bit vector for document URL0 eliminates the reliance of the search system on just the relationship of phrase A in URL0 pointing to URL1 as an indicator of significance or URL1 to the anchor text phrase.

[0153] Each phrase in the index 150 is also given a phrase number, based on its frequency of occurrence in the corpus. The more common the phrase, the lower phrase number it receivesorder in the index. The indexing system 110 then sorts 506 all of the posting lists in the index 150 in declining order according to the number of documents listedphrase number of in each posting list, so that the most frequently occurring phrases are listed first. The phrase number can then be used to look up a particular phrase. "

Call me a whacked out conspiracy theorist – but I think we have something here. Is it outright evidence that Google has migrated to a PaIR based model? Of course not. I would surmise that it is simply another layer that has been laid over the existing system and the last major infrastructure update (dreaded BigDaddy) facilitated it…. But that’s just me

At very least it has peaked my curiosity even further on all things PaIR related…. More soon….

Meanwhile - keep busy with these;

PaIR Spam Detection; Bill Slawski
Phrase Based Optimization - Me yammerin
Phrase Based Indexing and retrieval - another meme


Search the Site

SEO Training

Tools of the Trade


On Twitter

Follow me on Twitter

Site Designed by Verve Developments.