SEO Blog - Internet marketing news and views  

SEO Higher learning - Page 2

Written by David Harry   
Monday, 05 January 2009 07:19
Article Index
SEO Higher learning
Page 2
All Pages


Now you can call me biased (and you’d be right) but patents are always a great source of insight into how the gang at each of the big 3 may be thinking (now and historically). Bill has been a huge influence and help the last few years and patents are way my increasing fascination started to really kick into high gear. Unfortunately the list is endless and so we’ll keep it to some of the more fundamental and interesting ones worth looking into;

General IR related patents;

System and method for characterizing a web page using multiple anchor sets of web pages – Yahoo (sort of like TrustRank concepts)

Regression framework for learning ranking functions using relative preferences (machine learning) - Yahoo

System and method for determining semantically related terms using an active learning framework – Yahoo

Using link structure for suggesting related queries – Microsoft

Detecting Duplicate and near-duplicate files  - Google

Searching to identify web page(s) – Microsoft

Method and system for creating improved search queries – Google

Extraction of information from documents - Microsoft


Personalized search/ behavioural signals

Systems and methods for analyzing a user's web history – Google

Systems and methods for modifying search results based on a user's history - Google

Method and apparatus for learning a probabilistic generative model for text - Google

Search system using user behaviour data   - Microsoft

User sensitive (personalized) PageRank – Yahoo

Re-ranking search results based on query log – Microsoft (covered by Bill )

User Distributed Search Results – Google

Search pogosticking benchmarks – Yahoo

Using search trails to provide enhanced search interaction  - Microsoft

Personalization of web page search rankings – Microsoft

Accounting for behavioral variability in web search – Microsoft

User query data mining techniques – Yahoo

Bookmarks and Ranking  - Google


Page Segmentation

Document segmentation based on visual gaps – Google

System and method for detecting a web page – Yahoo

Vision-based document segmentation – Microsoft

Retrieval of structured documents - Microsoft

Systems and methods for analyzing boilerplate - Google


Historical ranking factors

Information retrieval based on historical data  - Google

Keyword usage score based on frequent impulse and frequency rate (and covered by Bill) - Microsoft

Calculating importance of documents factoring historical importance. - Microsoft

Temporal ranking of Search results  - Microsoft




System and method for providing preferred country biasing of search results – Google

System for providing geographically relevant content to a search query with local intent – Yahoo

System for determining local intent in a search query - Yahoo

Detecting a user's location, local intent and travel intent from search queries - Microsoft


Phrase Based IR and semantics

Phrase-based indexing in an information retrieval system - Google

Phrase Identification in an Information Retrieval System,  - Google

Phrase-Based Generation of Document Descriptions,  - Google

Phrase-Based Searching in an Information Retrieval System,  - Google

Phrase-based indexing in an information retrieval system - Google

Automatic taxonomy generation in search results using phrases - Google

Diverse Topic Phrase Extraction (using LSA) – Microsoft

Synonym and similar word page search (more semantics) Microsoft


Fact Extraction / Object Level Search

Generating structured information - Google

Learning facts from semi-structured text - Google

Designating data objects for analysis - Google


Spam Detection

Detecting spam documents in a phrase based information retrieval – Google

Discovering  and determining characteristics of network proxies – Yahoo

Search Ranger System and Double-Funnel Model for Search Spam Analyses and Browser Protection (cloaking) – Microsoft

Web spam page classification using query dependant data – Microsoft

Detecting web spam from changes to links of websites – Microsoft



Method for node ranking in a linked database - Google

Method for scoring documents in a linked database - Google

Methods for ranking nodes in large directed graphs - Google -- covered on SEO by the Sea


You can find a list of more search patents from 2008 here; Google YahooMicrosoft



ACM Transactions on Information Systems (TOIS):
Information Processing and Management (IP&M):
Information Retrieval:
International Journal on Digital Libraries:
Journal of the American Society of Information Science and Technology (JASIST):
SIGIR Forum:
Journal of Documentation
D-Lib Magazine
Data & Knowledge Engineering:
Information Processing Letters:
Information Research
Information Systems:
Journal of Intelligent Information Systems:
Knowledge and Information Systems:
Foundations and Trends in Information Retrieval:


Resources and tools

10 free NLP tools for the SEO – Science for SEO
Glossary (Modern Information Retrieval) - Berkely
Information retrieval research links @ Search Tools – Search
Information Retrieval Links - BUBL
Information Retrieval Systems - LSU
Open Directory: Information Retrieval Links - ODP
Indexing Resources - UBC
IR & Neural Networks, Symbolic Learning, Genetic Algorithms
Stop list (a list of stop words) - MIT
NLP resources - Chris Manning
Text mining links - Weiguo Patrick Fan's
LSA/LSI source code & tools – Science for SEO


Blogs and websites

IR related Blogs

Stanford Infoblog
Geeking with Greg
IR Thoughts
Jeff's Search Engine Caffe
Daniel Lemire's blog
Natural language processing blog

SEO Geeks

SEO by the Sea (careful, I hear he's whacked)
Know of others? Please let me know as I am admitedly light here....

Other IR websites;

Information Retrieval Specialist Group
Information Retrieval (newsletters- free and paid) Springer
Fast Search white papers - Fast Search
Wiki – Association for Computational linguistics
Latent Semantic Analysis – Colorado U


That should keep you busy, oui?

And there you have it. When this journey began it was in frustration but along the ride I fell deeper into the abyss that is my fascination (near obsession?) with all things search. Having the holidays to get immersed was timely and hopefully there were some finds that you take with you.

The ultimate goal of this exploration is to merely encourage a sense of understanding, fire up the imagination and hopefully stir the passion. When those in the SEO industry speak of ‘standards’ and pontificate the future, pausing to look at the IR community may better serve us all. Should we seek to avoid being labelled link whores and hype merchants, having deeper technical skills and knowledge may lead us to that end. This is my challenge to all those in the world of search engine optimization.

/end journey….


I can’t thank those that helped with this post enough. Each lead turned me in new directions and fired up my passions again. Gone is the angst of the bickering and drama, replaced by new paths into creative bliss :0)

A little help rom my friends


Search the Site

SEO Training

Tools of the Trade


On Twitter

Follow me on Twitter

Site Designed by Verve Developments.