|
Page 2 of 2
Patents
Now you can call me biased (and you’d be right) but patents are always a great source of insight into how the gang at each of the big 3 may be thinking (now and historically). Bill has been a huge influence and help the last few years and patents are way my increasing fascination started to really kick into high gear. Unfortunately the list is endless and so we’ll keep it to some of the more fundamental and interesting ones worth looking into;
General IR related patents;
System and method for characterizing a web page using multiple anchor sets of web pages – Yahoo (sort of like TrustRank concepts)
Regression framework for learning ranking functions using relative preferences (machine learning) - Yahoo
System and method for determining semantically related terms using an active learning framework – Yahoo
Using link structure for suggesting related queries – Microsoft
Detecting Duplicate and near-duplicate files - Google
Searching to identify web page(s) – Microsoft
Method and system for creating improved search queries – Google
Extraction of information from documents - Microsoft
Personalized search/ behavioural signals
Systems and methods for analyzing a user's web history – Google
Systems and methods for modifying search results based on a user's history - Google
Method and apparatus for learning a probabilistic generative model for text - Google
Search system using user behaviour data - Microsoft
User sensitive (personalized) PageRank – Yahoo
Re-ranking search results based on query log – Microsoft
(covered by Bill )
User Distributed Search Results – Google
Search pogosticking benchmarks – Yahoo
Using search trails to provide enhanced search interaction - Microsoft
Personalization of web page search rankings – Microsoft
Accounting for behavioral variability in web search – Microsoft
User query data mining techniques – Yahoo
Bookmarks and Ranking - Google
Page Segmentation
Document segmentation based on visual gaps – Google
System and method for detecting a web page – Yahoo
Vision-based document segmentation – Microsoft
Retrieval of structured documents - Microsoft
Systems and methods for analyzing boilerplate - Google
Historical ranking factors
Information retrieval based on historical data - Google
Keyword usage score based on frequent impulse and frequency rate (and covered by Bill) - Microsoft
Calculating importance of documents factoring historical importance. - Microsoft
Temporal ranking of Search results - Microsoft
DOCUMENT SCORING BASED ON QUERY ANALYSIS – Google
Geo-location
System and method for providing preferred country biasing of search results – Google
System for providing geographically relevant content to a search query with local intent – Yahoo
System for determining local intent in a search query - Yahoo
Detecting a user's location, local intent and travel intent from search queries - Microsoft
Phrase Based IR and semantics
Phrase-based indexing in an information retrieval system - Google
Phrase Identification in an Information Retrieval System, - Google
Phrase-Based Generation of Document Descriptions, - Google
Phrase-Based Searching in an Information Retrieval System, - Google
Phrase-based indexing in an information retrieval system - Google
Automatic taxonomy generation in search results using phrases - Google
Diverse Topic Phrase Extraction (using LSA) – Microsoft
Synonym and similar word page search (more semantics) Microsoft
Fact Extraction / Object Level Search
Generating structured information - Google
Learning facts from semi-structured text - Google
Designating data objects for analysis - Google
Spam Detection
Detecting spam documents in a phrase based information retrieval – Google
Discovering and determining characteristics of network proxies – Yahoo
Search Ranger System and Double-Funnel Model for Search Spam Analyses and Browser Protection (cloaking) – Microsoft
Web spam page classification using query dependant data – Microsoft
Detecting web spam from changes to links of websites – Microsoft
PageRank
Method for node ranking in a linked database - Google
Method for scoring documents in a linked database - Google
Methods for ranking nodes in large directed graphs - Google -- covered on SEO by the Sea
You can find a list of more search patents from 2008 here; Google – Yahoo – Microsoft
Journals
ACM Transactions on Information Systems (TOIS):
Information Processing and Management (IP&M):
Information Retrieval:
International Journal on Digital Libraries:
Journal of the American Society of Information Science and Technology (JASIST):
SIGIR Forum:
Journal of Documentation
D-Lib Magazine
Data & Knowledge Engineering:
Information Processing Letters:
Information Research
Information Systems:
Journal of Intelligent Information Systems:
Knowledge and Information Systems:
Foundations and Trends in Information Retrieval:
Resources and tools
10 free NLP tools for the SEO – Science for SEO
Glossary (Modern Information Retrieval) - Berkely
Information retrieval research links @ Search Tools – Search Tools.com
Information Retrieval Links - BUBL
Information Retrieval Systems - LSU
Open Directory: Information Retrieval Links - ODP
Indexing Resources - UBC
IR & Neural Networks, Symbolic Learning, Genetic Algorithms
Stop list (a list of stop words) - MIT
NLP resources - Chris Manning
Text mining links - Weiguo Patrick Fan's
LSA/LSI source code & tools – Science for SEO
Blogs and websites
IR related Blogs
Stanford Infoblog
Geeking with Greg
IR Thoughts
Jeff's Search Engine Caffe
Daniel Lemire's blog
Apperceptual
Natural language processing blog
SEO Geeks
SEO by the Sea
Seo-theory.com
ScienceforSEO.com
Huomah.com (careful, I hear he's whacked)
Know of others? Please let me know as I am admitedly light here....
Other IR websites;
Information Retrieval Specialist Group
Information Retrieval (newsletters- free and paid) Springer
Fast Search white papers - Fast Search
Wiki – Association for Computational linguistics
Latent Semantic Analysis – Colorado U
That should keep you busy, oui?
And there you have it. When this journey began it was in frustration but along the ride I fell deeper into the abyss that is my fascination (near obsession?) with all things search. Having the holidays to get immersed was timely and hopefully there were some finds that you take with you.
The ultimate goal of this exploration is to merely encourage a sense of understanding, fire up the imagination and hopefully stir the passion. When those in the SEO industry speak of ‘standards’ and pontificate the future, pausing to look at the IR community may better serve us all. Should we seek to avoid being labelled link whores and hype merchants, having deeper technical skills and knowledge may lead us to that end. This is my challenge to all those in the world of search engine optimization.
/end journey….
I can’t thank those that helped with this post enough. Each lead turned me in new directions and fired up my passions again. Gone is the angst of the bickering and drama, replaced by new paths into creative bliss :0)

|
Watch out Wikipedia, with all of this info, once mastered folding up double rankings like Wiki will be like throwing back a few cold ones.
Great post and thanks for the research bro.