|
Page 2 of 2
Patents
Now you can call me biased (and you’d be right) but patents are always a great source of insight into how the gang at each of the big 3 may be thinking (now and historically). Bill has been a huge influence and help the last few years and patents are way my increasing fascination started to really kick into high gear. Unfortunately the list is endless and so we’ll keep it to some of the more fundamental and interesting ones worth looking into;
General IR related patents;
System and method for characterizing a web page using multiple anchor sets of web pages – Yahoo (sort of like TrustRank concepts)
Regression framework for learning ranking functions using relative preferences (machine learning) - Yahoo
System and method for determining semantically related terms using an active learning framework – Yahoo
Using link structure for suggesting related queries – Microsoft
Detecting Duplicate and near-duplicate files - Google
Searching to identify web page(s) – Microsoft
Method and system for creating improved search queries – Google
Extraction of information from documents - Microsoft
Personalized search/ behavioural signals
Systems and methods for analyzing a user's web history – Google
Systems and methods for modifying search results based on a user's history - Google
Method and apparatus for learning a probabilistic generative model for text - Google
Search system using user behaviour data - Microsoft
User sensitive (personalized) PageRank – Yahoo
Re-ranking search results based on query log – Microsoft
(covered by Bill )
User Distributed Search Results – Google
Search pogosticking benchmarks – Yahoo
Using search trails to provide enhanced search interaction - Microsoft
Personalization of web page search rankings – Microsoft
Accounting for behavioral variability in web search – Microsoft
User query data mining techniques – Yahoo
Bookmarks and Ranking - Google
Page Segmentation
Document segmentation based on visual gaps – Google
System and method for detecting a web page – Yahoo
Vision-based document segmentation – Microsoft
Retrieval of structured documents - Microsoft
Systems and methods for analyzing boilerplate - Google
Historical ranking factors
Information retrieval based on historical data - Google
Keyword usage score based on frequent impulse and frequency rate (and covered by Bill) - Microsoft
Calculating importance of documents factoring historical importance. - Microsoft
Temporal ranking of Search results - Microsoft
DOCUMENT SCORING BASED ON QUERY ANALYSIS – Google
Geo-location
System and method for providing preferred country biasing of search results – Google
System for providing geographically relevant content to a search query with local intent – Yahoo
System for determining local intent in a search query - Yahoo
Detecting a user's location, local intent and travel intent from search queries - Microsoft
Phrase Based IR and semantics
Phrase-based indexing in an information retrieval system - Google
Phrase Identification in an Information Retrieval System, - Google
Phrase-Based Generation of Document Descriptions, - Google
Phrase-Based Searching in an Information Retrieval System, - Google
Phrase-based indexing in an information retrieval system - Google
Automatic taxonomy generation in search results using phrases - Google
Diverse Topic Phrase Extraction (using LSA) – Microsoft
Synonym and similar word page search (more semantics) Microsoft
Fact Extraction / Object Level Search
Generating structured information - Google
Learning facts from semi-structured text - Google
Designating data objects for analysis - Google
Spam Detection
Detecting spam documents in a phrase based information retrieval – Google
Discovering and determining characteristics of network proxies – Yahoo
Search Ranger System and Double-Funnel Model for Search Spam Analyses and Browser Protection (cloaking) – Microsoft
Web spam page classification using query dependant data – Microsoft
Detecting web spam from changes to links of websites – Microsoft
PageRank
Method for node ranking in a linked database - Google
Method for scoring documents in a linked database - Google
Methods for ranking nodes in large directed graphs - Google -- covered on SEO by the Sea
You can find a list of more search patents from 2008 here; Google – Yahoo – Microsoft
Journals
ACM Transactions on Information Systems (TOIS):
Information Processing and Management (IP&M):
Information Retrieval:
International Journal on Digital Libraries:
Journal of the American Society of Information Science and Technology (JASIST):
SIGIR Forum:
Journal of Documentation
D-Lib Magazine
Data & Knowledge Engineering:
Information Processing Letters:
Information Research
Information Systems:
Journal of Intelligent Information Systems:
Knowledge and Information Systems:
Foundations and Trends in Information Retrieval:
Resources and tools
10 free NLP tools for the SEO – Science for SEO
Glossary (Modern Information Retrieval) - Berkely
Information retrieval research links @ Search Tools – Search Tools.com
Information Retrieval Links - BUBL
Information Retrieval Systems - LSU
Open Directory: Information Retrieval Links - ODP
Indexing Resources - UBC
IR & Neural Networks, Symbolic Learning, Genetic Algorithms
Stop list (a list of stop words) - MIT
NLP resources - Chris Manning
Text mining links - Weiguo Patrick Fan's
LSA/LSI source code & tools – Science for SEO
Blogs and websites
IR related Blogs
Stanford Infoblog
Geeking with Greg
IR Thoughts
Jeff's Search Engine Caffe
Daniel Lemire's blog
Apperceptual
Natural language processing blog
SEO Geeks
SEO by the Sea
Seo-theory.com
ScienceforSEO.com
Huomah.com (careful, I hear he's whacked)
Know of others? Please let me know as I am admitedly light here....
Other IR websites;
Information Retrieval Specialist Group
Information Retrieval (newsletters- free and paid) Springer
Fast Search white papers - Fast Search
Wiki – Association for Computational linguistics
Latent Semantic Analysis – Colorado U
That should keep you busy, oui?
And there you have it. When this journey began it was in frustration but along the ride I fell deeper into the abyss that is my fascination (near obsession?) with all things search. Having the holidays to get immersed was timely and hopefully there were some finds that you take with you.
The ultimate goal of this exploration is to merely encourage a sense of understanding, fire up the imagination and hopefully stir the passion. When those in the SEO industry speak of ‘standards’ and pontificate the future, pausing to look at the IR community may better serve us all. Should we seek to avoid being labelled link whores and hype merchants, having deeper technical skills and knowledge may lead us to that end. This is my challenge to all those in the world of search engine optimization.
/end journey….
I can’t thank those that helped with this post enough. Each lead turned me in new directions and fired up my passions again. Gone is the angst of the bickering and drama, replaced by new paths into creative bliss :0)

|
Comments
Watch out Wikipedia, with all of this info, once mastered folding up double rankings like Wiki will be like throwing back a few cold ones.
Great post and thanks for the research bro.
If you're into it, join our merry band on the next adventure, I can keep you updated on things in the pipeline :0)
Dave, if you can, lets keep this page current. While it summarizes our last 4 years of conversation (about IR stuff), I am sure that its not the end. You deserve extra credit for taking on the MS papers as only a few dare to venture into that territory.
Looks like I have some reading to do...
Let me just add another great site from Dr. E. Garcia: http://www.miislita.com/
Quote:
Simply wow, awesome post. I have read some of the material but atleast 80% of it is new to me.
Thanks for posting this!
How much would one profit from the info? Lol... damned marketers, always after the buck huh? he he... To be honest it's all what one does with it - One important area is 'Future-proofing' ones SEO efforts. What works today may not some day soon, watching search evolution ultimately helps one plan ahead in many ways.
As for senor Garcia, we were certainly aware of him (old school here ya know) - but I culled him from the list as his blog doesn't seem all that active anymore unfortunately.
And to the 'wow' 'amazing' stuff... not necessary, just drop by often, hook up on Twitter and just keep this SEO geek from getting lonely ok?
THANK YOU... all for taking the time to comment, makes the effort worth it!! (pretty good chit chat on Sphinn as well if anyone is interested)
I had thought about doing something along the same lines following the guest post I kindly had from Marie-Claire Jenkins from Science for SEO, but the posts you've been putting out there lately are a tad more comprehensive than what I could muster.
I'm heading off to Sphinn to see what the take-up on it is...
Really appreciated, thanks a lot!
Ben
I would also recomend the less technical, fuzzier "Web Dragons" book by Witten et al as a good starting point for general understanding of IR/SEO issues - although I don't agree with all it says
@Chris - well, as U might know (from the sounds of it) each study and research paper is skewed. Researchers have varied data sets and to be honest, may not always be that objective - there is plenty to debate.
Thanks for the lead, shall follow it up.
Bottled aardvark Milk (www.bottled-aardvark-milk.com/)
Great stuff, love it when search marketers find resources to really sink their teeth into IR.
I'd like to plug the AIRWEB resource: http://airweb.cse.lehigh.edu/ Following the annual AIRWEB contest is a great way to keep tabs on spamming vs. anti-spam!
See-ya on twitter! @lucasng
Nice call on the AirWeb, I just so happened to mention the 07 and 08 stuff in a post the other day (Tweeted ye the link). AIR is kind of the ying and yang of search to me, what would they to without those nasty manipulators? The world is a imperfect place, what can one do?
:side:
In any case, I'm linking/blogging/citing etc. because this is a very nice resource.
Owner- www.CollegiateLiving.com
http://www.semmys.org/category/search-tech/
The physics courses are awesome! Also MIT offers great courses on Python too, they're available for download on iTunes and I've been learning (slowly but surely)on my iPhone while I run on the treadmill every day.
RSS feed for comments to this post