SEO Blog - Internet marketing news and views  

SEO Higher learning - Page 2

Written by David Harry   
Monday, 05 January 2009 07:19
Article Index
SEO Higher learning
Page 2
All Pages


Now you can call me biased (and you’d be right) but patents are always a great source of insight into how the gang at each of the big 3 may be thinking (now and historically). Bill has been a huge influence and help the last few years and patents are way my increasing fascination started to really kick into high gear. Unfortunately the list is endless and so we’ll keep it to some of the more fundamental and interesting ones worth looking into;

General IR related patents;

System and method for characterizing a web page using multiple anchor sets of web pages – Yahoo (sort of like TrustRank concepts)

Regression framework for learning ranking functions using relative preferences (machine learning) - Yahoo

System and method for determining semantically related terms using an active learning framework – Yahoo

Using link structure for suggesting related queries – Microsoft

Detecting Duplicate and near-duplicate files  - Google

Searching to identify web page(s) – Microsoft

Method and system for creating improved search queries – Google

Extraction of information from documents - Microsoft


Personalized search/ behavioural signals

Systems and methods for analyzing a user's web history – Google

Systems and methods for modifying search results based on a user's history - Google

Method and apparatus for learning a probabilistic generative model for text - Google

Search system using user behaviour data   - Microsoft

User sensitive (personalized) PageRank – Yahoo

Re-ranking search results based on query log – Microsoft (covered by Bill )

User Distributed Search Results – Google

Search pogosticking benchmarks – Yahoo

Using search trails to provide enhanced search interaction  - Microsoft

Personalization of web page search rankings – Microsoft

Accounting for behavioral variability in web search – Microsoft

User query data mining techniques – Yahoo

Bookmarks and Ranking  - Google


Page Segmentation

Document segmentation based on visual gaps – Google

System and method for detecting a web page – Yahoo

Vision-based document segmentation – Microsoft

Retrieval of structured documents - Microsoft

Systems and methods for analyzing boilerplate - Google


Historical ranking factors

Information retrieval based on historical data  - Google

Keyword usage score based on frequent impulse and frequency rate (and covered by Bill) - Microsoft

Calculating importance of documents factoring historical importance. - Microsoft

Temporal ranking of Search results  - Microsoft




System and method for providing preferred country biasing of search results – Google

System for providing geographically relevant content to a search query with local intent – Yahoo

System for determining local intent in a search query - Yahoo

Detecting a user's location, local intent and travel intent from search queries - Microsoft


Phrase Based IR and semantics

Phrase-based indexing in an information retrieval system - Google

Phrase Identification in an Information Retrieval System,  - Google

Phrase-Based Generation of Document Descriptions,  - Google

Phrase-Based Searching in an Information Retrieval System,  - Google

Phrase-based indexing in an information retrieval system - Google

Automatic taxonomy generation in search results using phrases - Google

Diverse Topic Phrase Extraction (using LSA) – Microsoft

Synonym and similar word page search (more semantics) Microsoft


Fact Extraction / Object Level Search

Generating structured information - Google

Learning facts from semi-structured text - Google

Designating data objects for analysis - Google


Spam Detection

Detecting spam documents in a phrase based information retrieval – Google

Discovering  and determining characteristics of network proxies – Yahoo

Search Ranger System and Double-Funnel Model for Search Spam Analyses and Browser Protection (cloaking) – Microsoft

Web spam page classification using query dependant data – Microsoft

Detecting web spam from changes to links of websites – Microsoft



Method for node ranking in a linked database - Google

Method for scoring documents in a linked database - Google

Methods for ranking nodes in large directed graphs - Google -- covered on SEO by the Sea


You can find a list of more search patents from 2008 here; Google YahooMicrosoft



ACM Transactions on Information Systems (TOIS):
Information Processing and Management (IP&M):
Information Retrieval:
International Journal on Digital Libraries:
Journal of the American Society of Information Science and Technology (JASIST):
SIGIR Forum:
Journal of Documentation
D-Lib Magazine
Data & Knowledge Engineering:
Information Processing Letters:
Information Research
Information Systems:
Journal of Intelligent Information Systems:
Knowledge and Information Systems:
Foundations and Trends in Information Retrieval:


Resources and tools

10 free NLP tools for the SEO – Science for SEO
Glossary (Modern Information Retrieval) - Berkely
Information retrieval research links @ Search Tools – Search
Information Retrieval Links - BUBL
Information Retrieval Systems - LSU
Open Directory: Information Retrieval Links - ODP
Indexing Resources - UBC
IR & Neural Networks, Symbolic Learning, Genetic Algorithms
Stop list (a list of stop words) - MIT
NLP resources - Chris Manning
Text mining links - Weiguo Patrick Fan's
LSA/LSI source code & tools – Science for SEO


Blogs and websites

IR related Blogs

Stanford Infoblog
Geeking with Greg
IR Thoughts
Jeff's Search Engine Caffe
Daniel Lemire's blog
Natural language processing blog

SEO Geeks

SEO by the Sea (careful, I hear he's whacked)
Know of others? Please let me know as I am admitedly light here....

Other IR websites;

Information Retrieval Specialist Group
Information Retrieval (newsletters- free and paid) Springer
Fast Search white papers - Fast Search
Wiki – Association for Computational linguistics
Latent Semantic Analysis – Colorado U


That should keep you busy, oui?

And there you have it. When this journey began it was in frustration but along the ride I fell deeper into the abyss that is my fascination (near obsession?) with all things search. Having the holidays to get immersed was timely and hopefully there were some finds that you take with you.

The ultimate goal of this exploration is to merely encourage a sense of understanding, fire up the imagination and hopefully stir the passion. When those in the SEO industry speak of ‘standards’ and pontificate the future, pausing to look at the IR community may better serve us all. Should we seek to avoid being labelled link whores and hype merchants, having deeper technical skills and knowledge may lead us to that end. This is my challenge to all those in the world of search engine optimization.

/end journey….


I can’t thank those that helped with this post enough. Each lead turned me in new directions and fired up my passions again. Gone is the angst of the bickering and drama, replaced by new paths into creative bliss :0)

A little help rom my friends



0 # Jeffrey Smith 2009-01-05 09:04
Dave you weren't kidding, this is a monster post. You gave me enough material to read for weeks.

Watch out Wikipedia, with all of this info, once mastered folding up double rankings like Wiki will be like throwing back a few cold ones.

Great post and thanks for the research bro.
Reply | Reply with quote | Quote
0 # Dave 2009-01-05 09:42
Lol... likewise. As I mentioned the post was greatly from friends and so there is a great deal of new goodiness there for me as well. An adventure became an odyssey.

If you're into it, join our merry band on the next adventure, I can keep you updated on things in the pipeline :0)
Reply | Reply with quote | Quote
0 # waveshoppe 2009-01-05 10:19
Talk about a big breakfast dude!

Dave, if you can, lets keep this page current. While it summarizes our last 4 years of conversation (about IR stuff), I am sure that its not the end. You deserve extra credit for taking on the MS papers as only a few dare to venture into that territory.
Reply | Reply with quote | Quote
+1 # Antares 2009-01-05 11:42
Colleagues greetings! I uzbek SEO master :-) Write still .....
Reply | Reply with quote | Quote
0 # Hugo @ Zeta Interactive 2009-01-05 15:55
Great stuff! I've seen some of it, but a lot of it is new to me.

Looks like I have some reading to do...
Reply | Reply with quote | Quote
0 # Lorna Li 2009-01-05 15:57
Whoa, this is serious stuff. It will take me months to get thru this and fully comprehend...without a day job. How much of a boost to one's earning potential as an SEO do you anticipate from this?
Reply | Reply with quote | Quote
0 # Alex 2009-01-05 16:39
Hey this post is amazing!
Let me just add another great site from Dr. E. Garcia:

Mi Islita is a research site about information retrieval, data mining, and search engine technologies. Our content is frequently referenced or used by both IR scholars and search engine marketers.
Reply | Reply with quote | Quote
0 # MikevanderHeijden 2009-01-05 17:36
Simply wow, awesome post. I have read some of the material but atleast 80% of it is new to me.

Thanks for posting this!
Reply | Reply with quote | Quote
0 # Jun 2009-01-05 19:15
Haven't done reading the patents yet! LOL! Now another reaading materials. Thanks Dave!
Reply | Reply with quote | Quote
+1 # Dave 2009-01-05 20:40
Well thanks ya'll - it was a labour of love among friends.

How much would one profit from the info? Lol... damned marketers, always after the buck huh? he he... To be honest it's all what one does with it - One important area is 'Future-proofing' ones SEO efforts. What works today may not some day soon, watching search evolution ultimately helps one plan ahead in many ways.

As for senor Garcia, we were certainly aware of him (old school here ya know) - but I culled him from the list as his blog doesn't seem all that active anymore unfortunately.

And to the 'wow' 'amazing' stuff... not necessary, just drop by often, hook up on Twitter and just keep this SEO geek from getting lonely ok?

THANK YOU... all for taking the time to comment, makes the effort worth it!! (pretty good chit chat on Sphinn as well if anyone is interested)
Reply | Reply with quote | Quote
0 # Ben McKay 2009-01-06 00:06
Phenomenal resource Dave - I've not seen a compilation of resources like this ever I don't think, so I'm guessing I'm not alone in thanking you for taking the (long) time in compiling it.

I had thought about doing something along the same lines following the guest post I kindly had from Marie-Claire Jenkins from Science for SEO, but the posts you've been putting out there lately are a tad more comprehensive than what I could muster.

I'm heading off to Sphinn to see what the take-up on it is...

Really appreciated, thanks a lot!

Ben ;-)
Reply | Reply with quote | Quote
0 # Jagdeep Singh Pannu 2009-01-06 10:00
Thanks for this awesome post Dave, Bill, Marie and Charles. That's a haystack of information. Bookmarked and Sphinned :-) Will attack the unfamiliar ones one by one. My in-house consultancy stint has kind of thrown me out of sync with the industry as I have a whole lot of e-learning stuff to attend to. Your post is a place, which i can revisit to catch up.
Reply | Reply with quote | Quote
0 # Chris McGiffen 2009-01-06 10:39
Great resource, and nice to see one of my favs up near the top - Managing Gigabytes :D
I would also recomend the less technical, fuzzier "Web Dragons" book by Witten et al as a good starting point for general understanding of IR/SEO issues - although I don't agree with all it says :-)
Reply | Reply with quote | Quote
0 # Dave 2009-01-06 11:33
@jagdeep - THANK YOU for mentioning the rest of my merry band of IR mayhem makers. I will pass along your comment. Once it started it really did need help of others to ensure it was a blanced but useful resources. It wouldn't be what it is without them :woohoo:

@Chris - well, as U might know (from the sounds of it) each study and research paper is skewed. Researchers have varied data sets and to be honest, may not always be that objective - there is plenty to debate.

Thanks for the lead, shall follow it up.
Reply | Reply with quote | Quote
0 # Jagdeep Singh Pannu 2009-01-08 05:21
It would be awesome guys, if you can present and discuss every one of these topics publicly online. If you want to, just let me know and I will set it up in a virtual environment, where anyone can jump in and discuss in real-time. You can embed the event to announce it on your blogs. These sessions are recorded, so anyone can access them later also. You can email me directly if you want to do this. To have a feel, you can follow the link I provided for this post.
Reply | Reply with quote | Quote
0 # Search Engine Optimization 2009-01-08 05:35
Hey thanks for so many resources. And yeah m going to download your SEO handbook. M sure as your articles have proven to be helpful, your book also will be.. :-)
Reply | Reply with quote | Quote
-1 # Bottled aardvark Milk 2009-01-12 01:09
Very useful resources.I like the way you have presented your article.Thanks for sharing!
Reply | Reply with quote | Quote
-1 # Bottled aardvark Milk 2009-01-12 01:11
Do you have any idea about black/white hat seo? where can I get those details?

Bottled aardvark Milk (
Reply | Reply with quote | Quote
0 # Robbert 2009-01-12 09:01
Wow, impressive list. I'll be checking it out tonight. Definitely!
Reply | Reply with quote | Quote
0 # web design bournemouth 2009-01-15 17:02
thanks dave - but could you just summerise for me in a tidy email so i don't have to bother reading it all! :P
Reply | Reply with quote | Quote
0 # Jack 2009-01-16 06:18
Thanks for the extensive list..
Reply | Reply with quote | Quote
0 # Guest 2009-01-16 07:18
Reply | Reply with quote | Quote
0 # Lucas Ng 2009-01-18 21:23
My work colleague pointed me to your site, specifically this monster treasure trove of IR papers and links!

Great stuff, love it when search marketers find resources to really sink their teeth into IR.

I'd like to plug the AIRWEB resource: Following the annual AIRWEB contest is a great way to keep tabs on spamming vs. anti-spam!

See-ya on twitter! @lucasng
Reply | Reply with quote | Quote
0 # Dave 2009-01-19 06:31
Glad you found it useful... more search geeks out here than I thought, at least in this neck of the woods.

Nice call on the AirWeb, I just so happened to mention the 07 and 08 stuff in a post the other day (Tweeted ye the link). AIR is kind of the ying and yang of search to me, what would they to without those nasty manipulators? The world is a imperfect place, what can one do?

Reply | Reply with quote | Quote
+1 # Avi Rappoport, Search Tools Co 2009-05-15 17:17
I just wandered over here and am very impressed by your diligence in assembling all these links. You may also want to keep an eye on the article posts in citeulike, including their IR group

In any case, I'm linking/blogging/citing etc. because this is a very nice resource.
Reply | Reply with quote | Quote
0 # college lamp guy 2009-12-16 15:16
This list has legs...I am still using this reference guide and it's almost a year old. Thank you, Thank you, Thank you!

Reply | Reply with quote | Quote
0 # WaveShoppe 2010-02-03 20:43
By the way Dave, congratulations on your win!
Reply | Reply with quote | Quote
0 # Ben Joven 2010-02-14 00:53
I LOVE MIT'S video courses online!!! =D

The physics courses are awesome! Also MIT offers great courses on Python too, they're available for download on iTunes and I've been learning (slowly but surely)on my iPhone while I run on the treadmill every day.
Reply | Reply with quote | Quote
0 # Matt Pennebaker 2011-10-21 15:03
Good God Man, this is more information than I had in my senior thesis! Great stuff though. Thanks for the post.
Reply | Reply with quote | Quote

Add comment

Security code

Search the Site

SEO Training

Tools of the Trade


On Twitter

Follow me on Twitter

Site Designed by Verve Developments.