SEO Blog - Internet marketing news and views  

Are the search engines spying on SEOs?

Written by David Harry   
Monday, 10 August 2009 09:37

Finding link spam via search marketing Forums

OK, sure… the title is a bit egregious, but in many ways so is the patent that came out from the folks at Microsoft last week. As I was perusing my feed reader on Thursday I noticed a patent that DEFINITELY caught my attention (and I even laughed a bunch too..);

Forum Mining for Suspicious Link Spam Sites Detection - Microsoft - Filed; Feb. 06 2008 – Assigned; Aug. 06 2009

Enemy sent by Gates

And yes… it is exactly what it seems to be. Why SEO forums? Well….that’s because webmasters use them;

“To conveniently and efficiently exchange link trade information, spammers usually log onto SEO forums to communicate with each other for trading links, including link exchange, link sale, and recommendation link exchange. These forums are increasingly more popular. Spammers post requests for "link exchange", "buy & sell link", and "recommendation exchange" in these forums, along with the URLs of their websites, and other interested spammers may reply the requests and provide the URLs of their websites.”

The first thing that really stuck out was, “Why in the world would they even bother to patent such a system?” – As long time readers of the trail know, we’ve covered a great deal of spam detection systems and so we’re well aware that search engines are always on the hunt for link spam… but to actually patent such a system? Very odd…

I mean seriously, I am sure all the major engines have related programs; so why a patent? Because there are a bunch of us that cover them? Naw… Bill covered the patent and put it well in the comments with;

“I’m not filled with enough hubris to think that they did solely because I might blog about it.”

Either way…. It’s such a fun ride that I thought we’d cover it anyways…

Search Engines; the ultimate forum lurkers


Right away they discuss ‘protecting rankings’ and targeting ‘SEO Forums’;

An anti-spam technique for protecting search engine ranking is based on mining search engine optimization (SEO) forums. The anti-spam technique collects webpages such as SEO forum posts from a list of suspect spam websites, and extracts suspicious link exchange URLs and corresponding link formation from the collected webpages.

And move onto the application of penalties for suspect URLs….

A search engine ranking penalty is then applied to the suspicious link exchange URLs. The penalty is at least partially determined by the link information associated with the respective suspicious link exchange URL. To detect more suspicious link exchange URLs, the technique may propagate one or more levels from a seed set of suspicious link exchange URLs generated by mining SEO forums.

That last part is interesting as they move from the forum, to suspect URLs and then analyze the link profiles of those sites to possibly find other reciprocal manipulations. That means if you’re doing recips with a webmaster that is dumb enough to post them on an SEO board you might be penalized by association.

Reciprocal link detectionThe thinking is that it would be easier to mine SEO forums for reciprocal link exchanges than to actively seek out spammy link profiles and analyze the link graphs…

Some of the factors listed for the spam activity scoring are;

  1. the number of posts of the user posts the URL
  2. the post time sequence of user who posted the URL, etc.


And the posts can be analyzed for;

  1. "exchange"
  2. `look for+{partner|site|link}"
  3. "reciprocal link"
  4. "{add|submit}+{link|site}"
  5. "backlink"
  6. "three way"
  7. or "link partner"


Essentially a seed set would be used to identify known offenders, or suspicious link profiles and then lurking various SEO forums looking for activity from those offenders. At that point the link graph can be analyzed and other spammers identified.



Smacking your competitors

What is VERY problematic right away is that it does sort of open the door to some false-positives in the form of those that might use this information to engage in link spamming their competitors.  There is no discussion of dealing with these problems in the patent so I am not entirely clear as to how effective this method could be, especially after they filed a patent and made it public? (tin foil note; maybe it’s corporate espionage and they’re trying to really mess with Google’s PageRank – lol)

Sure they limit it some with application of penalties “on at least some of the suspicious link exchange URLs” and that the penalty would be applied based on “the link information associated with the respective suspicious link exchange URL” (meaning mining the profile for excessive recips) – but there does seem to be some problems with it…

They speak of;

“(…) identifying a thread of user posts on search engine optimization forums contained in the one or more suspect spam websites; and downloading the identified thread of posts.”


“(…) analyzing the content of the webpage comprises detecting keywords indicating link exchange, links sale and recommendation exchange.”

Catch that part? The system is looking for not only recips but potential link sales/placement as well. Once more, this type of system is not all that surprising and we’d have to imagine that Google (whom truly dislikes paid links) would be using such an approach as well. (not really surprising tho’ for any old forum hound).


Web Spam – a brief history


As part of the filing they put out a fairly reasonable, but short, history of web spam – which is worth putting here;

“Web spamming techniques have also evolved in time. The first generation spam involved keyword stuffing when ranking was dependent on document similarity. The second generation spam involved link farms when ranking was largely dependent on site popularity. The third generation spam uses mutual link exchange through "mutual admiration societies" when ranking is largely dependent on page reputation. In general, the third-generation Web spamming is harder to detect than the previous generations.

Link spamming techniques, which include busying/selling links, exchanging links, and constructing link farms, are a major category of the commonly used spam techniques. Link spamming refers to the cases where spammers set up structures of interconnected pages in order to boost their rankings in link structure-based ranking system such as PageRank. Since link analysis is a crucial factor for commercial search engines, link spam is among the most popular and harmful techniques for search engines nowadays.”

They then go on to discuss the problems, traditionally, with various link spam detection methods such as TrustRank, (better for determining authority) BadRank (better at finding link farms) and SpamRank. Alas they ultimately submit that the “link spam problem has yet to be solved.”…. like we didn’t know already.

SEO forum spying process


Getting adversarial

Ok, we already knew that SEOs aren’t criminals, they’re the enemy. That’s nothing new. It is important to note that this is specifically targeted at SEOs and their communities. Furthermore, I highly doubt this is a unique thought and it should serve to a warning to ALL search geeks and webmasters alike. I’ve already told you that reciprocal links are useless and we’ve shown ways search engines detect paid links – so really… if you’re using them, find other ways would you?

Much like we covered with Yahoos patent on excessive reciprocal links, they also discuss 3-4 way linking schemes as well. They are merely using SEO boards as a starting point from which the profiles (link graph) can be further analyzed for other potential linking anomalies (almost a TrustRank type of approach). The penalties don’t seem to be domain wide though, it is more geared towards finding links and removing the value from them.

Oh and don’t even bother to be sneaky, they covered that too…

“There are also many "hidden" spammers in these forums. These hidden spammers may behave very cautiously and artfully and do not explicitly post URLs of their own sites. Instead, they may do link-exchanges with the sites whose URLs are explicitly posted by other spammers, all without explicitly posting their own URLs on an SEO forums.”

Thus you can see why they’d want to look at the link graph of the posted URLs. So you may think you’re safe by not posting your URL, by being a spam lurker yourself, but no-dice according to Bada-Bing… (or so the story goes).

Obvious link exchanges

Why I don’t give a rat’s ass

Well that’s simple… I don’t care for reciprocal link programs. Once upon a time it was very tough to get the word out and attract links. These days we can effectively use social media for link building and these schemes aren’t that attractive. One thing that is important in this filing is that it is a layer of link spam detection. If you are innocently linking back and forth you need not fear. This system would be used in concert with other approaches. If you were flagged as potential link spam the engine would look through forum data to see if there is supporting evidence.

That fact is about the only one that gives this approach any form of credence. As a stand alone method it really doesn’t make a lot of sense. There is too great a chance for false positives, and now that it’s public, the potential for people to link spam their competition.
For the record, I still have no idea why they’d bother even patenting the system really…

What is something you may want to consider though is if they are mining SEO forums for this type of data; what else might they be doing? Many of the old forum hounds would often warn that, “the search engines are watching us” and thism if it does anything, shows they certainly are… Oh crap.. probably blogs too… Ok, that’s it… I quit!

/end transmission


Give it a try -

Just for fun I whipped together a Google Custom Search Engine to search a bunch of popular SEO forums… give it a whirl – see if you can find a few reciprocal exchanges;


 HINT; I suggest searching for something like this (copy and paste below query);

look for AND partner OR site OR link AND exchange


Search the Site

SEO Training

Tools of the Trade


On Twitter

Follow me on Twitter

Site Designed by Verve Developments.