Finding link spam via search marketing Forums
OK, sure
the title is a bit egregious, but in many ways so is the patent that came out from the folks at Microsoft last week. As I was perusing my feed reader on Thursday I noticed a patent that DEFINITELY caught my attention (and I even laughed a bunch too..);
Forum Mining for Suspicious Link Spam Sites Detection - Microsoft -
Filed; Feb. 06 2008 Assigned; Aug. 06 2009

And yes
it is exactly what it seems to be. Why SEO forums? Well
.thats because webmasters use them;
To conveniently and efficiently exchange link trade information, spammers usually log onto SEO forums to communicate with each other for trading links, including link exchange, link sale, and recommendation link exchange. These forums are increasingly more popular. Spammers post requests for "link exchange", "buy & sell link", and "recommendation exchange" in these forums, along with the URLs of their websites, and other interested spammers may reply the requests and provide the URLs of their websites.
The first thing that really stuck out was, Why in the world would they even bother to patent such a system? As long time readers of the trail know, weve covered a great deal of spam detection systems and so were well aware that search engines are always on the hunt for link spam
but to actually patent such a system? Very odd
I mean seriously, I am sure all the major engines have related programs; so why a patent? Because there are a bunch of us that cover them? Naw
Bill covered the patent and put it well in the comments with;
Im not filled with enough hubris to think that they did solely because I might blog about it.
Either way
. Its such a fun ride that I thought wed cover it anyways
Search Engines; the ultimate forum lurkers
Right away they discuss protecting rankings and targeting SEO Forums;
An anti-spam technique for protecting search engine ranking is based on mining search engine optimization (SEO) forums. The anti-spam technique collects webpages such as SEO forum posts from a list of suspect spam websites, and extracts suspicious link exchange URLs and corresponding link formation from the collected webpages.
And move onto the application of penalties for suspect URLs
.
A search engine ranking penalty is then applied to the suspicious link exchange URLs. The penalty is at least partially determined by the link information associated with the respective suspicious link exchange URL. To detect more suspicious link exchange URLs, the technique may propagate one or more levels from a seed set of suspicious link exchange URLs generated by mining SEO forums.
That last part is interesting as they move from the forum, to suspect URLs and then analyze the link profiles of those sites to possibly find other reciprocal manipulations. That means if youre doing recips with a webmaster that is dumb enough to post them on an SEO board you might be penalized by association.
The thinking is that it would be easier to mine SEO forums for reciprocal link exchanges than to actively seek out spammy link profiles and analyze the link graphs
Some of the factors listed for the spam activity scoring are;
- the number of posts of the user posts the URL
- the post time sequence of user who posted the URL, etc.
And the posts can be analyzed for;
- "exchange"
- `look for+{partner|site|link}"
- "reciprocal link"
- "{add|submit}+{link|site}"
- "backlink"
- "three way"
- or "link partner"
Essentially a seed set would be used to identify known offenders, or suspicious link profiles and then lurking various SEO forums looking for activity from those offenders. At that point the link graph can be analyzed and other spammers identified.
Smacking your competitors
What is VERY problematic right away is that it does sort of open the door to some false-positives in the form of those that might use this information to engage in link spamming their competitors. There is no discussion of dealing with these problems in the patent so I am not entirely clear as to how effective this method could be, especially after they filed a patent and made it public? (tin foil note; maybe its corporate espionage and theyre trying to really mess with Googles PageRank lol)
Sure they limit it some with application of penalties on at least some of the suspicious link exchange URLs and that the penalty would be applied based on the link information associated with the respective suspicious link exchange URL (meaning mining the profile for excessive recips) but there does seem to be some problems with it
They speak of;
(
) identifying a thread of user posts on search engine optimization forums contained in the one or more suspect spam websites; and downloading the identified thread of posts.
And
(
) analyzing the content of the webpage comprises detecting keywords indicating link exchange, links sale and recommendation exchange.
Catch that part? The system is looking for not only recips but potential link sales/placement as well. Once more, this type of system is not all that surprising and wed have to imagine that Google (whom truly dislikes paid links) would be using such an approach as well. (not really surprising tho for any old forum hound).
Web Spam a brief history
As part of the filing they put out a fairly reasonable, but short, history of web spam which is worth putting here;
Web spamming techniques have also evolved in time. The first generation spam involved keyword stuffing when ranking was dependent on document similarity. The second generation spam involved link farms when ranking was largely dependent on site popularity. The third generation spam uses mutual link exchange through "mutual admiration societies" when ranking is largely dependent on page reputation. In general, the third-generation Web spamming is harder to detect than the previous generations.
Link spamming techniques, which include busying/selling links, exchanging links, and constructing link farms, are a major category of the commonly used spam techniques. Link spamming refers to the cases where spammers set up structures of interconnected pages in order to boost their rankings in link structure-based ranking system such as PageRank. Since link analysis is a crucial factor for commercial search engines, link spam is among the most popular and harmful techniques for search engines nowadays.
They then go on to discuss the problems, traditionally, with various link spam detection methods such as TrustRank, (better for determining authority) BadRank (better at finding link farms) and SpamRank. Alas they ultimately submit that the link spam problem has yet to be solved.
. like we didnt know already.

Getting adversarial
Ok, we already knew that SEOs arent criminals, theyre the enemy. Thats nothing new. It is important to note that this is specifically targeted at SEOs and their communities. Furthermore, I highly doubt this is a unique thought and it should serve to a warning to ALL search geeks and webmasters alike. Ive already told you that reciprocal links are useless and weve shown ways search engines detect paid links so really
if youre using them, find other ways would you?
Much like we covered with Yahoos patent on excessive reciprocal links, they also discuss 3-4 way linking schemes as well. They are merely using SEO boards as a starting point from which the profiles (link graph) can be further analyzed for other potential linking anomalies (almost a TrustRank type of approach). The penalties dont seem to be domain wide though, it is more geared towards finding links and removing the value from them.
Oh and dont even bother to be sneaky, they covered that too
There are also many "hidden" spammers in these forums. These hidden spammers may behave very cautiously and artfully and do not explicitly post URLs of their own sites. Instead, they may do link-exchanges with the sites whose URLs are explicitly posted by other spammers, all without explicitly posting their own URLs on an SEO forums.
Thus you can see why theyd want to look at the link graph of the posted URLs. So you may think youre safe by not posting your URL, by being a spam lurker yourself, but no-dice according to Bada-Bing
(or so the story goes).

Why I dont give a rats ass
Well thats simple
I dont care for reciprocal link programs. Once upon a time it was very tough to get the word out and attract links. These days we can effectively use social media for link building and these schemes arent that attractive. One thing that is important in this filing is that it is a layer of link spam detection. If you are innocently linking back and forth you need not fear. This system would be used in concert with other approaches. If you were flagged as potential link spam the engine would look through forum data to see if there is supporting evidence.
That fact is about the only one that gives this approach any form of credence. As a stand alone method it really doesnt make a lot of sense. There is too great a chance for false positives, and now that its public, the potential for people to link spam their competition.
For the record, I still have no idea why theyd bother even patenting the system really
What is something you may want to consider though is if they are mining SEO forums for this type of data; what else might they be doing? Many of the old forum hounds would often warn that, the search engines are watching us and thism if it does anything, shows they certainly are
Oh crap.. probably blogs too
Ok, thats it
I quit!
/end transmission
Give it a try -
Just for fun I whipped together a Google Custom Search Engine to search a bunch of popular SEO forums
give it a whirl see if you can find a few reciprocal exchanges;
Loading
HINT; I suggest searching for something like this (copy and paste below query);
look for AND partner OR site OR link AND exchange
|
Comments
I know you stated it and so did others but it does so feel like they only registered it so we would talk about it, big hole in that plan though.
Tactical, astute SEO/SEM people read these patent/science blogs, spammers play on those crap forums, do they really think most people who practice out in the open 3 way linking are even going to hear about this let alone understand it?
I think not.
As you (and the dude above me) mentioned the biggest problem is negative SEO coming out of this, it's like they make negative SEO easier every month
Now, offer to buy links, get said links pointing to competitor site... but even then they would appear to only devalue the links... not tank an entire site - so it likely wouldn't work either...
@Craig well yea, it does seem an odd sys to patent. As for abuse, I'd imagine what I stated above would be the case... that it is but ONE signal and they could simply devalue the links - this would make if a waste of time for the webmaster or the competitor that tried to link spam...
At the end of the day any SEO that was using boards for recips is a dork...lol... hell, doing them at all can be a waste IMO...
I'm not going to list them but imagine utilising 3 or 4 of these alongside the fact you know someone is doing SEO to a specific site anyway I think it would be pretty easy to make it look way worse than it is.
Hopefully Google are smarter than that, i'm sure they are but it's always a concern.
For me really, it highlights the fact that we're not generally the search engineers best friend and peeps might just want to be more careful about how they do things...
As always, a very interesting read!
As I said a few times, more of a layer really. If you tried to spam a competitor;
A. You'd have to actually get recips on the site
B. Or find a competitor engaging in it and create posts in their name...
BUT
If you noticed, the number of posts a user has plays in - so unlikely yo simply create account and drop...
AND
All that would happen in this scenario is devaluation of said links (and potentially other recips on the site)
And of course...
It's a link spam detection LAYER not standalone... this all makes it not only tough to do but also not really worth it for the most part...
I think we're safe from espionage at this point.. more likely they'd simply report yer ass...lol... (if ye had excessive or PAID ones)....
It really seems no more than an additional tool in the belt more than anything...
You should see what is being said in some of the forums!
The search gestapo is after us!
PISSML
Absolutely fantastic atricle, the search bar was a GAG!
great job!
Its sort of like the curfew idea, if no one is out in a bad neighborhood, there will be less crime.
RSS feed for comments to this post