SEO Blog - Internet marketing news and views  

Are the search engines spying on SEOs?

Written by David Harry   
Monday, 10 August 2009 09:37

Finding link spam via search marketing Forums

OK, sure… the title is a bit egregious, but in many ways so is the patent that came out from the folks at Microsoft last week. As I was perusing my feed reader on Thursday I noticed a patent that DEFINITELY caught my attention (and I even laughed a bunch too..);

Forum Mining for Suspicious Link Spam Sites Detection - Microsoft - Filed; Feb. 06 2008 – Assigned; Aug. 06 2009

Enemy sent by Gates

And yes… it is exactly what it seems to be. Why SEO forums? Well….that’s because webmasters use them;

“To conveniently and efficiently exchange link trade information, spammers usually log onto SEO forums to communicate with each other for trading links, including link exchange, link sale, and recommendation link exchange. These forums are increasingly more popular. Spammers post requests for "link exchange", "buy & sell link", and "recommendation exchange" in these forums, along with the URLs of their websites, and other interested spammers may reply the requests and provide the URLs of their websites.”

The first thing that really stuck out was, “Why in the world would they even bother to patent such a system?” – As long time readers of the trail know, we’ve covered a great deal of spam detection systems and so we’re well aware that search engines are always on the hunt for link spam… but to actually patent such a system? Very odd…

I mean seriously, I am sure all the major engines have related programs; so why a patent? Because there are a bunch of us that cover them? Naw… Bill covered the patent and put it well in the comments with;

“I’m not filled with enough hubris to think that they did solely because I might blog about it.”

Either way…. It’s such a fun ride that I thought we’d cover it anyways…

Search Engines; the ultimate forum lurkers

 

Right away they discuss ‘protecting rankings’ and targeting ‘SEO Forums’;

An anti-spam technique for protecting search engine ranking is based on mining search engine optimization (SEO) forums. The anti-spam technique collects webpages such as SEO forum posts from a list of suspect spam websites, and extracts suspicious link exchange URLs and corresponding link formation from the collected webpages.

And move onto the application of penalties for suspect URLs….

A search engine ranking penalty is then applied to the suspicious link exchange URLs. The penalty is at least partially determined by the link information associated with the respective suspicious link exchange URL. To detect more suspicious link exchange URLs, the technique may propagate one or more levels from a seed set of suspicious link exchange URLs generated by mining SEO forums.

That last part is interesting as they move from the forum, to suspect URLs and then analyze the link profiles of those sites to possibly find other reciprocal manipulations. That means if you’re doing recips with a webmaster that is dumb enough to post them on an SEO board you might be penalized by association.

Reciprocal link detectionThe thinking is that it would be easier to mine SEO forums for reciprocal link exchanges than to actively seek out spammy link profiles and analyze the link graphs…

Some of the factors listed for the spam activity scoring are;

  1. the number of posts of the user posts the URL
  2. the post time sequence of user who posted the URL, etc.

 

And the posts can be analyzed for;

  1. "exchange"
  2. `look for+{partner|site|link}"
  3. "reciprocal link"
  4. "{add|submit}+{link|site}"
  5. "backlink"
  6. "three way"
  7. or "link partner"

 

Essentially a seed set would be used to identify known offenders, or suspicious link profiles and then lurking various SEO forums looking for activity from those offenders. At that point the link graph can be analyzed and other spammers identified.

 

 

Smacking your competitors

What is VERY problematic right away is that it does sort of open the door to some false-positives in the form of those that might use this information to engage in link spamming their competitors.  There is no discussion of dealing with these problems in the patent so I am not entirely clear as to how effective this method could be, especially after they filed a patent and made it public? (tin foil note; maybe it’s corporate espionage and they’re trying to really mess with Google’s PageRank – lol)

Sure they limit it some with application of penalties “on at least some of the suspicious link exchange URLs” and that the penalty would be applied based on “the link information associated with the respective suspicious link exchange URL” (meaning mining the profile for excessive recips) – but there does seem to be some problems with it…

They speak of;

“(…) identifying a thread of user posts on search engine optimization forums contained in the one or more suspect spam websites; and downloading the identified thread of posts.”

And

“(…) analyzing the content of the webpage comprises detecting keywords indicating link exchange, links sale and recommendation exchange.”

Catch that part? The system is looking for not only recips but potential link sales/placement as well. Once more, this type of system is not all that surprising and we’d have to imagine that Google (whom truly dislikes paid links) would be using such an approach as well. (not really surprising tho’ for any old forum hound).

 

Web Spam – a brief history

 

As part of the filing they put out a fairly reasonable, but short, history of web spam – which is worth putting here;

“Web spamming techniques have also evolved in time. The first generation spam involved keyword stuffing when ranking was dependent on document similarity. The second generation spam involved link farms when ranking was largely dependent on site popularity. The third generation spam uses mutual link exchange through "mutual admiration societies" when ranking is largely dependent on page reputation. In general, the third-generation Web spamming is harder to detect than the previous generations.

Link spamming techniques, which include busying/selling links, exchanging links, and constructing link farms, are a major category of the commonly used spam techniques. Link spamming refers to the cases where spammers set up structures of interconnected pages in order to boost their rankings in link structure-based ranking system such as PageRank. Since link analysis is a crucial factor for commercial search engines, link spam is among the most popular and harmful techniques for search engines nowadays.”

They then go on to discuss the problems, traditionally, with various link spam detection methods such as TrustRank, (better for determining authority) BadRank (better at finding link farms) and SpamRank. Alas they ultimately submit that the “link spam problem has yet to be solved.”…. like we didn’t know already.

SEO forum spying process

 

Getting adversarial

Ok, we already knew that SEOs aren’t criminals, they’re the enemy. That’s nothing new. It is important to note that this is specifically targeted at SEOs and their communities. Furthermore, I highly doubt this is a unique thought and it should serve to a warning to ALL search geeks and webmasters alike. I’ve already told you that reciprocal links are useless and we’ve shown ways search engines detect paid links – so really… if you’re using them, find other ways would you?

Much like we covered with Yahoos patent on excessive reciprocal links, they also discuss 3-4 way linking schemes as well. They are merely using SEO boards as a starting point from which the profiles (link graph) can be further analyzed for other potential linking anomalies (almost a TrustRank type of approach). The penalties don’t seem to be domain wide though, it is more geared towards finding links and removing the value from them.

Oh and don’t even bother to be sneaky, they covered that too…

“There are also many "hidden" spammers in these forums. These hidden spammers may behave very cautiously and artfully and do not explicitly post URLs of their own sites. Instead, they may do link-exchanges with the sites whose URLs are explicitly posted by other spammers, all without explicitly posting their own URLs on an SEO forums.”

Thus you can see why they’d want to look at the link graph of the posted URLs. So you may think you’re safe by not posting your URL, by being a spam lurker yourself, but no-dice according to Bada-Bing… (or so the story goes).

Obvious link exchanges

Why I don’t give a rat’s ass

Well that’s simple… I don’t care for reciprocal link programs. Once upon a time it was very tough to get the word out and attract links. These days we can effectively use social media for link building and these schemes aren’t that attractive. One thing that is important in this filing is that it is a layer of link spam detection. If you are innocently linking back and forth you need not fear. This system would be used in concert with other approaches. If you were flagged as potential link spam the engine would look through forum data to see if there is supporting evidence.

That fact is about the only one that gives this approach any form of credence. As a stand alone method it really doesn’t make a lot of sense. There is too great a chance for false positives, and now that it’s public, the potential for people to link spam their competition.
For the record, I still have no idea why they’d bother even patenting the system really…

What is something you may want to consider though is if they are mining SEO forums for this type of data; what else might they be doing? Many of the old forum hounds would often warn that, “the search engines are watching us” and thism if it does anything, shows they certainly are… Oh crap.. probably blogs too… Ok, that’s it… I quit!

/end transmission

 

Give it a try -

Just for fun I whipped together a Google Custom Search Engine to search a bunch of popular SEO forums… give it a whirl – see if you can find a few reciprocal exchanges;

Loading

 HINT; I suggest searching for something like this (copy and paste below query);

look for AND partner OR site OR link AND exchange

 

Comments  

 
+2 # Tom 2009-08-10 11:35
A seriously kickass article. And much truth to it, too.
Reply | Reply with quote | Quote
 
 
+1 # double spam 2009-08-10 12:05
:evil: Time to find all of your competitors links, place ads to "sell" them on forum's and they all get nailed!?
Reply | Reply with quote | Quote
 
 
0 # Craig Parker 2009-08-10 16:43
Good Read, the patent stuff is always fun to know but when it's something like this it's just magic.

I know you stated it and so did others but it does so feel like they only registered it so we would talk about it, big hole in that plan though.

Tactical, astute SEO/SEM people read these patent/science blogs, spammers play on those crap forums, do they really think most people who practice out in the open 3 way linking are even going to hear about this let alone understand it?

I think not.

As you (and the dude above me) mentioned the biggest problem is negative SEO coming out of this, it's like they make negative SEO easier every month :sad:
Reply | Reply with quote | Quote
 
 
0 # Dave 2009-08-10 19:58
@DoubleSpam - well it wouldn't be entirely that easy with such as system as there are implementations of it being used as additional data. Meaning they look for other signs by analyzing the actual website's link graph. In short, you'd also need to hack in to the competitors site and leave the recips.... sooooo


Now, offer to buy links, get said links pointing to competitor site... but even then they would appear to only devalue the links... not tank an entire site - so it likely wouldn't work either...

@Craig well yea, it does seem an odd sys to patent. As for abuse, I'd imagine what I stated above would be the case... that it is but ONE signal and they could simply devalue the links - this would make if a waste of time for the webmaster or the competitor that tried to link spam...

At the end of the day any SEO that was using boards for recips is a dork...lol... hell, doing them at all can be a waste IMO...
Reply | Reply with quote | Quote
 
 
0 # Craig Parker 2009-08-11 03:40
Yeah accepted this itself is only one signal and unlikely to trigger anything but it's not like it's the only one out there that has looked a little open to abuse.

I'm not going to list them but imagine utilising 3 or 4 of these alongside the fact you know someone is doing SEO to a specific site anyway I think it would be pretty easy to make it look way worse than it is.

Hopefully Google are smarter than that, i'm sure they are but it's always a concern.
Reply | Reply with quote | Quote
 
 
0 # Dave 2009-08-11 10:07
Well I tend to believe if we can see ways to fck with them, then they usually do as well. I am sure there are safeguards in place... yea, the potential is always there, but one would hope they've considered them too. As they discussed in this patent, existing methods left them lacking and thus they created this particular 'layer'.

For me really, it highlights the fact that we're not generally the search engineers best friend and peeps might just want to be more careful about how they do things...
Reply | Reply with quote | Quote
 
 
0 # Jon Henshaw 2009-08-12 14:18
This article and the comments are like a howto on how to screwxor your competitors. It sucks that algos could be so crude that it would be so easy to create a bunch of false positives. I'm just glad this is for Bing and not Google.

As always, a very interesting read!
Reply | Reply with quote | Quote
 
 
0 # Dave 2009-08-12 14:58
Now now Jon... no myths started here :silly:

As I said a few times, more of a layer really. If you tried to spam a competitor;

A. You'd have to actually get recips on the site
B. Or find a competitor engaging in it and create posts in their name...

BUT

If you noticed, the number of posts a user has plays in - so unlikely yo simply create account and drop...

AND

All that would happen in this scenario is devaluation of said links (and potentially other recips on the site)

And of course...

It's a link spam detection LAYER not standalone... this all makes it not only tough to do but also not really worth it for the most part...

I think we're safe from espionage at this point.. more likely they'd simply report yer ass...lol... (if ye had excessive or PAID ones)....

It really seems no more than an additional tool in the belt more than anything...
Reply | Reply with quote | Quote
 
 
0 # Rhunters 2009-08-12 19:52
Yeay! another conspiracy theory! Sign me up! it is good when something comes along to break up the boredom!

You should see what is being said in some of the forums!

The search gestapo is after us!

PISSML

Absolutely fantastic atricle, the search bar was a GAG!

great job!
Reply | Reply with quote | Quote
 
 
0 # Jason Capshaw 2009-08-13 13:40
Maybe Microsoft released it as a warning, a scare tactic to get SEOs afraid to look for links in these forums.

Its sort of like the curfew idea, if no one is out in a bad neighborhood, there will be less crime.
Reply | Reply with quote | Quote
 
 
0 # Nick Stamoulis 2009-08-14 13:35
Link exchanges are not a form of building business. Especially when done with the sole purpose of rankings. This is there way to start cleaning things up in Bing.
Reply | Reply with quote | Quote
 
 
0 # Inder@SeoNext 2009-08-17 05:32
Getting to know the patent stuff is always beneficial, Ya I agree with Jason, Seems more or less like a warning from microsft to SEO's.
Reply | Reply with quote | Quote
 

Add comment


Security code
Refresh

Search the Site

SEO Training

Tools of the Trade

Banner
Banner
Banner

On Twitter

Follow me on Twitter

Site Designed by Verve Developments.