SEO Dojo - training for search geeks   the Community for search engine optimization geeks

Are the search engines spying on SEOs? PDF  | Print |  E-mail
Monday, 10 August 2009 09:37

Finding link spam via search marketing Forums

OK, sure… the title is a bit egregious, but in many ways so is the patent that came out from the folks at Microsoft last week. As I was perusing my feed reader on Thursday I noticed a patent that DEFINITELY caught my attention (and I even laughed a bunch too..);

Forum Mining for Suspicious Link Spam Sites Detection - Microsoft - Filed; Feb. 06 2008 – Assigned; Aug. 06 2009

Enemy sent by Gates

And yes… it is exactly what it seems to be. Why SEO forums? Well….that’s because webmasters use them;

“To conveniently and efficiently exchange link trade information, spammers usually log onto SEO forums to communicate with each other for trading links, including link exchange, link sale, and recommendation link exchange. These forums are increasingly more popular. Spammers post requests for "link exchange", "buy & sell link", and "recommendation exchange" in these forums, along with the URLs of their websites, and other interested spammers may reply the requests and provide the URLs of their websites.”

The first thing that really stuck out was, “Why in the world would they even bother to patent such a system?” – As long time readers of the trail know, we’ve covered a great deal of spam detection systems and so we’re well aware that search engines are always on the hunt for link spam… but to actually patent such a system? Very odd…

I mean seriously, I am sure all the major engines have related programs; so why a patent? Because there are a bunch of us that cover them? Naw… Bill covered the patent and put it well in the comments with;

“I’m not filled with enough hubris to think that they did solely because I might blog about it.”

Either way…. It’s such a fun ride that I thought we’d cover it anyways…

Search Engines; the ultimate forum lurkers

 

Right away they discuss ‘protecting rankings’ and targeting ‘SEO Forums’;

An anti-spam technique for protecting search engine ranking is based on mining search engine optimization (SEO) forums. The anti-spam technique collects webpages such as SEO forum posts from a list of suspect spam websites, and extracts suspicious link exchange URLs and corresponding link formation from the collected webpages.

And move onto the application of penalties for suspect URLs….

A search engine ranking penalty is then applied to the suspicious link exchange URLs. The penalty is at least partially determined by the link information associated with the respective suspicious link exchange URL. To detect more suspicious link exchange URLs, the technique may propagate one or more levels from a seed set of suspicious link exchange URLs generated by mining SEO forums.

That last part is interesting as they move from the forum, to suspect URLs and then analyze the link profiles of those sites to possibly find other reciprocal manipulations. That means if you’re doing recips with a webmaster that is dumb enough to post them on an SEO board you might be penalized by association.

Reciprocal link detectionThe thinking is that it would be easier to mine SEO forums for reciprocal link exchanges than to actively seek out spammy link profiles and analyze the link graphs…

Some of the factors listed for the spam activity scoring are;

  1. the number of posts of the user posts the URL
  2. the post time sequence of user who posted the URL, etc.

 

And the posts can be analyzed for;

  1. "exchange"
  2. `look for+{partner|site|link}"
  3. "reciprocal link"
  4. "{add|submit}+{link|site}"
  5. "backlink"
  6. "three way"
  7. or "link partner"

 

Essentially a seed set would be used to identify known offenders, or suspicious link profiles and then lurking various SEO forums looking for activity from those offenders. At that point the link graph can be analyzed and other spammers identified.

 

 

Smacking your competitors

What is VERY problematic right away is that it does sort of open the door to some false-positives in the form of those that might use this information to engage in link spamming their competitors.  There is no discussion of dealing with these problems in the patent so I am not entirely clear as to how effective this method could be, especially after they filed a patent and made it public? (tin foil note; maybe it’s corporate espionage and they’re trying to really mess with Google’s PageRank – lol)

Sure they limit it some with application of penalties “on at least some of the suspicious link exchange URLs” and that the penalty would be applied based on “the link information associated with the respective suspicious link exchange URL” (meaning mining the profile for excessive recips) – but there does seem to be some problems with it…

They speak of;

“(…) identifying a thread of user posts on search engine optimization forums contained in the one or more suspect spam websites; and downloading the identified thread of posts.”

And

“(…) analyzing the content of the webpage comprises detecting keywords indicating link exchange, links sale and recommendation exchange.”

Catch that part? The system is looking for not only recips but potential link sales/placement as well. Once more, this type of system is not all that surprising and we’d have to imagine that Google (whom truly dislikes paid links) would be using such an approach as well. (not really surprising tho’ for any old forum hound).

 

Web Spam – a brief history

 

As part of the filing they put out a fairly reasonable, but short, history of web spam – which is worth putting here;

“Web spamming techniques have also evolved in time. The first generation spam involved keyword stuffing when ranking was dependent on document similarity. The second generation spam involved link farms when ranking was largely dependent on site popularity. The third generation spam uses mutual link exchange through "mutual admiration societies" when ranking is largely dependent on page reputation. In general, the third-generation Web spamming is harder to detect than the previous generations.

Link spamming techniques, which include busying/selling links, exchanging links, and constructing link farms, are a major category of the commonly used spam techniques. Link spamming refers to the cases where spammers set up structures of interconnected pages in order to boost their rankings in link structure-based ranking system such as PageRank. Since link analysis is a crucial factor for commercial search engines, link spam is among the most popular and harmful techniques for search engines nowadays.”

They then go on to discuss the problems, traditionally, with various link spam detection methods such as TrustRank, (better for determining authority) BadRank (better at finding link farms) and SpamRank. Alas they ultimately submit that the “link spam problem has yet to be solved.”…. like we didn’t know already.

SEO forum spying process

 

Getting adversarial

Ok, we already knew that SEOs aren’t criminals, they’re the enemy. That’s nothing new. It is important to note that this is specifically targeted at SEOs and their communities. Furthermore, I highly doubt this is a unique thought and it should serve to a warning to ALL search geeks and webmasters alike. I’ve already told you that reciprocal links are useless and we’ve shown ways search engines detect paid links – so really… if you’re using them, find other ways would you?

Much like we covered with Yahoos patent on excessive reciprocal links, they also discuss 3-4 way linking schemes as well. They are merely using SEO boards as a starting point from which the profiles (link graph) can be further analyzed for other potential linking anomalies (almost a TrustRank type of approach). The penalties don’t seem to be domain wide though, it is more geared towards finding links and removing the value from them.

Oh and don’t even bother to be sneaky, they covered that too…

“There are also many "hidden" spammers in these forums. These hidden spammers may behave very cautiously and artfully and do not explicitly post URLs of their own sites. Instead, they may do link-exchanges with the sites whose URLs are explicitly posted by other spammers, all without explicitly posting their own URLs on an SEO forums.”

Thus you can see why they’d want to look at the link graph of the posted URLs. So you may think you’re safe by not posting your URL, by being a spam lurker yourself, but no-dice according to Bada-Bing… (or so the story goes).

Obvious link exchanges

Why I don’t give a rat’s ass

Well that’s simple… I don’t care for reciprocal link programs. Once upon a time it was very tough to get the word out and attract links. These days we can effectively use social media for link building and these schemes aren’t that attractive. One thing that is important in this filing is that it is a layer of link spam detection. If you are innocently linking back and forth you need not fear. This system would be used in concert with other approaches. If you were flagged as potential link spam the engine would look through forum data to see if there is supporting evidence.

That fact is about the only one that gives this approach any form of credence. As a stand alone method it really doesn’t make a lot of sense. There is too great a chance for false positives, and now that it’s public, the potential for people to link spam their competition.
For the record, I still have no idea why they’d bother even patenting the system really…

What is something you may want to consider though is if they are mining SEO forums for this type of data; what else might they be doing? Many of the old forum hounds would often warn that, “the search engines are watching us” and thism if it does anything, shows they certainly are… Oh crap.. probably blogs too… Ok, that’s it… I quit!

/end transmission

 

Give it a try -

Just for fun I whipped together a Google Custom Search Engine to search a bunch of popular SEO forums… give it a whirl – see if you can find a few reciprocal exchanges;

Loading

 HINT; I suggest searching for something like this (copy and paste below query);

look for AND partner OR site OR link AND exchange

Comments (12)
  • Tom  - :-)
    A seriously kickass article. And much truth to it, too.
  • double spam
    :evil: Time to find all of your competitors links, place ads to "sell" them on forum's and they all get nailed!?
  • Craig Parker
    Good Read, the patent stuff is always fun to know but when it's something like this it's just magic.

    I know you stated it and so did others but it does so feel like they only registered it so we would talk about it, big hole in that plan though.

    Tactical, astute SEO/SEM people read these patent/science blogs, spammers play on those crap forums, do they really think most people who practice out in the open 3 way linking are even going to hear about this let alone understand it?

    I think not.

    As you (and the dude above me) mentioned the biggest problem is negative SEO coming out of this, it's like they make negative SEO easier every month :(
  • Dave
    @DoubleSpam - well it wouldn't be entirely that easy with such as system as there are implementations of it being used as additional data. Meaning they look for other signs by analyzing the actual website's link graph. In short, you'd also need to hack in to the competitors site and leave the recips.... sooooo


    Now, offer to buy links, get said links pointing to competitor site... but even then they would appear to only devalue the links... not tank an entire site - so it likely wouldn't work either...

    @Craig well yea, it does seem an odd sys to patent. As for abuse, I'd imagine what I stated above would be the case... that it is but ONE signal and they could simply devalue the links - this would make if a waste of time for the webmaster or the competitor that tried to link spam...

    At the end of the day any SEO that was using boards for recips is a dork...lol... hell, doing them at all can be a waste IMO...
  • Craig Parker
    Yeah accepted this itself is only one signal and unlikely to trigger anything but it's not like it's the only one out there that has looked a little open to abuse.

    I'm not going to list them but imagine utilising 3 or 4 of these alongside the fact you know someone is doing SEO to a specific site anyway I think it would be pretty easy to make it look way worse than it is.

    Hopefully Google are smarter than that, i'm sure they are but it's always a concern.
  • Dave
    Well I tend to believe if we can see ways to fck with them, then they usually do as well. I am sure there are safeguards in place... yea, the potential is always there, but one would hope they've considered them too. As they discussed in this patent, existing methods left them lacking and thus they created this particular 'layer'.

    For me really, it highlights the fact that we're not generally the search engineers best friend and peeps might just want to be more careful about how they do things...
  • Jon Henshaw  - Howto on Fuxoring Your Competitors
    This article and the comments are like a howto on how to screwxor your competitors. It sucks that algos could be so crude that it would be so easy to create a bunch of false positives. I'm just glad this is for Bing and not Google.

    As always, a very interesting read!
  • Dave
    Now now Jon... no myths started here :silly:

    As I said a few times, more of a layer really. If you tried to spam a competitor;

    A. You'd have to actually get recips on the site
    B. Or find a competitor engaging in it and create posts in their name...

    BUT

    If you noticed, the number of posts a user has plays in - so unlikely yo simply create account and drop...

    AND

    All that would happen in this scenario is devaluation of said links (and potentially other recips on the site)

    And of course...

    It's a link spam detection LAYER not standalone... this all makes it not only tough to do but also not really worth it for the most part...

    I think we're safe from espionage at this point.. more likely they'd simply report yer ass...lol... (if ye had excessive or PAID ones)....

    It really seems no more than an additional tool in the belt more than anything...
  • Rhunters  - Owner
    Yeay! another conspiracy theory! Sign me up! it is good when something comes along to break up the boredom!

    You should see what is being said in some of the forums!

    The search gestapo is after us!

    PISSML

    Absolutely fantastic atricle, the search bar was a GAG!

    great job!
  • Jason Capshaw  - Maybe A Scare Tactic
    Maybe Microsoft released it as a warning, a scare tactic to get SEOs afraid to look for links in these forums.

    Its sort of like the curfew idea, if no one is out in a bad neighborhood, there will be less crime.
  • Nick Stamoulis
    Link exchanges are not a form of building business. Especially when done with the sole purpose of rankings. This is there way to start cleaning things up in Bing.
  • Inder@SeoNext
    Getting to know the patent stuff is always beneficial, Ya I agree with Jason, Seems more or less like a warning from microsft to SEO's.
Write comment
Your Contact Details:
Comment:
[b] [i] [u] [url] [quote] [code] [img]   
=)=D=(XD:dizzy:T_T:blush:^_^=_=-_-:pout::angry:
=Oo_O:snicker::eyebrow::sigh::sick::whisper::whistle::nuu::gah::flame::cool:
:shy::kawaii::notfunny::snooty::uhh:X_XXB:talkbiz::grr::onoes::psychotic::scared:
:evil::nomnom::zombie::want::drunk::love::meow::music:
Security
Please input the anti-spam code that you can read in the image.
 

Find your way

Trail Maps

Follow me on Twitter
Think Visibility; Internet Marketing
Angie's professional copywriting services
Check out the full line of SEO and PPC keyword tools
Raven - SEO Tools
SEM Rush - keyword research tools
New Media - programming and plugins