SEO Blog - Internet marketing news and views  

Understanding linking intent; the spam connection

Written by David Harry   
Wednesday, 29 July 2009 08:27

How search engines consider link relevance

A good friend of mine sent me an interesting question a while back, “Dear Dave, can you give me some pointers on how search engines determine link intent?”

Well sheeeeit… it’s hard to say what constitutes ‘intent’ when it comes to various aspects of linking and search engines. It really isn't that straight forward and understanding 'query intent' tends to be the most researched area for search peeps.

And as far as patents/papers, I doubt there is really much out there specifically on intent beyond some papers such as; Recognizing Nepotistic Links on the Web Or Detecting Nepotistic Links by Language Model Disagreement and there is more in CJ’s post on detecting paid links – there are a whack of papers at the end of that one.

Ultimately though…

It’s all about the spam

Most of such valuations (intent) would be in the link spam world as this is where links are evaluated. From that point out, what is actually ‘intent’ is defined by the engines themselves. What I mean is that an algorithm can merely look for elements common to those manipulating the index via links, it is up to the engineers to decide what to do with pages/sites above a given threshold.

To get some ideas we can look at this Google patent; Document scoring based on link based criteria ...which has stuff such as;

Authority entities – one area that can be used to bypass the need to assess value/intent and the potential for link spam is that they will trust authority domains…

“It may be possible for search engine (125) to make exceptions for documents that are determined to be authoritative in some respect, such as government documents, web directories (e.g., Yahoo), and documents that have shown a relatively steady and high rank over time. For example, if an unusual spike in the number or rate of increase of links to an authoritative document occurs, then search engine 125 may consider such a document not to be spam and, thus, allow a relatively high or even no threshold for (growth of) its rank (over time).”

 

Temporal factors – Essentially via link velocity and decay anomalies can be used as a flag for closer inspection – in this case the ‘intent’ being to artificially inflate a link profile.

“A typical, "legitimate" document attracts back links slowly. A large spike in the quantity of back links may signal a topical phenomenon (e.g., the CDC web site may develop many links quickly after an outbreak, such as SARS), or signal attempts to spam a search engine” From; Historical ranking factors or more specifically; Spam detection using temporal factors.

You can also look at this Microsoft patent on temporal link spam detection; Do link spammers leave footprints

 

Anchor text anomalies – A document that has a non-natural rate of growth often has spikes of new backlinks with similar/identical link text associated with it. Documents that show such spikes over time can have the links capped or otherwise devalued.

“One reason for such spikiness may be the addition of a large number of identical anchors from many documents. Another possibility may be the addition of deliberately different anchors from a lot of documents.” Also from; the Link builders guide to historical ranking factors.

 

Document ranking – search engines may also look at historical ranking levels. This additional signal can be used to detect link intent when combined with other factors;

“(…) search engine 125 may monitor the ranks of documents over time to detect sudden spikes in the ranks of the documents. A spike may indicate either a topical phenomenon (e.g., a hot topic) or an attempt to spam search engine 125 by, for example, trading or purchasing links. Search engine 125 may take measures to prevent spam attempts by, for example, employing hysteresis to allow a rank to grow at a certain rate. In another implementation, the rank for a given document may be allowed a certain maximum threshold of growth over a predefined window of time.”

They also discuss doorway domains and name server freshness… that could be used to assess legitimacy of the inbound link… and ultimately, intent ( in co-currence with other signals).

And that’s just a few ways they can look for anomalies. What is important I there is no singular approach to assessing the intent of a given link per se. It is more about looking at median scoring of a variety of factors which can either trigger an algorithmic devaluation or raise a flag for closer (human inspection). From there, obviously human judgement becomes involved ( see the Google quality rater document that came out a while back).

Another interesting Google patent is; Method for detecting link spam in hyperlinked databases

 

What else?

 … some random thoughts…

 

Document relevance – can also be used to assess the intent of links in a profile as those above and given threshold can often mean there is hanky panky afoot. This speaks to intent as it can show the intent to manipulate.

 

Excessive reciprocals – another area is recips or even 2-3-4 ways links. If they establish the expressed intent of such link building approaches are in play, this can also trigger a closer look. By defining a ratio and set of thresholds, pages with a high level of reciprocation can be identified. Some recent stuff I covered from Yahoo sheds light on that end

Which is based from the patent; Identifying excessively reciprocal links among web entities

 

TrustRank/Harmonic Rank – we can also infer that these concepts can also be used to identify web spam and thus linking intent.

“(…to) demote those hits whose effective mass renders them likely to be artificially boosted by link-based spam. The determination of the effective mass for a given web document relies on a combination of techniques that in part assess the discrepancy between the link-based popularity (e.g., PageRank) and the trustworthiness (e.g., TrustRank) of a given web document.”

From; Link-based spam detection - More on Harmonic rank in; Yahoo’s Harmonic Rank

Host level spam detection – this is another area somewhat related to Trust/Harmonic Rank type approaches. Once more, there is an inherit ‘intent’ trying to be established, this time based on where a site/page lives as well as touching on TrustRank type concepts;

 

Page Segmentation – in this instance the links can be assessed by location. Let’s say there are some links in the sidebar/footer that have the text ‘Advertisers’ or ‘Sponsors’ etc… this wouldn’t be too difficult for an algorithm to detect and report back as potential manipulations… thus the ‘intent’ being to game the engines with paid link spam…  This method can also be used in context with other approaches mentioned already…

For more see; Page segmentation and link building and the SEO implications of page segmentation

 

An that's about what I have on the technical front…and if there ain’t enough here… you could always have a look at the AIR Web proceedings. Here’s some papers from the 2009 edition;

 

The human element

Once more, many times assessing intent is not as much an algorithmic activity as a human one, (as humans set the training documents and thresholds). For starters humans set the thresholds for link spam and secondly, humans review ‘complaints’ from the public. I can say that the recent case I was on about over at Greenpeace most certainly did have some ‘human’ interactions that changed the approach taken.

As you may notice, the original campaign was altered after the post was written… so there was a human element on my part as well as the engines/authors. Intent often comes down to what I call ‘plausible deniability''. This means you can’t state that links are in any way a payment or trade for anything of value. Sure, it can be implied, but one can’t overtly state it… ya know?

At the end of the day intent will be more of a subjective assessment algorithmically, or via human assessment or a combination of both. By programming thresholds the search engines can assign intent or send it up the line for human evaluation. There is no real ‘intent algorithm’ so to speak that I know of… merely ways of identifying link spam, which is the embodiment of malicious intent on the part of the webmaster/optimizer.

I know it may not be the ‘clear cut’ statements of how search engines look at ‘link intent’ but they really aren’t as concerned with ‘intent’ outside of the spam dept. By utilizing some of the methods mentioned above, in accordance with subjective judgements (TOS and guidelines) they assign intent accordingly…


For all those that were ever curious... I hope that helped :0)

 

 

Comments  

 
+1 # Wiep 2009-07-29 08:41
Great stuff, Dave! Interesting question, btw :P
Reply | Reply with quote | Quote
 
 
0 # Dave 2009-07-29 08:54
hehe.... yea, can I write long-ass responses or what mate? Time for me to start that SOSG support group!!
Reply | Reply with quote | Quote
 
 
0 # David Leonhardt 2009-07-30 08:05
This post should be required reading for those p[eople who want x number of links of x PageRank in x number of months and then stop building the links. The search engines just are not that stupid.
Reply | Reply with quote | Quote
 
 
0 # Gideon Rubin 2009-07-31 10:48
Good article Dave. I like the way you gave sources and stuck with the facts vs your assumptions.
Reply | Reply with quote | Quote
 
 
0 # Gabriella 2009-07-31 10:50
Well the word is in. Intent has become the latest word I have been curious about. My interest would lie in the combination of both. The human assessment and the subjective. I wonder how that will play out.

Do you think Bing is going to add their own take on any of this "do's & dont's" that Google is loved/hated for? Or are you still of the mindset that Bing isn't in the same ball park... yet?

As usual Dave you are very thorough, informational, and yes opinionated. But that's what we respect about you. :whistle:
Reply | Reply with quote | Quote
 
 
0 # Dave 2009-07-31 11:16
Hi David....hows things? That's actually an important area for sure - it is odd that we don't see much written about link velocity out there. I am always telling peeps that the battle is ongoing and one needs to be constantly building links as old ones decay....
Reply | Reply with quote | Quote
 
 
0 # Dave 2009-07-31 12:04
Gabs - well to be honest I am a big fan of the work Bing does in IR research and they are certainly a force to be reckoned with.

Furthermore, they are doing a great job with the webmaster blog addressing SEO issues - Google gets to play cop because they have the market share, if they ever lose some, maybe they'll play nicer 2...lol
Reply | Reply with quote | Quote
 
 
+1 # JR 2009-08-02 13:12
Excellent post and research, thanks!

I'm all for Bing as well, and I hope that they even out the playing field (though we all know it's a long shot), monopolies are never good, and it sucks that the big G can wipe us out in one fell swoop!
Reply | Reply with quote | Quote
 
 
0 # Victor Bargoff 2009-08-03 23:10
I wonder if anybody has an opinion on Google SERPs
manipulation.

See:
http://tinyurl.com/lzcyej

I noticed recently, the site: search number of articles
indexd jumped by up to 1000 times on my sites.

It started steadily declining since the beginning of May,
2009. Doing a daily check on a number of articles
indexed, all of a sudden, I noticed index jumping to
almost 1000 times. The more I tested various things,
the more I saw a consistency of behaviour.
Reply | Reply with quote | Quote
 
 
0 # dom 2009-08-04 00:24
Great post, I only wish i had it a month ago. I had a very important blog go from google page 1 to page 17 overnight.

I am now more sure why this happened.. Competition is wicked out there. I did a check of links on my blog and found out I had over 5000 backlinks on Yahoo, and over 2000 on Google all in a 30 day period. I believe my competition did a spam link building campaign on me, and made my blog look like it was link spaming.

In essence it was link spam, just not done by me. What can I do to fix this? Help anyone, do I go through the back link list and delete spam links?

thx/guys
Reply | Reply with quote | Quote
 
 
0 # purposeinc - dk 2009-08-04 02:37
Followed over here from twitter from aussie webmaster. Very nicely written. Could not have said it better. In other words, do what matt says. :-) :P
Reply | Reply with quote | Quote
 
 
0 # Radek Karban 2009-08-04 07:23
Interesting article, and I add next point that has a huge impact of discovering advertisment backlinks.
Every link shoud be placed for a longer time period, so dont use some random links, its useless. Also, dont put links in articles after several months, do it when its published, becouse search engines compare content in time and when links occur in it later, its a strange bahavior.
Reply | Reply with quote | Quote
 
 
0 # Nishan Khednah 2009-08-04 08:47
Great article dude. Informative and easy to read and take in. Thanks
Reply | Reply with quote | Quote
 
 
0 # Leadgenix 2009-08-04 11:24
This is incredibly helpful information! There's been a lot written about links but this is really in a league of its own, thanks for the great tips!
Reply | Reply with quote | Quote
 
 
0 # Arnie 2009-08-04 11:36
totally agree with David L..."This post should be required reading for those people who want x number of links of x PageRank in x number of months and then stop building the links. The search engines just are not that stupid."

It's still a dumb strategy even if you continue for years.
Reply | Reply with quote | Quote
 
 
0 # Dave 2009-08-04 11:36
@dom - it is REALLY hard to say what the problem may be to be honest. Generally speaking it is VERY hard to link spam a competitor. There are usually other factors involved when such a loss occurs. Feel free to get in touch and I can hopefully find some time to look into it deeper.

@purposeinc yea, Frank is a good guy and I appreciate the pimperatti action!! And hey, Matt is a pretty good dude as well.. He'd know a thing or ten about spam so sure, worth listening to. As would many others in the space. Try these vids on for size; http://www.huomah.com/Search-Engines/Search-Engine-Optimization/Listening-to-the-Spam-Assassins.html

@Radek well U make some interesting points, but I wouldn't be TOO worried about adding links to older content. What's important to understand is that a COMBINATION of items are what can get you profiled as potential spam. Consider this a best practices post and one want to avoid satisfying too many of the above elements. In moderation, you should be safe (theoretically)

@Nishan - thanks my fellow search warrior - glad U enjoyed
Reply | Reply with quote | Quote
 
 
0 # Terry Van Horne 2009-08-08 12:56
Dave, great piece. I think you covered pretty well especially the fact that these things start with a filter and end with a human making a decision.
@dom we've done some research in another group on negative SEO linking and it is pretty tough to do that. If you weren't buying links I'd say it's pretty hard to impossible to do what you were thinking was done. If you had bought links even then "it's not a given" it depends.. sorry... sworn to keep some stuff in the room so I can't elaborate.
Reply | Reply with quote | Quote
 
 
0 # Dave 2009-08-08 13:07
Hey Terry... good to see U as always! I think it is a VERY important distinction for folks to understand the 'turning the dials' is a very human element when it comes to valuations and many folks forget that. Heck, even the algos are written by humans... When they test precision and recall in IR it is always human input which leads to how the algos are ultimately used.

As for link spamming competitors, I'd also agree that it would be next to impossible to implement as many of the spam approaches seem to be tied together...meaning that in isolation the chance of getting spanked is tough. They tend to 'flag' things for closer inspection many times and one would need to satisfy more than one element to be deemed spam.

REALLY funny that you dropped in when U did as I was researching a post I am working on today and found a convo we had in the comments here on the trail back in Jan. - then poof... here U were...lol

Have a great weekend my fellow Fire Horse!!
Reply | Reply with quote | Quote
 
 
-2 # ccc 2009-09-01 06:48
wedding invitations (www.vponsale.com/invitations/)
wedding invitation (www.vponsale.com/invitations/)
Reply | Reply with quote | Quote
 

Add comment


Security code
Refresh

Search the Site

SEO Training

Tools of the Trade

Banner
Banner
Banner

On Twitter

Follow me on Twitter

Site Designed by Verve Developments.