SEO Blog - Internet marketing news and views  

Understanding linking intent; the spam connection

Written by David Harry   
Wednesday, 29 July 2009 08:27

How search engines consider link relevance

A good friend of mine sent me an interesting question a while back, “Dear Dave, can you give me some pointers on how search engines determine link intent?”

Well sheeeeit… it’s hard to say what constitutes ‘intent’ when it comes to various aspects of linking and search engines. It really isn't that straight forward and understanding 'query intent' tends to be the most researched area for search peeps.

And as far as patents/papers, I doubt there is really much out there specifically on intent beyond some papers such as; Recognizing Nepotistic Links on the Web Or Detecting Nepotistic Links by Language Model Disagreement and there is more in CJ’s post on detecting paid links – there are a whack of papers at the end of that one.

Ultimately though…

It’s all about the spam

Most of such valuations (intent) would be in the link spam world as this is where links are evaluated. From that point out, what is actually ‘intent’ is defined by the engines themselves. What I mean is that an algorithm can merely look for elements common to those manipulating the index via links, it is up to the engineers to decide what to do with pages/sites above a given threshold.

To get some ideas we can look at this Google patent; Document scoring based on link based criteria ...which has stuff such as;

Authority entities – one area that can be used to bypass the need to assess value/intent and the potential for link spam is that they will trust authority domains…

“It may be possible for search engine (125) to make exceptions for documents that are determined to be authoritative in some respect, such as government documents, web directories (e.g., Yahoo), and documents that have shown a relatively steady and high rank over time. For example, if an unusual spike in the number or rate of increase of links to an authoritative document occurs, then search engine 125 may consider such a document not to be spam and, thus, allow a relatively high or even no threshold for (growth of) its rank (over time).”


Temporal factors – Essentially via link velocity and decay anomalies can be used as a flag for closer inspection – in this case the ‘intent’ being to artificially inflate a link profile.

“A typical, "legitimate" document attracts back links slowly. A large spike in the quantity of back links may signal a topical phenomenon (e.g., the CDC web site may develop many links quickly after an outbreak, such as SARS), or signal attempts to spam a search engine” From; Historical ranking factors or more specifically; Spam detection using temporal factors.

You can also look at this Microsoft patent on temporal link spam detection; Do link spammers leave footprints


Anchor text anomalies – A document that has a non-natural rate of growth often has spikes of new backlinks with similar/identical link text associated with it. Documents that show such spikes over time can have the links capped or otherwise devalued.

“One reason for such spikiness may be the addition of a large number of identical anchors from many documents. Another possibility may be the addition of deliberately different anchors from a lot of documents.” Also from; the Link builders guide to historical ranking factors.


Document ranking – search engines may also look at historical ranking levels. This additional signal can be used to detect link intent when combined with other factors;

“(…) search engine 125 may monitor the ranks of documents over time to detect sudden spikes in the ranks of the documents. A spike may indicate either a topical phenomenon (e.g., a hot topic) or an attempt to spam search engine 125 by, for example, trading or purchasing links. Search engine 125 may take measures to prevent spam attempts by, for example, employing hysteresis to allow a rank to grow at a certain rate. In another implementation, the rank for a given document may be allowed a certain maximum threshold of growth over a predefined window of time.”

They also discuss doorway domains and name server freshness… that could be used to assess legitimacy of the inbound link… and ultimately, intent ( in co-currence with other signals).

And that’s just a few ways they can look for anomalies. What is important I there is no singular approach to assessing the intent of a given link per se. It is more about looking at median scoring of a variety of factors which can either trigger an algorithmic devaluation or raise a flag for closer (human inspection). From there, obviously human judgement becomes involved ( see the Google quality rater document that came out a while back).

Another interesting Google patent is; Method for detecting link spam in hyperlinked databases


What else?

 … some random thoughts…


Document relevance – can also be used to assess the intent of links in a profile as those above and given threshold can often mean there is hanky panky afoot. This speaks to intent as it can show the intent to manipulate.


Excessive reciprocals – another area is recips or even 2-3-4 ways links. If they establish the expressed intent of such link building approaches are in play, this can also trigger a closer look. By defining a ratio and set of thresholds, pages with a high level of reciprocation can be identified. Some recent stuff I covered from Yahoo sheds light on that end

Which is based from the patent; Identifying excessively reciprocal links among web entities


TrustRank/Harmonic Rank – we can also infer that these concepts can also be used to identify web spam and thus linking intent.

“(…to) demote those hits whose effective mass renders them likely to be artificially boosted by link-based spam. The determination of the effective mass for a given web document relies on a combination of techniques that in part assess the discrepancy between the link-based popularity (e.g., PageRank) and the trustworthiness (e.g., TrustRank) of a given web document.”

From; Link-based spam detection - More on Harmonic rank in; Yahoo’s Harmonic Rank

Host level spam detection – this is another area somewhat related to Trust/Harmonic Rank type approaches. Once more, there is an inherit ‘intent’ trying to be established, this time based on where a site/page lives as well as touching on TrustRank type concepts;


Page Segmentation – in this instance the links can be assessed by location. Let’s say there are some links in the sidebar/footer that have the text ‘Advertisers’ or ‘Sponsors’ etc… this wouldn’t be too difficult for an algorithm to detect and report back as potential manipulations… thus the ‘intent’ being to game the engines with paid link spam…  This method can also be used in context with other approaches mentioned already…

For more see; Page segmentation and link building and the SEO implications of page segmentation


An that's about what I have on the technical front…and if there ain’t enough here… you could always have a look at the AIR Web proceedings. Here’s some papers from the 2009 edition;


The human element

Once more, many times assessing intent is not as much an algorithmic activity as a human one, (as humans set the training documents and thresholds). For starters humans set the thresholds for link spam and secondly, humans review ‘complaints’ from the public. I can say that the recent case I was on about over at Greenpeace most certainly did have some ‘human’ interactions that changed the approach taken.

As you may notice, the original campaign was altered after the post was written… so there was a human element on my part as well as the engines/authors. Intent often comes down to what I call ‘plausible deniability''. This means you can’t state that links are in any way a payment or trade for anything of value. Sure, it can be implied, but one can’t overtly state it… ya know?

At the end of the day intent will be more of a subjective assessment algorithmically, or via human assessment or a combination of both. By programming thresholds the search engines can assign intent or send it up the line for human evaluation. There is no real ‘intent algorithm’ so to speak that I know of… merely ways of identifying link spam, which is the embodiment of malicious intent on the part of the webmaster/optimizer.

I know it may not be the ‘clear cut’ statements of how search engines look at ‘link intent’ but they really aren’t as concerned with ‘intent’ outside of the spam dept. By utilizing some of the methods mentioned above, in accordance with subjective judgements (TOS and guidelines) they assign intent accordingly…

For all those that were ever curious... I hope that helped :0)



Search the Site

SEO Training

Tools of the Trade


On Twitter

Follow me on Twitter

Site Designed by Verve Developments.