How search engines consider link relevance
A good friend of mine sent me an interesting question a while back, Dear Dave, can you give me some pointers on how search engines determine link intent?
Well sheeeeit
its hard to say what constitutes intent when it comes to various aspects of linking and search engines. It really isn't that straight forward and understanding 'query intent' tends to be the most researched area for search peeps.
And as far as patents/papers, I doubt there is really much out there specifically on intent beyond some papers such as; Recognizing Nepotistic Links on the Web Or Detecting Nepotistic Links by Language Model Disagreement and there is more in CJs post on detecting paid links there are a whack of papers at the end of that one.
Ultimately though
Its all about the spam
Most of such valuations (intent) would be in the link spam world as this is where links are evaluated. From that point out, what is actually intent is defined by the engines themselves. What I mean is that an algorithm can merely look for elements common to those manipulating the index via links, it is up to the engineers to decide what to do with pages/sites above a given threshold.
To get some ideas we can look at this Google patent; Document scoring based on link based criteria ...which has stuff such as;
Authority entities one area that can be used to bypass the need to assess value/intent and the potential for link spam is that they will trust authority domains
It may be possible for search engine (125) to make exceptions for documents that are determined to be authoritative in some respect, such as government documents, web directories (e.g., Yahoo), and documents that have shown a relatively steady and high rank over time. For example, if an unusual spike in the number or rate of increase of links to an authoritative document occurs, then search engine 125 may consider such a document not to be spam and, thus, allow a relatively high or even no threshold for (growth of) its rank (over time).
Temporal factors Essentially via link velocity and decay anomalies can be used as a flag for closer inspection in this case the intent being to artificially inflate a link profile.
A typical, "legitimate" document attracts back links slowly. A large spike in the quantity of back links may signal a topical phenomenon (e.g., the CDC web site may develop many links quickly after an outbreak, such as SARS), or signal attempts to spam a search engine From; Historical ranking factors or more specifically; Spam detection using temporal factors.
You can also look at this Microsoft patent on temporal link spam detection; Do link spammers leave footprints
Anchor text anomalies A document that has a non-natural rate of growth often has spikes of new backlinks with similar/identical link text associated with it. Documents that show such spikes over time can have the links capped or otherwise devalued.
One reason for such spikiness may be the addition of a large number of identical anchors from many documents. Another possibility may be the addition of deliberately different anchors from a lot of documents. Also from; the Link builders guide to historical ranking factors.
Document ranking search engines may also look at historical ranking levels. This additional signal can be used to detect link intent when combined with other factors;
(
) search engine 125 may monitor the ranks of documents over time to detect sudden spikes in the ranks of the documents. A spike may indicate either a topical phenomenon (e.g., a hot topic) or an attempt to spam search engine 125 by, for example, trading or purchasing links. Search engine 125 may take measures to prevent spam attempts by, for example, employing hysteresis to allow a rank to grow at a certain rate. In another implementation, the rank for a given document may be allowed a certain maximum threshold of growth over a predefined window of time.
They also discuss doorway domains and name server freshness
that could be used to assess legitimacy of the inbound link
and ultimately, intent ( in co-currence with other signals).
And thats just a few ways they can look for anomalies. What is important I there is no singular approach to assessing the intent of a given link per se. It is more about looking at median scoring of a variety of factors which can either trigger an algorithmic devaluation or raise a flag for closer (human inspection). From there, obviously human judgement becomes involved ( see the Google quality rater document that came out a while back).
Another interesting Google patent is; Method for detecting link spam in hyperlinked databases
What else?
some random thoughts
Document relevance can also be used to assess the intent of links in a profile as those above and given threshold can often mean there is hanky panky afoot. This speaks to intent as it can show the intent to manipulate.
Excessive reciprocals another area is recips or even 2-3-4 ways links. If they establish the expressed intent of such link building approaches are in play, this can also trigger a closer look. By defining a ratio and set of thresholds, pages with a high level of reciprocation can be identified. Some recent stuff I covered from Yahoo sheds light on that end
Which is based from the patent; Identifying excessively reciprocal links among web entities
TrustRank/Harmonic Rank we can also infer that these concepts can also be used to identify web spam and thus linking intent.
(
to) demote those hits whose effective mass renders them likely to be artificially boosted by link-based spam. The determination of the effective mass for a given web document relies on a combination of techniques that in part assess the discrepancy between the link-based popularity (e.g., PageRank) and the trustworthiness (e.g., TrustRank) of a given web document.
From; Link-based spam detection - More on Harmonic rank in; Yahoos Harmonic Rank
Host level spam detection this is another area somewhat related to Trust/Harmonic Rank type approaches. Once more, there is an inherit intent trying to be established, this time based on where a site/page lives as well as touching on TrustRank type concepts;
Page Segmentation in this instance the links can be assessed by location. Lets say there are some links in the sidebar/footer that have the text Advertisers or Sponsors etc
this wouldnt be too difficult for an algorithm to detect and report back as potential manipulations
thus the intent being to game the engines with paid link spam
This method can also be used in context with other approaches mentioned already
For more see; Page segmentation and link building and the SEO implications of page segmentation
An that's about what I have on the technical front
and if there aint enough here
you could always have a look at the AIR Web proceedings. Heres some papers from the 2009 edition;
The human element
Once more, many times assessing intent is not as much an algorithmic activity as a human one, (as humans set the training documents and thresholds). For starters humans set the thresholds for link spam and secondly, humans review complaints from the public. I can say that the recent case I was on about over at Greenpeace most certainly did have some human interactions that changed the approach taken.
As you may notice, the original campaign was altered after the post was written
so there was a human element on my part as well as the engines/authors. Intent often comes down to what I call plausible deniability''. This means you cant state that links are in any way a payment or trade for anything of value. Sure, it can be implied, but one cant overtly state it
ya know?
At the end of the day intent will be more of a subjective assessment algorithmically, or via human assessment or a combination of both. By programming thresholds the search engines can assign intent or send it up the line for human evaluation. There is no real intent algorithm so to speak that I know of
merely ways of identifying link spam, which is the embodiment of malicious intent on the part of the webmaster/optimizer.
I know it may not be the clear cut statements of how search engines look at link intent but they really arent as concerned with intent outside of the spam dept. By utilizing some of the methods mentioned above, in accordance with subjective judgements (TOS and guidelines) they assign intent accordingly
For all those that were ever curious... I hope that helped :0)
|
Comments
Do you think Bing is going to add their own take on any of this "do's & dont's" that Google is loved/hated for? Or are you still of the mindset that Bing isn't in the same ball park... yet?
As usual Dave you are very thorough, informational, and yes opinionated. But that's what we respect about you. :whistle:
Furthermore, they are doing a great job with the webmaster blog addressing SEO issues - Google gets to play cop because they have the market share, if they ever lose some, maybe they'll play nicer 2...lol
I'm all for Bing as well, and I hope that they even out the playing field (though we all know it's a long shot), monopolies are never good, and it sucks that the big G can wipe us out in one fell swoop!
manipulation.
See:
http://tinyurl.com/lzcyej
I noticed recently, the site: search number of articles
indexd jumped by up to 1000 times on my sites.
It started steadily declining since the beginning of May,
2009. Doing a daily check on a number of articles
indexed, all of a sudden, I noticed index jumping to
almost 1000 times. The more I tested various things,
the more I saw a consistency of behaviour.
I am now more sure why this happened.. Competition is wicked out there. I did a check of links on my blog and found out I had over 5000 backlinks on Yahoo, and over 2000 on Google all in a 30 day period. I believe my competition did a spam link building campaign on me, and made my blog look like it was link spaming.
In essence it was link spam, just not done by me. What can I do to fix this? Help anyone, do I go through the back link list and delete spam links?
thx/guys
Every link shoud be placed for a longer time period, so dont use some random links, its useless. Also, dont put links in articles after several months, do it when its published, becouse search engines compare content in time and when links occur in it later, its a strange bahavior.
It's still a dumb strategy even if you continue for years.
@purposeinc yea, Frank is a good guy and I appreciate the pimperatti action!! And hey, Matt is a pretty good dude as well.. He'd know a thing or ten about spam so sure, worth listening to. As would many others in the space. Try these vids on for size; http://www.huomah.com/Search-Engines/Search-Engine-Optimization/Listening-to-the-Spam-Assassins.html
@Radek well U make some interesting points, but I wouldn't be TOO worried about adding links to older content. What's important to understand is that a COMBINATION of items are what can get you profiled as potential spam. Consider this a best practices post and one want to avoid satisfying too many of the above elements. In moderation, you should be safe (theoretically)
@Nishan - thanks my fellow search warrior - glad U enjoyed
@dom we've done some research in another group on negative SEO linking and it is pretty tough to do that. If you weren't buying links I'd say it's pretty hard to impossible to do what you were thinking was done. If you had bought links even then "it's not a given" it depends.. sorry... sworn to keep some stuff in the room so I can't elaborate.
As for link spamming competitors, I'd also agree that it would be next to impossible to implement as many of the spam approaches seem to be tied together...meaning that in isolation the chance of getting spanked is tough. They tend to 'flag' things for closer inspection many times and one would need to satisfy more than one element to be deemed spam.
REALLY funny that you dropped in when U did as I was researching a post I am working on today and found a convo we had in the comments here on the trail back in Jan. - then poof... here U were...lol
Have a great weekend my fellow Fire Horse!!
wedding invitation (www.vponsale.com/invitations/)
RSS feed for comments to this post