Microsoft on link spam; using temporal tracking
A recent Microsoft search patent came out for a system which detects spam websites by looking at the changes in link information on a given page/set of pages over time. We recently covered some potential ways of going about this with some analysis of Google in Historical ranking factors for link builders be sure to give that a read as well if youre in the mood to saunter down some related journeys. This time we have;
Detecting web spam from changes to links of websites
FiledDecember 14, 2006 : Published June 19, 2008
As we know from the last excursion, link activity over time can unlock potential spam signals for search engines to use. This can be done by looking at a variety of features of the associated link information. As with any analysis the system uses a probabilistic model to judge what is and is not considered to be a spammy link profile.
This is also not limited to inbound links, but to outbound links as well (because a link profile is more than just inbounds right?). The main problems that web spam creates for a search engine are the obvious lack of meaningful search results, but also bandwidth/spidering resources spent crawling/indexing spammy sites. So some yummy SERPs and good for the bottom line as well!!
Link spam temporal footprints
"Spamming" in general refers to a deliberate action taken to unjustifiably increase the popularity or importance of a web page or web site. In the case of link spamming, a spammer can manipulate links to unjustifiably increase the importance of a web page. For example, a spammer may increase a web page's hub score by adding out links to the spammer's web page.
Some examples of tactics link spammers may use also included;
- create a copy of an existing link directory to quickly create a very large out link structure.
- a spammer may provide a web page of useful information with hidden links to spam web pages.
- many web sites, such as blogs and web directories, allow visitors to post links. Spammers can post links to their spam web pages to directly or indirectly increase the importance of the spam web pages.
- a group of spammers may set up a link exchange mechanism in which their web sites point to each other to increase the importance of the web pages of the spammers' web sites.
By looking at the link profiles of spam sites the search engine can create a template of it's linking activity to enable further algorithmic seek and destroy adaptations
As with many probabilistic systems, a set of training documents/websites can be used to train valuations of a spammy link profile. These can come from inputted sites that received a manual review and were deemed to be a spam website. These become the base set used for teaching the algorithm(s) what look for when crawling.