|
Tuesday, 24 June 2008 |
Microsoft on link spam; using temporal trackingA recent Microsoft search patent came out for a system which detects spam websites by looking at the changes in link information on a given page/set of pages over time. We recently covered some potential ways of going about this with some analysis of Google in ‘Historical ranking factors for link builders’ – be sure to give that a read as well if you’re in the mood to saunter down some related journeys. This time we have;
Detecting web spam from changes to links of websites FiledDecember 14, 2006 : Published June 19, 2008 As we know from the last excursion, link activity over time can unlock potential spam signals for search engines to use. This can be done by looking at a variety of features of the associated link information. As with any analysis the system uses a probabilistic model to judge what is and is not considered to be a spammy link profile. This is also not limited to inbound links, but to outbound links as well (because a link profile is more than just inbounds right?). The main problems that web spam creates for a search engine are the obvious lack of meaningful search results, but also bandwidth/spidering resources spent crawling/indexing spammy sites. So some yummy SERPs and good for the bottom line as well!! Link spam temporal footprints“"Spamming" in general refers to a deliberate action taken to unjustifiably increase the popularity or importance of a web page or web site. In the case of link spamming, a spammer can manipulate links to unjustifiably increase the importance of a web page. For example, a spammer may increase a web page's hub score by adding out links to the spammer's web page.” Some examples of tactics link spammers may use also included;
- create a copy of an existing link directory to quickly create a very large out link structure.
- a spammer may provide a web page of useful information with hidden links to spam web pages.
- many web sites, such as blogs and web directories, allow visitors to post links. Spammers can post links to their spam web pages to directly or indirectly increase the importance of the spam web pages.
- a group of spammers may set up a link exchange mechanism in which their web sites point to each other to increase the importance of the web pages of the spammers' web sites.
By looking at the link profiles of spam sites the search engine can create a template of it's linking activity to enable further algorithmic seek and destroy adaptations As with many probabilistic systems, a set of training documents/websites can be used to train valuations of a spammy link profile. These can come from inputted sites that received a manual review and were deemed to be a spam website. These become the base set used for teaching the algorithm(s) what look for when crawling.
|
|
Read more...
|
|
|
Wednesday, 23 April 2008 |
|
Microsoft’s take on user behavioural data
A search related patent released by Microsoft the other day touches on a popular theme of late, (with me at least); user behaviour analysis. In simplest terms, they look at various interactions with the search results (SERPs) and listing pages, to try and determine the relevance of a set of results. This has been a common theme among the Big 3 as noted by these recent posts;
Google confirms using query analysis – use of past queries in the regular index Google on User Performance Metrics – trio of patents on the subject Yahoo Personalized PageRank – user behaviour and PageRank Yahoo on Personalized Networks – user annotations used to populate networks
..you get the idea...
While concepts utilizing user behaviour relating to AdServing have been around for while, there is increasing interest in ways to harness it within the main SERPs. That is beyond mere personal search implications – though they work best there.
The Microsoft approach
The patent; Search system using user behaviour data - Filed; December 2003 – Published; April 22 2008
|
|
Read more...
|
|
|
Thursday, 17 April 2008 |
|
It’s all about the Buddy System…Recently I have been engaging a few of my industry cohorts on the topic of social search engines and the future of search in general. In the coming weeks I will be publishing some of these discussions which for me lead to some hybrid of algorithmic, performance metric based and human enhanced ranking signals. One of the problems facing those that believe in a pure social powered search engine is the fact that over time, once the novelty wears off, many users will be less inclined to actively be involved and so the few (in the form of power users) would be creating indexes/SERPs for the many. One way of dealing with this (and spam) would be to have a form of personalized (trusted) network search approach… But how do you make creating networks and accessing ranking signal easier? Yahoo seems to have a plan.
In a patent I came across yesterday, I would seem the folks at Yahoo! are looking to address such shortcomings in a personalized/social search sphere. Systems and methods for establishing or maintaining a personalized trusted social network
|
|
Read more...
|
|
|
Friday, 11 April 2008 |
Personalized Search and User Performance Metrics - back in the news Recently Danny (Sullivan) mentioned Google and the use of performance metrics, which I thought worth discussing. In his keynote, wonderfully covered by Kalena, he was talking about his vision of Search 4.0 and how personalized search finds its way into the mix. Also of note, the other day Danny was reporting on the keynote interview and Marissa Mayer’s talk about ‘Previous Query’ ranking signals coming to the regular index search (not merely AdServing). Personalized search is built upon the concepts that your actions can teach a search engine to better predict your likes and dislikes. Taken on a larger scale, it stands to reason that ranking signals for the regular index can also be mined.
Now what is all this Gobble-D-gook and why do I care? Because it is important.
|
|
Read more...
|
|
|
Thursday, 03 April 2008 |
It’s all about getting along with your neighbours A while ago the fine folks at Yahoo! had a patent granted relating to Personalized PageRank, (or User Sensitive PageRank). At the time it was not only interesting but just fun from a ‘my PageRank is better than yours’ perspective. Today I ran into another Yahoo! patent that makes mention of said Personalized PageRank and another potential candidate for the lexicon; HarmonyRank. It is sort of like, how well do you play with others ;0) System and method for characterizing a web page using multiple anchor sets of web pages - filed Oct.2006; Published April 03 2008 “ An improved method for characterizing a web page using multiple anchor sets of web pages. “
Now as we have all learned from our search masters, have a look around right? Ok, seems these fellows have ALL also worked on; Anchor-based Proximity Measures (PDF) – 2007; which discusses both Personalized PageRank (PPR) and Harmonic Rank (HR - I prefer HarmonyRank) ...one author, Andrew Tomkins also worked on the Personalized PageRank patent. Also of interest, Amruta Joshi, (also an author on this one) did a PPT on ‘ Keyword generation for search engine advertising’ while at Stanford. ...at a glance... 
|
|
Read more...
|
|
|
|
<< Start < Prev 1 2 3 Next > End >>
|
| Results 1 - 9 of 22 |