Do you know where to find prime link real estate?
Brother Bill expertly covered (yet another) patent on page segmentation the other day; this time from Yahoo. It is certainly an area that SEOs really might want to be paying more attention to these days. Considering that each of the big three have dabbled into it (to varying degrees) over the last ½ decade, there is every reason to believe that something might be here (all that where there is smoke
Page segmentation has widely been developed for use in OCR applications to better understand text/image relationships. More recently, weve seen this area of expertise brought to the online world. One important aspect that should be of particular interest to those in the SEO world is in the link related ramifications.
Let us consider that Google, the king of link reliant search engines, would certainly find a huge benefit from this approach for valuating links (or even indexation decisions).
An interesting example is the recent update to the Google Blog search that fixed the issue of Google indexing blog roll links (problematic when using the link: command). As soon as that news broke I thought, Howd they do that I wonder? Obviously page segmentation came to mind.
Benefits of page segmentation
Now, as fastidious little detectives, the first thing we need to look for is motive. Why would a search engine want to do this? A few of the potential benefits include;
- Crawling/indexing resources once the template structure of a site is prioritized, infrequent crawling of, or discarding certain layers would save on computing and storage costs.
- Topic Drifting some pages will have more than one topic and search engines can struggle to properly index/categorize them.
- False positives it can also better deal with problems that arise when a page contains links/citations to other topics not directly related to the content of the page. This approach would help improve the quality of the results in avoiding such false positives.
- Spam Detection obviously understanding boilerplate and other inherently spammy elements would be an important part of the value and attraction.
- Paid links it can also could go a long way in fighting link spam and paid links even. By devaluing segments or altogether passing them over, it makes the practice less enticing and handicaps the activity.
These are but a few ways that page segmentation can be a useful tool for search engines. There are more, but I just wanted to get you up to speed (for more see last post on it). What were concerned with this time is how it has the potential to change the world of links as we know it
Links in the chain
What is most important is how these types of approaches would affect link valuations and by association, link building campaigns. There are a few distinct areas where link builders might want to consider that can affect strength of a given link location;
- Segment indexation one of the benefits of page segmentation is deciding which parts of the page should or should not be indexed. For example we could suppose that a side panel segment titled Our Sponsors may be passed over altogether when the page is indexed. As you might imagine, having a link in a part of the page that the search engine doesnt index, could be problematic.
- Topical diversity imagine if a page has multiple topics and multiple link types going to that page. If the search engine can establish which links are to which segments, some segments may be of more value than others.
- Segment location if search engines can understand the (boiler plate) template of a page as well as common segments, it would be easy to refine the values. We could find that various areas of the page would be valued more than others in a kind of PageRank dampening.
It should be noted that most of the patents/papers on the topic have gone far into the whole link valuation area, but it seems quite intuitive that this would be a valuable tool. This means there would likely be a premium on the editorial/content areas of a given web page.
One paper that does go into it is Microsofts; Block Level Link Analysis (PDF) where they discuss a Block Level PageRank, (BLPR) approach.
Some possible areas that a search engine might dampen the value of links are;
- Header/footer links easily definable in most templates.
- Navigation links internal site links in segments
- Blog rolls as mentioned before, this could already be in play.
- Advertiser/Supporter side bar links
- Forum signatures recurring template elements
- Blog comments sorry mister comment spammer :0(
- Social user pages profile pages etc.. common elements
- Directory/link pages very easy boilerplate to identify
As you can imagine, this would put a premium on links within the content of a page and varied degrees of devaluation for other placements.
Time to link ahead
this is all interesting, but is it real? There certainly does seem to be something going on with link valuations the last few years, but we cant be sure it has anything to do with segmentation. But I cant shake the feeling that this is an important discipline to be aware of. Another interesting tidbit that I can across, which further denotes Googles interest in some form of segmentation, was a comment in a post about HTMM;
) shifting topics within a document, and in so doing, provides a topic segmentation within the document, Google Research 2007
And they (partially) sponsored the; ICDAR2009 Page Segmentation Competition
Anyway, there is at least enough logic and anecdotal evidence to at do some testing and see what we find. It is likely a better route for SEOs these days than real time social search if you ask me. This is certainly something that well try and get some of the Dojo warriors working on, (Ill report back on that).
For now, it is definitely an area to watch for and consider when future proofing your link building efforts. And really, arent the editorial links what one is after anyway? And why is that? How does Google know which links are which?
. Maybe theyre further ahead on this than we know
time to break out some fashionable tin foil Id say
because where there is smoke
well, you know the rest.
If Block Level PageRank, (BLPR) isnt in your lexicon, maybe its time it was
Microsoft Granted Patent on Vision-Based Document Segmentation (VIPS) SEO by the Sea
Google and Document Segmentation Indexing for Local Search SEO by the Sea
Page segmentation; ignore at your own peril - fantomaster
VIPS a Vision-based Page Segmentation Algorithm Microsoft : Nov. 1 2003 : Deng Cai, Shipeng Yu, Ji-Rong Wen and Wei-Ying Ma
Block Based Web Search Microsoft research : 2004 : Deng Cai, Shipeng Yu2, Ji-Rong Wen, Wei-Ying Ma
Block-level Link Analysis : Microsoft : 2004 : Deng Cai, Xiaofei He, Ji-Rong Wen, Wei-Ying Ma
Learning Block Importance Models for Web Pages - Microsoft : 2004 -
Page-level Template Detection via Isotonic Smoothing : Yahoo : May 2007 : Deepayan Chakrabarti, Ravi Kumar, Kunal Punera
Also somewhat related is the webtables project which is similar to Google sets, but looks for data or layout elements via HTML etc
(Hat tip to Bill on that angle);Uncovering the relational Web - Google : WebTables: Exploring the Power of Tables on the Web - Google
Vision-based document segmentation Microsoft : filed; July 28, 2003 and awarded Sept.23 2008 : Wen; Ji-Rong (Beijing, CN), Yu; Shipeng (Beijing, CN), Cai; Deng (Beijing, CN), Ma; Wei-Ying (Beijing, CN) - and on SEO by the Sea
Method and system for calculating importance of a block within a display page Microsoft : filed Apr 2004 and assigned April 2008 : Ma; Wei-Ying (Beijing, CN), Wen; Ji-Rong (Beijing, CN), Song; Ruihua (Beijing, CN), Liu; Haifeng (Toronto, CA) : (great coverage from Bill as always)
Method and system for identifying object information : Microsoft: filed April 2005 and assigned June 2008 : Wen; Ji-Rong (Beijing, CN), Ma; Wei-Ying (Beijing, CN), Nie; Zaiqing (Beijing, CN)
Retrieval of structured documents Microsoft : filed Mar. 2006 and awarded sept 23 2008 : Wen; Ji-Rong (Beijing, CN), Cui; Hang (National University of Singapore, SG)
System and method for detecting a web page template Yahoo : filed May 2007 and assigned Nov 2008 : Chakrabarti; Deepayan; (Mountain View, CA) ; Punera; Kunal; (Austin, TX) ; Ravikumar; Shanmugasundaram; (Berkeley, CA)
System and method for smoothing hierarchical data using isotonic regression : Yahoo : filed May 2007 and assigned Nov 2008 : Chakrabarti; Deepayan; (Mountain View, CA) ; Punera; Kunal; (Austin, TX) ; Ravikumar; Shanmugasundaram; (Berkeley, CA)
Techniques for approximating the visual layout of a web page and determining the portion of the page containing the significant content Yahoo filed August 2006 and assigned Feb 2008 - Kesari; Anandsudhakar
Document segmentation based on visual gaps Google : filed Dec 30 2004 and assigned Sept 2008 : Daniel Egnor
Systems and methods for analyzing boilerplate Google : filed March 2004 and assigned Feb 2008 : Stephen R. Lawrence;