Take a ride on the Relevance Train
If Content is King - then Relevance is the queen. We all know who really runs the Kingdom right?
One of the more interesting areas, for me at least, in the indexing and retrieval game is the never ending quest for more relevance. If it is studying the semantics or theme of a given document/web page or employing user metrics to create a probabilistic model, understanding how search engines look at relevance is important as well as interesting. What seems painfully obvious these days is that the ever popular methodologies based upon link relationships (backlinks) or simple keyword density centric approaches in your SEO programs just arent everything they once were.
There has been a distinct undertaking to understand relevance beyond simple link popularity approaches (such as PageRank). The link centric approach has contributed to a TON of the Link Spam out there, as well as having turned into a full blown war over tactics such as Paid Links within the SEO and Webmaster communities. While it's true that Web Spam is not the main reason why other methods for greater relevance are being explored, but they can certainly help in not only effecting more relevance, but combating web spam as well.
Getting onboard the search relevance train
There are more than a few ways that search engineers seem to have been looking at to establish more relevance in the search query results, here are a few that are of interest;
Semantic Relationships; these methods involve looking at the semantically related information on a given web page for a given search query. By looking at what other counts of related terms and phrases there are in a given document they can further establish which documents in a result set may be best suited to a given query. This can be areas such as;
- Semantic relationship of a given web page to the over-all site theme.
- Count of semantically related terms or concepts on the target document.
- Anchor text relevance from incoming links to a page.
- Concept relevance from the page where an incoming links resides.
The main idea is that a term such as German Shepherd has a completely different meaning than the sum of its parts German and Shepherd by looking at the surrounding terms and page themes ( dogs, pets, canine, training, police dog) a search engine can begin to decide what the probable meaning or connotation for these 2 words are when used together in a phrase.
Some areas worth looking into that include these concepts are; Phrase Based Indexing and Retrieval, Probabilistic Latent Semantic Indexing, Latent Dirichlet Allocation or Hidden Topic Markov Models
, in addition, Phrase Based Indexing and Retrieval can even be used to refine for personalized search and detecting Web Spam for further establishing relevance. These concepts can go a long way towards understanding concepts relating to semantics.
User Performance Metrics; Another area that relevance can be further refined is by actually tracking what users to when presented with a given set of results to a query. What searchers do and do not do can start to give a profile of relevance for not only a specific user (personalized search) but from a larger aggregate measure of users.
- Query revisions from the displayed results
- Bounce rates from chosen item in a set of results
They can refine query results based upon the performance data collected from a variety of avenues from IP tracking to logged-in users. By watching how people refine a search query or by which results are chosen with the best bounce rates, a search engine can further define the concepts and relevance of future document scoring.
Some recent Patents off hand that delve into these areas are; Method and apparatus for learning a probabilistic generative model for text And Ranking documents based on large data sets
Duplicate content; another less obvious way to improve the relevance and quality of results is to disregard duplicate content from the retrieval process. Obviously search results that contain a ton of duplicate entries arent going to create much in the way of end user confidence. A higher degree of relevance can be achieved by establishing more authoritative pages in a given set of query results.
Some situations that could cause a page to be re-ranked are;
- Websites with identical pages
- Scraped content
- Heavily distributed Articles
- Canonical issues (multiple page naming conventions)
- Printer friendly pages
- Boiler plate pages/sites
Interesting reading in this area; Detecting duplicate and near-duplicate files and Detecting query-specific duplicate documents - Methods and apparatus for estimating similarity
Authority; relevance is often further refined by looking at the authority scoring for documents in a given set of query results. Sometimes it could be authority for a given topic or concept and other times it can be establishing relevance through a localized authority site (using local search for example). Authority scoring can affect relevance for a given term, business location or category associated with the search query.
For some recent goodiness Bill Slawski had a great post about Determining Search Authority Pages
Based on the patents; Authoritative Document Identification And; Propagating useful information among related web pages, such as web pages of a website
All of these methods seek to refine relevance through probabilistic models, as the search engines, short of mind reading, can never truly tell what a specific searchers actual intent is. Further to that people tend to use search engines differently and trying to establish a catch-all solution simply doesnt work. So in the end, the search engines is stuck with trying to guess what you are after from past user data and a stronger system of indexing and classifying documents for future query results.
Making relevance work for you
So what does this mean to you the webmaster or SEO provider? It means that simply creating a web page with your keywords all over it and building links from the far reaches of the web, relevant or not, wont necessarily guarantee a prime ranking as we go forward in the world of search engine optimization. It also likely means more grief to keyword stuffers, link spammers and content generators and scrapers. The more layers of relevance filtering/re-ranking, the harder it will be for simplistic, redundant pages to rank for a given search query.
Here are some considerations in your SEO activities to maximize the effectiveness of your from the standpoint of maintaining relevance in the eyes of a search engines;
Site structure and page naming conventions; be sure to have Search Engine Friendly URLs that can not only be easily parsed by a search engine, but also add to the relevance train for the target page of your SEO campaign. Lets use my favourite blue widget example;
A directory structure such as this enables you plenty of room to add secondary and related terms into not only the URL but into the overall relevancy train since you can have interim pages further tighten up the targeting such as;
This now gives us 3 pages (not including the home page) to gradually tighten of the relevancy of the eventual target page (blue-widgets.html). Having these pages in our structure offers us many opportunities to strengthen the overall theme and relevance to our actual target page.
On-Page Optimization; what is important with your on-page work is that you forget about the days of Keyword Density and start looking to bolster the content with more related terms, phrases and even concepts. If I were targeting a term such as Picnic Basket I would look at some related phrases and concepts to use on the target page and any pages leading to it (as per site structure). Some examples for page/content creation could be;
vintage picnic basket - metal picnic basket - wicker picnic basket - new picnic basket - picnic basket set - find picnic gift baskets - buy antique picnic baskets - picnic basket for home and garden - picnic basket sets - wholesale picnic basket - gourmet picnic basket - picnic accessories - picnic adventure -
.. and so on
. I would give the content creation team (or article marketing team) a mega list of related concepts and themes to work with for content creation and distribution.
Further to that would be other concepts related to actual brand names, geographic location, activities and so on, depending on how granular I wanted to take it. The main idea is to build up related concepts to the core term that we are focussing on with the SEO program.
Titles; Also be sure to consider the above concepts when creating the page TITLE parameters for the target page and any others in the relevance chain as per the above Site structure and Page naming conventions. I like to describe it as theme building as it seems more sensible that way.
I would also mention at this point, the inherent risks involved with duplicate content. As we established it is another level of refinement towards greater relevance, it is a filter and not generally a penalty. It may as well be one if you are the one that gets refined out of the search results. There are 2 easily avoidable areas for this;
- Content Distribution; Try to avoid using the same content for distribution that you do for your site. The main risk is that a site with more authority than yours ( and equal or greater relevance) picks up your content and ultimately ranks it over you for a given term.
- Product Descriptions; If you are using the copy and paste method from you distributor for you product descriptions, the mother ship or an older distributor using the same descriptions, will often supersede you. Create your own (better targeted) descriptions
Link Profile Development when working on you link building be sure to evaluate the site, the page and the link text whenever possible. Certainly, many times we dont have total control over whom links to us, where on their site they link from and what link text they use, but a concerted effort on building relevancy where possible, will pay off in the end.
- Site themes; try to use not only websites that are perfectly targeted for the term your after but even look at sites that are in vertical markets. If I am promoting a website about German Sheppards, I would not simply look at sites about German Shepherds but also look at sites that are about Dogs Pets Pet Supplies Dog videos and so on. The search engine begins to establish through my backlinks that my page is not about Germany nor is it about Shepherds it is about the canine variety.
- Page concept; as with above try to vary conceptually related themes that your backlinks arrive upon. Obviously having links from pages that are about real estate arent going to be worth much in the targeting process. Further to that, the same theme concepts from point A also apply. A certain amount of variance should actually help establish a deeper relevance for your page.
- Link Text; once again, if I would want to have a certain number of links with my actual target term, but I would further develop the relevance factors of the link profile by having some semantically and conceptually related back links as well.
- Reciprocal Links; there has also been much ado over the last year about non-relevant reciprocal links. Some search engines frown upon them while others simply devalue them these days I would always ensure you have relevance in any recip linking that you participate in. It is also beneficial in my opinion for the visitor to your site, links going out from your site to non-relevant places, causes confusion.
Much like keyword stuffing and density, hammering away at the same link text on a zillion sites regardless of relevance, will not produce efficient results. You will need far less backlinks than usual to get ranked for a given term in your SEO program. By tightening up the over-all relevance of the link profile, you save work in the totality of backlinks needed in most cases.
Just Rolling Along
For anyone that spends any time what-so-ever studying search, one thing is clear there are many layers to the indexing/retrieval and ranking/re-ranking process. There is certainly a common element that runs through many of the methodologies relevance. Search engines are constantly striving to get better at it. It stands to reason that by paying attention to the concepts of achieving relevance we can attempt to create our websites in such a way as to maximize the effect on our efforts understanding relevance in SEO can be a great addition to the toolbox.
I have read in my travels that concepts I have related here the link relevancy in particular are a pile of pattoootie. There are those that dont believe in some type of semantic (no, not the LSI bandwagon) or phrase analysis in search engines. What they have negated to understand there are other scoring mechanisms at play. No one said any single factor was the only or primary aspect in ranking (we all know links are still huge right?). That is short sighted - I have found that when you use all of these concepts in a strong profile of relevance that it most certainly is effective.
As always, there is no magic bullet. Each search engine is going to give more weight to a given factor in the scoring process, (ranking). The main point is to understand some of the various methods search folks use to try and leverage relevance through this you will establish good habits in content/site creation and even linking programs.
ADDED Nov. 19 2007; A good read I came upon is the Economics Of Managing Relevance