SEO Blog - Internet marketing news and views  

How to Identify the Best Keywords for Your Niche

Written by David Harry   
Monday, 21 September 2009 12:14

(the following is a post from WordStream's - Tom Demers)

One of the most difficult aspects to effectively marketing online is "getting started". Whether you're starting out as an affiliate, new to an industry, or just launching a product it's extremely difficult to both predict the impact and effectiveness of a new campaign and to determine how best to attack a niche.

This post will aim to help a bit with both.

First, some required reading for those trying to get a new site off the ground, determine which affiliate niche is best for them, or launching a new product line or section of their site:

  • The Affiliate Newbies Guide to Finding Niches - Great post by Gab on identifying niches for affiliate marketing (as with a lot of strong content pieces, you can apply a lot of the theory and methodology to other things: like determining which "niche within a niche" is best to target).
  • How Predict the ROI on SEO - More from Gab, this time from his own blog. This is a great post on determining what sort of return you can expect from an SEO endeavor.
  • What is the Value of a Number 1 Google Ranking - Outstanding in-depth article on evaluating the value of a number one ranking by SEO Book's Aaron Wall. This is another example of something that can be tweaked and pushed out into uses other than just evaluating a number one ranking.

There are some great resources for both identifying which niche you would want to spend your time in, and how you would go about evaluating what you’d get from that niche. But what about once you’ve chosen a niche? What’s next?

Read more... [How to Identify the Best Keywords for Your Niche]

What you need to know about phrase based optimization

Written by David Harry   
Thursday, 27 August 2009 14:24

We’ve almost got the whole collection…woo hoo!

Ok sure, there are those that called me a total SOSG for getting excited about the patent awarded to Google yesterday, but there is goodiness for every SEO really. Granted it may be a bit geeky, but it is still somoething that needs more consideration in the SEO space... You see my friends, it has always been an odd thing that people in the business can go on and on about Google and LSI (total DUH…). I have seen nothing in Google research papers or patents that relate to it…

Understanding Phrase Based IR

On the other hand, we know of at least 9 patents on another semantic analysis approach… namely; Phrase Based IR


Read more... [What you need to know about phrase based optimization]

Optimize for the niche

Written by David Harry   
Wednesday, 26 August 2009 13:28

Why generalizations in SEO targeting just don't work

(the following is a guest post from - Jon Stephenson)


"The essence of focusing on a niche, finding the place the customer is going to be when they are going to want or need your product and be there to offer it to them"

The industry of SEO has grown and changed in many ways over the last few years, and its only continuing to change at a faster and faster pace. One of the areas that SEO has grown and changed the most is in the niche vertical markets. No longer is it good enough to optimize for "shoes" (244,000,000 results) you now must optimize for "womens Nike running shoes" (829,000 results), another example is my current SEO target of "blinds" (17,800,000 results) VS "faux wood blinds" (828,000 results). This isn’t a bad thing, it allows the users to get to the relevant information they want faster and allows the optimizer to build better content for their pages that focuses on a specific topic.

Niche targeting for SEO

We, the search marketers, have trained our customers to be more specific in how they search. When I started in search I could optimize a page for about 10 different keywords and be happy; it would rank well for all 10 keywords with no problem. Now I don’t optimize for anything less than a 3 word phrase and at max two very closely related phrases per page. The landing page that I work to have indexed gets right to the point and will lead the user straight to the content they are searching for.

I then work to guide them quickly to the goal for that page. If I don’t make it fast and obvious, even ranking in the top spot won't bring in the conversions I want, users will just bounce back to Google and find another listing.


Read more... [Optimize for the niche]

Are the search engines spying on SEOs?

Written by David Harry   
Monday, 10 August 2009 09:37

Finding link spam via search marketing Forums

OK, sure… the title is a bit egregious, but in many ways so is the patent that came out from the folks at Microsoft last week. As I was perusing my feed reader on Thursday I noticed a patent that DEFINITELY caught my attention (and I even laughed a bunch too..);

Forum Mining for Suspicious Link Spam Sites Detection - Microsoft - Filed; Feb. 06 2008 – Assigned; Aug. 06 2009

Enemy sent by Gates

Read more... [Are the search engines spying on SEOs?]

Context-Aware Query Classification

Written by Terry Van Horne   
Tuesday, 04 August 2009 08:42

Modeling Google's Florida Update and Universal Search

(the following is a guest post from Terry Van Horne)

A number of papers have come out of the SIGR conference and for me one of the most interesting is the paper from the Microsoft research team using Conditional Random Field (CRF) and contextual analysis (pdf) to determine query classification. The paper discusses some interesting uses for click data and query "labeling". I found this paper especially intriguing in that as I read through it I kept thinking I was reading a Google paper about Universal Search.

Once I do an overview of the paper I'll write a bit about what I believe is similar and in use on Google. I will be basing my conjecture... er... thesis in the belief that the Florida update did add a taxonomy to the algo for the purposes of classifying queries. Further to that the current Universal Search Algo is just the next step in refining the query classifications and that it was done in a similar fashion to what the Microsoft paper describes. These are observations as indicated by the current blended/Universal Search SERPs.

Universal Search

I'm going to first go through the highlights in the paper and try to give you the elevator version. I'm not as interested in formulas or IR geek stuff because, well, it doesn't really help me do what I do. The thing to keep in mind about papers are they are research they indicate problems that are trying to be solved and the methods that can be used to solve them. As a webmaster I want to be part of that solution by aiding the search engines when I'm building a site. Note aiding not manipulating... there's a difference.

The Microsoft team chose some interesting features of CRF and there is no use in me explaining when they summed it up pretty well...

To answer these questions, we propose to use the Conditional Random Field model (CRF for short) to help incorporate the search context information. We have several motivations for using this model. First, CRF is a sequential learning model which is particularly suitable for capturing the context information of queries. Second, the CRF model does not need any prior knowledge for the type of conditional distribution.

Finally, compared with Hidden Markov Models, the CRF model is more °exible to incorporate richer features, such as the knowledge of an externalWeb directory.

...the advantages of using CRF

"When we use the CRF to model a search context, one of the most important parts is to choose the effective feature functions. In this section, we introduce the features used for building a CRF model of the search context for QC. In general, the features can be divided into two categories. The features that do not rely on the context information are called local features, and those that are dependent on context information are called contextual features."

Three local features are query, pseudo feedback and implicit feedback. I'll try and describe the problem that the research is trying to solve in layman's terms.


Taxonomies src=


The query

Query is the starting point for refining the association between query term and classification. The reasons for the query needing the contextual element was summed up well in the paper:

The available training data are usually with limited size and could not cover a sufficient set of query terms that are useful for reflecting the association between queries and category labels.

Long story short, there are not enough query terms associated with category labels to be useful so it has to be refined by adding Psuedo and Implicit feedback.



Pseudo feedback

  Pseudofeedback is basically using a taxonomy from the web, for example, ODP or Yahoo! where the directory category results are mapped to a category in a target taxonomy and a Confidence score calculated. I will admit the Confidence score took me a bit to understand the math so here is an excerpt that IMO, explains the confidence score pretty well:

Finally, we calculate a general label confidence score: GConf(ct,qt)= Mct,qt/M
where Mct,qt means the number of returned related search results of qt whose category labels are ct after mapping. Intuitively, the GConf score reflects the confidence that qt is labeled as ct gained from pseudo feedback; the larger the score, the higher the confidence.

In a nutshell, a confidence score is attached to the query term based on how many search results matched the category term from the outside/target taxonomy. IMO, this does not have to be a web directory in fact for some queries an offline taxonomy might be a better choice. For example occupational/professional categories used by human resource organizations.



Implicit (user) Feedback:

An interesting aspect of this research was the use of logs from the web directory to refine the confidence score with user data. Logs reduced the chance of spam and does make this useful for a "compiled confidence score". Where user data like this gets tricky is some people will inflate clicks when it becomes common knowledge among marketers.

Direct Hit tried to use click data many years back. Lets say there were/are lots of rumors as to who may have been gaming it with scripting. That said for the purposes of smoothing the category terms this seems a reasonable solution. For those with the low forehead, this stuff is basically compiled so clicking on links in ODP is not going to affect your Google rankings! There I just saved some mod in a forum a few hrs. of time.

One debate within the SEO community is the use of bounce rate data in rankings. Taking the scenario above where "historical click data" is used to smooth the category terms. I can see where especially "seasonal" bounce rates could appear to be affecting a group of similar sites rankings, I would say it myth to say it is individual site bounce rates. IMO, in the scenario outlined above you would see similar sites drop at the same time as they would be in the same category term and affected equally.

IMO, Google universal search has this characteristic. To do that in the way it is talked about in the community is well... very expensive in computing terms. Just an opinion and not worth debate.

The third element of the Local features is to build a Vector Space Model. Here's a quote from the paper on it's purpose. This is extremely boring stuff but obviously interesting to any IR professional:

(...) we build a Vector Space Model (VSM) [25] for each category from its document collection and make the cosine similarity between the term vector of ct and the term vector of ut as CConf(ct, ut). The snippets of the web pages are used for generating the term vectors....If a user does not click on any URL for qt, or qt is the current query to be classified, this score cannot be calculated.

In layman terms they do some fancy math and build a model that has 1 flaw. If the user didn't click any sites in the Web results or the query term matches the category label then the final confidence score cannot be calculated.


The last part of this experiment is to implement the contextual features. Below are quotes I felt best summed up the contextual features.

Since there is no existing approach for query classification that takes into account the context information, we design a naive context-aware approach as the second baseline to further evaluate the modeling power of CRF in this problem.

Yada... Yada... no one has used the context of the search to smooth the query classification.

To use the context information, we consider some features that can reflect the association between adjacent category labels.

and further...

To reduce the bias of training data, besides considering the feature of direct association between adjacent labels, we also consider the structure of the taxonomy. Intuitively, the association between two sibling categories is stronger than that of two non-sibling categories.

and finally...

...the bridging classifier introduced by Shen et al in [27]. The idea of this approach is training a classifier on an intermediate taxonomy and then bridging the queries and the target taxonomy in the online step of QC. Experiments in [27] show this approach outperforms the wining approach in KDD Cup'05.

Lets take the query classifications we compiled and compare them to the search query and refine the query classifcation with each new search query. To some degree real time personalization.


The Microsoft team implemented a method that was surprising to me because they used 3 "human labelers" to test if search context affected the ability to classify queries.

A query's final label is voted by the three labelers. Since each query is associated with context information (except for the beginning queries of sessions) and real user clicks which can help determine the meaning or intent of the query, the consistency among the labelers is quite high. For more than 90% queries, the three labelers give the same labels. This is very different from the general query classification problem.


Outcome:This stuff works!


One of the benefits to having some interest in the real "science of SEO" or IR is not only that it fills the time I'd spend watching the idiot box, it gives me an insight into what I should be doing as an SEO. If Universal search isn't screaming "I like video, pictures, news, local locations and product feeds" then there's a seat waiting for you at the "home for washed up SEOs".

In fact text relevancy has less value then at any time I remember. The number of blue slots on the page for text/link manipulators is limited and has decreased by over 40% since I semi-retired in 2004. That in turn means links are less valuable and real marketing is where the game is being played. Mike Grehan has been alluding to this for over a year.

Product/shopping, News/blogs/articles, Local Search/Maps, video and pictures are a lot more about managing content development and marketing strategy than manipulating Search engine algos and chasing links. IMO, if you gotta' chase/beg links... you likely don't deserve them. My bias is out there for you all to know and chide me, so... let the fun begin.

So, first let me warn you this is pure conjecture I'm not an engineer at Google at the very most if I'm right it was part luck and part watching Google and SE SERPs for many years. The paper I attempted to walk you through is IMO, a model of the Florida update and possibly how Universal/blended search evolved.

User Sessions


Evolution of Universal Search

To some degree there are strong ties between the ODP (owned by AOL) and Google. It has long been discussed in some circles that the Florida update added a taxonomy or topics/classification to queries on Google. I saw it in anomalies which are, when you find them, like having a red flag on the results. I knew Google was updating the index whenever these showed up in one of my "legacy SERPs".

A legacy is one I have watched, forever, I know every site in the SERP and when new ones show up they get closely reviewed because that is always a clue as to what is working content wise or the "in vogue spam method" and to be truthful sometimes it's a fine line between them.

Query classifcation or QC is an interesting concept in that to do it algorythmically is difficult. Kind of like hidden text seems easy enough to stop but... at what cost? Too many false positives.

Query classification makes it so much harder to obtain top positions just by link text because all the keywords on the page are mapped to a classification and a classification to a query... hence, it becomes pretty tough for pages to place that don't have the term on the page. Sure it happens but are they terms that matter? It also takes a massive amount of links to do it. Feasible yes, plausible... sure if you got shit for brains and are bored stiff!

Classifications src=


The Human Touch

The human labeling in the Microsoft research was one peiece of the experiment that I didn't see the point in. Then I realized it was to smooth the web/target taxonomy query classifications.

The methodology in that was also interesting in that labelers would act like a person engaged in searching. The research used old Excite data. If memory serves me right Excite used to have data available for research on their site.

Google was on a hiring frenzy for a while and there were questions about what they were doing... labeling? Was Google employing labelers? The other day I sat in on a seminar with my old amigo Marshall D Simmonds, Greg Boser and Todd Malicoat. They came to sort of the same conclusion as myself. 1000's of people were QC labeling Google SERPs. Where the Google method differs is I think they also wanted a variety of media that can't really be ranked algorithmically. So Labelers also were slotting media.

Refining the QC provided by the web taxonomy is what IMO, Google Universal/Blended/Personalized Search is all about. When I look at the Google SERP, there's not ten links, there are ten slots, some times more. As a marketer and SEO I now have to consider media type and brand message when I am considering a marketing strategy for an audience, keyword term or promotion I want to position in the results. SEO is as much about PR and branding as it is links and onpage optimizing. That will be reflected in the "Socialization of SERPs" by social media.

Other resources;

Query type classification for web document retrieval
A unified and discriminative model for query refinement
Hourly analysis of a very large topically categorized web query log
Automatic Web Query Classification Using Labeled and Unlabeled Training Data
Improving Automatic Query Classification via Semi-supervised Learning
Context-Aware Query Suggestion by Mining Click-Through and Session Data
Towards Context-Aware Search by Learning A Very Large Variable Length Hidden Markov Model from Search Logs


About the author; Terry is an old school SEO geek that works out of International Website Builders and the founder of - You can also hook up with him on Twitter. I'd like to REALLY send a huge thanks out to Terry for not only producing a search geek worthy post, but also for being a smart and open minded guy. We didn't meet under the best of circumstances but our love of search marketing pulled us together in the end - I look forward to many years of great chats together!

<< Start < 1 2 3 5 7 8 9 10 > End >>

Page 5 of 14

Search the Site

SEO Training

Tools of the Trade


On Twitter

Follow me on Twitter

Site Designed by Verve Developments.