Another chat with Bill Slawski
Where is the future of Search headed? Will algorithmic, link based, search engines continue to rule the roost? Or will the Web 2.0 world produce a social search engine that takes the world by storm? What about a mix of both?
These are some of the questions that have been bantered about with a few of my friends in the SEO and SMM worlds over the last few months. As you might imagine, the views are as varied as the individuals I talk to. Many of my mates in the social media marketing world tend to believe that people are the answer to search relevance and the death of spam. I personally cant see a straight social search approach ultimately providing the most relevant results. Granted it is hard to say as there have been no widely-accepted/used social search engines and the premise requires a larger data set than we can see so far. So for me the jury is still out.
For me, search engines are always taking in implied or passive data, which in many ways is a form of social search engineering (behavioural targeting at very least).
Talking to the experts
I dont really do a lot of interviews here, but I do enjoy bringing conversations public and over the next few weeks shall be asking a few friends to talk about this with me here on the Trail. First up; technical search geek and fellow algo-holic Bill Slawski (we last chatted here)
Meta search madness
Dave; I notice that many search engines positioning themselves as social, are actually incorporating some type of meta-search element to them. While I am a fan of some meta-search engines over the years, I dont find them to be ultimately a social search engine but an amalgam. Have you noticed this? Are U a fan of meta/alternate search engines?
Bill; Im not a fan of meta search engines, but I do find social search engines interesting.
What we consider traditional web search, with a reliance on links indicating the popularity of pages, might be more social than we give often give it credit for in that it focuses upon the social element of linking. That view doesnt even consider the notion that todays major search engines may be considering the browsing and searching behaviours of people who use the Web.
There are a lot of tradeoffs in what a search engine attempts to do when it returns results based upon a query from a searcher. These are some examples:
Precision vs. Recall -- For example, a search engine should return as many documents as it can in response to a search, while also returning the most precise results. The more pages returned in a set of search results, the less precise and relevant those results become. A search engine will try to rank the most precise results first.
Relevance vs. Importance The most relevant result for a query might not be the most important one. For example, a single page, with just the query terms on it might just be the most relevant document a search engine could return to a searcher, but chances are that document wouldnt be the most important one it should show. You may see the terms query dependent and query independent used to describe ranking factors. A query dependent ranking factor is one that is relevant to the terms being searched for, while a query independent ranking factor attempts to look at the quality and importance of a potential search result. A search engine will try to return results to a searcher that are a balance of relevant and important.
Matching keywords vs. Matching intent When someone performs a search, they usually have some task in mind that they want to accomplish, whether its finding information, taking some kind of action, or arriving at some destination. But many words have more than one meaning, and instead of instead of showing a searcher pages that are the highest ranking pages in terms of both relevance and importance, a search engine might make efforts to provide a diversity of results while acknowledging that a searcher might be looking for a page that might not be using the most popular meanings of terms in their query.
Meta search engines attempt to blend results from more than one search engine, with their own unique algorithms that look to give extra weight to results that appear in more than one source. But they do this without understanding or probing more deeply into the balances and tradeoffs that the other search engines incorporate into the results that they display to searchers.
What is it? Defining a social search engine
Dave; Based on that - how would you define a social search engine and have you played with any you liked? I mean, everyone's fav Wiki says;
A social search engine is a type of search engine that determines the relevance of search results by considering the interactions or contributions of users.
Social search takes many forms, ranging from simple shared bookmarks or tagging of content with descriptive labels to more sophisticated approaches that combine human intelligence with computer algorithms
This does imply that the end users merely have an effect on the outcome (search results) and that algorithmic methods can also play a role. What to your mind are the makings of a social search engine?
Bill; A social search engine is one which uses people instead of programs to discover new pages or objects (i.e., audio, animations, images, videos) on the web, and associate them with meanings through labels, tags, annotations, stated interests of people choosing those pages and objects, and other indications of meaning or relevance. Importance might be decided by things such as numbers of votes and comments, reviews and ratings, frequency of votes and annotations.
So, instead of crawling programs, we have people finding pages and objects on the Web. Instead of query dependent ranking factors being used to determine relevance, we have the associations that people make through such things as tags and annotations with the things they find. Instead of query independent ranking factors determining how important results are, we look at the behaviour of people, and how important they might find what is discovered.
Many whitepapers that discuss how search engines work use the term nodes to describe pages or objects found on the Web. They also refer to the links between pages as edges or ties. If you look at the wikipedia entry for social networks, youll see people being referred to as nodes and the connections between people as ties.
One potential benefit of a social network is that people who share similar interests in some areas may share similar interests in other related areas, much like the way that a book seller may recommend some books to some people looking at specific books on the site based upon their past history of viewing, reviewing, rating and purchasing, and what others with similar interests might have looked at when looking at those books.
A social network also can provide people with the opportunity to find other people who may share similar interests, or different perspectives that might be interesting, and to build relationships with people.
A social search engine may include automated ways of discovering information on the web, and ranking pages, but the more automated the search engine, the less social it may become.
The social search engines that I like most are Stumbleupon and Flickr.
The drawbacks of social search
Dave; what immediate issues do you see with social search engines? At least ones that are largely, or entirely, based on user input?
Off the top of my head I came up with
- Algo lite; I have found that many of these types of search engines are algorithmically light as far as not only the rating systems, but the ability to judge relevance of an entry to another search query than its individual tags an so forth. In my playing around, I found too heavy a dependence on tags and votes, that diversity over a range of terms for a page/entry was weak. A better combination of algos and social seems needed.
- Ease of use; the larger masses tend to like things that are simple and easy to use. To me most people dont want an active roll in search. This means past the gee wizz phase, the bulk of the results would be the few editing for the many. The power users would begin to give an editorial slant to the process that is skewed
- Spam-ability; once again nothing stops web spammers from infesting them and at a certain point the above points come into play once more as people are turned off by having to be an active part of the process. They want the search engine to do the quality control.
- Gangs of NJ; obviously we also have the problem of voting gangs. Social search engines, should they ever gain popularity would be open to all kinds of shenanigans that would like need to be addressed editorially as well as algorithmically.
What are the drawbacks in your mind
Bill; I think youve defined some of the major drawbacks of social search engines quite well, so Im going to keep this answer short.
When people tag and annotate and comment and otherwise mark pages and videos and images, they often do so in a manner that indicates some personal relationship between them and the page or object that they are marking. For example, the tag toread doesnt describe the thing being tagged, but rather a future action that the tagger may take.
Social ranking signals
Dave; Ok, so what types of signals could one have for measuring social search outside of actual user interactions; do any kind of buzz metrics hold any value, if only as a weak signal in the rankings? Some have suggested to me that various social space benchmarks could make good ranking signals. Do you feel there are any passive signals worth valuing?
Bill; Its really difficult to look at small pieces of a social network to learn information about the whole. If you have access to information about data from a network that allows you to visualize in it meaningful ways, you might be able to learn about relationships between users, trends in interests, and evolution of the network over time.
A paper from a couple of years ago from Ben Shneiderman describes how data visualization can be used within an enterprise computing network in Discovering Business Intelligence Using Treemap Visualizations. See, for instance figure 7, which shows the 100 most popular songs at iTunes, grouped by genre.
Digg is also experimenting with data visualization, and an MIT technology review article tells us that their ability to visualize data has influenced their interface, the diversity of topics that appear on the front page of Digg, and future feature development. With a Digg API available, a number of possible applications can be developed like the Digg WordWeb .
It is possible to use a social network without data visualization to learn about people and their interests through a more human analysis, as described in Analytics - When What Becomes How. Its something that I recommend that a lot of people consider.
PageRank = popularity?
Dave; Back to the current world of search; if the core of many of the top engines algorithm are gauging popularity/value through backlinks, does that really account for end users that are active web surfers not webmasters? Does a PageRank, nodal random walk truly capture popularity? To me some form of analytics/performance metric could be considered.
Bill; One major idea behind PageRank is that it is a kind of peer review, based upon back links to pages, with links from more important pages carrying more weight than links from less important pages. The measure isnt so much one of popularity as it is an estimate of importance based upon the citations that you might see in academic journals. Since not all links are treated equally, it really isnt intended to be a measure of popularity.
One of the early PageRank papers, The PageRank Citation Ranking: Bringing Order to the Web mentions some problems in using PageRank to estimate actual traffic to pages, in a section titled Estimating Web Traffic. At least one experiment was performed to see how well PageRank compared to actual usage by people, and provide a couple of examples of where it actually doesnt. They do mention that further study of PageRank and actual usage data is an area that is worth exploring further, as well as telling us that, It may be possible to use usage data as a start vector for PageRank, and then iterate PageRank a few times.
Google is collecting a large amount of data about how people interact with the Web, including browsing, searching, bookmarking pages, and other activities. Its possible that information could be used along with, or in place of PageRank. The PageRank Citation paper also describes a personalized PageRank that could use information such as bookmarks and choice of homepage.
Some type of analytics /performance metric could possibly capture popularity, and even importance. Using that performance information coupled with the backlink information used in PageRank (as described in the paper) might be a good approach because it combines information about how people link and publish to the web, with information about how people travel around the Web.
(Dave also see recent coverage on Yahoos Personalized PageRank and HarmonicRank patents)
Is Personalized Search, social?
Dave; If it is Googles personalized search or Yahoos Personalized PageRank, do you see these as the algorithmic equivalent to a pure social search approach? Does it meet our definition? Does the user have to explicitly interact with a search engine to be giving a signal?
Bill; I think theres an element to personalized search from either search engine that has both beginning to act more like recommendation systems than the straightforward keyword matching that often expect from a search engine. One aspect of personalization that we might see develop to go with these recommendations are implicit profiles developed around sites, queries, and groupings of interests of individual. So, people who like baseball in Cincinnati, who are searching for queries that involve mortgage information, tend to be most interested in a certain grouping of sites that contain that information based upon the previous user activity of other people who like baseball in Cincinnati.
That user information could include sites discovered by the searcher through their browsing, bookmarking, and searching activity, through their rankings of sites and businesses, the alerts they set up, the places they choose to leave comments upon, the information that they provide about their interests in social networks, the relationship choices (friend, acquaintance, contact, dont know this person) they make (if available) in Orkut or Flickr or MyBlogLog or other places. Theres no doubt that elements of how personalized search work can have a social side, though most information that is used is implicit information, not explicitly collected for purposes of receiving personalized search results.
(Dave also see recent patent on Trusted Networks)
Dave; Looking beyond the personalized approaches, search engines are increasingly better at behavioural targeting (search and ad serving) as well as user performance metrics. Do you feel a more passive data collection and application works better for understanding the social nature of the web and equally combating web spam?
Bill; I think at this point, one of the issues that search engines face is that they have so much data; its difficult to figure out what to do with a lot of it. Rather than going into a lot of depth here, I think that people might be better served by watching a presentation from Yahoos Usama Fayyad on the subject (skip past the first eight minutes of introduction ). The kind of data mining that may make behavioral targeting more effective may also be helpful in identifying unusual behaviors and patterns that can help identify web spam.
Dave; Do you see a Google (or other engine) offering users a limited active role in the SERPs such as a thumbs-up/down mechanism, (beyond the toolbar one) on personalized Google account of your network of friends? I know they were playing with it in the testing ground; do you think it is a viable approach for extending the search offering into social networks?
Bill; Stumbleupon might be ahead of the major commercial search engines in this area with their Search Reviews feature which shows you information about sites that your friends have given a positive vote in searches on Google, Yahoo, live.com, Ask, and AOL. Users still have to vote sites up and down in Stumbleupon for their friends to see the results at those search engines, but I like the way they present information about votes. I suspect that other search engines will try to emulate what Stumbleupon provides in some manner.
Look into the future
Dave; Ok, if you had that Yahoo future searching happening (not the Google MATE), where do you see this headed? It is certain not to go away as the momentum of the social sphere expands. Will some maverick figure out how to best combine all the elements, or will one of the major search players break the code to using passive and active personalizations within current algorithmic schemes?
Bill; The Yahoo Future Search that you describe is from a patent application from Yahoo involving a decision making tool that can take information from different sources, such as news, and look at future scheduled events, organize them, and provide more information to make decisions with.
It would be really useful in answering this question to have a tool like that, but there are things that it cant help us with, such as the possibility that Microsoft might initiate a hostile takeover proceeding to acquire Yahoo.
Out of all of the major commercial search engines, Yahoo probably has the largest collection of user generated content and user data, not only from information about searching and browsing history, but also through many of the services they provide from Flickr to Yahoo Trip planner, from Geocities to MyBlogLog, and much more. I dont think that Yahoo Everything lists everything that they offer.
Google has a lot of momentum moving forward with personalized search, and a series of around 50 related search patent filings (many still unpublished) that would incorporate personalization with desktop search in a number of ways. If they added some elements of social search to that, I wouldnt be surprised.
I suspect that we might be surprised by a start up being developed by a couple of students from someplace like a dorm room at Stanford, or perhaps from people who learned a lot about search while working at a Google or Yahoo.
Dave; Once again Bill
thanks for the chat and letting others play fly on the wall
. Hope to do so again in the future
As for you my fair reader, be sure to get hooked up with Bill, my favorite SEO Guru . The next stop on this trail will likely be some perspective and insights into social search engines from some folks in the social media marketing community. Until then stay tuned
More Reading on Social and Alternate Search -
A Comparison; Social v. Algorithmic Search Alt Search Engines
Social Search Guide; 40 plus engines - Mashable
Are Social powered search engines the future of Search? Search Engine Journal
Social Search; the next step? Search Engine Roundtable
AfterVote tops Social Search Engines - ShoeMoney
Wikia looking to top Google - zDnet
Mahalo; Search Engine becomes social network - Mashable
Social Search; personalized results based on your network Technology Review
Alternative Search Engines;
Alternate Search Engines.com
Top 100 Alternate search engines Read Write Web
Top 10 Alternate Search engines - LifeHacker