SEO Blog - Internet marketing news and views  

Microsoft search patents 2008

Written by David Harry   
Friday, 19 December 2008 13:51

Cleaning out my closet part III

Whew… one last round of patents and that’s it for my overflow. Last but not least is Microsoft. While they may have a small market share and often ignored by search geeks, they really do have some of the more interesting offerings patent wise.

Some of the more interesting tidbits here are some patents relating to usage of Latent Semantic Analysis (LSA) and some interesting publications relating to dealing with web spam. Actually, my fav is the one describing a system to spam themselves for testing purposes (I wonder if they point that bad boy at Google?)

I must say, there are a few here we might just dig into a little deeper in the new year. As with Google and Yahoo, page segmentation seems an area we should all look at deeper.

Microsoft patents 2008

And now I give you; MS patents of interest 2008

General Indexing and Retrieval

Ranking method using hyperlinks in blogs

A method for static ranking of web documents is disclosed. Search engines are typically configured such that search results having a higher PageRank.RTM. score are listed first. A modified scoring technique is provided whereby the score includes a reset vector that is biased toward web pages linked to blogs. This requires identifying web pages as either blogs or non-blogs.

Using link structure for suggesting related queries
An approach is provided for determining related queries for a given search query based on the linking structure of electronic documents within a document set. Document titles are used to represent potential search queries and links between the electronic documents are used to determine relationships between the potential search queries. As such, the document set may be represented as a directed graph in which document titles (which represent potential search queries) are nodes and links are edges between the nodes. When a particular search query is received, a corresponding node is identified and related queries are determined by identifying other nodes having connections with that node.

Finding Related Entities For Search Queries

Architecture for finding related entities for web search queries. An extraction component takes a document as input and outputs all the mentions (or occurrences) of named entities such as names of people, organizations, locations, and products in the document, as well as entity metadata. An indexing component takes a document identifier (docID) and the set of mentions of named entities and, stores and indexes the information for retrieval. A document-based search component takes a keyword query and returns the docIDs of the top documents matching with the query. A retrieval component takes a docID as input, accesses the information stored by the indexing component and returns the set of mentions of named entities in the document. This information is then passed to an entity scoring and thresholding component that computes an aggregate score of each entity and selects the entities to return to the user.

Term suggestion for multi-sense query

Systems and methods for related term suggestion are described. In one aspect, term clusters are generated as a function of calculated similarity of term vectors. Each term vector having been generated from search results associated with a set of high frequency of occurrence (FOO) historical queries previously submitted to a search engine. Responsive to receiving a term/phrase from an entity, the term/phrase is evaluated in view of terms/phrases in the term clusters to identify one or more related term suggestions.

Supervised rank aggregation based on rankings

A method and system for rank aggregation of entities based on supervised learning is provided. A rank aggregation system provides an order-based aggregation of rankings of entities by learning weights within an optimization framework for combining the rankings of the entities using labeled training data and the ordering of the individual rankings. The rank aggregation system is provided with multiple rankings of entities. The rank aggregation system is also provided with training data that indicates the relative ranking of pairs of entities. The rank aggregation system then learns weights for each of the ranking sources by attempting to optimize the difference between the relative rankings of pairs of entities using the weights and the relative rankings of pairs of entities of the training data.

Pseudo-Anchor Text Extraction for Vertical Search

A search method uses pseudo-anchor text associated with search objects to improve search performance. The pseudo-anchor text may be extracted in combination with an identifier of the search objects (such as a pseudo-URL) from a digital corpus such as a collection of documents. Pseudo-anchor texts for each object are preferably extracted from candidate anchor blocks using a machine learning based approach. The pseudo-anchor texts are made available for searching and used to help ranking the objects in a search result to improve search performance. Method may be used in vertical search of objects such as published articles, products and images that lack explicit URL and anchor text information.


Page Segmentation

Vision-based document segmentation

Vision-based document segmentation identifies one or more portions of semantic content of a document. The one or more portions are identified by identifying a plurality of visual blocks in the document, and detecting one or more separators between the visual blocks of the plurality of visual blocks. A content structure for the document is constructed based at least in part on the plurality of visual blocks and the one or more separators, and the content structure identifies the one or more portions of semantic content of the document. The content structure obtained using the vision-based document segmentation can optionally be used during document retrieval.

Retrieval of structured documents

This disclosure relates to performing a query for a search term of a database containing a plurality of structured documents. Those structured documents that do not include the search term are ferreted or filtered out during an initial search. Matched structured documents which are those structured documents that do contain the search term are evaluated by ranking the individual elements based on how well each individual element matches the search term, and indicating to the user the ranking of the individual elements wherein the individual elements can be accessed by the user.



Diverse Topic Phrase Extraction (using LSA)

Systems and methods for implementing diverse topic phrase extraction are disclosed. According to one implementation, multiple word candidate phrases are extracted from a corpus and weighed. One or more documents are re-weighed to identify less obvious candidate topics using latent semantic analysis (LSA). Phrase diversification is then used to remove redundancy and select informative and distinct topic phrases.

Synonym and similar word page search (more semantics)

A search tool enables users to search for synonyms of, and/or syntactically similar words to search terms that they enter. In at least some embodiments, the search tool is implemented in the context of a web browser for searching web pages. In some embodiments, search terms can be distinctly, visually highlighted on a page, such as a web page, to allow the user to easily identify words that have been found through the search. In at least some embodiments, color coding can be used to uniquely identify exact matches, synonyms and/or syntactically similar words that are identified on a page.




Using search trails to provide enhanced search interaction (behavioural)
It has been found that user navigation that follows search engine interactions provides implicit endorsement of resources (such as web resources) that are preferred by users, and which may be particularly valuable for exploratory search tasks. Thus, a combination of past searching and browsing user behavior is analyzed to identify additional information that augments search results delivered by a search engine. The additional information may include a display of hyperlinks to locations which are derived from the past searching and browsing user behavior, given a specific input query. The additional information may be provided to supplement web search results.

Personalization of web page search rankings
Methods and systems are provided for efficiently computing personalized rankings of web pages or other interconnected objects. The personalized rankings are produced by efficiently computing an approximation matrix to an ideal personalized page ranking matrix. The methods and systems provided herein can be used to produce search results with particular relevance to an individual searcher

Click through log mining (related peeps from above LSA)

Click-through log mining is described. Raw search click-through log data is processed to generate ordered query keywords, utilizing an algorithm to expand user-submitted keywords to include high frequency user queries, managing the keywords for a keyword expansion file, analyzing the algorithm performance on a bidding criteria, and identifying related phrases with similar page-click behaviors for advertisements.

Accounting for behavioral variability in web search
The concept of variability pertains to whether users exhibit consistent search interaction patterns, for example, in terms of interaction flow or information targeted. Methods are provided for analyzing variability, and then adapting search-related functionality (e.g., processes and/or interfaces) to account for variability characteristics, for example, to account for predictable search interaction behavior.

Automated analysis of user search behaviour
Automated analysis of user search behavior is provided. Data on user searches is maintained in a user search database. Relevance factors are determined for each search result included in a given search session where the relevance factors provide an indication of user satisfaction with particular search results included in the session. The relevance factors for each search result are analyzed by a relevance classification module for classifying each search result in terms of its relevance to an associated search query. The result of the relevance classification may assign a relevance classification and associated confidence level to each analyzed search result as to whether the search result is acceptable, unacceptable or partially acceptable relative to the search query that resulted in the search result. Relevance classifications for each analyzed search result may be stored for future use, for example, for diagnostic analysis of the operation of a given search mechanism.


Spam detection

Search Ranger System and Double-Funnel Model for Search Spam Analyses and Browser Protection (cloaking)
An exemplary method for defeating server-side click-through cloaking includes retrieving a search results page to set a browser variable, inserting a link to a page into the search results page and clicking through to the page using the inserted link. An exemplary method for investigating client-side cloaking includes providing script associated with a suspected spam URL, modifying the script to de-obfuscate the script and executing the modified script to reveal cloaking logic associated with the script. Other methods, systems, etc., are also disclosed.

Web spam page classification using query dependant data
A web spam page classifier is described that identifies web spam pages based on features of a search query and web page pair. The features can be extracted from training instances and a training algorithm can be employed to develop the classifier. Pages identified as web spam pages can be demoted and/or removed from a relevancy ranked list.

Graph structure and web spam detection

A SPAM detection system is provided. The system includes a graph clustering component to analyze web data. A link analysis component can be associated with the graph clustering component to facilitate SPAM detection in accordance with the web data.

Spam score propagation for web spam detection

A SPAM detection system is provided. The system includes a graph clustering component to analyze web data. A link analysis component can be associated with the graph clustering component to facilitate SPAM detection in accordance with the web data.

Active spam testing system (spamming themselves?)

A method and system for introducing spam into a search engine for testing purposes is provided. An active spam testing system receives from a tester a specification of spam that is to be introduced into the search engine for testing purposes. The testing system may then generate auxiliary data structures for storing indications of the spam that is to be introduced. A search engine has original data structures that may include a content index and a link data structure. The testing system stores the indications of the spam in the auxiliary data structures so that use of the search engine for non-testing purposes is not affected. When the search engine is used for testing purposes, the search engine generates search results based on a combination of the original data structures and the auxiliary data structures.


Historical factors

Keyword usage score based on frequent impulse and frequency rate (and covered by Bill)

A method and system for assessing keyword usage based on frequency of usage of the keywords during various periods is provided. A keyword usage measurement system is provided with the frequency of keywords during various periods. The measurement system then calculates a recent usage score for a keyword by combining a frequency impulse score for the keyword with a frequency weight for the keyword. The frequency impulse score for a keyword indicates whether a recent change in the frequency of the keyword has occurred. The frequency weight for a keyword indicates a recent measure of the frequency of the keyword.

Calculating importance of documents factoring historical importance.

A method and system for determining temporal importance of documents having links between documents based on a temporal analysis of the links is provided. A temporal ranking system collects link information or snapshots indicating the links between documents at various snapshot times. The temporal ranking system calculates a current temporal importance of a document by factoring in the current importance of the document derived from the current snapshot (i.e., with the latest snapshot time) and the historical importance of the document derived from the past snapshots. To calculate the current temporal importance of a web page, the temporal ranking system aggregates the importance of the web page for each snapshot.


Getting social

Differential pricing based on social network standing

The claimed subject matter provides a system and/or method that effectuates and facilitates the generation and provision of differential pricing policies based at least in part on the relative social network standing that a user might have with a potential purchaser. The disclosed system can include a component that receives data associated with a user, a good, or a service that the user lists for sale or barter in an online market place. The component can determine, based at least in part on the particular good or service, a differential pricing policy that can be associated therewith. The differential pricing policy can then be utilized to selectively provide differentiated prices to a purchaser based on a relative social network standing that can be established between the purchaser and the user.


Geographic related

Search query dominant location detection

A system and method for location-specific searching. The invention correctly identifies explicit and implicit locations in a search query, and provides an appropriate dominant location. Top search results are obtained and analyzed to determine which terms in the query often appear in combination, and the query is tokenized based on the analysis. An explicit location indicating a location intent is most likely treated as an individual token, and the explicit location is treated as the dominant location of the query. In the case of a false positive, wherein the explicit location in a query is not the location intent, the explicit location is likely to be present with other terms that provide context. A token will likely include these terms together. The explicit location will therefore not be used to generate location-specific results in the case of a false positive.

Detecting a user's location, local intent and travel intent from search queries

A search query history for a user is analyzed to determine a home location of the user. Subsequent search queries are analyzed to discern whether the search query contains local intent, meaning that the search query requests information having an area of geographic relevance. In cases where a search query has local intent, the area of geographic relevance for that search query is compared to the home location of the user to determine whether the search query suggests an intent to travel.


MS does TV too?

Media content search results ranked by popularity (TV content)

Media content search results ranked by popularity is described. In embodiment(s), a search request for television media content can be initiated by a viewer, and television media content that is relevant to the search request can be identified. The relevant television media content can then be ranked based on a popularity rating such that the relevant television media content can be displayed in an ordered list that is ordered by popularity rankings.


Next we’ll look at patents that were covered here on the trail and thoughts for 2009

Have a great weekend!!


Search the Site

SEO Training

Tools of the Trade


On Twitter

Follow me on Twitter

Site Designed by Verve Developments.