SEO Blog - Internet marketing news and views  

Tale of the two PageRank Patents

Written by David Harry   
Tuesday, 11 September 2007 17:29

Ok.. Bill Slawski put a shout out for some back up with a couple of patents relating to PageRank.. I had a few minutes to spare and decided to jump in...... Apparently Google put up 2 patents on PageRank and Ol Bill was wondering if there were any major differences -- so why not?

See Bill's original Post for more -  New Stanford PageRank Patent

 Documents - Patents of discussion; Original Patent  - Follow Up Patent -

Let's see what we can find...... (read on)

I guess we can simply start from the beginning, Oui? Right away the ‘Abstract’ wording has been modified from;

“A method is presented for scoring documents stored in a network. The method includes identifying links from linking documents to linked documents in the network and determining an importance of the identified links. The method further includes weighting the identified links based on the determined importance and scoring the linked documents based on the weighted links.”

TO

“A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database. The rank assigned to a document is calculated from the ranks of documents citing it. In addition, the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document. ”

Nothing terribly exciting though the concept of ‘the probability that a browser through the database will randomly jump to the document’is an interesting addition.


Moving along….

The only difference in the ‘ Cited References ’ is the addition of the earlier patent to the list.

 

The ‘ Claims ’ section was all but bare in the original patent.. With only one that states;

‘A computer implemented method for scoring documents, at least some of the documents containing links to other ones of the documents, the method comprising: determining a probability that a searcher will access each of the documents after following a number of the links; and scoring each of the documents based on the determined probability.’

The second patent has 14 points that more closely define the parameters. It’s pretty standard fair though, such as;

“the importance rank of each of the backlinked web page documents is weighted in dependence upon the total number of links in the backlinked web page document”


There are a few references to ‘probability’ that seem to support the changes from the original ‘Abstract Changes’ ;

“wherein the matrix A is chosen so that an importance rank of a web page document is calculated, in part, from a constant .alpha. representing the probability that a surfer will randomly jump to the web page document. “

Outside of that nothing really jumps out at me.


Background of the invention -

Early on nothing was changed except this;

The well known idea of citation counting is a simple method for determining the importance of a document by counting its number of citations, or backlinks. The citation rank r(A) of a document which has n backlink pages is simply

Was changed to

The well known idea of citation counting is a simple method for determining the importance of a document by counting its number of citations, or backlinks. The citation rank r(A) of a document which has n backlink pages is simply r(A)=n.

Seems merely like an omission that was corrected.


Summary -

There are considerable changes in this area of the patent. The new additions are mostly computational additions furthering what was added in the ‘Claims’ section. From what I can tell they are looking to protect the patent with a tighter definition.

This paragraph was removed in the second incarnation;

One aspect of the present invention is directed to taking advantage of the linked structure of a database to assign a rank to each document in the database, where the document rank is a measure of the importance of a document. Rather than determining relevance only from the intrinsic content of a document, or from the anchor text of backlinks to the document, a method consistent with the invention determines importance from the extrinsic relationships between documents. Intuitively, a document should be important (regardless of its content) if it is highly cited by other documents. Not all citations, however, are necessarily of equal significance. A citation from an important document is more important than a citation from a relatively unimportant document. Thus, the importance of a page, and hence the rank assigned to it, should depend not just on the number of citations it has, but on the importance of the citing documents as well. This implies a recursive definition of rank: the rank of a document is a function of the ranks of the documents which cite it. The ranks of documents may be calculated by an iterative procedure on a linked database.

There is a small addition to the probability aspect with;

In addition, the importance rank of a node is calculated, in part, from a constant .alpha. representing the probability that a surfer will randomly jump to the node. The importance rank of a node can also be calculated, in part, from a measure of distances between the node and backlink nodes of the node. The initial N-dimensional vector p.sub.0 may be selected to represent a uniform probability distribution, or a non-uniform probability distribution which gives weight to a predetermined set of nodes.

Take from that what you will, nothing is jumping out at me really.

 


Detailed Description - Once again not much has really changed here except for the 4th paragraph which has been modified with some of the new computational models introduced. This seems to be a common theme at this point. Where the original;

where B.sub.1, . . . , B.sub.n are the backlink pages of A, r(B.sub.1), . . . , r(B.sub.n) are their ranks, .vertline.B.sub.1.vertline., . . . , .vertline.B.sub.n.vertline. are their numbers of forward links, and .alpha. is a constant in the interval [0,1], and N is the total number of pages in the web.

..type of model has been replaced with

function..alpha..alpha..times..function..function. ##EQU00001## where B.sub.1, . . . , B.sub.n are the backlink pages of A, r(B.sub.1), . . . , r(B.sub.n) are their ranks, |B.sub.1|, . . . , |B.sub.n| are their numbers of forward links, and .alpha. is a constant in the interval [0,1], and N is the total number of pages in the web.

After that there are some more minor computational additions/modifications, but nothing earth shaking that I can see.


This was also removed from the original version;

The present method of determining the rank of a document can also be used to enhance the display of documents. In particular, each link in a document can be annotated with an icon, text, or other indicator of the rank of the document that each link points to. Anyone viewing the document can then easily see the relative importance of various links in the document.


..and that’s all folks.

So, in the end I can’t see this being more than a tightening up of the definitions and additions of some minor edits. I don’t see any major changes to the original patent that would warrant much attention from where I am sitting.

Happy Trails Bill….. All is as it was in Google Land.

 

Comments  

 
0 # Aaron 2007-09-11 18:51
:zzz Welcome back! Long time no see!
Reply | Reply with quote | Quote
 
 
0 # The Gypsy 2007-09-11 19:40
Why thank you brother MadHat…. While my other companies kept me busy, much of it was simply a breather from the wacky world of search, publicly at least. Alas, ‘Twilight Zone’ SEO and other silliness has sent me back ranting. What can U do? I get so passionate about this stuff.

See if I can’t come up with something worthy of Friday tea time….
Reply | Reply with quote | Quote
 
 
0 # waveshoppe 2007-09-11 20:08
Ok Dave, what the hell is ‘Twilight Zone’ SEO?
Reply | Reply with quote | Quote
 
 
0 # The Gypsy 2007-09-11 20:16
Aahhh.. that's the useless bits like TBPR, Meta-KW tags, sandboxes right down to the completely scammy 'submit your site to a zilion search engines' crap that we know and love so well.

I said the other day I felt like I was in the Twilight Zone and it kinda stuk on me.... Basically the stuff from my SEO Myths and Bad SEO posts....

he he....
Reply | Reply with quote | Quote
 
 
0 # Bill 2007-09-12 06:13
I was surprised to see this update, but it looked more like a cleaning up exercise than an exposition of unknown and new attributes to pagerank.

The idea in the original of showing the pagerank of pages in results sets would have been interesting, but I wonder what affect it would have had if ever adopted by a search engine like Google.

Thanks very much for comparing the two, Dave.
Reply | Reply with quote | Quote
 
 
0 # The Gypsy 2007-09-12 07:07
Yeah, I was curious if there was a 'fast one' but it is mostly late additions and clarifications to pretect future claims from what I saw.

Enjoy the time off... leaving the cable behind could be a sign.... enjoy it.

Dave
Reply | Reply with quote | Quote
 
 
0 # f-lops-y 2007-09-12 20:09
Pity about the masses of smoke I eagerly cut mny way through there, but this

“wherein the matrix A is chosen so that an importance rank of a web page document is calculated, in part, from a constant .alpha. representing the probability that a surfer will randomly jump to the web page document. “


..was interesting. Appreciate your digging around there Dave, and thanks Bill for putting it out there first of all.
Reply | Reply with quote | Quote
 

Add comment


Security code
Refresh

Search the Site

SEO Training

Tools of the Trade

Banner
Banner
Banner

On Twitter

Follow me on Twitter

Site Designed by Verve Developments.