If you are patient enough to read all this through, the results are pretty explosive if you are into SEO.
I have a way to predict the PageRank® offered by Google's Toolbar with a correlation of around 0.8 (R2=0.65)*. At the moment my working theory is that I may be able to predict Page Rank nearly 7 times out of 10 and should be able to predict within one PageRank® point over 95 times out of 100. I don't think anyone has published this data before, so consider this post the unofficial "Google PageRank® is blown" post, with the caveat that other SEOs will need to verify and maybe improve on this research.
Firstly a bit of background. I cannot use the actual PageRank® algorithm in any research, even though - with all the link data of MajesticSEO at my disposal - I might be able to recreate it. In the UK, you cannot patent a mathematical formula, but if we took this then used it in a commercial product that was available in the USA I might have some legal issues to contest with from Google. So this research does not use the PageRank® algorithm in any way. What it does do is correlate numbers generated in my test with the Google PageRank® generated by Google's toolbar for the same url to see if my numbers (which I am calling UV for "URL Value" or "Ultra Violet" because it lays the formula bare if you prefer) bear any resemblance whatsoever to Google's scale.
My (current) definition of "UValue" (UV)
"UValue is the number of referring domains that link to the home page URL (not the whole domain) which have, themselves, got links from more than one referring domain."
Now this definition is pretty convenient, because if you look at the way MajesticSEO defines ACRank, this means every link that has an ACRank of 3 or more is a candidate - except that you need to also only count one link for every referring domain. Fortunately, MajesticSEO lets me find this in the standard reports. Here's how:
1. Choose the URL you want to see the UValue for and put it into MajesticSEO.com
2. Buy the standard report for this url.
3. From the report's domain overview, select URL > Backlinks checking that the filter is set to return best backlink only per domain
4. Scroll through the list (or download CSV) to find out how many in this list are ACRank 3+ and use this number as the UValue. (Screenshot)
Please note that UV is NOT ACRank. It is a very different number, running from zero to many thousands depending on the URL.
In a vague attempt to recall some of the scientific credibility required for these things since I did my Maths degree over 20 years ago, I have a hypothesis. Luckily I also have a business partner who did his degree even longer back, but went on to get a doctorate and lectured at Iowa State and later at Cranfield University. Between us we hope not to over claim.
My hypothesis is that UV has a correlation with PageRank® (and therefore might be used as a predictor of PageRank®) when looking at home pages. I could expand the theory to inner pages, but for now decided to simply concentrate on home page research,
Now this is not too difficult to test. It is simply two sets of paired numbers. So it turns out that you really only need 25-30 pairs to test this hypothesis. We could repeat with hundreds or thousands of pairs, but it is unlikely our correlation would increase or decrease, just the degree of certainty that there is (or isn't) a correlation. I took all the home page listings from two directory categories in Dmoz as a sample - one from a "mobile phone" category and one from an "underwater photography" category. Using genuinely random domains might be a way to improve on the test for anyone wishing to try to emulate the research.
After eliminating sites which did not generate a 200 response code, this gave me a list of 42 sites ranging from Page Rank 1 to Page Rank 6. The UV runs from 46 to 1819.
Using a linear regression, we see some good evidence of a decent fit - with an R2 of 0.62 (A perfect fot for R2 would be 1):
But having a Doctor helping me with the stats means we can do better. We looked at the same data again at used Excel's wizardry to fit a Binomial distribution trend line to the data. Excel was able to do this with an R2 value (correlation) of 0.68. The graph below shows the data points.
However - even though this mathematically fits, I do not like the feel of the chart to be honest. I think that looking at the results, I can make a more intuitively logical prediction, mapping on the UValue range onto a prediction of PageRank® as follows:
IF UValue is Greater than Then PR is predicted to be
Below 30 Not Known
This table above predicted the correct PageRank® in my test 69% of the time (29 times out of 42) and predicts the answer within one over 95% of the time (40 out of 42).
I would love others to try to replicate this research and maybe make some modifications to improve it.
*PageRank is a concept owned in the US by Stanford University and Google and opinions expressed are my own.