2009
08.20
08.20
最近 PageRank 計算的公式改變,已經無法用舊的 HASH 值取回正確的 PR ,會出現錯誤訊息

一般來講,網路上取回 PR 的程式都是算出一串 URL 後丟給 Google ,然後把解讀回傳值,會像是這樣:
http://www.google.com/search?client=navclient-auto&ch=6-9223293982612316911&features=Rank&q=info:roga.tw
可以拆解成幾個部份來看:
$site = 'http://www.example.com'; /* 你的 url */
$info = 'info:' . urldecode($site);
$checksum = $this->checksum($this->strord($info));
$url = "http://www.google.com/search?client=navclient-auto&ch=6{$checksum}&features=Rank&q={$info}";
接著把 $url 用 curl 或是 fopen 之類的方法取回即可。
裡面最重要的就是計算 checksum 這組 hash value 的方法,舊有的方法如下,我們可以看到幾個地方有 Magic Number ,像是上面 $url 字串中包含的 &ch=6 以及後面接 $checksum,以及下面程式碼中計算 checksum 的 $init = 0xE6359A60
/**
* Pagerank checksum hash emulator
*/
function checksum ($url, $length = null, $init = 0xE6359A60)
{
if (is_null($length))
{
$length = sizeof($url);
}
$a = $b = 0x9E3779B9;
$c = $init;
$k = 0;
$len = $length;
while($len >= 12)
{
$a += ( $url[$k+0] + ( $url[$k+1] << 8 ) + ( $url[$k+2] << 16 ) + ( $url[$k+3] << 24 ));
$b += ( $url[$k+4] + ( $url[$k+5] << 8 ) + ( $url[$k+6] << 16 ) + ( $url[$k+7] << 24 ));
$c += ( $url[$k+8] + ( $url[$k+9] << 8 ) + ( $url[$k+10] << 16 ) + ( $url[$k+11] << 24 ));
$mix = $this->mix($a,$b,$c);
$a = $mix[0]; $b = $mix[1]; $c = $mix[2];
$k += 12;
$len -= 12;
}
$c += $length;
switch($len)
{
case 11: $c += ($url[$k + 10] << 24);
case 10: $c += ($url[$k + 9] << 16);
case 9: $c += ($url[$k + 8] << 8);
case 8: $b += ($url[$k + 7] << 24);
case 7: $b += ($url[$k + 6] << 16);
case 6: $b += ($url[$k + 5] << 8);
case 5: $b += ($url[$k + 4]);
case 4: $a += ($url[$k + 3] << 24);
case 3: $a += ($url[$k + 2] << 16);
case 2: $a += ($url[$k + 1] << 8);
case 1: $a += ($url[$k + 0]);
}
$mix = $this->mix($a, $b, $c);
return $mix[2];
}
/**
* Converts number to int 32
* (Required for pagerank hash)
*/
function to_int_32 (&$x)
{
$z = hexdec(80000000);
$y = (int) $x;
if($y ==- $z && $x <- $z)
{
$y = (int) ((-1) * $x);
$y = (-1) * $y;
}
$x = $y;
}
/**
* Fills in zeros on a number
* (Required for pagerank hash)
*/
function zero_fill ($a, $b)
{
$z = hexdec(80000000);
if ($z & $a)
{
$a = ($a >> 1);
$a &= (~$z);
$a |= 0x40000000;
$a = ($a >> ($b - 1));
}
else
{
$a = ($a >> $b);
}
return $a;
}
/**
* Pagerank hash prerequisites
*/
function mix($a, $b, $c)
{
$a -= $b; $a -= $c; $this->to_int_32($a); $a = (int)($a ^ ($this->zero_fill($c,13)));
$b -= $c; $b -= $a; $this->to_int_32($b); $b = (int)($b ^ ($a<<8));
$c -= $a; $c -= $b; $this->to_int_32($c); $c = (int)($c ^ ($this->zero_fill($b,13)));
$a -= $b; $a -= $c; $this->to_int_32($a); $a = (int)($a ^ ($this->zero_fill($c,12)));
$b -= $c; $b -= $a; $this->to_int_32($b); $b = (int)($b ^ ($a<<16));
$c -= $a; $c -= $b; $this->to_int_32($c); $c = (int)($c ^ ($this->zero_fill($b,5)));
$a -= $b; $a -= $c; $this->to_int_32($a); $a = (int)($a ^ ($this->zero_fill($c,3)));
$b -= $c; $b -= $a; $this->to_int_32($b); $b = (int)($b ^ ($a<<10));
$c -= $a; $c -= $b; $this->to_int_32($c); $c = (int)($c ^ ($this->zero_fill($b,15)));
return array($a,$b,$c);
}
/**
* ASCII conversion of a string
*/
function strord($string)
{
for($i = 0; $i < strlen($string); $i++)
{
$result[$i] = ord($string{$i});
}
return $result;
}
/**
* Number formatting for use with pagerank hash
*/
function format_number ($number='', $divchar = ',', $divat = 3)
{
$decimals = '';
$formatted = '';
if (strstr($number, '.'))
{
$pieces = explode('.', $number);
$number = $pieces[0];
$decimals = '.' . $pieces[1];
}
else
{
$number = (string) $number;
}
if (strlen($number) <= $divat)
return $number;
$j = 0;
for ($i = strlen($number) - 1; $i >= 0; $i--)
{
if ($j == $divat)
{
$formatted = $divchar . $formatted;
$j = 0;
}
$formatted = $number[$i] . $formatted;
$j++;
}
return $formatted . $decimals;
}
我後來在 http://wpcn.googlecode.com 找到一個新的 Google PageRank checksum 算法,它是針對 WP 寫的,稍微改寫一下就可以使用了。
$info = urlencode("info:".$site);
$checksum = $this->CheckHash($this->HashURL($site));
$url = "http://www.google.com/search?client=navclient-auto&ch=$checksum&features=Rank&=$info"
最重要的 $checksum 計算方法如下:
//convert a string to a 32-bit integer
function StrToNum($Str, $Check, $Magic) {
$Int32Unit = 4294967296; // 2^32
$length = strlen($Str);
for ($i = 0; $i < $length; $i++) {
$Check *= $Magic;
//If the float is beyond the boundaries of integer (usually +/- 2.15e+9 = 2^31),
// the result of converting to integer is undefined
// refer to http://www.php.net/manual/en/language.types.integer.php
if ($Check >= $Int32Unit) {
$Check = ($Check - $Int32Unit * (int) ($Check / $Int32Unit));
//if the check less than -2^31
$Check = ($Check < -2147483648) ? ($Check + $Int32Unit) : $Check;
}
$Check += ord($Str{$i});
}
return $Check;
}
//genearate a hash for a url
function HashURL($String) {
$Check1 = $this->StrToNum($String, 0x1505, 0x21);
$Check2 = $this->StrToNum($String, 0, 0x1003F);
$Check1 >>= 2;
$Check1 = (($Check1 >> 4) & 0x3FFFFC0 ) | ($Check1 & 0x3F);
$Check1 = (($Check1 >> 4) & 0x3FFC00 ) | ($Check1 & 0x3FF);
$Check1 = (($Check1 >> 4) & 0x3C000 ) | ($Check1 & 0x3FFF);
$T1 = (((($Check1 & 0x3C0) << 4) | ($Check1 & 0x3C)) <<2 ) | ($Check2 & 0xF0F );
$T2 = (((($Check1 & 0xFFFFC000) << 4) | ($Check1 & 0x3C00)) << 0xA) | ($Check2 & 0xF0F0000 );
return ($T1 | $T2);
}
//genearate a checksum for the hash string
function CheckHash($Hashnum) {
$CheckByte = 0;
$Flag = 0;
$HashStr = sprintf('%u', $Hashnum) ;
$length = strlen($HashStr);
for ($i = $length - 1; $i >= 0; $i --) {
$Re = $HashStr{$i};
if (1 === ($Flag % 2)) {
$Re += $Re;
$Re = (int)($Re / 10) + ($Re % 10);
}
$CheckByte += $Re;
$Flag ++;
}
$CheckByte %= 10;
if (0 !== $CheckByte) {
$CheckByte = 10 - $CheckByte;
if (1 === ($Flag % 2) ) {
if (1 === ($CheckByte % 2)) {
$CheckByte += 9;
}
$CheckByte >>= 1;
}
}
return '7'.$CheckByte.$HashStr;
}
裡面一樣有 Magic Number ,像是 CheckHash() 的 return value。接著一樣把 $url 用 curl 或是 fopen 之類的方法取回即可。
唉,我還是不懂為什麼 Google 不提供取回 PR 的 API 。

對啊,直接提供一個讓大家丟網址過去他就吐數字回來的小工具不好嗎 Q_Q
roga大大,很抱歉在這邊打擾你,我是sitestates的使用者,因為在那邊po留言板一直失敗,所以在這邊向你求助
我前天開始登入一直失敗,在我輸入完帳號密碼按了送出,結果畫面就像甚麼都沒發生一樣,帳號密碼欄還是空白等著我輸入的狀態
我有試著去申請忘記密碼,用系統給的新帳號登入,可是情況也是一樣><
ps,請問有辦法刪除以前自己留言版的留言嗎?一點小小的隱私問題@@
您好,請試著清除瀏覽器的 Cookie 再試一次,謝謝您
roga大大,
我發現是瀏覽器的問題,因為我下載了GreenBrowser就沒問題了
如果以後有人跟我一樣是vista ie打不開的,
清cookie也沒用的話
可以叫他們下載這個瀏覽器試試看喔
謝謝你^^
非常抱歉呢!~ 謝謝你唷 ^^
roga大大,
我發現是瀏覽器的問題,因為我下載了GreenBrowser就沒問題了
如果以後有人跟我一樣是vista ie打不開的,
清cookie也沒用的話
可以叫他們下載這個瀏覽器試試看喔
謝謝你^^