最近 PageRank 計算的公式改變,已經無法用舊的 HASH 值取回正確的 PR ,會出現錯誤訊息
一般來講,網路上取回 PR 的程式都是算出一串 URL 後丟給 Google ,然後把解讀回傳值,會像是這樣:
http://www.google.com/search?client=navclient-auto&ch=6-9223293982612316911&features=Rank&q=info:roga.tw
可以拆解成幾個部份來看:
1 2 3 4 5 6 |
$site = 'http://www.example.com'; /* 你的 url */ $info = 'info:' . urldecode($site); $checksum = $this->checksum($this->strord($info)); $url = "http://www.google.com/search?client=navclient-auto&ch=6{$checksum}&features=Rank&q={$info}"; |
接著把 $url
用 curl 或是 fopen 之類的方法取回即可。
裡面最重要的就是計算 checksum 這組 hash value 的方法,舊有的方法如下,我們可以看到幾個地方有 Magic Number ,像是上面 $url 字串中包含的 &ch=6
以及後面接 $checksum
,以及下面程式碼中計算 checksum
的 $init = 0xE6359A60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
/** * Pagerank checksum hash emulator */ function checksum ($url, $length = null, $init = 0xE6359A60) { if (is_null($length)) { $length = sizeof($url); } $a = $b = 0x9E3779B9; $c = $init; $k = 0; $len = $length; while($len >= 12) { $a += ( $url[$k+0] + ( $url[$k+1] << 8 ) + ( $url[$k+2] << 16 ) + ( $url[$k+3] << 24 )); $b += ( $url[$k+4] + ( $url[$k+5] << 8 ) + ( $url[$k+6] << 16 ) + ( $url[$k+7] << 24 )); $c += ( $url[$k+8] + ( $url[$k+9] << 8 ) + ( $url[$k+10] << 16 ) + ( $url[$k+11] << 24 )); $mix = $this->mix($a,$b,$c); $a = $mix[0]; $b = $mix[1]; $c = $mix[2]; $k += 12; $len -= 12; } $c += $length; switch($len) { case 11: $c += ($url[$k + 10] << 24); case 10: $c += ($url[$k + 9] << 16); case 9: $c += ($url[$k + 8] << 8); case 8: $b += ($url[$k + 7] << 24); case 7: $b += ($url[$k + 6] << 16); case 6: $b += ($url[$k + 5] << 8); case 5: $b += ($url[$k + 4]); case 4: $a += ($url[$k + 3] << 24); case 3: $a += ($url[$k + 2] << 16); case 2: $a += ($url[$k + 1] << 8); case 1: $a += ($url[$k + 0]); } $mix = $this->mix($a, $b, $c); return $mix[2]; } /** * Converts number to int 32 * (Required for pagerank hash) */ function to_int_32 (&$x) { $z = hexdec(80000000); $y = (int) $x; if($y ==- $z && $x <- $z) { $y = (int) ((-1) * $x); $y = (-1) * $y; } $x = $y; } /** * Fills in zeros on a number * (Required for pagerank hash) */ function zero_fill ($a, $b) { $z = hexdec(80000000); if ($z & $a) { $a = ($a >> 1); $a &= (~$z); $a |= 0x40000000; $a = ($a >> ($b - 1)); } else { $a = ($a >> $b); } return $a; } /** * Pagerank hash prerequisites */ function mix($a, $b, $c) { $a -= $b; $a -= $c; $this->to_int_32($a); $a = (int)($a ^ ($this->zero_fill($c,13))); $b -= $c; $b -= $a; $this->to_int_32($b); $b = (int)($b ^ ($a<<8)); $c -= $a; $c -= $b; $this->to_int_32($c); $c = (int)($c ^ ($this->zero_fill($b,13))); $a -= $b; $a -= $c; $this->to_int_32($a); $a = (int)($a ^ ($this->zero_fill($c,12))); $b -= $c; $b -= $a; $this->to_int_32($b); $b = (int)($b ^ ($a<<16)); $c -= $a; $c -= $b; $this->to_int_32($c); $c = (int)($c ^ ($this->zero_fill($b,5))); $a -= $b; $a -= $c; $this->to_int_32($a); $a = (int)($a ^ ($this->zero_fill($c,3))); $b -= $c; $b -= $a; $this->to_int_32($b); $b = (int)($b ^ ($a<<10)); $c -= $a; $c -= $b; $this->to_int_32($c); $c = (int)($c ^ ($this->zero_fill($b,15))); return array($a,$b,$c); } /** * ASCII conversion of a string */ function strord($string) { for($i = 0; $i < strlen($string); $i++) { $result[$i] = ord($string{$i}); } return $result; } /** * Number formatting for use with pagerank hash */ function format_number ($number='', $divchar = ',', $divat = 3) { $decimals = ''; $formatted = ''; if (strstr($number, '.')) { $pieces = explode('.', $number); $number = $pieces[0]; $decimals = '.' . $pieces[1]; } else { $number = (string) $number; } if (strlen($number) <= $divat) return $number; $j = 0; for ($i = strlen($number) - 1; $i >= 0; $i--) { if ($j == $divat) { $formatted = $divchar . $formatted; $j = 0; } $formatted = $number[$i] . $formatted; $j++; } return $formatted . $decimals; } |
我後來在 http://wpcn.googlecode.com 找到一個新的 Google PageRank checksum 算法,它是針對 WP 寫的,稍微改寫一下就可以使用了。
1 2 3 4 5 |
$info = urlencode("info:".$site); $checksum = $this->CheckHash($this->HashURL($site)); $url = "http://www.google.com/search?client=navclient-auto&ch=$checksum&features=Rank&=$info" |
最重要的 $checksum
計算方法如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
//convert a string to a 32-bit integer function StrToNum($Str, $Check, $Magic) { $Int32Unit = 4294967296; // 2^32 $length = strlen($Str); for ($i = 0; $i < $length; $i++) { $Check *= $Magic; //If the float is beyond the boundaries of integer (usually +/- 2.15e+9 = 2^31), // the result of converting to integer is undefined // refer to http://www.php.net/manual/en/language.types.integer.php if ($Check >= $Int32Unit) { $Check = ($Check - $Int32Unit * (int) ($Check / $Int32Unit)); //if the check less than -2^31 $Check = ($Check < -2147483648) ? ($Check + $Int32Unit) : $Check; } $Check += ord($Str{$i}); } return $Check; } //genearate a hash for a url function HashURL($String) { $Check1 = $this->StrToNum($String, 0x1505, 0x21); $Check2 = $this->StrToNum($String, 0, 0x1003F); $Check1 >>= 2; $Check1 = (($Check1 >> 4) & 0x3FFFFC0 ) | ($Check1 & 0x3F); $Check1 = (($Check1 >> 4) & 0x3FFC00 ) | ($Check1 & 0x3FF); $Check1 = (($Check1 >> 4) & 0x3C000 ) | ($Check1 & 0x3FFF); $T1 = (((($Check1 & 0x3C0) << 4) | ($Check1 & 0x3C)) <<2 ) | ($Check2 & 0xF0F ); $T2 = (((($Check1 & 0xFFFFC000) << 4) | ($Check1 & 0x3C00)) << 0xA) | ($Check2 & 0xF0F0000 ); return ($T1 | $T2); } //genearate a checksum for the hash string function CheckHash($Hashnum) { $CheckByte = 0; $Flag = 0; $HashStr = sprintf('%u', $Hashnum) ; $length = strlen($HashStr); for ($i = $length - 1; $i >= 0; $i --) { $Re = $HashStr{$i}; if (1 === ($Flag % 2)) { $Re += $Re; $Re = (int)($Re / 10) + ($Re % 10); } $CheckByte += $Re; $Flag ++; } $CheckByte %= 10; if (0 !== $CheckByte) { $CheckByte = 10 - $CheckByte; if (1 === ($Flag % 2) ) { if (1 === ($CheckByte % 2)) { $CheckByte += 9; } $CheckByte >>= 1; } } return '7'.$CheckByte.$HashStr; } |
裡面一樣有 Magic Number ,像是 CheckHash() 的 return value。接著一樣把 $url
用 curl 或是 fopen 之類的方法取回即可。
唉,我還是不懂為什麼 Google 不提供取回 PR 的 API 。
roga大大,
我發現是瀏覽器的問題,因為我下載了GreenBrowser就沒問題了
如果以後有人跟我一樣是vista ie打不開的,
清cookie也沒用的話
可以叫他們下載這個瀏覽器試試看喔
謝謝你^^
roga大大,
我發現是瀏覽器的問題,因為我下載了GreenBrowser就沒問題了
如果以後有人跟我一樣是vista ie打不開的,
清cookie也沒用的話
可以叫他們下載這個瀏覽器試試看喔
謝謝你^^
非常抱歉呢!~ 謝謝你唷 ^^
roga大大,很抱歉在這邊打擾你,我是sitestates的使用者,因為在那邊po留言板一直失敗,所以在這邊向你求助
我前天開始登入一直失敗,在我輸入完帳號密碼按了送出,結果畫面就像甚麼都沒發生一樣,帳號密碼欄還是空白等著我輸入的狀態
我有試著去申請忘記密碼,用系統給的新帳號登入,可是情況也是一樣><
ps,請問有辦法刪除以前自己留言版的留言嗎?一點小小的隱私問題@@
您好,請試著清除瀏覽器的 Cookie 再試一次,謝謝您 🙂
對啊,直接提供一個讓大家丟網址過去他就吐數字回來的小工具不好嗎 Q_Q