一致性雜湊演算法 PHP 實現

nongnong發表於2019-12-06

本文轉載於

 一致性雜湊演算法在1997年由麻省理工學院提出的一種分散式雜湊(DHT)實現演算法,設計目標是為了解決因特網中的熱點(Hot spot)問題,初衷和CARP十分類似。一致性雜湊修正了CARP使用的簡單雜湊演算法帶來的問題,使得分散式雜湊(DHT)可以在P2P環境中真正得到應用。

一致性hash演算法提出了在動態變化的Cache環境中,判定雜湊演算法好壞的四個定義:

1、平衡性(Balance):平衡性是指雜湊的結果能夠儘可能分佈到所有的緩衝中去,這樣可以使得所有的緩衝空間都得到利用。很多雜湊演算法都能夠滿足這一條件。

2、單調性(Monotonicity):單調性是指如果已經有一些內容通過雜湊分派到了相應的緩衝中,又有新的緩衝加入到系統中。雜湊的結果應能夠保證原有已分配的內容可以被對映到原有的或者新的緩衝中去,而不會被對映到舊的緩衝集合中的其他緩衝區。

3、分散性(Spread):在分散式環境中,終端有可能看不到所有的緩衝,而是隻能看到其中的一部分。當終端希望通過雜湊過程將內容對映到緩衝上時,由於不同終端所見的緩衝範圍有可能不同,從而導致雜湊的結果不一致,最終的結果是相同的內容被不同的終端對映到不同的緩衝區中。這種情況顯然是應該避免的,因為它導致相同內容被儲存到不同緩衝中去,降低了系統儲存的效率。分散性的定義就是上述情況發生的嚴重程度。好的雜湊演算法應能夠儘量避免不一致的情況發生,也就是儘量降低分散性。

4、負載(Load):負載問題實際上是從另一個角度看待分散性問題。既然不同的終端可能將相同的內容對映到不同的緩衝區中,那麼對於一個特定的緩衝區而言,也可能被不同的使用者對映為不同 的內容。與分散性一樣,這種情況也是應當避免的,因此好的雜湊演算法應能夠儘量降低緩衝的負荷。

  在分散式叢集中,對機器的新增刪除,或者機器故障後自動脫離叢集這些操作是分散式叢集管理最基本的功能。如果採用常用的 hash(object)%N 演算法,那麼在有機器新增或者刪除後,很多原有的資料就無法找到了,這樣嚴重的違反了單調性原則。接下來主要講解一下一致性雜湊演算法是如何設計的:

環形Hash空間
按照常用的hash演算法來將對應的key雜湊到一個具有 2^32次方 個桶的空間中,即0~(2^32)-1的數字空間中。現在我們可以將這些數字頭尾相連,想象成一個閉合的環形。如下圖

一致性雜湊演算法php實現

class Flexihash
{   
    /**
     * The number of positions to hash each target to.
     *
     * @var int
     * @comment 虛擬節點數,解決節點分佈不均的問題
     */
    private $_replicas = 64;

    /**
     * The hash algorithm, encapsulated in a Flexihash_Hasher implementation.
     * @var object Flexihash_Hasher
     * @comment 使用的hash方法 : md5,crc32
     */
    private $_hasher;

    /**
     * Internal counter for current number of targets.
     * @var int
     * @comment 節點記數器
     */
    private $_targetCount = 0;

    /**
     * Internal map of positions (hash outputs) to targets
     * @var array { position => target, ... }
     * @comment 位置對應節點,用於lookup中根據位置確定要訪問的節點
     */
    private $_positionToTarget = array();

    /**
     * Internal map of targets to lists of positions that target is hashed to.
     * @var array { target => [ position, position, ... ], ... }
     * @comment 節點對應位置,用於刪除節點
     */
    private $_targetToPositions = array();

    /**
     * Whether the internal map of positions to targets is already sorted.
     * @var boolean
     * @comment 是否已排序
     */
    private $_positionToTargetSorted = false;

    /**
     * Constructor
     * @param object $hasher Flexihash_Hasher
     * @param int $replicas Amount of positions to hash each target to.
     * @comment 建構函式,確定要使用的hash方法和需擬節點數,虛擬節點數越多,分佈越均勻,但程式的分散式運算越慢
     */
    public function __construct(Flexihash_Hasher $hasher = null, $replicas = null)
    {
        $this->_hasher = $hasher ? $hasher : new Flexihash_Crc32Hasher();
        if (!empty($replicas)) $this->_replicas = $replicas;
    }

    /**
     * Add a target.
     * @param string $target
     * @chainable
     * @comment 新增節點,根據虛擬節點數,將節點分佈到多個虛擬位置上
     */
    public function addTarget($target)
    {
        if (isset($this->_targetToPositions[$target]))
        {
            throw new Flexihash_Exception("Target '$target' already exists.");
        }

        $this->_targetToPositions[$target] = array();

        // hash the target into multiple positions
        for ($i = 0; $i < $this->_replicas; $i++)
        {
            $position = $this->_hasher->hash($target . $i);
            $this->_positionToTarget[$position] = $target; // lookup
            $this->_targetToPositions[$target] []= $position; // target removal
        }

        $this->_positionToTargetSorted = false;
        $this->_targetCount++;

        return $this;
    }

    /**
     * Add a list of targets.
     * @param array $targets
     * @chainable
     */
    public function addTargets($targets)
    {
        foreach ($targets as $target)
        {
            $this->addTarget($target);
        }

        return $this;
    }

    /**
     * Remove a target.
     * @param string $target
     * @chainable
     */
    public function removeTarget($target)
    {
        if (!isset($this->_targetToPositions[$target]))
        {
            throw new Flexihash_Exception("Target '$target' does not exist.");
        }

        foreach ($this->_targetToPositions[$target] as $position)
        {
            unset($this->_positionToTarget[$position]);
        }

        unset($this->_targetToPositions[$target]);

        $this->_targetCount--;

        return $this;
    }

    /**
     * A list of all potential targets
     * @return array
     */
    public function getAllTargets()
    {
        return array_keys($this->_targetToPositions);
    }

    /**
     * A list of all potential targets
     * @return array
     */
    public function getAll()
    {
        return array(
            "targers"=>$this->_positionToTarget, 
            "positions"=>$this->_targetToPositions);
    }

    /**
     * Looks up the target for the given resource.
     * @param string $resource
     * @return string
     */
    public function lookup($resource)
    {
        $targets = $this->lookupList($resource, 1);
        print_r($targets);die;
        if (empty($targets)) throw new Flexihash_Exception('No targets exist');
        return $targets[0]; //0表示返回離資源位置最近的機器節點
    }

    /**
     * Get a list of targets for the resource, in order of precedence.
     * Up to $requestedCount targets are returned, less if there are fewer in total.
     *
     * @param string $resource
     * @param int $requestedCount The length of the list to return
     * @return array List of targets
     * @comment 查詢當前的資源對應的節點,
     *          節點為空則返回空,節點只有一個則返回該節點,
     *          對當前資源進行hash,對所有的位置進行排序,在有序的位置列上尋找當前資源的位置
     *          當全部沒有找到的時候,將資源的位置確定為有序位置的第一個(形成一個環)
     *          返回所找到的節點
     */
    public function lookupList($resource, $requestedCount)
    {
        if (!$requestedCount)
            throw new Flexihash_Exception('Invalid count requested');

        // handle no targets
        if (empty($this->_positionToTarget))
            return array();

        // optimize single target
        if ($this->_targetCount == 1)
            return array_unique(array_values($this->_positionToTarget));

        // hash resource to a position
        $resourcePosition = $this->_hasher->hash($resource);

        $results = array();
        $collect = false;

        $this->_sortPositionTargets();
        print_r($this->_positionToTarget);die;

        // search values above the resourcePosition
        foreach ($this->_positionToTarget as $key => $value)
        {
            // start collecting targets after passing resource position
            if (!$collect && $key > $resourcePosition)
            {
                $collect = true;
            }

            // only collect the first instance of any target
            if ($collect && !in_array($value, $results))
            {
                $results []= $value;
                //var_dump($results);
            }
            // return when enough results, or list exhausted
            //var_dump(count($results));
            //var_dump($requestedCount);
            if (count($results) == $requestedCount || count($results) == $this->_targetCount)
            {
                return $results;
            }
        }

        // loop to start - search values below the resourcePosition
        foreach ($this->_positionToTarget as $key => $value)
        {
            if (!in_array($value, $results))
            {
                $results []= $value;
            }

            // return when enough results, or list exhausted
            if (count($results) == $requestedCount || count($results) == $this->_targetCount)
            {
                return $results;
            }
        }

        // return results after iterating through both "parts"
        return $results;
    }

    public function __toString()
    {
        return sprintf(
            '%s{targets:[%s]}',
            get_class($this),
            implode(',', $this->getAllTargets())
        );
    }

    // ----------------------------------------
    // private methods

    /**
     * Sorts the internal mapping (positions to targets) by position
     */
    private function _sortPositionTargets()
    {
        // sort by key (position) if not already
        if (!$this->_positionToTargetSorted)
        {
            ksort($this->_positionToTarget, SORT_REGULAR);
            $this->_positionToTargetSorted = true;
        }
    }

}

/**
 * Hashes given values into a sortable fixed size address space.
 *
 * @author Paul Annesley
 * @package Flexihash
 * @licence http://www.opensource.org/licenses/mit-license.php
 */
interface Flexihash_Hasher
{

    /**
     * Hashes the given string into a 32bit address space.
     *
     * Note that the output may be more than 32bits of raw data, for example
     * hexidecimal characters representing a 32bit value.
     *
     * The data must have 0xFFFFFFFF possible values, and be sortable by
     * PHP sort functions using SORT_REGULAR.
     *
     * @param string
     * @return mixed A sortable format with 0xFFFFFFFF possible values
     */
    public function hash($string);

}

/**
 * Uses CRC32 to hash a value into a signed 32bit int address space.
 * Under 32bit PHP this (safely) overflows into negatives ints.
 *
 * @author Paul Annesley
 * @package Flexihash
 * @licence http://www.opensource.org/licenses/mit-license.php
 */
class Flexihash_Crc32Hasher
    implements Flexihash_Hasher
{

    /* (non-phpdoc)
     * @see Flexihash_Hasher::hash()
     */
    public function hash($string)
    {
        return crc32($string);
    }

}

/**
 * Uses CRC32 to hash a value into a 32bit binary string data address space.
 *
 * @author Paul Annesley
 * @package Flexihash
 * @licence http://www.opensource.org/licenses/mit-license.php
 */
class Flexihash_Md5Hasher
    implements Flexihash_Hasher
{

    /* (non-phpdoc)
     * @see Flexihash_Hasher::hash()
     */
    public function hash($string)
    {
        return substr(md5($string), 0, 8); // 8 hexits = 32bit

        // 4 bytes of binary md5 data could also be used, but
        // performance seems to be the same.
    }

}

/**
 * An exception thrown by Flexihash.
 *
 * @author Paul Annesley
 * @package Flexihash
 * @licence http://www.opensource.org/licenses/mit-license.php
 */
class Flexihash_Exception extends Exception
{
}

$hash = new Flexihash();
$targets=array(
    "192.168.1.1:11011",
    "192.168.1.1:11012",
    "192.168.1.1:11013",
    "192.168.1.1:11014",
    "192.168.1.1:11015",
);
$hash->addTargets($targets);
for ($i=0; $i < 25; $i++) {
    $resource = sprintf("format %d",$i);
    var_dump($resource." --> ".$hash->lookup($resource));
}
本作品採用《CC 協議》,轉載必須註明作者和本文連結

相關文章