Redis原理再學習04：資料結構-雜湊表hash表(dict字典)

九卷發表於2022-02-28

原文網址 : https://www.cnblogs.com/jiujuan/p/15944061.html

Redis資料結構

雜湊函式簡介

雜湊函式(hash function)，又叫雜湊函式，雜湊演算法。雜湊函式把資料“壓縮”成摘要，有的也叫”指紋“，它使資料量變小且資料格式大小也固定。

雜湊函式將資料打亂混合，重新建立一個雜湊值。

我們經常用到的對使用者登入密碼加密，比如 md5 演算法，其實就是一個雜湊函式。

value = hash_function(input_data)，value 這個計算出來的值是大小固定的。

md5("hashmd5") = 46BD4AA9F79D359530D3D873BAC6F3DC，32 位的 md5 值。

當然也有 16 位的 md5 值。

經過雜湊函式計算的雜湊值，會不會出現雜湊值相同情況？

當然會，這個就是雜湊值衝突。

所以一個好的雜湊函式就很重要，要儘量避免出現雜湊值衝突。

常用的雜湊演算法：md5，sha-1，sha-256，sha-512 等等。

雜湊表簡介

雜湊表可以有很多英文名稱，比如 hashtable，hashmap，symbol table，map 等等，英文名稱雖然不同，但是資料結構基本差不多。

在 map 中，就是一種對映關係。一般儲存 key:value 的鍵值對對映關係。

在雜湊表中，key 經過雜湊函式計算後儲存到雜湊表中，然後與 value 值關聯對應。

雜湊表的結構組成：陣列array + 連結串列list。是一個組合結構。

比如：key:value 值，陣列用來儲存 key 經過雜湊函式計算後的值與陣列長度取餘後的值，連結串列儲存 key:value 值。

如下圖：

上圖為什麼是 2 個 key:val 在一起？

其實這就是 hash 衝突了，用鏈地址表來解決雜湊衝突的問題。

Redis中的雜湊表和字典dict

1. 雜湊表各結構定義

雜湊表dictht

redis3.0 中的雜湊表叫 dictht,dictht 的定義：

// https://github.com/redis/redis/blob/3.0/src/dict.h#L69

/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
typedef struct dictht { // 雜湊表
    dictEntry **table; // 雜湊表的陣列，陣列中每個元素都是指標，指向 dictEntry 結構
    unsigned long size; // 雜湊表的大小，table 陣列的大小
    unsigned long sizemask; // 雜湊表掩碼，用於計算索引值，等於 size-1
    unsigned long used; // 雜湊表已有的節點(鍵值對)數量
} dictht;

雜湊表節點dictEntry

雜湊表節點，有的地方取名為雜湊桶 bucket，節點 Node 等等，不過表達意思是一樣的。

上面 redis3.0 雜湊表 dictht 裡的節點 dictEntry 是怎麼定義？程式碼如下：

// https://github.com/redis/redis/blob/3.0/src/dict.h#L47
typedef struct dictEntry {
    void *key;  // 鍵 key
    union { // 值 val
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next; // 指向下一個雜湊表節點，連結串列法解決hash衝突
} dictEntry;

key 屬性儲存鍵值對中的鍵，v 屬性儲存鍵值對中的值，其中這個 v 值可能是一個指標，或者是一個 uint64_t 整數，或者是 int64_t 整數，或是 double 型別浮點數。

dictEnty 表節點和 dictht 雜湊表結構關係如下圖：

next：指向下一個雜湊節點，用連結串列法來解決雜湊衝突。

hash衝突：

上面的 dictEntry 結構裡的屬性 next 就是解決這個雜湊鍵衝突問題的。

有衝突的值，就用連結串列來記錄下一個值。

雜湊演算法

Redis 中計算雜湊值的雜湊函式有好幾個。

dictIntHashFunction 計算整型型別雜湊值的雜湊函式
```
unsigned int dictIntHashFunction(unsigned int key)
```
dictGenHashFunction MurmurHash2 雜湊演算法, by Austin Appleby，用於計算字串的雜湊值的雜湊函式
```
unsigned int dictGenHashFunction(const void *key, int len)
```

dictGenCaseHashFunction djb 雜湊演算法，大小寫敏感的雜湊函式

/* And a case insensitive hash function (based on djb hash) */
unsigned int dictGenCaseHashFunction(const unsigned char *buf, int len)

2. 字典dict

字典dict

上面我們已經瞭解，在 Redis 中用 dictht 來表示雜湊表，但是，在使用雜湊表時，Redis 又定義了一個字典 dict 的資料結構。

為什麼要再定義一個 dict 結構？

為了擴充套件雜湊表(rehash)的時候，能夠方面的操作雜湊表。為此裡面定義了 2 個雜湊表 ht[2]。

字典 dict.h/dict 結構定義：

typedef struct dict {
    dictType *type; // 指標，指向dictType 結構，dictType 中包含很多自定義函式，見下面
    void *privdata; // 私有資料，儲存dictType結構中的函式引數
    dictht ht[2]; // hash表，ht[2] 表示有2張表
    long rehashidx; /* rehashing not in progress if rehashidx == -1 *///rehash 標識，rehashidx=-1，沒進行rehash
    int iterators; /* number of iterators currently running */// 正在執行的迭代器數量
} dict;

*type：儲存了很多函式，這些函式是操作特定型別鍵值對的函式，Redis 會為用途不同的字典設定不同型別特定函式。

ht[2]：包含 2 個 dictht雜湊表，為什麼有2張表？rehash 時會用到 ht[1]。一般情況下只使用 ht[0]。

rehashidx：這個屬性與 rehash 有關，記錄 rehash 目前的進度，如果目前沒有進行 rehash，那麼 rehashidx=-1。

dict.h/dictType 結構：

typedef struct dictType {
    unsigned int (*hashFunction)(const void *key); // 計算雜湊值的函式
    void *(*keyDup)(void *privdata, const void *key);// 複製鍵的函式
    void *(*valDup)(void *privdata, const void *obj); // 複製值函式
    int (*keyCompare)(void *privdata, const void *key1, const void *key2); // 對比鍵的函式
    void (*keyDestructor)(void *privdata, void *key); // 銷燬鍵的函式
    void (*valDestructor)(void *privdata, void *obj); // 銷燬值的函式
} dictType;

字典 dict 圖示：

3. rehash

a. 什麼是 rehash ？

擴大或縮小雜湊表容量。

b. 為什麼有 rehash ？

當雜湊表的資料量持續增長，而雜湊表容量大小固定時，就可能會有 2 個或以上數量的鍵被分配到雜湊表陣列的同一個索引上，於是就發生了衝突(collision)。
當然衝突可以用連結串列法(separate chaining)解決，但是為了雜湊表的效能，要儘量避免衝突，就要對雜湊表進行擴容或縮容。

雜湊表中有一個負載因子(load factor)的概念:

負載因子 = 雜湊表已儲存的鍵值對數量(使用的數量) / 雜湊表的長度

load_factor = ht[0].used / ht[0].size

這個負載因子的概念是用來衡量雜湊表容量大小情況的。雜湊表中的鍵值對數量少，負載因子也小。

當負載因子超過某個闕值時，為了維持雜湊的容量在一定合理範圍，就會對雜湊表容量進行 resize 操作：

擴大雜湊表容量
縮小雜湊表容量

c. 什麼時候進行擴容和縮容操作？

擴容條件

滿足下面任一條件都會觸發雜湊表擴容
1. 伺服器目前沒有執行 bgsave 命令，或 bgrewriteaof 命令，並且雜湊表的負載因子 >=1
2. 伺服器目前在執行 bgsave 命令，或 bgrewriteaof 命令並且雜湊表的負載因子 >5
縮容條件
1. 雜湊表的負載因子 < 0.1

d. 怎麼操作擴容和縮容？

也就是說擴容和縮容的操作步驟是什麼？

為字典 ht[1] 分配記憶體空間，空間大小取決於要執行的操作，以及當前 ht[0] 的鍵值對數量
- 如果是擴容操作，那麼 ht[1] 的空間大小等於第一個 ht[0].used * 2 的 2^n(2的n次冪)
- 如果是縮容操作，那麼 ht[1] 的空間大小等於第一個 ht[0].used 的 2^n(2的n次冪)
將 ht[0] 上所有鍵值重新計算雜湊值和索引值後存放到 ht[1] 對應位置上
當 ht[0] 上所有的鍵值移動到 ht[1] 後，釋放 ht[0]，將 ht[1] 變成 ht[0]，並在 ht[1] 上新建一個空雜湊表

擴容程式碼簡析：

_dictExpandIfNeeded ：

// https://github.com/redis/redis/blob/3.0/src/dict.c#L923

/* Expand the hash table if needed */
static int _dictExpandIfNeeded(dict *d)
{
    /* Incremental rehashing already in progress. Return. */
    if (dictIsRehashing(d)) return DICT_OK; // 如果正在進行rehash，則返回

    /* If the hash table is empty expand it to the initial size. */
    // 如果 ht[0] 為空，則建立並初始化ht[0]，然後返回
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */
    /*當 (ht[0].used/ht[0].size)>=1 並且，
       滿足dict_can_resize=1或ht[0].used/ht[0].size>5時，對字典進行擴容*/ 
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
    {
        return dictExpand(d, d->ht[0].used*2);
    }
    return DICT_OK;
}

// https://github.com/redis/redis/blob/3.0/src/dict.c#L58
static int dict_can_resize = 1;
static unsigned int dict_force_resize_ratio = 5;

dictExpand:

// https://github.com/redis/redis/blob/3.0/src/dict.c#L204
/* Expand or create the hash table */
int dictExpand(dict *d, unsigned long size)
{
    dictht n; /* the new hash table 新建一個雜湊表*/
    unsigned long realsize = _dictNextPower(size); // 計算擴容或縮容新版雜湊表大小

    /* the size is invalid if it is smaller than the number of
     * elements already inside the hash table */
    // 如果雜湊表正在rehash或新建雜湊表大小小於現已使用的，則返回錯誤
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;

    /* Rehashing to the same table size is not useful. */
    if (realsize == d->ht[0].size) return DICT_ERR;

    /* Allocate the new hash table and initialize all pointers to NULL */
    n.size = realsize;
    n.sizemask = realsize-1;
    n.table = zcalloc(realsize*sizeof(dictEntry*));
    n.used = 0;

    /* Is this the first initialization? If so it's not really a rehashing
     * we just set the first hash table so that it can accept keys. */
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }

    /* Prepare a second hash table for incremental rehashing */
    d->ht[1] = n;
    d->rehashidx = 0;
    return DICT_OK;
}

縮容操作：

dictResize

// https://github.com/redis/redis/blob/3.0/src/dict.c#L192
int dictResize(dict *d)
{
    int minimal;

    // dict_can_resize 在 https://github.com/redis/redis/blob/3.0/src/dict.c#L58 這裡是設定為 1，如果為0就返回，不進行後面操心
    // 或者 dictIsRehashig() 真正進行rehash操心，也返回不rehash操作
    if (!dict_can_resize || dictIsRehashing(d)) return DICT_ERR;
    minimal = d->ht[0].used; // 獲得已經使用ht的數量
    if (minimal < DICT_HT_INITIAL_SIZE) // 這個最小值不能小於 DICT_HT_INITIAL_SIZE = 4
        minimal = DICT_HT_INITIAL_SIZE;
    return dictExpand(d, minimal); // 用dictExpand函式調整字典大小
}

// https://github.com/redis/redis/blob/3.0/src/dict.h#L100
/* This is the initial size of every hash table */
#define DICT_HT_INITIAL_SIZE     4

參考

【資料結構與演算法學習】雜湊表（Hash Table，雜湊表）
2023-03-15
資料結構演算法
資料結構 - 雜湊表，再探
2024-10-29
資料結構
04 Javascript資料結構與演算法之字典和雜湊表
2018-08-27
JavaScript資料結構演算法
資料結構，雜湊表hash設計實驗
2020-12-12
資料結構
資料結構——雜湊表
2019-03-04
資料結構
JavaScript資料結構——字典和雜湊表的實現
2019-08-06
JavaScript資料結構
資料結構之「雜湊表」
2019-03-23
資料結構
資料結構 - 雜湊表，初探
2024-10-27
資料結構
演算法與資料結構基礎 - 雜湊表(Hash Table)
2019-08-05
演算法資料結構
資料結構基礎--雜湊表
2018-12-07
資料結構
JAVA資料結構之雜湊表
2018-08-15
Java資料結構
資料結構（二十八）：雜湊表
2020-10-06
資料結構
【PHP資料結構】雜湊表查詢
2021-09-09
PHP資料結構
資料結構雜湊表（c語言）
2020-12-27
資料結構C語言
雜湊表（雜湊表）原理詳解
2019-03-14
Ruby：Hash(雜湊)學習,你可以理解為字典
2018-12-23
Day76.雜湊表、雜湊函式的構造 -資料結構
2020-10-19
函式資料結構
演算法與資料結構——雜湊表
2024-08-27
演算法資料結構
資料結構第十一節(雜湊表)
2020-12-15
資料結構
Redis資料結構詳解（2）-redis中的字典dict
2022-03-28
Redis資料結構
Python：說說字典和雜湊表，雜湊衝突的解決原理
2018-10-09
Python
資料結構和演算法-雜湊表 (HashTable)
2020-06-13
資料結構演算法
《JavaScript資料結構與演算法》筆記——第7章字典和雜湊表
2019-02-16
JavaScript資料結構演算法筆記
雜湊表的原理
2022-03-26
Redis命令——雜湊(Hash)
2018-11-11
Redis
資料結構與演算法整理總結---雜湊表
2020-04-16
資料結構演算法
【閱讀筆記：雜湊表】Javascript任何物件都是一個雜湊表（hash表）！
2019-07-04
筆記JavaScript物件
【資料結構】查詢結構（二叉排序樹、ALV樹、雜湊技術雜湊表）
2018-06-08
資料結構排序
Python 雜湊表的實現——字典
2023-11-24
Python
資料結構 - 雜湊表，三探之程式碼實現
2024-10-31
資料結構
《閒扯Redis八》Redis字典的雜湊表執行Rehash過程分析
2020-07-28
Redis
深入剖析Redis系列(六) - Redis資料結構之雜湊
2018-10-14
Redis資料結構
資料結構與演算法Python版熟悉雜湊表，瞭解Python字典底層實現
2021-06-15
資料結構演算法Python
Redis五大資料型別之 Hash（雜湊）
2020-09-11
Redis大資料資料型別
Hash，雜湊，雜湊？
2019-03-09
js實現資料結構及演算法之雜湊表(Hashtable)
2018-08-31
JS資料結構演算法
資料結構實驗之查詢七：線性之雜湊表
2020-12-16
資料結構
雜湊表
2024-11-08