Redis 資料結構與物件編碼 (Object Encoding)

buttercup發表於2020-10-31

原文網址 : https://www.cnblogs.com/buttercup/p/13853114.html

資料結構實現

相信大家對 redis 的資料結構都比較熟悉：

string：字串（可以表示字串、整數、點陣圖）
list：列表（可以表示線性表、棧、雙端佇列、阻塞佇列）
hash：雜湊表
set：集合
zset：有序集合

為了將效能優化到極致，redis 作者為每種資料結構提供了不同的實現方式，以適應特定應用場景。
以最常用的 string 為例，其底層實現就可以分為 3 種：int, embstr, raw

127.0.0.1:6379> SET counter 1
OK
127.0.0.1:6379> OBJECT ENCODING counter
"int"
127.0.0.1:6379> SET name "Tom"
OK
127.0.0.1:6379> OBJECT ENCODING name
"embstr"
127.0.0.1:6379> SETBIT bits 1 1
(integer) 0
127.0.0.1:6379> OBJECT ENCODING bits
"raw"

這些特定的底層實現在 redis 中被稱為編碼encoding，下面逐一介紹這些編碼實現。

string

redis 中所有的 key 都是字串，這些字串是通過一個名為 簡單動態字串SDS的資料結構實現的。

    typedef char *sds; // SDS 字串指標，指向 sdshdr.buf

    struct sdshdr? { // SDS header，[?] 可以為 8, 16, 32, 64
        uint?_t len;          // 已用空間，字串的實際長度
        uint?_t alloc;        // 已分配空間，不包含'\0'
        unsigned char flags;  // 型別標記，指明瞭 len 與 alloc 的實際型別，可以通過 sds[-1] 獲取
        char buf[];           // 字元陣列，儲存以'\0'結尾的字串，與傳統 C 語言中的字串的表達方式保持一致
    };

記憶體佈局如下：

+-------+---------+-----------+-------+
|  len  |  alloc  |   flags   |  buf  |
+-------+---------+-----------+-------+
                   ^--sds[-1]  ^--sds

相較於傳統的 C 字串，其優點如下：

高效：記錄了已用空間，獲取字串長度的操作為O(1)
安全：記錄了空閒空間，可以避免寫緩衝區越界的問題
記憶體友好：通過記錄了空間資訊，可以預分配空間，實現惰性刪除，減少記憶體分配的同時不會造成記憶體洩露
二進位制安全：字串內容可以為非 ASCII 編碼，任意資料都能被編碼為二進位制字串
相容 C 字串：可以複用部分 C 標準庫程式碼，避免無用重複

list

redis 中 list 的底層實現之一是雙向連結串列，該結構支援順序訪問，並提供了高效的元素增刪功能。

    typedef struct listNode {
        struct listNode *prev; // 前置節點
        struct listNode *next; // 後置節點
        void *value;           // 節點值
    } listNode;

    typedef struct list {
        listNode *head; // 頭節點
        listNode *tail; // 尾節點
        unsigned long len;     // 列表長度
        void *(*dup) (void *ptr); // 節點值複製函式
        void (*free) (void *ptr); // 節點值釋放函式
        int (*match) (void *ptr); // 節點值比較函式
    } list;

這裡使用了函式指標來實現動態繫結，根據 value 型別，指定不同 dup, free, match 的函式，實現多型。

該資料結構有以下特徵：

有長：獲取列表長度的操作為O(1)
雙端：可以同時支援正向和逆向遍歷，獲取前後位置的節點複雜度為O(1)
無環：沒有設定哨兵節點，列表為空時，表頭表尾均為 NULL
多型：通過函式指標實現多型，資料結構可以複用

dict

redis 中使用 dict 來儲存鍵值對，其底層實現之一是雜湊表。

    typedef struct dictEntry {
        void* key;  // 鍵
        union {     // 值，可以為指標、有符號長整，無符號長整，雙精度浮點
            void *val;
            uint64_t u64;
            int64_t s64;
            double d;
        } v;
        struct dictEntry *next;
    } dictEntry;

    typedef struct dictht {
        dictEntry **table;      // 雜湊表陣列，陣列中的每個元素是一個單向連結串列
        unsigned long size;     // 雜湊表陣列大小
        unsigned long sizemask; // 雜湊掩碼，用於計算索引
        unsigned long used;     // 已有節點數量
    } dictht;

    typedef struct dictType {
        unsigned int (*hashFunction) (const void *key);             // 雜湊函式，用於計算雜湊值
        int (*keyCompare)(void *privdata, const void *key1, const void *key2); // 鍵比較函式
        void *(*keyDup)(void *privdata, const void *key);           // 鍵複製函式
        void *(*valDup)(void *privdata, const void *obj);           // 值複製函式
        void *(*keyDestructor)(void *privdata, const void *key);    // 鍵銷燬函式
        void *(*valDestructor)(void *privdata, const void *obj);    // 值銷燬函式
    } dictType;

    typedef struct dict {
        dictType *type;     // 型別函式，用於實現多型
        void *privdata;     // 私有資料，用於實現多型
        dictht ht[2];       // 雜湊表，字典使用 ht[0] 作為雜湊表，ht[1] 用於進行 rehash
        int rehashidx;      // rehash索引，當沒有執行 rehash 時，其值為 -1
    } dict;

該資料結構有以下特徵：

雜湊演算法：使用 murmurhash2 作為雜湊函式，時間複雜度為O(1)
衝突解決：使用鏈地址法解決衝突，新增元素會被放到表頭，時間複雜度為O(1)
重新雜湊：每次 rehash 操作都會分成 3 步完成

步驟1：為dict.ht[1]分配空間，其大小為 2 的 n 次方冪
步驟2：將dict.ht[0]中的所有鍵值對 rehash 到dict.ht[1]上
步驟3：釋放dict.ht[0]的空間，用dict.ht[1]替換 dict.ht[0]

rehash 的一些細節

分攤開銷

為了減少停頓，步驟2 會分為多次漸進完成，將 rehash 鍵值對所需的計算工作，平均分攤到每個字典的增加、刪除、查詢、更新操作，期間會使用dict.rehashidx記錄dict.ht[0]中已經完成 rehash 操作的dictht.table索引：
- 每執行一次 rehash 操作，dict.rehashidx計數器會加 1
- 當 rehash 完成後，dict.rehashidx會被設定為 -1
觸發條件
計算當前負載因子：loader_factor = ht[0].used / ht[0].size
收縮： 當 loader_factor < 0.1 時，執行 rehash 回收空閒空間
擴充套件：
1. 沒有執行 BGSAVE 或 BGREWRITEAOF 命令，loader_factor >= 1 執行 rehash
2. 正在執行 BGSAVE 或 BGREWRITEAOF 命令，loader_factor >= 5 執行 rehash
大多作業系統都採用了 寫時複製copy-on-write技術來優化子程式的效率：

父子程式共享同一份資料，直到資料被修改時，才實際拷貝記憶體空間給子程式，保證資料隔離

在執行 BGSAVE 或 BGREWRITEAOF 命令時，redis 會建立子程式，此時伺服器會通過增加 loader_factor 的閾值，避免在子程式存在期間執行不必要的記憶體寫操作，節約記憶體

skiplist

跳錶是一種有序資料結構，並且通過維持多層級指標來達到快速訪問的目的，是典型的空間換時間策略。
其查詢效率與平衡樹相近，但是維護成本更低，且實現簡單。

    typedef struct zskiplistNode {
        sds ele;                        // 成員物件
        double score;                   // 分值
        struct zskiplistNode *backward; // 後退指標
        struct zskiplistLevel {
            struct zskiplistNode *forward;  // 前進指標
            unsigned long span;             // 跨度，當前節點和前進節點之間的距離
        } level[];
    } zskiplistNode;

    typedef struct zskiplist {
        struct zskiplistNode *header, *tail;// 頭尾指標
        unsigned long length;               // 長度
        int level;                          // 最大層級
    } zskiplist;

該資料結構有以下特徵：

查詢：平均查詢時間為O(logN)，最壞查詢時間為O(N)，並且支援範圍查詢
概率：每次建立節點的時候，程式根據冪次定律隨機生成一個 1 至 32 之間的隨機數，用於決定層高
排位：在查詢節點的過程中，沿途訪問過所有的跨度 span 累計起來，得到目標節點在表中的排位

intset

有序整型集合，具有緊湊的儲存空間，新增操作的時間複雜度為O(N)。

    typedef struct intset {
        uint32_t encoding;  // 編碼方式，指示元素的實際型別
        uint32_t length;    // 元素數量
        int8_t contents[];  // 元素陣列，元素實際型別可能為 int16_t,int32_t,int64_t,
    } intset;

該資料結構有以下特徵：

有序：元素陣列中的元素按照從小到大排列，使用二分查詢時間複雜度為O(logN)
升級：當有新元素加入集合，且新元素比所有現有元素型別都長時，集合需要進行升級：

步驟1：根據新元素的型別，擴充套件元素陣列空間
步驟2：將現有元素都轉換為新型別
步驟3：將新元素新增到陣列中

ziplist

壓縮列表是為了節約記憶體而開發的，是儲存在連續記憶體塊上的順序資料結構。
一個壓縮列表可以包含任意多的 entry 節點，每個節點包含一個位元組陣列或整數。
redis 中並沒有顯式定義 ziplist 的資料結構，僅僅提供了一個描述結構 zlentry 用於運算元據。

    typedef struct zlentry {
        unsigned int prevrawlensize;// 用於記錄前一個 entry 長度的位元組數
        unsigned int prevrawlen;    // 前一個 entry 的長度
        unsigned int lensize        // 用於記錄當前 entry 型別/長度的位元組數
        unsigned int len;           // 實際用於儲存資料的位元組數
        unsigned int headersize;    // prevrawlensize + lensize
        unsigned char encoding;     // 用於指示 entry 資料的實際編碼型別
        unsigned char *p;           // 指向 entry 的開頭
    } zlentry;

其實際的記憶體佈局如下：

+----------+---------+---------+--------+-----+--------+--------+
|  zlbytes |  zltail |  zllen  | entry1 | ... | entryN |  zlend |
+----------+---------+---------+--------+-----+--------+--------+
<--------------------------- zlbytes --------------------------->
                                               ^--zltail
                                <------- zllen ------->

zlbytes : 壓縮列表佔用的位元組數 (u_int32)
zltail : 壓縮列表表尾偏移量，無需遍歷即可確定表尾地址，方便反向遍歷 (u_int32)
zllen : 壓縮列表節點數量，當節點數量大於 65535 時，具體數量需要通過遍歷得出 (u_int16)
entryX : 列表節點，具體長度不定
zlend : 列表末端，特殊值 0xFF (u_int8)

entry 的記憶體佈局如下：

+-------------------+----------+---------+
| prev_entry_length | encoding | content |
+-------------------+----------+---------+

prev_entry_length : 前一個節點的長度，可以根據當前節點的起始地址，計算前一個節點的起始地址（變長：1位元組/5位元組）
encoding : 節點儲存資料的型別和長度（變長：1位元組/2位元組/5位元組）
content : 節點儲存的資料，可以儲存整數或者位元組陣列

該資料結構具有以下特徵：

結構緊湊：一整塊連續記憶體，沒有多餘的記憶體碎片，更新會導致記憶體 realloc 與記憶體複製，平均時間複雜度為 O(N)
逆向遍歷：從表尾開始向表頭進行遍歷
連鎖更新：對前一條資料的更新，可能導致後一條資料的 prev_entry_length 與 encoding 所需長度變化，產生連鎖反應，更新操作最壞時間為 O(N²)

quicklist

在較早版本的 redis 中，list 有兩種底層實現：

當列表物件中元素的長度比較小或者數量比較少的時候，採用壓縮列表 ziplist 來儲存
當列表物件中元素的長度比較大或者數量比較多的時候，則會轉而使用雙向列表 linkedlist 來儲存

兩者各有優缺點：

ziplist 的優點是記憶體緊湊，訪問效率高，缺點是更新效率低，並且資料量較大時，可能導致大量的記憶體複製
linkedlist 的優點是節點修改的效率高，但是需要額外的記憶體開銷，並且節點較多時，會產生大量的記憶體碎片

為了結合兩者的優點，在 redis 3.2 之後，list 的底層實現變為快速列表 quicklist。
快速列表是 linkedlist 與 ziplist 的結合: quicklist 包含多個記憶體不連續的節點，但每個節點本身就是一個 ziplist。

    typedef struct quicklistNode {
        struct quicklistNode *prev;  // 上一個 ziplist 
        struct quicklistNode *next;  // 下一個 ziplist 
        unsigned char *zl;           // 資料指標，指向 ziplist 結構，或者 quicklistLZF 結構
        unsigned int sz;             // ziplist 佔用記憶體長度（未壓縮）
        unsigned int count : 16;     // ziplist 記錄數量
        unsigned int encoding : 2;   // 編碼方式，1 表示 ziplist ，2 表示 quicklistLZF
        unsigned int container : 2;  // 
        unsigned int recompress : 1;         // 臨時解壓，1 表示該節點臨時解壓用於訪問
        unsigned int attempted_compress : 1; // 測試欄位
        unsigned int extra : 10;             // 預留空間
    } quicklistNode;

    typedef struct quicklistLZF {
        unsigned int sz;    // 壓縮資料長度
        char compressed[];  // 壓縮資料
    } quicklistLZF;

    typedef struct quicklist {
        quicklistNode *head;        // 列表頭部
        quicklistNode *tail;        // 列表尾部
        unsigned long count;        // 記錄總數
        unsigned long len;          // ziplist 數量
        int fill : 16;              // ziplist 長度限制，每個 ziplist 節點的長度（記錄數量/記憶體佔用）不能超過這個值
        unsigned int compress : 16; // 壓縮深度，表示 quicklist 兩端不壓縮的 ziplist 節點的個數，為 0 表示所有 ziplist 節點都不壓縮
    } quicklist;

該資料結構有以下特徵：

無縫切換：結合了 linkedlist 與 ziplist 的優點，無需在兩種結構之間進行切換
中間壓縮：作為佇列使用的場景下，list 中間的資料被訪問的頻率比較低，可以選擇進行壓縮以減少記憶體佔用

robj

為了實現動態編碼技術，redis 構建了一個物件系統。
redis 可以在執行命令前，根據物件型別判斷當前命令是否能夠執行。
此外，該系統通過引用計數實現記憶體共享，並記錄來物件訪問時間，為優化記憶體回收策略提供了依據。

    typedef struct redisObject {
        unsigned type:4;        // 型別，當前物件的邏輯型別，例如：set
        unsigned encoding:4;    // 編碼，底層實現的資料結構，例如：intset / ziplist
        unsigned lru:24;        /* LRU 時間 (相對與全域性 lru_clock 的時間) 或
                                 * LFU 資料 (8bits 記錄訪問頻率，16 bits 記錄訪問時間). */
        int refcount;           // 引用計數
        void *ptr;              // 資料指標，指向具體的資料結構
    } robj;

該資料結構有以下特徵：

高效：同個型別的 redis 物件可以使用不同的底層實現，可以在不同的應用場景上優化物件的使用效率
節約記憶體：對於整數值的記憶體字串物件，redis 可以通過記錄引用計數來減少記憶體複製
空轉時長：物件系統會記錄物件的訪問時間，方便 LRU 演算法優先回收較少使用的物件

編碼格式

string 型別

string 的編碼型別可能為：

OBJ_ENCODING_INT int：long 型別整數
OBJ_ENCODING_RAW raw：sds 字串
OBJ_ENCODING_EMBSTR embstr：嵌入式字串（編碼後長度小於 44 位元組的字串）

127.0.0.1:6379> SET str "1234567890 1234567890 1234567890 1234567890"
OK
127.0.0.1:6379> STRLEN str
(integer) 43
127.0.0.1:6379> OBJECT ENCODING str
"embstr"
127.0.0.1:6379> APPEND str _
(integer) 44
127.0.0.1:6379> OBJECT ENCODING str
"raw"

使用 embstr 編碼是為了減少短字串的記憶體分配次數，參考 redis 作者原話：

REDIS_ENCODING_EMBSTR_SIZE_LIMIT set to 39.
The new value is the limit for the robj + SDS header + string + null-term to stay inside the 64 bytes Jemalloc arena in 64 bits systems.

對比兩者記憶體佈局可以發現：

embstr 是一個完整連續的記憶體塊，只需要 1 次記憶體分配
raw 的記憶體是不連續的，需要申請 2 次記憶體


<------------------------------------------ Jemalloc arena (64 bytes)  ---------------------------------------------->
+-------------------------------------------------------------------------------+---------------------+--------------+
|                             redisObject (16 bytes)                            |  sdshdr8 (3 bytes)  |   45 bytes   |
+--------------------+---------------------------------+-------+----------+-----+-----+-------+-------+---------+----+
| type(REDIS_STRING) | encoding(REDIS_ENCODING_EMBSTR) |  lru  | refcount | ptr | len | alloc | flags |   buf   | \0 |
+--------------------+---------------------------------+-------+----------+-----+-----+-------+-------+---------+----+


+--------------------+
|    redisObject     |
+--------------------+
|        type        |
|    REDIS_STRING    |
+--------------------+
|      encoding      |
| REDIS_ENCODING_RAW |
+--------------------+      +---------+
|         ptr        | ---> | sdshdr? |
+--------------------+      +---------+
                            |   len   |
                            +---------+
                            |  alloc  |
                            +---------+
                            |  flags  |
                            +---------++---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
                            |   buf   || T | h | e | r | e |   | i | s |   | n | o |   | c | e | r | t | a |...|
                            +---------++---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

list 型別

list 預設的編碼型別為 OBJ_ENCODING_QUICKLIST quicklist

list-max-ziplist-size：每個 quicklist 節點上的 ziplist 長度
list-compress-depth：quicklist 兩端不壓縮的節點數目

hash 型別

hash 的編碼型別有 OBJ_ENCODING_ZIPLIST ziplist 與 OBJ_ENCODING_HT hashtable，具體使用哪種編碼受下面兩個選項控制：

hash-max-ziplist-value：當 key 與 value 的長度都小於該值時使用 ziplist 編碼（預設為 64）
hash-max-ziplist-entries：當 hash 中的元素數量小於該值時使用 ziplist 編碼（預設為 512）

key 長度超過 64 的情況：

127.0.0.1:6379> HSET table x 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
(integer) 0
127.0.0.1:6379> OBJECT ENCODING table
"ziplist"
127.0.0.1:6379> HSET table x 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
(integer) 0
127.0.0.1:6379> OBJECT ENCODING table
"hashtable"
127.0.0.1:6379> DEL table
(integer) 1
127.0.0.1:6379> HSET table xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 'x'
(integer) 1
127.0.0.1:6379> OBJECT ENCODING table
"ziplist"
127.0.0.1:6379> HSET table xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 'x'
(integer) 1
127.0.0.1:6379> OBJECT ENCODING table
"hashtable"

value 長度超過 64 的情況：

127.0.0.1:6379> HSET table x 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
(integer) 0
127.0.0.1:6379> OBJECT ENCODING table
"ziplist"
127.0.0.1:6379> HSET table x 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
(integer) 0
127.0.0.1:6379> OBJECT ENCODING table
"hashtable"
127.0.0.1:6379> DEL table
(integer) 1
127.0.0.1:6379> HSET table xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 'x'
(integer) 1
127.0.0.1:6379> OBJECT ENCODING table
"ziplist"
127.0.0.1:6379> HSET table xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 'x'
(integer) 1
127.0.0.1:6379> OBJECT ENCODING table
"hashtable"

元素數量度超過 512 的情況：

127.0.0.1:6379> EVAL "for i=1,512 do redis.call('HSET', KEYS[1], i, i) end" 1 numbers
(nil)
127.0.0.1:6379> HLEN numbers
(integer) 512
127.0.0.1:6379> OBJECT ENCODING numbers
"ziplist"
127.0.0.1:6379> DEL numbers
(integer) 1
127.0.0.1:6379> EVAL "for i=1,513 do redis.call('HSET', KEYS[1], i, i) end" 1 numbers
(nil)
127.0.0.1:6379> HLEN numbers
(integer) 513
127.0.0.1:6379> OBJECT ENCODING numbers
"hashtable"

set 型別

set 的編碼型別有 OBJ_ENCODING_INTSET intset 與 OBJ_ENCODING_HT hashtable，具體使用哪種編碼受下面兩個選項控制：

當 set 中的所有元素都是整數時考慮使用 intset 編碼，否則只能使用 hashtable 編碼
set-max-intset-entries：當 set 中的元素數量小於該值時使用 intset 編碼（預設為 512）

包含非整數元素的情況：

127.0.0.1:6379> SADD set 1 2
(integer) 2
127.0.0.1:6379> OBJECT ENCODING set
"intset"
127.0.0.1:6379> SADD set "ABC"
(integer) 1
127.0.0.1:6379> OBJECT ENCODING set
"hashtable"

元素數量度超過 512 的情況：

127.0.0.1:6379> EVAL "for i=1,512 do redis.call('SADD', KEYS[1], i, i) end" 1 numbers
(nil)
127.0.0.1:6379> SCARD numbers
(integer) 512
127.0.0.1:6379> OBJECT ENCODING numbers
"intset"
127.0.0.1:6379> DEL numbers
(integer) 1
127.0.0.1:6379> EVAL "for i=1,513 do redis.call('SADD', KEYS[1], i, i) end" 1 numbers
(nil)
127.0.0.1:6379> SCARD numbers
(integer) 513
127.0.0.1:6379> OBJECT ENCODING numbers
"hashtable"

zset 型別

set 的編碼型別有 OBJ_ENCODING_ZIPLIST ziplist 與 OBJ_ENCODING_SKIPLIST skiplist。

使用 ziplist 編碼時，每個集合元素使用兩個相鄰的 entry 節點儲存，第一個節點儲存成員值 member，第二節點儲存元素的分值 score，並且 entry 按照 score 從小到大進行排序：

+----------------------+
|     redisObject      |
+----------------------+
|         type         |
|      REDIS_ZSET      |
+----------------------+
|       encoding       |
| OBJ_ENCODING_ZIPLIST |
+----------------------+      +----------+----------+---------+--------------------+-------------------+-----+-----------------------+--------------------+-------+
|          ptr         | ---> |  zlbytes |  zltail  |  zllen  | entry 1 (member 1) | entry 2 (score 1) | ... | entry 2N-1 (member N) | entry 2N (score N) | zlend |
+----------------------+      +----------+----------+---------+--------------------+-------------------+-----+-----------------------+--------------------+-------+
                                                               >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> score increase >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

使用 skiplist 實現時，使用會使用一個名為 zset 的資料結構：

    typedef struct zset {
        dict *dict;      // 維護 member -> score 的對映，查詢給的成員的分值
        zskiplist *zsl;  // 按 score 大小儲存了所有集合元素，支援範圍操作
    } zset; // dict 與 zsl 會共享成員與分值


+----------------------+                                       +--------+     +------------+    +---------+
|     redisObject      |                                   +-->| dictht |     |  StringObj | -> |  long   |
+----------------------+                       +-------+   |   +--------+     +------------+    +---------+
|         type         |                   +-->| dict  |   |   | table  | --> |  StringObj | -> |  long   |
|      REDIS_ZSET      |                   |   +-------+   |   +--------+     +------------+    +---------+
+----------------------+                   |   | ht[0] | --+                  |  StringObj | -> |  long   |
|       encoding       |      +--------+   |   +-------+      +-----+         +------------+    +---------+
| OBJ_ENCODING_ZIPLIST |      |  zset  |   |                  | L32 | -> NULL
+----------------------+      +--------+   |                  +-----+
|          ptr         | ---> |  dict  | --+                  | ... |
+----------------------+      +--------+       +--------+     +-----+    +-----------+                     +-----------+
                              |  zsl   | --->  | header | --> | L4  | -> |     L4    | ------------------> |     L4    | -> NULL
                              +--------+       +--------+     +-----+    +-----------+                     +-----------+
                                               | tail   |     | L3  | -> |     L3    | ------------------> |     L3    | -> NULL
                                               +--------+     +-----+    +-----------+    +-----------+    +-----------+
                                               | level  |     | L2  | -> |     L2    | -> |     L2    | -> |     L2    | -> NULL
                                               +--------+     +-----+    +-----------+    +-----------+    +-----------+
                                               | length |     | L1  | -> |     L1    | -> |     L1    | -> |     L1    | -> NULL
                                               +--------+     +-----+    +-----------+    +-----------+    +-----------+
                                                                 NULL <- |     BW    | <- |     BW    | <- |     BW    |
                                                                         +-----------+    +-----------+    +-----------+
                                                                         | StringObj |    | StringObj |    | StringObj |
                                                                         +-----------+    +-----------+    +-----------+
                                                                         |    long   |    |    long   |    |    long   |
                                                                         +-----------+    +-----------+    +-----------+

zset 具體使用哪種編碼受下面兩個選項控制：

zset-max-ziplist-value：當 member 的長度都小於該值時使用 ziplist 編碼（預設為 64）
zset-max-ziplist-entries：當 zset 中的元素數量小於該值時使用 ziplist 編碼（預設為 128)

Redis 整體結構

每個資料庫都是一個 redisDb 結構體：

    typedef struct redisDb {
        dict *dict;                 /* 據庫的鍵空間 keyspace */
        dict *expires;              /* 設定了過期時間的 key 集合 */
        dict *blocking_keys;        /* 客戶端阻塞等待的 key 集合 (BLPOP)*/
        dict *ready_keys;           /* 已就緒的阻塞 key 集合 (PUSH) */
        dict *watched_keys;         /* 在事務中監控受監控的 key 集合 */
        int id;                     /* 資料庫 ID */
        long long avg_ttl;          /* 平均 TTL, just for stats */
        unsigned long expires_cursor; /* 過期檢測指標 */
        list *defrag_later;         /* 記憶體碎片回收列表 */
    } redisDb;

redis 所有資料庫都儲存著 redisServer.db 陣列中，redisServer.dbnum 儲存了資料庫的數量，簡化後的記憶體佈局大致如下：

+-------------+
| redisServer |
+-------------+    +------------+------+-------------+
|     db      | -> | redisDb[0] | .... | redisDb[15] |
+-------------+    +------------+------+-------------+
|    dbnum    |      |
|     16      |      |
+-------------+      |  +---------+                         +------------+
                     +->| redisDb |                     +-> | ListObject |
                        +---------+    +------------+   |   +------------+
                        |  dict   | -> |  StringObj | --+
                        +---------+    +------------+       +------------+
                        | expires |    |  StringObj | ----> | HashObject |
                        +---------+    +------------+       +------------+
                              |        |  StringObj | --+
                              |        +------------+   |   +------------+
                              |                         +-> | StringObj  |
                              |                             +------------+
                              |
                              |       +------------+    +-------------+
                              +---->  |  StringObj | -> |    long     |
                                      +------------+    +-------------+
                                      |  StringObj | -> |    long     |
                                      +------------+    +-------------+

至此，redis 的幾種編碼方式都介紹完畢，後續將對 redis 的一些其他細節進行分享，感謝觀看。

Redis資料結構與物件
2022-12-25
Redis資料結構物件
Redis-資料結構與物件-物件
2019-02-25
Redis資料結構物件
redis 資料結構和內部編碼
2018-11-08
Redis資料結構
Redis資料結構的內部編碼
2021-01-02
Redis資料結構
Redis資料結構及物件（下）
2019-05-06
Redis資料結構物件
Redis資料結構及物件（上）
2019-05-06
Redis資料結構物件
【redis】-- 資料結構及底層編碼篇
2021-01-05
Redis資料結構
《Redis設計與實現》筆記 -- 資料結構與物件
2019-04-01
Redis筆記資料結構物件
《redis設計與實現》1-資料結構與物件篇
2018-12-16
Redis資料結構物件
redis資料結構原始碼閱讀——字串編碼過程
2020-11-17
Redis資料結構原始碼字串編碼
Redis 的基礎資料結構（三）物件
2018-03-17
Redis資料結構物件
Redis設計與實現閱讀總結（一）資料結構和物件
2019-03-20
Redis資料結構物件
Redis - 物件結構
2023-03-27
Redis物件
Redis資料結構—連結串列與字典
2021-05-09
Redis資料結構
Redis資料結構—連結串列與字典的結構
2021-05-09
Redis資料結構
Redis | 第一部分：資料結構與物件下篇《Redis設計與實現》
2021-11-23
Redis資料結構物件
Redis | 第一部分：資料結構與物件上篇《Redis設計與實現》
2021-11-17
Redis資料結構物件
Redis資料結構
2018-12-13
Redis資料結構
Redis 資料結構
2021-06-16
Redis資料結構
Redis資料結構概覽（原始碼分析）
2020-03-05
Redis資料結構原始碼
Redis基礎（一）資料結構與資料型別
2020-10-26
Redis資料結構資料型別
redis的資料結構
2024-05-27
Redis資料結構
Redis的資料結構與應用場景
2021-02-18
Redis資料結構
簡讀筆記_Redis設計與實現_第一章_資料結構與物件
2019-04-21
筆記Redis資料結構物件
【Redis 系列】redis 學習十六，redis 字典(map) 及其核心編碼結構
2022-06-26
Redis
redis支援的資料結構
2018-10-07
Redis資料結構
Redis基礎資料結構
2018-09-16
Redis資料結構
Redis中的資料結構
2018-09-10
Redis資料結構
Redis 資料結構之 SDS
2020-06-18
Redis資料結構
Redis 5大資料結構
2020-06-27
Redis大資料資料結構
Redis 內部資料結構
2020-07-06
Redis資料結構
大話 Redis 資料結構
2019-05-07
Redis資料結構
淺談 Redis 資料結構
2019-04-21
Redis資料結構
Redis資料結構簡介
2018-12-26
Redis資料結構
Redis - 底層資料結構
2023-04-12
Redis資料結構
深入剖析Redis系列(四) - Redis資料結構與全域性命令概述
2018-09-29
Redis資料結構
Redis資料結構—整數集合與壓縮列表
2021-05-16
Redis資料結構
Redis原始碼分析-底層資料結構盤點
2019-05-15
Redis原始碼資料結構

Redis 資料結構與物件編碼 (Object Encoding)

資料結構實現

string

list

dict

rehash 的一些細節

skiplist

intset

ziplist

quicklist

robj

編碼格式

string 型別

list 型別

hash 型別

set 型別

zset 型別

Redis 整體結構

相關文章