Redis SWAPDB 命令背後做了什麼

羅西的思考發表於2021-05-27

原文網址 : https://www.cnblogs.com/rossiXYZ/p/14802130.html

Redis

Redis SWAPDB 命令背後做了什麼

0x00 摘要

新使用一個功能之前必須慎重。除了進行大量測試以外，如果有條件，可以讀取相關程式碼看看其內部執行原理。

本文我們就通過原始碼來看看 Redis SwapDB 命令是否靠譜。

0x01 SWAPDB 基礎

1.1 命令說明

可用版本：>=4.0.0

該命令可以交換同一Redis伺服器上的兩個 DATABASE，可以實現連線某一資料庫的連線立即訪問到其他DATABASE的資料。

swapdb執行之後，使用者連線db無需再執行select操作，即可看到新的資料。

1.2 演示

redis> set mystring 0 # 先在 db 0 設定為 0
OK
redis> select 1 # 然後切換到 db 1
OK
redis[1]> set mystring 1 # 設定為 1
OK
redis[1]> swapdb 0 1     # 交換db0和db1的資料
OK
redis[1]> get mystring   # db1的連線裡獲取  原db0  的資料
"0"

下面我們看看原始碼，Redis 究竟在背後做了什麼，這個功能對我們日常業務是否有影響。

0x02 預先校驗

SWAPDB 入口函式為 swapdbCommand。

可以看出來，swapdbCommand 預先做了一些檢驗。

如果是 cluster mode，則不允許切換；
獲取兩個DB idnexes，如果出錯，就不切換；

然後才開始呼叫 dbSwapDatabases 進行切換；

/* SWAPDB db1 db2 */
void swapdbCommand(client *c) {
    long id1, id2;

    /* Not allowed in cluster mode: we have just DB 0 there. */
    if (server.cluster_enabled) {
        addReplyError(c,"SWAPDB is not allowed in cluster mode");
        return;
    }

    /* Get the two DBs indexes. */
    if (getLongFromObjectOrReply(c, c->argv[1], &id1,
        "invalid first DB index") != C_OK)
        return;

    if (getLongFromObjectOrReply(c, c->argv[2], &id2,
        "invalid second DB index") != C_OK)
        return;

    /* Swap... */
    if (dbSwapDatabases(id1,id2) == C_ERR) {
        addReplyError(c,"DB index is out of range");
        return;
    } else {
        RedisModuleSwapDbInfo si = {REDISMODULE_SWAPDBINFO_VERSION,id1,id2};
        moduleFireServerEvent(REDISMODULE_EVENT_SWAPDB,0,&si);
        server.dirty++;
        addReply(c,shared.ok);
    }
}

0x03 正式切換

dbSwapDatabases 是正式業務處理。

看了前半部分程式碼，真沒想到這麼簡單，居然就是簡單的把 db1，db2 的一些變數做了交換！

看了後半部分程式碼，才恍然原來還是有點複雜以及對業務有一定影響，具體就是：

通知 redis db 上面已經連結的各個客戶端 ready，因為有些客戶端在使用B[LR]POP 監聽資料，交換了資料庫，有些數值就可能已經ready了；
通知 redis db 上面 watch 的客戶端，本資料庫的資料已經有問題，所以客戶端需要處理；

具體如下：

int dbSwapDatabases(long id1, long id2) {
    if (id1 < 0 || id1 >= server.dbnum ||
        id2 < 0 || id2 >= server.dbnum) return C_ERR;
    if (id1 == id2) return C_OK;
    redisDb aux = server.db[id1];
    redisDb *db1 = &server.db[id1], *db2 = &server.db[id2];

    /* Swap hash tables. Note that we don't swap blocking_keys,
     * ready_keys and watched_keys, since we want clients to
     * remain in the same DB they were. */
    db1->dict = db2->dict;
    db1->expires = db2->expires;
    db1->avg_ttl = db2->avg_ttl;
    db1->expires_cursor = db2->expires_cursor;

    db2->dict = aux.dict;
    db2->expires = aux.expires;
    db2->avg_ttl = aux.avg_ttl;
    db2->expires_cursor = aux.expires_cursor;

    /* Now we need to handle clients blocked on lists: as an effect
     * of swapping the two DBs, a client that was waiting for list
     * X in a given DB, may now actually be unblocked if X happens
     * to exist in the new version of the DB, after the swap.
     *
     * However normally we only do this check for efficiency reasons
     * in dbAdd() when a list is created. So here we need to rescan
     * the list of clients blocked on lists and signal lists as ready
     * if needed.
     *
     * Also the swapdb should make transaction fail if there is any
     * client watching keys */
    scanDatabaseForReadyLists(db1);
    touchAllWatchedKeysInDb(db1, db2);
    scanDatabaseForReadyLists(db2);
    touchAllWatchedKeysInDb(db2, db1);
    return C_OK;
}

3.1 通知客戶端ready

因為有些客戶端在使用B[LR]POP 監聽資料，交換了資料庫，有些數值就可能已經ready了。

所以首先做的是：通知這兩個資料庫的客戶端，即：遍歷監聽本資料庫的 key 列表，嘗試得到對應的 value，如果可以得到 value，就通知客戶這個 key 已經ready了。

/* Helper function for dbSwapDatabases(): scans the list of keys that have
 * one or more blocked clients for B[LR]POP or other blocking commands
 * and signal the keys as ready if they are of the right type. See the comment
 * where the function is used for more info. */
void scanDatabaseForReadyLists(redisDb *db) {
    dictEntry *de;
    dictIterator *di = dictGetSafeIterator(db->blocking_keys);
    while((de = dictNext(di)) != NULL) {
        robj *key = dictGetKey(de);
        robj *value = lookupKey(db,key,LOOKUP_NOTOUCH);
        if (value) signalKeyAsReady(db, key, value->type);
    }
    dictReleaseIterator(di);
}

3.2 通知watch客戶端

這裡是通知 watch 的客戶端，本資料庫的資料已經有問題，所以客戶端需要處理。

可以看到，會遍歷 watched keys，得到這些key對應的client，把這些client 的 flag 新增上 CLIENT_DIRTY_CAS。

/* Set CLIENT_DIRTY_CAS to all clients of DB when DB is dirty.
 * It may happen in the following situations:
 * FLUSHDB, FLUSHALL, SWAPDB
 *
 * replaced_with: for SWAPDB, the WATCH should be invalidated if
 * the key exists in either of them, and skipped only if it
 * doesn't exist in both. */
void touchAllWatchedKeysInDb(redisDb *emptied, redisDb *replaced_with) {
    listIter li;
    listNode *ln;
    dictEntry *de;

    if (dictSize(emptied->watched_keys) == 0) return;

    dictIterator *di = dictGetSafeIterator(emptied->watched_keys);
    while((de = dictNext(di)) != NULL) {
        robj *key = dictGetKey(de);
        list *clients = dictGetVal(de);
        if (!clients) continue;
        listRewind(clients,&li);
        while((ln = listNext(&li))) {
            client *c = listNodeValue(ln);
            if (dictFind(emptied->dict, key->ptr)) {
                c->flags |= CLIENT_DIRTY_CAS;
            } else if (replaced_with && dictFind(replaced_with->dict, key->ptr)) {
                c->flags |= CLIENT_DIRTY_CAS;
            }
        }
    }
    dictReleaseIterator(di);
}

這裡需要講解下 Watch的機制。

0x04 Watch機制

4.1 watch 命令

Redis Watch 命令用於監視一個(或多個) key ，如果在事務執行之前這個(或這些) key 被其他命令所改動，那麼事務將被打斷

語法
redis Watch 命令基本語法如下：
WATCH key [key …]

驗證：

首先開啟兩個redis客戶端，客戶端1和客戶端2.

1. 客戶端1中，先set一個值

redis 127.0.0.1:6379> set number 10
OK
12

1. 客戶端1開啟Watch 此值。

redis 127.0.0.1:6379> watch number
OK
12

1. 客戶端1開啟事務，修改此值

redis 127.0.0.1:6379> multi
OK
redis 127.0.0.1:6379> set number 100
QUEUED
redis 127.0.0.1:6379> get number
QUEUED
redis 127.0.0.1:6379>
1234567

注意此時先不要exec執行

1. 客戶端2，去修改此值

redis 127.0.0.1:6379> set number 500
OK
12

1. 客戶端1，執行exec執行

redis 127.0.0.1:6379> exec
(nil)
redis 127.0.0.1:6379> get number
"500"
1234

發現為nil，執行未成功，客戶端 1 獲取的值為客戶端 2 修改後的值。

邏輯如下：

Redis Client 1          Redis Server              Redis Client 2
      +                       +                        +
      |                       |                        |
      |                       |                        |
      |                       |                        |
      v                       |                        |
set number 10 +-------------> |                        |
      +                       v                        |
      |                  number = 10                   |
      |                       +                        |
      |                       |                        |
      v        start watch    |                        |
watch number +--------------> |                        |
      +                       |                        |
      |                       |                        |
      |                       |                        |
      v        begin traction |                        |
    multi    ---------------> |                        |
      +                       |                        |
      |                       |                        |
      |                       |                        |
      v                       |                        |
set number 100                |                        |
      +                       |                        |
      |                       |                        |
      |                       |                        |
      v                       |                        v
  get number                  +<---------------+  set number 500
      +                       v                        +
      |                  number = 500                  |
      |                       +                        |
      v      exec will fail   |                        |
    exec +----------------->  |                        |
      +                       |                        |
      | nil                   |                        |
      |                       |                        |
      v                       |                        |
                              v                        |
  get number <---------+ number = 500                  |
      +                       +                        |
      |                       |                        |
      +                       v                        +

4.2 機制說明

4.2.1 Redis 事務

Redis保證一個事務中的所有命令要麼都執行，要麼都不執行。如果在傳送EXEC命令前客戶端斷線了，則Redis會清空事務佇列，事務中的所有命令都不會執行。而一旦客戶端傳送了EXEC命令，所有的命令就都會被執行，即使此後客戶端斷線也沒關係，因為Redis中已經記錄了所有要執行的命令。

除此之外，Redis的事務還能保證一個事務內的命令依次執行而不被其他命令插入。試想客戶端A需要執行幾條命令，同時客戶端B傳送了一條命令，如果不使用事務，則客戶端B的命令可能會插入到客戶端A的幾條命令中執行。如果不希望發生這種情況，也可以使用事務。

4.2.2 不需要回滾

redis的watch+multi實際是一種樂觀鎖。

若一個事務中有多條命令，若有一條命令錯誤，事務中的所有命令都不會執行。所以與mysql的事務不同，redis的事務執行中時不會回滾，哪怕出現錯誤，之前已經執行的命令結果也不會回滾，因為不需要回滾。

用WATCH提供的樂觀鎖功能，在你EXEC的那一刻，如果被WATCH的鍵發生過改動，則MULTI到EXEC之間的指令全部不執行，不需要rollback。

4.2.3 提示失敗

當客戶端A和客戶端B同時執行一段程式碼時候，因為事務的執行是序列的，假設A客戶端先於B執行，那麼當A執行完成時，會將客戶端A從watch了這個key的列表中刪除，並且將列表中的所有客戶端都設定為CLIENT_DIRTY_CAS，之後當B執行的時候，事務發現B的狀態是CLIENT_DIRTY_CAS，便終止事務並返回失敗。

4.3 Watch 原始碼

4.3.1 新增 watch

通過 watchCommand 來給一個client新增一個watch key，最終在 watched_keys 中插入這個 watchedkey。

/* watch命令 */
void watchCommand(client *c) {
    int j;
 
    if (c->flags & CLIENT_MULTI) {
        addReplyError(c,"WATCH inside MULTI is not allowed");
        return;
    }
    for (j = 1; j < c->argc; j++)
        watchForKey(c,c->argv[j]);
    
    addReply(c,shared.ok);
}
 
typedef struct watchedKey {
    robj *key;
    redisDb *db;
} watchedKey;
 
/* watch一個key */
void watchForKey(client *c, robj *key) {
    list *clients = NULL;
    listIter li;
    listNode *ln;
    watchedKey *wk;
 
    /* 檢查key是否已經watch 如果已經watch 直接返回 */
    // 建立一個迭代器
    listRewind(c->watched_keys,&li);
    // 遍歷客戶端已經watch的key
    while((ln = listNext(&li))) {
        wk = listNodeValue(ln);
        // 當發現已經存在此key，直接返回
        if (wk->db == c->db && equalStringObjects(key,wk->key))
            return; /* Key already watched */
    }
    /* 沒有被watch，繼續一下處理 */
    // 獲取hash表中當前key的客戶端連結串列
    clients = dictFetchValue(c->db->watched_keys,key);
    // 如果不存在，則建立一個連結串列用於儲存
    if (!clients) {
        clients = listCreate();
        dictAdd(c->db->watched_keys,key,clients);
        incrRefCount(key);
    }
    // 新增當前客戶端到連結串列末尾
    listAddNodeTail(clients,c);
    /* 維護客戶端中的watch_keys 連結串列 */
    wk = zmalloc(sizeof(*wk));
    wk->key = key;
    wk->db = c->db;
    incrRefCount(key);
    listAddNodeTail(c->watched_keys,wk);
}

具體如下，client 使用 watched_keys 來監控一系列的 key：

+----------------------+
| client               |
|                      |       +------------+     +-------------+
|                      |       | wk         |     | wk          |
|      watched_keys +--------> |      key 1 | ... |       key n |
|                      |       |      db  1 |     |       db  n |
+----------------------+       +------------+     +-------------+

4.3.2 執行命令

具體就是：

在執行命令之前，如果發現client的狀態已經被設定為 CLIENT_DIRTY_CAS，則直接終止事務，不會執行事務佇列中的命令；
如果在執行 multi 命令過程中，一旦發現問題，就退出遍歷，呼叫 discardTransaction，設定客戶端 flags 加上CLIENT_DIRTY_CAS。

具體如下：

/* exec 命令 */
void execCommand(client *c) {
    int j;
    robj **orig_argv;
    int orig_argc;
    struct redisCommand *orig_cmd;
    int must_propagate = 0; /* Need to propagate MULTI/EXEC to AOF / slaves? */
    int was_master = server.masterhost == NULL;
	
    // 未執行multi，則返回
    if (!(c->flags & CLIENT_MULTI)) {
        addReplyError(c,"EXEC without MULTI");
        return;
    }
	
    /*
     * 關鍵
     * 處理客戶端狀態 以下兩種狀態會直接終止事務，不會執行事務佇列中的命令
     * 1. CLIENT_DIRTY_CAS => 當因為watch的key被touch了
     * 2. CLIENT_DIRTY_EXEC => 當客戶端入隊了不存在的命令
     */   
    if (c->flags & (CLIENT_DIRTY_CAS|CLIENT_DIRTY_EXEC)) {
        addReply(c, c->flags & CLIENT_DIRTY_EXEC ? shared.execaborterr :
                                                  shared.nullmultibulk);
        discardTransaction(c);
        goto handle_monitor;
    }
 
    /* 執行佇列中的命令 */
    // 清空當前客戶端中儲存的watch了的key，和hash表中客戶端node
    unwatchAllKeys(c); /* Unwatch ASAP otherwise we'll waste CPU cycles */
    orig_argv = c->argv;
    orig_argc = c->argc;
    orig_cmd = c->cmd;
    addReplyMultiBulkLen(c,c->mstate.count);
    // 執行佇列中的命令
    for (j = 0; j < c->mstate.count; j++) {
        c->argc = c->mstate.commands[j].argc;
        c->argv = c->mstate.commands[j].argv;
        c->cmd = c->mstate.commands[j].cmd;
 
        /* ACL permissions are also checked at the time of execution in case
         * they were changed after the commands were ququed. */
        int acl_errpos;
        int acl_retval = ACLCheckCommandPerm(c,&acl_errpos);
        if (acl_retval == ACL_OK && c->cmd->proc == publishCommand)
            acl_retval = ACLCheckPubsubPerm(c,1,1,0,&acl_errpos);
        if (acl_retval != ACL_OK) {
            char *reason;
            switch (acl_retval) {
            case ACL_DENIED_CMD:
                reason = "no permission to execute the command or subcommand";
                break;
            case ACL_DENIED_KEY:
                reason = "no permission to touch the specified keys";
                break;
            case ACL_DENIED_CHANNEL:
                reason = "no permission to publish to the specified channel";
                break;
            default:
                reason = "no permission";
                break;
            }
        } else {
            // 這裡會call相關的命令
            // 如果是涉及到修改相關的命令，不管有沒有更改值，都會將hash表中watch了key的客戶端的狀態置為CLIENT_DIRTY_CAS            
            call(c,server.loading ? CMD_CALL_NONE : CMD_CALL_FULL);
            serverAssert((c->flags & CLIENT_BLOCKED) == 0);
        }

        /* Commands may alter argc/argv, restore mstate. */
        c->mstate.commands[j].argc = c->argc;
        c->mstate.commands[j].argv = c->argv;
        c->mstate.commands[j].cmd = c->cmd;
    }
    
    c->argv = orig_argv;
    c->argc = orig_argc;
    c->cmd = orig_cmd;
    discardTransaction(c);
 
handle_monitor:
    /* Send EXEC to clients waiting data from MONITOR. We do it here
     * since the natural order of commands execution is actually:
     * MUTLI, EXEC, ... commands inside transaction ...
     * Instead EXEC is flagged as CMD_SKIP_MONITOR in the command
     * table, and we do it here with correct ordering. */
    if (listLength(server.monitors) && !server.loading)
        replicationFeedMonitors(c,server.monitors,c->db->id,c->argv,c->argc);
}
 
/* 清空當前事務資料 */
void discardTransaction(client *c) {
    freeClientMultiState(c);
    initClientMultiState(c);
    c->flags &= ~(CLIENT_MULTI|CLIENT_DIRTY_CAS|CLIENT_DIRTY_EXEC);
    unwatchAllKeys(c);
}

邏輯如下圖：

Client 監控了一系列 key；
當 Redis DB 執行 multi 命令失敗之後，會設定 flags 為 CLIENT_DIRTY_CAS；
客戶端在獲得 key 的時候，發現 flag 被設定了，就不會執行事務佇列中的命令；

+-------------------+
| client            |
|                   |       +-------------+     +--------------+
|                   |  1    | wk          |     | wk           |
|   watched_keys +--------> |      key 1  | ... |       key n  |
|                   |       |      db  1  |     |       db  n  |
|            ^      |       +-------------+     +--------------+
|            |      |
|            | 3    |                                      +----------------------+
|            |      |                                      | Redis DB             |
|            |      |                                      |                      |
|            +      |  2 set CLIENT_DIRTY_CAS when error   |                      |
|          flags <--------------------------------------------+ execCommand(multi)|
|                   |                                      |                      |
+-------------------+                                      +----------------------+