ApsaraDB for Redis之記憶體去哪兒了（一）資料過期與逐出策略

weixin_33858249發表於2017-11-24

Redis記憶體

背景

Redis作為一個高效能的記憶體NoSQL資料庫，其容量受到最大記憶體限制的限制。

使用者在使用阿里雲Redis時，除了對效能，穩定性有很高的要求外，對記憶體佔用也比較敏感。在使用過程中，有些使用者會覺得自己的線上例項記憶體佔用比自己預想的要大。

事實上，例項中的記憶體除了儲存原始的鍵值對所需的開銷外，還有一些執行時產生的額外記憶體，包括：

垃圾資料和過期Key所佔空間
字典漸進式Rehash導致未及時刪除的空間
Redis管理資料，包括底層資料結構開銷，客戶端資訊，讀寫緩衝區等
主從複製，bgsave時的額外開銷
其它

本系列文章主要分析這些在Redis中產生的原因，帶來的影響和規避的方式。

本文主要分析第一項Redis過期策略對記憶體的影響。

Redis過期資料清理策略

過期資料清理時機

為了防止一次性清理大量過期Key導致Redis服務受影響，Redis只在空閒時清理過期Key。

具體Redis逐出過期Key的時機為:

訪問Key時，會判斷Key是否過期，逐出過期Key;

robj lookupKeyRead(redisDb db, robj *key) {
```
robj *val;
expireIfNeeded(db,key);
val = lookupKey(db,key);
...
return val;
```
}

CPU空閒時在定期serverCron任務中，逐出部分過期Key;

aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL)

int serverCron(struct aeEventLoop eventLoop, long long id, void clientData) {

...
databasesCron();
...

}

void databasesCron(void) {

/* Expire keys by random sampling. Not required for slaves
 + as master will synthesize DELs for us. */
if (server.active_expire_enabled && server.masterhost == NULL)
    activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
...

}

每次事件迴圈執行的時候，逐出部分過期Key;

void aeMain(aeEventLoop *eventLoop) {

eventLoop->stop = 0;
while (!eventLoop->stop) {
    if (eventLoop->beforesleep != NULL)
        eventLoop->beforesleep(eventLoop);
    aeProcessEvents(eventLoop, AE_ALL_EVENTS);
}

}

void beforeSleep(struct aeEventLoop *eventLoop) {

...
/* Run a fast expire cycle (the called function will return
 - ASAP if a fast cycle is not needed). */
if (server.active_expire_enabled && server.masterhost == NULL)
    activeExpireCycle(ACTIVE_EXPIRE_CYCLE_FAST);
...

}

過期資料清理演算法

Redis過期Key清理的機制對清理的頻率和最大時間都有限制，在儘量不影響正常服務的情況下，進行過期Key的清理，以達到長時間服務的效能最優.

Redis會週期性的隨機測試一批設定了過期時間的key並進行處理。測試到的已過期的key將被刪除。具體的演算法如下:

Redis配置項hz定義了serverCron任務的執行週期，預設為10，即CPU空閒時每秒執行10次;
每次過期key清理的時間不超過CPU時間的25%，即若hz=1，則一次清理時間最大為250ms，若hz=10，則一次清理時間最大為25ms;
清理時依次遍歷所有的db;
從db中隨機取20個key，判斷是否過期，若過期，則逐出;
若有5個以上key過期，則重複步驟4，否則遍歷下一個db;
在清理過程中，若達到了25%CPU時間，退出清理過程;

這是一個基於概率的簡單演算法，基本的假設是抽出的樣本能夠代表整個key空間，redis持續清理過期的資料直至將要過期的key的百分比降到了25%以下。這也意味著在長期來看任何給定的時刻已經過期但仍佔據著記憶體空間的key的量最多為每秒的寫操作量除以4.

由於演算法採用的隨機取key判斷是否過期的方式，故幾乎不可能清理完所有的過期Key;
調高hz引數可以提升清理的頻率，過期key可以更及時的被刪除，但hz太高會增加CPU時間的消耗;Redis作者關於hz引數的一些討論

程式碼分析如下:

void activeExpireCycle(int type) {
    ...
    /* We can use at max ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC percentage of CPU time
     * per iteration. Since this function gets called with a frequency of
     * server.hz times per second, the following is the max amount of
     * microseconds we can spend in this function. */
    // 最多允許25%的CPU時間用於過期Key清理
    // 若hz=1，則一次activeExpireCycle最多隻能執行250ms
    // 若hz=10，則一次activeExpireCycle最多隻能執行25ms
    timelimit = 1000000*ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC/server.hz/100;
    ...
    // 遍歷所有db
    for (j = 0; j < dbs_per_call; j++) {
        int expired;
        redisDb *db = server.db+(current_db % server.dbnum);

        /* Increment the DB now so we are sure if we run out of time
         * in the current DB we'll restart from the next. This allows to
         * distribute the time evenly across DBs. */
        current_db++;

        /* Continue to expire if at the end of the cycle more than 25%
         * of the keys were expired. */
        do {
            ...
            // 一次取20個Key，判斷是否過期
            if (num > ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP)
                num = ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP;

            while (num--) {
                dictEntry *de;
                long long ttl;

                if ((de = dictGetRandomKey(db->expires)) == NULL) break;
                ttl = dictGetSignedIntegerVal(de)-now;
                if (activeExpireCycleTryExpire(db,de,now)) expired++;
            }

            if ((iteration & 0xf) == 0) { /* check once every 16 iterations. */
                long long elapsed = ustime()-start;
                latencyAddSampleIfNeeded("expire-cycle",elapsed/1000);
                if (elapsed > timelimit) timelimit_exit = 1;
            }
            if (timelimit_exit) return;
            /* We don't repeat the cycle if there are less than 25% of keys
             * found expired in the current DB. */
            // 若有5個以上過期Key，則繼續直至時間超過25%的CPU時間
            // 若沒有5個過期Key，則跳過。
        } while (expired > ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP/4);
    }
}

Redis資料逐出策略

資料逐出時機

// 執行命令
int processCommand(redisClient *c) {
        ...
        /* Handle the maxmemory directive.
        **
        First we try to free some memory if possible (if there are volatile
        * keys in the dataset). If there are not the only thing we can do
        * is returning an error. */
        if (server.maxmemory) {
            int retval = freeMemoryIfNeeded();
            ...
    }
    ...
}

資料逐出演算法

在逐出演算法中，根據使用者設定的逐出策略，選出待逐出的key，直到當前記憶體小於最大記憶體值為主.

可選逐出策略如下：

volatile-lru：從已設定過期時間的資料集（server.db[i].expires）中挑選最近最少使用的資料淘汰
volatile-ttl：從已設定過期時間的資料集（server.db[i].expires）中挑選將要過期的數據淘汰
volatile-random：從已設定過期時間的資料集（server.db[i].expires）中任意選擇資料淘汰
allkeys-lru：從資料集（server.db[i].dict）中挑選最近最少使用的資料淘汰
allkeys-random：從資料集（server.db[i].dict）中任意選擇資料淘汰
no-enviction（驅逐）：禁止驅逐資料

具體程式碼如下

int freeMemoryIfNeeded() {
    ...
    // 計算mem_used
    mem_used = zmalloc_used_memory();
    ...

    /* Check if we are over the memory limit. */
    if (mem_used <= server.maxmemory) return REDIS_OK;

    // 如果禁止逐出，返回錯誤
    if (server.maxmemory_policy == REDIS_MAXMEMORY_NO_EVICTION)
        return REDIS_ERR; /* We need to free memory, but policy forbids. */

    mem_freed = 0;
    mem_tofree = mem_used - server.maxmemory;
    long long start = ustime();
    latencyStartMonitor(latency);
    while (mem_freed < mem_tofree) {
        int j, k, keys_freed = 0;

        for (j = 0; j < server.dbnum; j++) {
            // 根據逐出策略的不同，選出待逐出的資料
            long bestval = 0; /* just to prevent warning */
            sds bestkey = NULL;
            struct dictEntry *de;
            redisDb *db = server.db+j;
            dict *dict;

            if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||
                server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM)
            {
                dict = server.db[j].dict;
            } else {
                dict = server.db[j].expires;
            }
            if (dictSize(dict) == 0) continue;

            /* volatile-random and allkeys-random policy */
            if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM ||
                server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_RANDOM)
            {
                de = dictGetRandomKey(dict);
                bestkey = dictGetKey(de);
            }

            /* volatile-lru and allkeys-lru policy */
            else if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||
                server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
            {
                for (k = 0; k < server.maxmemory_samples; k++) {
                    sds thiskey;
                    long thisval;
                    robj *o;

                    de = dictGetRandomKey(dict);
                    thiskey = dictGetKey(de);
                    /* When policy is volatile-lru we need an additional lookup
                     * to locate the real key, as dict is set to db->expires. */
                    if (server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
                        de = dictFind(db->dict, thiskey);
                    o = dictGetVal(de);
                    thisval = estimateObjectIdleTime(o);

                    /* Higher idle time is better candidate for deletion */
                    if (bestkey == NULL || thisval > bestval) {
                        bestkey = thiskey;
                        bestval = thisval;
                    }
                }
            }

            /* volatile-ttl */
            else if (server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_TTL) {
                for (k = 0; k < server.maxmemory_samples; k++) {
                    sds thiskey;
                    long thisval;

                    de = dictGetRandomKey(dict);
                    thiskey = dictGetKey(de);
                    thisval = (long) dictGetVal(de);

                    /* Expire sooner (minor expire unix timestamp) is better
                     * candidate for deletion */
                    if (bestkey == NULL || thisval < bestval) {
                        bestkey = thiskey;
                        bestval = thisval;
                    }
                }
            }

            /* Finally remove the selected key. */
            // 逐出挑選出的資料
            if (bestkey) {
                ...
                delta = (long long) zmalloc_used_memory();
                dbDelete(db,keyobj);
                delta -= (long long) zmalloc_used_memory();
                mem_freed += delta;
                ...
            }
        }
        ...
    }
    ...
    return REDIS_OK;
}