由淺入深介紹 Redis LRU 策略的具體實現

發表於2016-10-27

在使用redis作為快取的場景下，記憶體淘汰策略決定的redis的記憶體使用效率。在大部分場景下，我們會採用LRU(Least Recently Used)來作為redis的淘汰策略。本文將由淺入深的介紹redis lru策略的具體實現。

首先我們來科普下，什麼是LRU ？(以下來自維基百科)

Discards the least recently used items first. This algorithm requires keeping track of what was used when, which is expensive if one wants to make sure the algorithm always discards the least recently used item. General implementations of this technique require keeping “age bits” for cache-lines and track the “Least Recently Used” cache-line based on age-bits. In such an implementation, every time a cache-line is used, the age of all other cache-lines changes.

簡而言之，就是每次淘汰最近最少使用的元素。一般的實現，都是採用對儲存在記憶體的元素採用 ‘age bits’ 來標記該元素從上次訪問到現在為止的時長，從而在每次用LRU淘汰時，淘汰這些最長時間未被訪問的元素。

這裡我們先實現一個簡單的LRU Cache，以便於後續內容的理解。(來自leetcod,不過這裡我重新用Python語言實現了)

實現該快取滿足如下兩點:
1.get(key) – 如果該元素(總是正數)存在，將該元素移動到lru頭部，並返回該元素的值，否則返回-1。
2.set(key,value) – 設定一個key的值為value(如果該元素存在),並將該元素移動到LRU頭部。否則插入一個key,且值為value。如果在設定前檢查到，該key插入後，會超過cache的容量，則根據LRU策略，刪除最近最少使用的key。

分析
這裡我們採用雙向連結串列來實現元素(k-v鍵值對)的儲存，同時採用hash表來儲存相關的key與item的對應關係。這樣，我們既能在O(1)的時間對key進行操作，同時又能利用Double LinkedList的新增和刪除節點的便利性。(get/set都能在O(1)內完成)。
具體實現(Python語言)

class Node:
      key=None
      value=None
      pre=None
      next=None

def __init__(self,key,value):
      self.key=key
      self.value=value

class LRUCache:
      capacity=0
      map={} # key is string ,and value is Node object
      head=None
      end=None

def __init__(self,capacity):
      self.capacity=capacity

def get(self,key):
      if key in self.map:
           node=self.map[key]
           self.remove(node)
           self.setHead(node)
           return node.value
      else:
          return -1

def getAllKeys(self):
       tmpNode=None
       if self.head:
          tmpNode=self.head
          while tmpNode:
          print (tmpNode.key,tmpNode.value)
          tmpNode=tmpNode.next

def remove(self,n):
      if n.pre:
         n.pre.next=n.next
      else:
         self.head=n.next

      if n.next:
         n.next.pre=n.pre
      else:
         self.end=n.pre

def setHead(self,n):
      n.next=self.head
      n.pre=None

      if self.head:
         self.head.pre=n

      self.head=n

      if not self.end:
         self.end=self.head

def set(self,key,value):
       if key in self.map:
          oldNode=self.map[key]
          oldNode.value=value
          self.remove(oldNode)
          self.setHead(oldNode)
       else:
          node=Node(key,value)
          if len(self.map) >= self.capacity:
             self.map.pop(self.end.key)
             self.remove(self.end)
             self.setHead(node)
          else:
             self.setHead(node)

          self.map[key]=node


def main():
       cache=LRUCache(100)

       #d->c->b->a
       cache.set('a','1')
       cache.set('b','2')
       cache.set('c',3)
       cache.set('d',4)

       #遍歷lru連結串列
       cache.getAllKeys()

       #修改('a','1') ==> ('a',5),使該節點從LRU尾端移動到開頭.
       cache.set('a',5)
       #LRU連結串列變為 a->d->c->b

       cache.getAllKeys()
       #訪問key='c'的節點，是該節點從移動到LRU頭部
       cache.get('c')
       #LRU連結串列變為 c->a->d->b
       cache.getAllKeys()

       if __name__ == '__main__': 
          main()

100

101

class Node:

key=None

value=None

pre=None

next=None

def __init__(self,key,value):

self.key=key

self.value=value

class LRUCache:

capacity=0

map={} # key is string ,and value is Node object

head=None

end=None

def __init__(self,capacity):

self.capacity=capacity

def get(self,key):

if key in self.map:

node=self.map[key]

self.remove(node)

self.setHead(node)

return node.value

else:

return -1

def getAllKeys(self):

tmpNode=None

if self.head:

tmpNode=self.head

while tmpNode:

print (tmpNode.key,tmpNode.value)

tmpNode=tmpNode.next

def remove(self,n):

if n.pre:

n.pre.next=n.next

else:

self.head=n.next

if n.next:

n.next.pre=n.pre

else:

self.end=n.pre

def setHead(self,n):

n.next=self.head

n.pre=None

if self.head:

self.head.pre=n

self.head=n

if not self.end:

self.end=self.head

def set(self,key,value):

if key in self.map:

oldNode=self.map[key]

oldNode.value=value

self.remove(oldNode)

self.setHead(oldNode)

else:

node=Node(key,value)

if len(self.map) >= self.capacity:

self.map.pop(self.end.key)

self.remove(self.end)

self.setHead(node)

else:

self.setHead(node)

self.map[key]=node

def main():

cache=LRUCache(100)

#d->c->b->a

cache.set('a','1')

cache.set('b','2')

cache.set('c',3)

cache.set('d',4)

#遍歷lru連結串列

cache.getAllKeys()

#修改('a','1') ==> ('a',5),使該節點從LRU尾端移動到開頭.

cache.set('a',5)

#LRU連結串列變為 a->d->c->b

cache.getAllKeys()

#訪問key='c'的節點，是該節點從移動到LRU頭部

cache.get('c')

#LRU連結串列變為 c->a->d->b

cache.getAllKeys()

if __name__ == '__main__':

main()

通過上面簡單的介紹與實現，現在我們基本已經瞭解了什麼是LRU，下面我們來看看LRU演算法在redis 內部的實現細節，以及其會在什麼情況下帶來問題。在redis內部，是通過全域性結構體struct redisServer 儲存redis啟動之後相關的資訊，比如:

struct redisServer {
       pid_t pid; /* Main process pid. */
       char *configfile; /* Absolute config file path, or NULL */
       …..
       unsigned lruclock:LRU_BITS; /* Clock for LRU eviction */
       ...
       };

struct redisServer {

pid_t pid; /* Main process pid. */

char *configfile; /* Absolute config file path, or NULL */

…..

unsigned lruclock:LRU_BITS; /* Clock for LRU eviction */

...

};

redisServer 中包含了redis伺服器啟動之後的基本資訊(PID,配置檔案路徑,serverCron執行頻率hz等),外部可呼叫模組資訊，網路資訊，RDB/AOF資訊，日誌資訊，複製資訊等等。

我們看到上述結構體中lruclock:LRU_BITS,其中儲存了伺服器自啟動之後的lru時鐘，該時鐘是全域性的lru時鐘。該時鐘100ms(可以通過hz來調整,預設情況hz=10,因此每1000ms/10=100ms執行一次定時任務)更新一次。

接下來我們看看LRU時鐘的具體實現:

server.lruclock = getLRUClock();
getLRUClock函式如下:
#define LRU_CLOCK_RESOLUTION 1000 /* LRU clock resolution in ms */
#define LRU_BITS 24
#define LRU_CLOCK_MAX ((1<<LRU_BITS)-1) /* Max value of obj->lru */
/* Return the LRU clock, based on the clock resolution. This is a time
 * in a reduced-bits format that can be used to set and check the
 * object->lru field of redisObject structures. */

  unsigned int getLRUClock(void) {
        return (mstime()/LRU_CLOCK_RESOLUTION) & LRU_CLOCK_MAX;
  }

server.lruclock = getLRUClock();

getLRUClock函式如下:

#define LRU_CLOCK_RESOLUTION 1000 /* LRU clock resolution in ms */

#define LRU_BITS 24

#define LRU_CLOCK_MAX ((1<<LRU_BITS)-1) /* Max value of obj->lru */

/* Return the LRU clock, based on the clock resolution. This is a time

* in a reduced-bits format that can be used to set and check the

* object->lru field of redisObject structures. */

unsigned int getLRUClock(void) {

return (mstime()/LRU_CLOCK_RESOLUTION) & LRU_CLOCK_MAX;

}

因此lrulock最大能到(2**24-1)/3600/24 = 194天,如果超過了這個時間，lrulock重新開始。對於redis server來說，server.lrulock表示的是一個全域性的lrulock，那麼對於每個redisObject都有一個自己的lrulock。這樣每redisObject就可以根據自己的lrulock和全域性的server.lrulock比較，來確定是否能夠被淘汰掉。

redis key對應的value的存放物件:

typedef struct redisObject {
     unsigned type:4;
     unsigned encoding:4;
     unsigned lru:LRU_BITS; /* LRU time (relative to server.lruclock) or
                             * LFU data (least significant 8 bits frequency
                             * and most significant 16 bits decreas time). */
     int refcount;
     void *ptr;
     } robj

typedef struct redisObject {

unsigned type:4;

unsigned encoding:4;

unsigned lru:LRU_BITS; /* LRU time (relative to server.lruclock) or

* LFU data (least significant 8 bits frequency

* and most significant 16 bits decreas time). */

int refcount;

void *ptr;

} robj

那麼什麼時候，lru會被更新呢？訪問該key，lru都會被更新，這樣該key就能及時的被移動到lru頭部，從而避免從lru中淘汰。下面是這一部分的實現:

/* Low level key lookup API, not actually called directly from commands
 * implementations that should instead rely on lookupKeyRead(),
 * lookupKeyWrite() and lookupKeyReadWithFlags(). */
robj *lookupKey( redisDb *db, robj *key, int flags )
{
	dictEntry *de = dictFind( db->dict, key->ptr );
	if ( de )
	{
		robj *val = dictGetVal( de );


/* Update the access time for the ageing algorithm.
 * Don't do it if we have a saving child, as this will trigger
 * a copy on write madness. */
		if ( server.rdb_child_pid == -1 &&
		     server.aof_child_pid == -1 &&
		     !(flags & LOOKUP_NOTOUCH) )
		{
			if ( server.maxmemory_policy & MAXMEMORY_FLAG_LFU )
			{
				unsigned long	ldt	= val->lru >> 8;
				unsigned long	counter = LFULogIncr( val->lru & 255 );
				val->lru = (ldt << 8) | counter;
			} else {
				val->lru = LRU_CLOCK();
			}
		}
		return(val);
	} else {
		return(NULL);
	}
}

/* Low level key lookup API, not actually called directly from commands

* implementations that should instead rely on lookupKeyRead(),

* lookupKeyWrite() and lookupKeyReadWithFlags(). */

robj *lookupKey( redisDb *db, robj *key, int flags )

{

dictEntry *de = dictFind( db->dict, key->ptr );

if ( de )

{

robj *val = dictGetVal( de );

/* Update the access time for the ageing algorithm.

* Don't do it if we have a saving child, as this will trigger

* a copy on write madness. */

if ( server.rdb_child_pid == -1 &&

server.aof_child_pid == -1 &&

!(flags & LOOKUP_NOTOUCH) )

{

if ( server.maxmemory_policy & MAXMEMORY_FLAG_LFU )

{

unsigned long ldt = val->lru >> 8;

unsigned long counter = LFULogIncr( val->lru & 255 );

val->lru = (ldt << 8) | counter;

} else {

val->lru = LRU_CLOCK();

}

return(val);

} else {

return(NULL);

}

接下來，我們在來分析，key的lru淘汰策略如何實現，分別有哪幾種:

# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory
# is reached. You can select among five behaviors:
#
# volatile-lru -> Evict using approximated LRU among the keys with an expire set. //在設定了過期時間的key中，使用近似的lru淘汰策略
# allkeys-lru -> Evict any key using approximated LRU. //所有的key均使用近似的lru淘汰策略
# volatile-lfu -> Evict using approximated LFU among the keys with an expire set. //在設定了過期時間的key中，使用lfu淘汰策略
# allkeys-lfu -> Evict any key using approximated LFU. //所有的key均使用lfu淘汰策略
# volatile-random -> Remove a random key among the ones with an expire set. //在設定了過期時間的key中，使用隨機淘汰策略
# allkeys-random -> Remove a random key, any key. //所有的key均使用隨機淘汰策略 
# volatile-ttl -> Remove the key with the nearest expire time (minor TTL) //使用ttl淘汰策略
# noeviction -> Don't evict anything, just return an error on write operations . //不允許淘汰，在寫操作發生，但記憶體不夠時，將會返回錯誤
#
# LRU means Least Recently Used
# LFU means Least Frequently Used
#
# Both LRU, LFU and volatile-ttl are implemented using approximated
# randomized algorithms.

# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory

# is reached. You can select among five behaviors:

# volatile-lru -> Evict using approximated LRU among the keys with an expire set. //在設定了過期時間的key中，使用近似的lru淘汰策略

# allkeys-lru -> Evict any key using approximated LRU. //所有的key均使用近似的lru淘汰策略

# volatile-lfu -> Evict using approximated LFU among the keys with an expire set. //在設定了過期時間的key中，使用lfu淘汰策略

# allkeys-lfu -> Evict any key using approximated LFU. //所有的key均使用lfu淘汰策略

# volatile-random -> Remove a random key among the ones with an expire set. //在設定了過期時間的key中，使用隨機淘汰策略

# allkeys-random -> Remove a random key, any key. //所有的key均使用隨機淘汰策略

# volatile-ttl -> Remove the key with the nearest expire time (minor TTL) //使用ttl淘汰策略

# noeviction -> Don't evict anything, just return an error on write operations . //不允許淘汰，在寫操作發生，但記憶體不夠時，將會返回錯誤

# LRU means Least Recently Used

# LFU means Least Frequently Used

# Both LRU, LFU and volatile-ttl are implemented using approximated

# randomized algorithms.

這裡暫不討論LFU,TTL淘汰演算法和noeviction的情況，僅僅討論lru所有場景下的，淘汰策略具體實現。(LFU和TTL將在下一篇文章中詳細分析)。
LRU淘汰的場景:

1.主動淘汰。

1.1 通過定時任務serverCron定期的清理過期的key。
2.被動淘汰

2.1 每次寫入key時，發現記憶體不夠，呼叫activeExpireCycle釋放一部分記憶體。
2.2 每次訪問相關的key，如果發現key過期，直接釋放掉該key相關的記憶體。

首先我們來分析LRU主動淘汰的場景:
serverCron每間隔1000/hz ms會呼叫databasesCron方法來檢測並淘汰過期的key。

void databasesCron(void){
   /* Expire keys by random sampling. Not required for slaves
    * as master will synthesize DELs for us. */
    if (server.active_expire_enabled && server.masterhost == NULL)
        activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
    …..
}

void databasesCron(void){

/* Expire keys by random sampling. Not required for slaves

* as master will synthesize DELs for us. */

if (server.active_expire_enabled && server.masterhost == NULL)

activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);

…..

}

主動淘汰是通過activeExpireCycle 來實現的，這部分的邏輯如下:
1.遍歷至多16個DB 。【由巨集CRON_DBS_PER_CALL定義，預設為16】

2.隨機挑選20個帶過期時間的key。【由巨集ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP定義，預設20】

3.如果key過期，則將key相關的記憶體釋放，或者放入失效佇列。

4.如果操作時間超過允許的限定時間,至多25ms。(timelimit = 1000000*ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC/server.hz/100，
,ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC=25,server.hz預設為10), 則此次淘汰操作結束返回,否則進入5。

5.如果該DB下，有超過5個key (ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP/4=5) 實際失效，則進入 2，否則選擇下一個DB，再次進入2。

6.遍歷完成，結束。

流程圖如下
注：(圖中大於等於%5的可以是實際過期的，應改為大於等於%25的key是實際過期的。iteration++是在遍歷20個key的時候，每次加1）。

被動淘汰 – 記憶體不夠，呼叫activeExpireCycle釋放

該步驟的實現方式如下:

processCommand 函式關於記憶體淘汰策略的邏 輯 :


/* Handle the maxmemory directive.
 *
 * First we try to free some memory if possible (if there are volatile
 * keys in the dataset). If there are not the only thing we can do
 * is returning an error. */
if ( server.maxmemory )
{
	int retval = freeMemoryIfNeeded();


/* freeMemoryIfNeeded may flush slave output buffers. This may result
 * into a slave, that may be the active client, to be freed. */
	if ( server.current_client == NULL )
		return(C_ERR);


/* It was impossible to free enough memory, and the command the client
 * is trying to execute is denied during OOM conditions? Error. */
	if ( (c->cmd->flags & CMD_DENYOOM) && retval == C_ERR )
	{
		flagTransaction( c );
		addReply( c, shared.oomerr );
		return(C_OK);
	}
}

processCommand 函式關於記憶體淘汰策略的邏輯 :

/* Handle the maxmemory directive.

* First we try to free some memory if possible (if there are volatile

* keys in the dataset). If there are not the only thing we can do

* is returning an error. */

if ( server.maxmemory )

{

int retval = freeMemoryIfNeeded();

/* freeMemoryIfNeeded may flush slave output buffers. This may result

* into a slave, that may be the active client, to be freed. */

if ( server.current_client == NULL )

return(C_ERR);

/* It was impossible to free enough memory, and the command the client

* is trying to execute is denied during OOM conditions? Error. */

if ( (c->cmd->flags & CMD_DENYOOM) && retval == C_ERR )

{

flagTransaction( c );

addReply( c, shared.oomerr );

return(C_OK);

}

每次執行命令前，都會呼叫freeMemoryIfNeeded來檢查記憶體的情況，並釋放相應的記憶體，如果釋放後，記憶體仍然不夠，直接向請求的客戶端返回OOM。

具體的步驟如下:
1.獲取redis server當前已經使用的記憶體mem_reported。

2.如果mem_reported < server.maxmemory ,則返回ok。否則mem_used=mem_reported，進入步驟3。

3.遍歷該redis的所slaves，mem_used減去所有slave佔用的ClientOutputBuffer。

4.如果配置了AOF，mem_used減去AOF佔用的空間。sdslen(server.aof_buf)+aofRewriteBufferSize()。

5.如果mem_used < server.maxmemory,返回ok。否則進入步驟6。

6.如果記憶體策略配置為noeviction，返回錯誤。否則進入7。

7.如果是LRU策略,如果是VOLATILE的LRU，則每次從可失效的資料集中，每次隨機取樣maxmemory_samples(預設為5)個key,從中選取idletime最大的key進行淘汰。否則，如果是ALLKEYS_LRU則從全域性資料中進行取樣，每次隨機取樣maxmemory_samples(預設為5)個key，並從中選擇idletime最大的key進行淘汰。

8.如果釋放記憶體之後，還是超過了server.maxmemory,則繼續淘汰，只到釋放後剩下的記憶體小於server.maxmemory為止。

被動淘汰 – 每次訪問相關的key，如果發現key過期，直接釋放掉該key相關的記憶體:
每次訪問key，都會呼叫expireIfNeeded來判斷key是否過期，如果過期，則釋放掉，並返回null，否則返回key的值。

總結
1.redis做為快取，經常採用LRU的策略來淘汰資料，所以如果同時過期的資料太多，就會導致redis發起主動檢測時耗費的時間過長(最大為250ms)，從而導致最大應用超時 >= 250ms。<

timelimit = 1000000*ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC/server.hz/100
ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC=25
server.hz>=1(預設為10)
timelimit <= 250ms

2.記憶體使用率過高，則會導致記憶體不夠，從而發起被動淘汰策略，從而使應用訪問超時。

3.合理的調整hz引數，從而控制每次主動淘汰的頻率，從而有效的緩解過期的key數量太多帶來的上述超時問題。

promise由淺入深
2018-03-19
Promise
LRU工程實現原始碼（一）：Redis 記憶體淘汰策略
2021-07-12
原始碼Redis記憶體
JavaScript Promise由淺入深
2019-03-05
JavaScriptPromise
MySQL索引由淺入深
2021-03-03
MySql索引
由淺入深的來聊聊Golang中select的實現機制
2018-09-09
Golang
物件導向-由淺入深
2018-12-03
物件
iOS架構由淺入深 | MVVM
2018-08-05
iOS架構MVVM
純手寫Promise，由淺入深
2019-09-26
Promise
由淺入深理解 IOC 和 DI
2020-08-31
Vue.js 2.0 由淺入深
2021-09-09
Vue.js
由淺入深理解Dubbo的SPI機制
2019-02-26
LRU演算法四種實現方式介紹
2019-03-18
演算法
第十八節：Skywalking由淺入深
2024-06-18
由淺入深 docker 系列： (2) docker 構建
2018-11-16
Docker
由淺入深 docker 系列： (3) docker-compose
2018-11-29
Docker
由淺入深完全理解Java動態代理
2018-05-28
Java
由淺入深 docker 系列： (6) 映象分層
2019-06-26
Docker
【Fastjson】Fastjson反序列化由淺入深
2021-12-13
ASTJSON
前端如何理解正則-由淺入深的學習
2020-04-03
前端
AI實戰 | 由淺入深，手把手帶你實現Java轉型學習助手
2024-03-01
AIJava
探索Redis設計與實現14：Redis事務淺析與ACID特性介紹
2019-11-17
Redis
Git 由淺入深之細說變基 (rebase)
2019-02-27
Git
由淺入深 docker 系列： (5) 資源隔離
2019-05-27
Docker
【redis前傳】自己手寫一個LRU策略 | redis淘汰策略
2021-06-25
Redis
利用LRU策略實現Axios請求快取
2021-08-01
iOS快取
c/c++指標從淺入深介紹——基於資料記憶體分配的理解（上）
2023-03-13
C++指標記憶體
MVP架構由淺入深篇一（基礎版）
2020-09-25
MVP架構
【由淺入深_打牢基礎】HOST頭攻擊
2022-06-20
C#非同步程式設計由淺入深（一）
2021-03-28
C#非同步程式設計
由淺入深，從掌握Promise的基本使用到手寫Promise
2022-04-03
Promise
MySQL內部實現讀鎖和寫鎖的具體鎖定型別介紹
2018-07-23
MySql型別
由淺入深 docker 系列：(4) 容器與虛擬機器
2019-01-23
Docker虛擬機
【由淺入深學MySQL】- MySQL連線查詢詳解
2023-05-12
MySql
由淺入深地教你開發自己的 React Router v4
2019-02-22
React
LRU快取替換策略及C#實現
2023-04-05
快取C#
C#非同步程式設計由淺入深（二）Async/Await的作用.
2021-04-10
C#非同步程式設計AI
如何做好任務管理？軟體+方法+具體實操介紹
2021-07-29
Linux具體目錄結構介紹！
2022-05-12
Linux
由淺到深瞭解工廠模式
2018-09-26
模式

由淺入深介紹 Redis LRU 策略的具體實現

相關文章