基於Redis千萬級使用者排行榜最佳實踐

藍鯨人發表於2017-12-25

基於Redis 千萬級使用者排行榜最佳實踐

前言

Redis 是一個開源的，記憶體中的資料結構儲存系統，可以用作資料庫、快取和訊息佇列中介軟體。它支援多種型別的資料結構，如字串（string）， Hash，列表（List），集合（Set），有序集合（Sorted Set) 。內建了複製（replication），LUA指令碼（Lua scripting）， LRU驅動事件（LRU eviction），事務(Transactions）和不同級別的磁碟持久化(Persistence)，並通過 Redis哨兵(Sentinel)和自動分割槽(Cluster)提供高可用性

根據Redis支援的幾大資料結構, 我們可以在多個場景加以應用, 目前應用最廣泛的場景主要用於以下幾大類:

基於Web服務的分散式會話Session儲存

訊息佇列

排行榜

計數器

釋出/訂閱

此文主要講解個人是如何處理千萬級別使用者排行榜並持久化使用者排行榜使用者資料

排行榜種類

由於排行榜的不同業務需求, 需要從不同角度去考慮排行榜的榜單生成規則. 目前我們主要研究實時榜和歷史榜單

排行榜Redis資料結構選擇

在這裡, 我們採用Redis Sorted Set資料結構來處理排行榜資料, 根據官方文件定義:

Redis Sorted sets —
Sorted sets are a data type which is similar to a mix between a Set and a Hash. Like sets, sorted sets are composed of unique, non-repeating string elements, so in some sense a sorted set is a set as well.

However while elements inside sets are not ordered, every element in a sorted set is associated with a floating point value, called the score (this is why the type is also similar to a hash, since every element is mapped to a value).

然而，雖然Set中的元素沒有被排序，但排序集中的每個元素都與一個浮點值相關聯，這個值稱為得分（這也是為什麼該型別與雜湊類似，因為每個元素都對映到一個值）。

Moreover, elements in a sorted sets are taken in order (so they are not ordered on request, order is a peculiarity of the data structure used to represent sorted sets). They are ordered according to the following rule:

此外，Sorted Set中的元素是按順序排列的（而不是按加入的先後順序排序的，並且這個順序是用於表示排序集的資料結構的一個特性）。

在Sorted Set中新增、刪除或更新一個成員都是非常快速的操作，其時間複雜度為集合中成員數量的對數。由於Sorted Set中的成員在集合中的位置是有序的，因此，即便是訪問位於集合中部的成員也仍然是非常高效的。

Sorted Set底層的實現是跳躍表(Skip List),插入和刪除的效率都很高.

在redis的原始碼中，找到zset的定義如下(server.h)：

/* ZSETs use a specialized version of Skiplists */
typedef struct zskiplistNode {
    sds ele;
    double score;
    struct zskiplistNode *backward;
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned int span;
    } level[];
} zskiplistNode;

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;

typedef struct zset {
    dict *dict;
    zskiplist *zsl;
} zset;

插入節點對應的方法原始碼(t_zset.c)：

/* Insert a new node in the skiplist. Assumes the element does not already
 * exist (up to the caller to enforce that). The skiplist takes ownership
 * of the passed SDS string `ele`. */
zskiplistNode *zslInsert(zskiplist *zsl, double score, sds ele) {
    zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
    unsigned int rank[ZSKIPLIST_MAXLEVEL];
    int i, level;

    serverAssert(!isnan(score));
    x = zsl->header;
    for (i = zsl->level-1; i >= 0; i--) {
        /* store rank that is crossed to reach the insert position */
        rank[i] = i == (zsl->level-1) ? 0 : rank[i+1];
        while (x->level[i].forward &&
                (x->level[i].forward->score < score ||
                    (x->level[i].forward->score == score &&
                    sdscmp(x->level[i].forward->ele,ele) < 0)))
        {
            rank[i] += x->level[i].span;
            x = x->level[i].forward;
        }
        update[i] = x;
    }
    /* we assume the element is not already inside, since we allow duplicated
     * scores, reinserting the same element should never happen since the
     * caller of zslInsert() should test in the hash table if the element is
     * already inside or not. */
    level = zslRandomLevel();
    if (level > zsl->level) {
        for (i = zsl->level; i < level; i++) {
            rank[i] = 0;
            update[i] = zsl->header;
            update[i]->level[i].span = zsl->length;
        }
        zsl->level = level;
    }
    x = zslCreateNode(level,score,ele);
    for (i = 0; i < level; i++) {
        x->level[i].forward = update[i]->level[i].forward;
        update[i]->level[i].forward = x;

        /* update span covered by update[i] as x is inserted here */
        x->level[i].span = update[i]->level[i].span - (rank[0] - rank[i]);
        update[i]->level[i].span = (rank[0] - rank[i]) + 1;
    }

    /* increment span for untouched levels */
    for (i = level; i < zsl->level; i++) {
        update[i]->level[i].span++;
    }

    x->backward = (update[0] == zsl->header) ? NULL : update[0];
    if (x->level[0].forward)
        x->level[0].forward->backward = x;
    else
        zsl->tail = x;
    zsl->length++;
    return x;
}

場景應用

這裡我們Server語言採用php, 因此程式碼風格為php語法:

1, 資料錄入

    $rank_zset_key = $rank_prefix_key.date(`Y-m-d`).`:zset`;        //定義每天排行榜的key
    $redis->zIncrBy($rank_zset_key, 1, $user_id);    //使用者計數器加1

2, 排行榜日榜

    $rank_zset_key = $rank_prefix_key.date(`Y-m-d`).`:zset`;
    $values = $redis->zRevRange($rank_zset_key, 0 , self::K_RANK_SIZE, true);//K_RANK_SIZE 既是排行榜的榜單長度

3, 排行榜月榜

    $sta_days = [
      `prefix_key_2017-12-21:zset`,
      `prefix_key_2017-12-21:zset`,
      `prefix_key_2017-12-21:zset`...
    ];    
    $lastmonth_store_key = `monthly_rank_key`;
    $this->redis->zUnion($lastmonth_store_key, $sta_days);  //將每天的排行榜資料進行zUion操作, 既是合併Sorted Set
    $values = $this->redis->zRevRange($lastmonth_store_key, 0 , self::K_RANK_SIZE, true); //獲取月排行榜
    
    //持久化至DB
    $db_rank = [];
    foreach ($values as $user_id => $score){
       $db_rank[$user_id] = $score;
    }
    ......

但這裡需要注意的是需要組成並集的各個集合的key必須是對應到redis叢集中的同一個slot上，否則將會出現一個異常：CROSSSLOT Keys in request don`t hash to the same slot。參考: https://redis.io/commands/cluster-keyslot

所以redis提供了一種特定的標籤{},這個{}內的字串才參與計算hash slot.列如：{user}:aaa與{user}:bbb 這兩個集合可以確保在同一個slot上,可以使用zunionstore求它們的並集。所以，在處理這種需要合併redis歷史資料時，如果是在redis叢集環境下，需要特別注意。

其他需要注意的問題

在基於redis的整個排行榜的設計過程中，我們還需要考慮的

排行榜key的數量：確保key的數量是可控的，可設定過期時間的，就設定明顯的過期時間, 如果確實由於空間不足, 可以持久化到DB中處理
佔用空間評估：Redis中排行榜資料記憶體佔用情況進行評估, 防止不需要的排行榜資料長期佔用記憶體

參考資料

http://redis.cn/

https://redis.io/commands/cluster-keyslot

基於Redis實現線上遊戲積分排行榜
2020-09-23
Redis遊戲
想知道誰是你的最佳使用者？基於Redis實現排行榜週期榜與最近N期榜
2018-12-13
Redis
千萬級資料深分頁查詢SQL效能最佳化實踐
2024-10-30
SQL
Spring Data Redis 最佳實踐！
2020-03-17
SpringRedis
基於 KubeSphere 的分級管理實踐
2022-06-28
Redis 高可用架構最佳實踐
2017-05-28
Redis架構
基於Redis、Storm的實時資料查詢實踐
2016-08-11
RedisORM
RabbitMQ保姆級教程最佳實踐
2023-09-23
MQ
MySQL 升級的最佳實踐
2014-09-29
MySql
基於Tp的千萬級資料圖片站
2019-05-11
基於 react, redux 最佳實踐構建的 2048
2017-10-12
ReactRedux
基於Ascend C的FlashAttention運算元效能最佳化最佳實踐
2024-06-12
最佳實踐 | 原始碼升級gcc
2019-01-17
原始碼GC
基於Sentinel自研元件的系統限流、降級、負載保護最佳實踐探索
2023-05-16
元件負載
Jest基於dva框架的單元測試最佳實踐
2019-03-04
框架
基於LNMP的WordPress搭建與速度最佳化實踐
2018-11-08
LNMP
React SSR 企業級方案最佳實踐
2019-03-04
React
PHP 無限級分類最佳實踐
2019-02-16
PHP
PostgreSQL十億級模糊查詢最佳實踐
2017-04-26
SQL
基於 JWT + Refresh Token 的使用者認證實踐
2018-12-13
JWT
基於Redis點陣圖實現使用者簽到功能
2019-02-18
Redis
基於Spring Cache實現二級快取(Caffeine+Redis)
2022-03-22
Spring快取Redis
基於雲原生閘道器的可觀測性最佳實踐
2022-11-21
企業級雲資料庫最佳實踐
2020-05-06
資料庫
《SpringBoot實戰開發》——基於Gradle+Kotlin的企業級應用開發最佳實踐
2018-04-15
Spring BootGradleKotlin
Uber基於Apache Hudi構建PB級資料湖實踐
2020-06-11
Apache
Vue 前端配置多級目錄實踐（基於Nginx配置方式）
2021-12-08
Vue前端Nginx
DCOS實踐分享(3)：基於Mesos 和 Docker企業級移動應用實踐分享
2016-06-14
Docker
Redis大叢集擴容效能最佳化實踐
2021-10-18
Redis
千萬級約課系統自動化壓測實踐 - 甯浩然
2020-06-12
Elasticsearch從0到千萬級資料查詢實踐（非轉載）
2021-01-30
Elasticsearch
基於Redis作為發號器生成短網址Python實踐
2020-05-12
RedisPython
騰訊廣告模型基於"太極"的訓練成本最佳化實踐
2023-02-15
模型
基於Redis的低成本高可用排行榜服務構建
2018-07-12
Redis
基於github的CICD實踐
2021-07-19
Github
基於 KubeVela 的機器學習實踐
2022-04-07
機器學習
基於redis實現分散式鎖
2018-11-19
Redis分散式
Istio最佳實踐系列：如何實現方法級呼叫跟蹤？
2021-04-15

基於Redis千萬級使用者排行榜最佳實踐