效能提升 48 倍! python redis 批次寫入大量資料最佳化過程

花菜發表於2020-09-14

原文網址 : https://testerhome.com/topics/25448

背景

最近在測試大資料時,需要往redis大寫入大量資料

1.最原始的版本,直接使用hset,效率很低

寫30w條完耗時365秒,這樣有兩個問題:

相同的key,寫入多條應該用hmset代替hset
另外可以用pipeline,避免頻繁跟redis服務端互動,大量減少網路io

def get_conn():
    r = redis.Redis(host='localhost', port=6379, decode_responses=True)
    return r


def test_set_redis():
    conn = get_conn()
    machineId = 43696000000000  
    device_no = 88800000  
    work_in = time.time()
    source = "1"
    factory_no = "factory"
    today = datetime.date.today()
    oneday = datetime.timedelta(days=1)
    tomorrow = str(today + oneday).replace("-", "")
    afterTomorrow = str(today + oneday + oneday).replace("-", "")
    todayZero = int(time.mktime(today.timetuple()))
    today = str(today).replace("-", "")
    for i in range(300000):  
        upAxisId = "uxi" + str(device_no)
        axisVarietyId = "axi" + str(device_no)
        varietyId = "vi" + str(device_no)
        axisNum = "axn" + str(device_no)
        try:
            conn.hset('mykey_prefix' + str(device_no), "machineId", str(machineId))
            conn.hset('mykey_prefix' + str(device_no), "machineNum", str(machineId))
            conn.hset('mykey_prefix' + str(device_no), "factoryId", factory_no)
            conn.hset('mykey_prefix' + str(device_no), "groupId", "group_id")
            conn.hset('mykey_prefix' + str(device_no), "groupName", "groupName11")
            conn.hset('mykey_prefix' + str(device_no), "workshopId", "workshopId11")
            conn.hset('mykey_prefix' + str(device_no), "workshopName", "workshopName11")
            conn.hset('mykey_prefix' + str(device_no), "source", source)
            conn.hset('mykey_prefix' + str(device_no), "errorTimeLimit", str(20))
            conn.expire('mykey_prefix' + str(device_no), 864000)  # 設定10天過期時間
            conn.hset('mykey_prefix' + str(device_no), "axisInfo", json.dumps(axisInfo))
            conn.hset('mykey_another_prefix:' + today, str(machineId), json.dumps(fbfcalue))
            conn.hset('mykey_another_prefix:' + tomorrow, str(machineId), json.dumps(fbfcalue2))
            conn.hset('mykey_another_prefix:' + afterTomorrow, str(machineId), json.dumps(fbfcalue3))
            conn.hset('mykey_another_prefix1:' + today, str(machineId), json.dumps(fbfcalue))
            conn.hset('mykey_another_prefix1:' + tomorrow, str(machineId), json.dumps(fbfcalue2))
            conn.hset('mykey_another_prefix1:' + afterTomorrow, str(machineId), json.dumps(fbfcalue3))

            conn.expire('mykey_another_prefix:' + today, 259200)  # 3天
            conn.expire('mykey_another_prefix:' + tomorrow, 259200)
            conn.expire('mykey_another_prefix:' + afterTomorrow, 259200)
            conn.expire('mykey_another_prefix1:' + today, 259200)
            conn.expire('mykey_another_prefix1:' + tomorrow, 259200)
            conn.expire('mykey_another_prefix1:' + afterTomorrow, 259200)

            conn.hset('fy:be:de:ma', str(device_no), str(machineId))
            conn.expire('fy:be:de:ma', 864000)
            machineId = int(machineId) + int(1)
            device_no = int(device_no) + int(1)
        except Exception as e:
            print("設定異常，錯誤資訊：", e)

2.使用pipeline代替每次設定一個key就請求一次

##方法很簡單,只需要兩處小小的改動

使用`pipeline`效果非常明顯,已經從365秒變成了126秒,一下子就減少了239秒,將近4約分鐘!

3.使用pipeline + hmset

把同一個key對應的field和value組裝成字典,通過hmset一次性搞定

用了hmset之後,再次壓縮時間,126變成98,耗時縮小了28秒,將近半分鐘

為了進一步壓縮時間,使用`golang`實現了一遍,效能很強勁

從python的98秒變成了7.5秒,整整提升了13倍! 是最開始的365秒的48倍!!!

func setDevice() {
    var deviceNo string
    var deviceInfo map[string]interface{}
  // 獲取reids管道
    pipe := rdb.Pipeline()
    defer pipe.Exec(ctx1)

    for i := 0; i < len(devices); i++ {
        device := devices[i]
        for k, v := range device {
            deviceNo = k
            deviceInfo = v
        }

        deviceKey := fmt.Sprintf("%s:%s", deviceInfoKey, deviceNo)

        machineId := deviceInfo["machineId"].(string)
        // 設定排班資訊
        shiftInfo, _ := json.Marshal(shiftToday)
        pipe.HSetNX(ctx1, fystTodayKey, machineId, shiftInfo)
        pipe.Expire(ctx1, fystTodayKey, time.Hour*24)
        pipe.HSetNX(ctx1, fymstTodayKey, machineId, shiftInfo)
        pipe.Expire(ctx1, fymstTodayKey, time.Hour*24)

         // hmset 代替hset,一次性寫入map
        pipe.HMSet(ctx1, deviceKey, deviceInfo).Err()
        pipe.Expire(ctx1, deviceKey, time.Hour*72)
        if i%1000 == 0 && i >= 1000 {
            failCmd, err1 := pipe.Exec(ctx1)
            log.Printf("正在設定第%d個採集器 \n", i)
            if err1 != nil {
                countFail += len(failCmd)
            }
        }
    }

}

4.總結

批量寫入時,使用pipeline可以大幅度提升效能
key相同的field和value,可以用hmset代替hset,也能很好的提升效能
操作大量資料時,使用golang來代替python是很棒的選擇

spark 批次寫入redis控制
2024-12-05
SparkRedis
MySQL資料寫入過程介紹
2022-12-01
MySql
MySQL的寫入資料儲存過程
2022-06-10
MySql儲存過程
PostgreSQL資料庫匯入大量資料時如何最佳化
2022-09-01
SQL資料庫
Redis資料匯入工具優化過程總結
2019-05-12
Redis優化
【Python】透過Cython提升效能
2024-07-18
Python
Django效能最佳化：提升載入速度
2024-05-20
Django
RangeBitmap提升Java流資料過濾效能
2022-03-14
Java
雲MongoDB 最佳化讓LBS服務效能提升十倍
2018-10-20
MongoDB
3倍+提升，高德地圖極致效能最佳化之路
2021-01-15
地圖
大量資料夾批次重新命名的操作
2023-04-12
效能測試過程中最佳化-3：
2024-03-22
Qcon/dbaplus/mongodb社群分享-萬億級資料庫MongoDB叢集效能數十倍提升最佳化實踐
2021-02-03
MongoDB資料庫
UData查詢引擎最佳化-如何讓一條SQL效能提升數倍
2022-10-08
SQL
Elasticsearch 如何保證寫入過程中不丟失資料的
2024-03-12
Elasticsearch
寫入效能：TDengine 最高達到 InfluxDB 的 10.3 倍，TimeScaleDB 的 6.74 倍
2023-03-07
UX
通過Python將監控資料由influxdb寫入到MySQL
2021-05-15
PythonUXMySql
PHP匯入大量CSV資料
2022-02-17
PHP
用大模型最佳化大模型預訓練資料，節省20倍計算量，實現顯著效能提升！
2024-09-27
大模型
活字格效能最佳化技巧（1）——如何利用資料庫主鍵提升訪問效能
2023-04-06
資料庫
WPF效能最佳化示例：使用VirtualizingStackPanel提升介面載入速度
2024-04-29
Golang pprof 效能調優實戰，效能提升 3 倍！
2020-12-15
Golang
NebulaGraph v3.3.0 釋出：支援子圖過濾、和大量效能最佳化
2022-11-29
mongodb核心原始碼實現及效能最佳化系列：Mongodb特定場景效能數十倍提升最佳化實踐
2020-10-04
MongoDB原始碼
兩招提升硬碟儲存資料的寫入效率
2022-02-17
硬碟
實時資料併發寫入 Redis 優化
2019-11-12
Redis優化
幾行程式碼提升Pandas效能150倍
2024-06-27
行程
百萬商品查詢，效能提升了10倍
2024-10-28
Nacos 2.0 正式釋出，效能提升 10 倍！！
2021-03-29
Elasticsearch 最佳化查詢中獲取欄位內容的方式，效能提升5倍！
2023-12-05
Elasticsearch
《大資料： Redis 安全效能》
2020-10-13
大資料Redis
效能竟然再提升一倍！Redis的一個例項有多快？- keydb.dev
2019-06-19
Redisdev
硬核解讀，WeTune是如何提升資料庫查詢重寫效能？
2024-07-04
資料庫
TDengine 釋出效能測試報告，寫入效能達到 InfluxDB 的 10.6 倍
2023-02-23
測試報告UX
kafka connect，將資料批量寫到hdfs完整過程
2018-03-23
Kafka
T-One 社群版排程引擎替換至 runnerV2，效能提升 6.8 倍
2023-10-19
桌面客戶端效能提升，最佳化使用資源消耗
2023-01-05
客戶端
百萬級高併發mongodb叢集效能數十倍提升最佳化實踐(上篇)
2019-12-24
MongoDB

效能提升 48 倍! python redis 批次寫入大量資料最佳化過程

背景

1.最原始的版本,直接使用hset,效率很低

寫30w條完耗時365秒,這樣有兩個問題:

2.使用pipeline代替每次設定一個key就請求一次

使用pipeline效果非常明顯,已經從365秒變成了126秒,一下子就減少了239秒,將近4約分鐘!

3.使用pipeline + hmset

把同一個key對應的field和value組裝成字典,通過hmset一次性搞定

用了hmset之後,再次壓縮時間,126變成98,耗時縮小了28秒,將近半分鐘

為了進一步壓縮時間,使用golang實現了一遍,效能很強勁

從python的98秒變成了7.5秒,整整提升了13倍! 是最開始的365秒的48倍!!!

4.總結

相關文章

使用`pipeline`效果非常明顯,已經從365秒變成了126秒,一下子就減少了239秒,將近4約分鐘!

為了進一步壓縮時間,使用`golang`實現了一遍,效能很強勁