ceph踩坑日記之rgw_dynamic_resharding

Luxf0發表於2020-10-30

原文網址 : https://www.cnblogs.com/luxf0/p/13900440.html

1、背景說明

參考說明：
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/object_gateway_guide_for_ubuntu/administration_cli#configuring-bucket-index-sharding
https://ceph.com/community/new-luminous-rgw-dynamic-bucket-sharding/

1.1、問題描述

效能測試中出現效能暴跌（間歇性效能波動），出現一段時間無法寫入情況（時延一百多秒）

1.2、問題排查

檢視rgw日誌，發現單桶物件數寫入太多，觸發自動分片resharding操作

[root@node113 ~]# cat /var/log/ceph/ceph-client.rgw.node113.7480.log | grep reshard
2020-09-16 04:51:50.239505 7fe71d0a7700  0 RGWReshardLock::lock failed to acquire lock on reshard.0000000009 ret=-16
2020-09-16 06:11:56.304955 7fe71d0a7700  0 RGWReshardLock::lock failed to acquire lock on reshard.0000000013 ret=-16
2020-09-16 06:41:58.919390 7fe71d0a7700  0 RGWReshardLock::lock failed to acquire lock on reshard.0000000004 ret=-16
2020-09-16 08:02:00.619906 7fe71d0a7700  0 RGWReshardLock::lock failed to acquire lock on reshard.0000000002 ret=-16
2020-09-16 08:22:01.038502 7fe71d0a7700  0 RGWReshardLock::lock failed to acquire lock on reshard.0000000012 ret=-16
2020-09-16 08:31:58.229956 7fe71d0a7700  0 RGWReshardLock::lock failed to acquire lock on reshard.0000000000 ret=-16
2020-09-16 08:52:06.020018 7fe71d0a7700  0 RGWReshardLock::lock failed to acquire lock on reshard.0000000006 ret=-16
2020-09-16 09:22:12.882771 7fe71d0a7700  0 RGWReshardLock::lock failed to acquire lock on reshard.0000000000 ret=-16

檢視rgw相關配置，叢集每個分片最大存放10w個物件，當前設定每個桶分片數為8，當寫入物件數超過80w時，則會觸發自動分片操作reshard

[root@node111 ~]# ceph --show-config | grep rgw_dynamic_resharding
rgw_dynamic_resharding = true
[root@node111 ~]# ceph --show-config | grep rgw_max_objs_per_shard
rgw_max_objs_per_shard = 100000
[root@node111 ~]# ceph --show-config | grep rgw_override_bucket_index_max_shards
rgw_override_bucket_index_max_shards = 8

[root@node111 ~]# radosgw-admin bucket limit check
"user_id": "lifecycle01",
        "buckets": [
            {
                "bucket": "cosbench-test-pool11",
                "tenant": "",
                "num_objects": 31389791,
                "num_shards": 370,
                "objects_per_shard": 84837,
                "fill_status": "OK"
            },
            {
                "bucket": "cycle-1",
                "tenant": "",
                "num_objects": 999,
                "num_shards": 8,
                "objects_per_shard": 124,
                "fill_status": "OK"
            },

引數說明

rgw_dynamic_resharding
[L版官方引入新的引數](https://ceph.com/community/new-luminous-rgw-dynamic-bucket-sharding/，該引數預設開啟，當單個bucketfill_status達到OVER 100.000000%時（objects_per_shard > rgw_max_objs_per_shard），動態進行resharding（裂變新的分片，重新均衡資料）

這個引數有個致命的缺陷，resharding過程中bucket無法進行讀寫，因為後設資料物件正在重新分散索引，需要保證一致性，同時，資料量越大時間會越來越長
rgw_override_bucket_index_max_shards
單個bucket建立分片數，預設引數值為0，最大引數值為7877（即單bucket最大寫入物件數為7877x100000）
分片數引數計算方式為number of objects expected in a bucket / 100,000，若預估單桶物件數為300w，則分片數設定為30（300w/10w）
示例叢集分片數為8，即建立bucket時預設建立8個分片，當每個分片物件數超過10w時，繼續resharding建立新的分片，同時重新均衡索引到所有的分片內
rgw_max_objs_per_shard
單個分片最大存放物件數，預設引數值為10w
rgw_reshard_thread_interval
自動分片執行緒掃描的間隔，預設為十分鐘

1.3、分片說明

索引物件

RGW為每個bucket維護一份索引，裡邊存放了bucket中全部物件的後設資料。RGW本身沒有足夠有效遍歷物件的能力，bucket索引影響到物件寫入、修改、遍歷功能（不影響讀）。
bucket索引還有其他用處，比如為版本控制的物件維護日誌、bucket配額後設資料和跨區同步的日誌。

預設情況下，每個bucket只有一個索引物件，索引物件過大會導致以下問題，所以每個bucket所能儲存的物件數量有限
--會造成可靠性問題，極端情況下，可能會因為緩慢的資料恢復，導致osd程式掛掉
--會造成效能問題，所有對同一bucket的寫操作，都會對一個索引物件進行修改和序列化操作

bucket分片

Hammer版本以後，新增bucket分片功能用以解決單桶儲存大量資料的問題，bucket的索引資料可以分佈到多個RADOS物件上，bucket儲存物件數量隨著索引資料的分片數量增加而增加。
但這隻對新增的bucket有效，需要提前根據bucket最終存放資料量規劃分片數。當儲存桶寫入物件超過分片所能承載的最大數時，寫入效能暴跌，此時需要手動修改分片數量，以此去承載更多的物件寫入。

動態bucket分片

Luminous版本以後，新增動態 bucket分片功能，隨著儲存物件的增加，RADOSGW 程式會自動發現需要進行分片的 bucket，並安排進行自動分片。

2、解決措施

主要分為以下兩種情況

2.1、確定單bucket最終寫入物件數

關閉動態分片功能（避免調整過程中出現效能暴跌問題），根據最終單bucket寫入物件數設定分片數

示例最終單bucket寫入物件數為300w，設定分片數為30
追加引數配置在/etc/ceph/ceph.conf配置檔案[global]欄位內
[root@node45 ~]# cat /etc/ceph/ceph.conf 
[global]
rgw_dynamic_resharding = false
rgw_max_objs_per_shard = 30

重啟RGW服務程式
[root@node45 ~]# systemctl restart ceph-radosgw.target

2.2、不確定單bucket最終寫入物件數

不關閉動態分片功能，大概設定一個分片數

追加引數配置在/etc/ceph/ceph.conf配置檔案[global]欄位內
[root@node45 ~]# cat /etc/ceph/ceph.conf 
[global]
rgw_max_objs_per_shard = 8

重啟RGW服務程式
[root@node45 ~]# systemctl restart ceph-radosgw.target

踩坑日記(1)
2020-12-15
Nginx的踩坑日記
2019-03-22
Nginx
Electron踩坑日記-2
2024-09-02
Cocos Creator踩坑日記（一）
2018-07-06
pdf.js踩坑日記
2019-01-02
JS
小程式踩坑日記（一）
2018-04-05
Opengl ES之踩坑記
2023-02-15
踩坑日誌--CEPH叢集常見問題解決辦法
2020-11-06
Homestead 下搭建 ELK 踩坑日記
2018-11-09
Flutter踩坑日記（持續更新...）
2020-05-20
Flutter
夥伴匹配系統踩坑日記2
2024-08-03
React Native踩坑日記 —— tailwind-rn
2021-09-16
React NativeAI
整合Atomikos、Quartz、Postgresql的踩坑日記
2021-04-27
quartzSQL
Laya 踩坑日記 ---A* 導航尋路
2021-01-14
Java踩坑記系列之Arrays.AsList
2020-11-01
Java
removeChild踩坑記
2019-02-16
REM
vue 踩坑記
2018-12-03
Vue
mpVue 踩坑記
2018-08-08
Vue
vuepress踩坑記
2018-04-17
Vue
踩坑日記-element ui樹形控制元件
2018-10-20
UI控制元件
Jmeter之讀取csv檔案踩坑記
2024-07-29
JMeter
golang—踩坑之切片
2024-04-03
Golang
Angular之ngSwitch踩坑
2020-03-19
Angular
Sentry 部署踩坑記
2019-04-02
RN 踩坑：雜記
2018-12-14
sealos踩坑記錄
2024-04-29
strtotime 踩坑記錄
2019-12-31
Dubbo 2.7.1 踩坑記
2019-05-16
DietPi踩坑記錄
2024-07-28
DelayedWorkQueue踩坑筆記
2021-01-02
筆記
laravel踩坑記錄
2021-01-08
Laravel
PHP 8 踩坑記
2021-01-15
PHP
SpringBoot踩坑日記-定時任務不定時了？
2019-04-15
Spring Boot
踩坑日記,同域名不同埠.. cookie 會覆蓋...
2020-10-21
Cookie
夥伴匹配系統踩坑日記8 controller傳參
2024-08-25
Controller
卡片開發使用偽類之踩坑記錄
2021-02-18
MySQL5.5升級到MySQL5.7踩坑日記
2024-08-15
MySql
vue系列之踩坑之旅
2018-07-11
Vue