ceph 叢集報 mds cluster is degraded 故障排查
ceph 叢集版本:
ceph -v
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
ceph -w 檢視服務狀態:
mds cluster is degraded monmap e1: 3 mons at {ceph-6-11=172.16.6.11:6789/0,ceph-6-12=172.16.6.12:6789/0,ceph-6-13=172.16.6.13:6789/0} election epoch 454, quorum 0,1,2 ceph-6-11,ceph-6-12,ceph-6-13 fsmap e1928: 1/1/1 up {0=ceph-6-13=up:rejoin}, 2 up:standby osdmap e4107: 90 osds: 90 up, 90 in flags sortbitwise,require_jewel_osds pgmap v24380658: 5120 pgs, 4 pools, 14837 GB data, 5031 kobjects 44476 GB used, 120 TB / 163 TB avail 5120 active+clean
服務日誌:
fault with nothing to send, going to standby2017-05-08 00:21:32.423571 7fb859159700 1 heartbeat_map is_healthy `MDSRank` had timed out after 152017-05-08 00:21:32.423578 7fb859159700 1 mds.beacon.ceph-6-12 _send skipping beacon, heartbeat map not healthy2017-05-08 00:21:33.006114 7fb85e264700 1 heartbeat_map is_healthy `MDSRank` had timed out after 152017-05-08 00:21:34.902990 7fb858958700 -1 mds.ceph-6-12 *** got signal Terminated ***2017-05-08 00:21:36.423632 7fb859159700 1 heartbeat_map is_healthy `MDSRank` had timed out after 152017-05-08 00:21:36.423640 7fb859159700 1 mds.beacon.ceph-6-12 _send skipping beacon, heartbeat map not healthy2017-05-08 00:21:36.904448 7fb85c260700 1 mds.0.1929 rejoin_joint_start2017-05-08 00:21:36.906440 7fb85995a700 1 heartbeat_map reset_timeout `MDSRank` had timed out after 152017-05-08 00:21:36.906502 7fb858958700 1 mds.ceph-6-12 suicide. wanted state up:rejoin2017-05-08 00:21:37.906842 7fb858958700 1 mds.0.1929 shutdown: shutting down rank 02017-05-08 01:04:36.411123 7f2886f60180 0 set uid:gid to 167:167 (ceph:ceph)2017-05-08 01:04:36.411140 7f2886f60180 0 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185), process ceph-mds, pid 11320282017-05-08 01:04:36.411734 7f2886f60180 0 pidfile_write: ignore empty --pid-file2017-05-08 01:04:37.291720 7f2880f40700 1 mds.ceph-6-12 handle_mds_map standby2017-05-08 01:04:44.618574 7f2880f40700 1 mds.0.1955 handle_mds_map i am now mds.0.19552017-05-08 01:04:44.618588 7f2880f40700 1 mds.0.1955 handle_mds_map state change up:boot --> up:replay2017-05-08 01:04:44.618602 7f2880f40700 1 mds.0.1955 replay_start2017-05-08 01:04:44.618627 7f2880f40700 1 mds.0.1955 recovery set is
表現現象:
此時cephfs 掛載到系統的資料夾,可以進入,無法建立檔案,僅能檢視目錄;
故障排查解決:
參考文件
http://tracker.ceph.com/issues/19118
http://tracker.ceph.com/issues/18730
檢視資訊發現,是新版本的一個bug,近期我們做了一個版本升級,從10.2.5升級到10.2.7 ,升級完成不到一週:
基本原因分析,當cephfs 儲存有大量資料的時候,多個主節點要同步狀並進行資料交換,mds 節點有訊息監測,預設設定的是15秒超時,如果15沒有收到訊息,就將節點踢出叢集。預設的超時時間較短,會導致壓力大,返回資料慢的節點異常,被反覆踢出叢集,剛被踢出叢集,心跳又發現節點是活著的,又會將節點加入叢集,加入叢集后一會又被踢出,如此反覆。此時ceph叢集會報“mds cluster is degraded”。服務日誌報“heartbeat_map is_healthy `MDSRank` had timed out after 15”
解決辦法:
解決辦法1:
此辦法為應急辦法,留一個mds 節點工作,其它節點服務暫時關閉,僅剩餘一個節點獨立工作,不再有mds 之間的心跳監測,此問題可以規避。此步驟完成後可以按照解決辦法2進行處理,徹底解決。
解決辦法2:增大超時時間閥值,修改到300秒,引數如下:
在所有的mds 節點執行,
mds beacon grace 描述: 多久沒收到標識訊息就認為 MDS 落後了(並可能替換它)。 型別: Float 預設值: 15
參考文件:
http://docs.ceph.org.cn/cephfs/mds-config-ref/
修改引數方法:
1:可以寫入ceph 配置檔案
調整引數前檢視:
root@jp33e514-6-10 ~]# ceph –admin-daemon /var/run/ceph/ceph-mds.jp33e514-6-10.asok config show |grep mds|grep beacon_grace
“mds_beacon_grace”: “15”,
新增配置引數:
[root@jp33e514-6-11 ceph]# more ceph.conf
[global]
………………略。。。。。。。。。。。。
public_network = 172.17.6.0/24
cluster_network = 172.17.6.0/24
filestore_xattr_use_omap = true
osd_crush_chooseleaf_type = 1
mds_beacon_grace = 300
mds_cache_size = 2000000
mds_beacon_grace = 300 《《== 配置內容
重啟服務
systemctl restart ceph-mds@jp33e514-6-10.service
systemctl status ceph-mds@jp33e514-6-10.service
確認配置生效:
ceph –admin-daemon /var/run/ceph/ceph-mds.jp33e514-6-10.asok config show |grep mds|grep beacon_grace
“mds_beacon_grace”: “300”,
2:直接使用命令修改叢集引數:
檢視現配置:
[root@ceph-6-11 ~]# ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-6-11.asok config show |grep mds|grep beacon_grace "mds_beacon_grace": "15",
使用線上配置命令直接修改成功:
[root@ceph-6-11 ~]# ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-6-11.asok config set mds_beacon_grace 300{ "success": "mds_beacon_grace = `300` (unchangeable) "}
驗證:
[root@ceph-6-11 ~]# ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-6-11.asok config show |grep mds|grep beacon_grace "mds_beacon_grace": "300", # << === 引數已經修改成功
引數修改完成後,可開啟所有已關閉mds 節點,在叢集中任意關閉一個mds 主節點,狀態可以同步到其它節點,其它主節點會接管服務響應,cephfs 使用不受影響。