CEPH叢集重啟後ceph osd status和ceph -s命令檢視發現節點伺服器osd服務不成功的解決方案

一往無前,未來可期發表於2020-11-06

CEPH叢集重啟後ceph osd status和ceph -s命令檢視發現節點伺服器osd服務不成功的解決方案

前言

  • 實驗環境部署ceph叢集(結合openstack多節點),重啟之後ceph叢集的osd服務出現問題,解決如下
  • ceph+openstack多節點部署,有興趣的可以檢視我另一篇部落格:https://blog.csdn.net/CN_TangZheng/article/details/104745364

一:報錯

[root@ct ~]# ceph -s	'//檢視ceph叢集狀態'
  cluster:
    id:     8c9d2d27-492b-48a4-beb6-7de453cf45d6
    health: HEALTH_WARN	'//健康檢查為warn'
            1 osds down	
            1 host (1 osds) down	
            Reduced data availability: 192 pgs inactive
            Degraded data redundancy: 812/1218 objects degraded (66.667%), 116 pgs degraded, 192 pgs undersized
            clock skew detected on mon.c1, mon.c2
 
  services:
    mon: 3 daemons, quorum ct,c1,c2
    mgr: ct(active), standbys: c1, c2
    osd: 3 osds: 1 up, 2 in	'//二個OSD服務宕了'
 
  data:
    pools:   3 pools, 192 pgs
    objects: 406  objects, 1.8 GiB
    usage:   2.8 GiB used, 1021 GiB / 1024 GiB avail
    pgs:     100.000% pgs not active
             812/1218 objects degraded (66.667%)
             116 undersized+degraded+peered
             76  undersized+peered
 
[root@ct ~]# ceph osd status	'//檢視osd服務狀態,發現兩個計算節點的osd服務狀態不正常'
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| id | host |  used | avail | wr ops | wr data | rd ops | rd data |     state      |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| 0  |  ct  | 2837M | 1021G |    0   |     0   |    0   |     0   |   exists,up    |
| 1  |      |    0  |    0  |    0   |     0   |    0   |     0   |     exists     |
| 2  |      |    0  |    0  |    0   |     0   |    0   |     0   | autoout,exists |
+----+------+-------+-------+--------+---------+--------+---------+----------------+

1.1:解決

  • 發現neutron的Open vSwitch服務掛了
[root@ct ~]# source keystonerc_admin 
[root@ct ~(keystone_admin)]# openstack network agent list	'//經過排查,發現Open vSwitch和L3服務掛掉了'
+--------------------------------------+----------------------+------+-------------------+-------+-------+---------------------------+
| ID                                   | Agent Type           | Host | Availability Zone | Alive | State | Binary                    |
+--------------------------------------+----------------------+------+-------------------+-------+-------+---------------------------+
| 12dd5b51-1344-4c29-8974-e5d8e0e65d2e | Open vSwitch agent   | c1   | None              | XXX   | UP    | neutron-openvswitch-agent |
| 20829a10-4a26-4317-8175-4534ac0b01e1 | Open vSwitch agent   | c2   | None              | XXX   | UP    | neutron-openvswitch-agent |
| 25c121ec-b761-4e7b-bfbf-9601993ebb54 | Metadata agent       | ct   | None              | :-)   | UP    | neutron-metadata-agent    |
| 47c878ee-93f0-4960-baa1-1cc92476ed2a | DHCP agent           | ct   | nova              | :-)   | UP    | neutron-dhcp-agent        |
| 57647383-7106-46b6-971f-2398457e5179 | Loadbalancerv2 agent | ct   | None              | :-)   | UP    | neutron-lbaasv2-agent     |
| 92d49052-0b4f-467c-a92c-1743d891043f | Metering agent       | ct   | None              | :-)   | UP    | neutron-metering-agent    |
| c2f7791c-96ed-472b-abda-509a3ff125b5 | L3 agent             | ct   | nova              | XXX   | UP    | neutron-l3-agent          |
| e48269d8-e4f1-424b-bc3e-4c0d13757e8a | Open vSwitch agent   | ct   | None              | :-)   | UP    | neutron-openvswitch-agent |
+--------------------------------------+----------------------+------+-------------------+-------+-------+---------------------------+
  • 控制節點重啟l3服務
[root@ct ~(keystone_admin)]# systemctl start neutron-l3-agent
  • 計算節點重啟Open vSwitch agent
[root@c1 ceph]# systemctl restart neutron-openvswitch-agent
[root@c2 ceph]# systemctl restart neutron-openvswitch-agent
  • 重啟完成後再次檢視openstack network agent list服務是否都正常開啟
  • 我們進入計算節點重啟osd,
[root@c1 ceph]# systemctl restart  ceph-osd.target
[root@c2 ceph]# systemctl restart  ceph-osd.target
[root@c1 ceph]# systemctl restart  ceph-mgr.target
[root@c2 ceph]# systemctl restart  ceph-mgr.target
'//重啟OSD服務後使用ceph -s命令檢視ceph叢集狀態,若計算節點的mgr服務沒有開啟也需要重啟一下'

1.2:再次檢查,發現問題已經解決

[root@ct ~(keystone_admin)]# ceph -s
  cluster:
    id:     8c9d2d27-492b-48a4-beb6-7de453cf45d6
    health: HEALTH_OK	'//健康檢查OK'
 
  services:	'//下面的服務也都正常了'
    mon: 3 daemons, quorum ct,c1,c2
    mgr: ct(active), standbys: c2, c1
    osd: 3 osds: 3 up, 3 in
 
  data:
    pools:   3 pools, 192 pgs
    objects: 406  objects, 1.8 GiB
    usage:   8.3 GiB used, 3.0 TiB / 3.0 TiB avail
    pgs:     192 active+clean
 
  io:
    client:   1.5 KiB/s rd, 1 op/s rd, 0 op/s wr
[root@ct ~(keystone_admin)]# ceph osd status	'//OSD狀態也都沒問題'
+----+------+-------+-------+--------+---------+--------+---------+-----------+
| id | host |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+------+-------+-------+--------+---------+--------+---------+-----------+
| 0  |  ct  | 2837M | 1021G |    0   |     0   |    0   |     0   | exists,up |
| 1  |  c1  | 2837M | 1021G |    0   |     0   |    0   |     0   | exists,up |
| 2  |  c2  | 2837M | 1021G |    0   |     0   |    0   |     0   | exists,up |
+----+------+-------+-------+--------+---------+--------+---------+-----------+

| 0 | 0 | exists,up |
±—±-----±------±------±-------±--------±-------±--------±----------+


相關文章