Ceph pg unfound處理過程詳解
導讀 | 今天檢查ceph叢集,發現有pg丟失,本文就給大家介紹一下解決方法。 |
1.檢視叢集狀態
[root@k8snode001 ~]# ceph health detail HEALTH_ERR 1/973013 objects unfound (0.000%); 17 scrub errors; Possible data damage: 1 pg recovery_unfound, 8 pgs inconsistent, 1 pg repair; Degraded data redundancy: 1/2919039 objects degraded (0.000%), 1 pg degraded OBJECT_UNFOUND 1/973013 objects unfound (0.000%) pg 2.2b has 1 unfound objects OSD_SCRUB_ERRORS 17 scrub errors PG_DAMAGED Possible data damage: 1 pg recovery_unfound, 8 pgs inconsistent, 1 pg repair pg 2.2b is active+recovery_unfound+degraded, acting [14,22,4], 1 unfound pg 2.44 is active+clean+inconsistent, acting [14,8,21] pg 2.73 is active+clean+inconsistent, acting [25,14,8] pg 2.80 is active+clean+scrubbing+deep+inconsistent+repair, acting [4,8,14] pg 2.83 is active+clean+inconsistent, acting [14,13,6] pg 2.ae is active+clean+inconsistent, acting [14,3,2] pg 2.c4 is active+clean+inconsistent, acting [8,21,14] pg 2.da is active+clean+inconsistent, acting [23,14,15] pg 2.fa is active+clean+inconsistent, acting [14,23,25] PG_DEGRADED Degraded data redundancy: 1/2919039 objects degraded (0.000%), 1 pg degraded pg 2.2b is active+recovery_unfound+degraded, acting [14,22,4], 1 unfound
從輸出發現pg 2.2b is active+recovery_unfound+degraded, acting [14,22,4], 1 unfound
現在我們來檢視pg 2.2b,看看這個pg的想想資訊。
[root@k8snode001 ~]# ceph pg dump_json pools |grep 2.2b dumped all 2.2b 2487 1 1 0 1 9533198403 3048 3048 active+recovery_unfound+degraded 2020-07-23 08:56:07.669903 10373'5448370 10373:7312614 [14,22,4] 14 [14,22,4] 14 10371'5437258 2020-07-23 08:56:06.637012 10371'5437258 2020-07-23 08:56:06.637012 0
可以看到它現在只有一個副本
2.檢視pg map
[root@k8snode001 ~]# ceph pg map 2.2b osdmap e10373 pg 2.2b (2.2b) -> up [14,22,4] acting [14,22,4]
從pg map可以看出,pg 2.2b分佈到osd [14,22,4]上
3.檢視儲存池狀態
[root@k8snode001 ~]# ceph osd pool stats k8s-1 pool k8s-1 id 2 1/1955664 objects degraded (0.000%) 1/651888 objects unfound (0.000%) client io 271 KiB/s wr, 0 op/s rd, 52 op/s wr [root@k8snode001 ~]# ceph osd pool ls detail|grep k8s-1 pool 2 'k8s-1' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 88 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
4.嘗試恢復pg 2.2b丟失地塊
[root@k8snode001 ~]# ceph pg repair 2.2b
如果一直修復不成功,可以檢視卡住PG的具體資訊,主要關注recovery_state, 如下
[root@k8snode001 ~]# ceph pg 2.2b query { "...... "recovery_state": [ { "name": "Started/Primary/Active", "enter_time": "2020-07-21 14:17:05.855923", "might_have_unfound": [], "recovery_progress": { "backfill_targets": [], "waiting_on_backfill": [], "last_backfill_started": "MIN", "backfill_info": { "begin": "MIN", "end": "MIN", "objects": [] }, "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": { "pull_from_peer": [], "pushing": [] } }, "scrub": { "scrubber.epoch_start": "10370", "scrubber.active": false, "scrubber.state": "INACTIVE", "scrubber.start": "MIN", "scrubber.end": "MIN", "scrubber.max_end": "MIN", "scrubber.subset_last_update": "0'0", "scrubber.deep": false, "scrubber.waiting_on_whom": [] } }, { "name": "Started", "enter_time": "2020-07-21 14:17:04.814061" } ], "agent_state": {} }
如果repair修復不了;兩種解決方案,回退舊版或者直接刪除
5.解決方案
回退舊版 [root@k8snode001 ~]# ceph pg 2.2b mark_unfound_lost revert 直接刪除 [root@k8snode001 ~]# ceph pg 2.2b mark_unfound_lost delete
6.驗證
我這裡直接刪除了,然後ceph叢集重建pg,稍等會再看,pg狀態變為active+clean
[root@k8snode001 ~]# ceph pg 2.2b query { "state": "active+clean", "snap_trimq": "[]", "snap_trimq_len": 0, "epoch": 11069, "up": [ 12, 22, 4 ],
再次檢視叢集狀態
[root@k8snode001 ~]# ceph health detail HEALTH_OK
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69955379/viewspace-2757372/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 記一次ceph pg unfound處理過程
- 分散式儲存Ceph之PG狀態詳解分散式
- ceph之pg inactive
- 【Tomcat】Tomat 處理請求的過程(圖解)Tomcat圖解
- MapReduce過程詳解
- 理論+實驗 詳解Oracle安裝部署過程Oracle
- ceph-pg雜湊分析
- 【原始碼】Redis命令處理過程原始碼Redis
- python中PCA的處理過程PythonPCA
- 詳述一條SQL引發的高CPU故障處理過程SQL
- CEPH-4:ceph RadowGW物件儲存功能詳解物件
- DOM在Ahooks中的處理過程Hook
- SSL/TLS 握手過程詳解TLS
- mongo 儲存過程詳解Go儲存過程
- 轉:DNS解析過程詳解DNS
- Redis複製過程詳解Redis
- RabbitMQ安裝過程詳解MQ
- 【elasticsearch】搜尋過程詳解Elasticsearch
- ceph:忘記 甚至 從ceph裡刪除了 ceph.client.admin密碼,怎麼處理?client密碼
- Ceph MDS States狀態詳解
- CEPH-3:cephfs功能詳解
- Flink流處理過程的部分原理分析
- 大資料處理過程是怎樣大資料
- Linux 核心處理中斷全過程解析Linux
- RPC(遠端過程呼叫)詳解RPC
- SpringIOC初始化過程--詳解Spring
- 詳解Linux 程式編譯過程Linux編譯
- 詳解C#異常處理C#
- 文字預處理技術詳解
- Python Excel處理庫openpyxl詳解PythonExcel
- Reactor詳解之:異常處理React
- MySQL 動態字串處理詳解MySql字串
- Kafka流處理內幕詳解Kafka
- redis cluster + sentinel詳細過程和錯誤處理三主三備三哨兵Redis
- SpringMVC請求處理過程原始碼簡析SpringMVC原始碼
- GC析構物件和列表的處理過程GC物件
- 一次壞塊的處理過程(一)
- Linux Yum 安裝失敗處理過程整理Linux