ceph-deploy離線部署ceph叢集及報錯解決FAQ

vfanCloud發表於2022-02-28

ceph-deploy部署ceph叢集

環境介紹

主機名 ip地址 作業系統 角色 備註
ceph-node1 10.153.204.13 Centos7.6 mon、osd、mds、mgr、rgw、ceph-deploy chronyd時鐘同步(主)
ceph-node2 10.130.22.45 Centos7.6 mon、osd、mds、mgr、rgw chronyd時鐘同步
ceph-node3 10.153.204.28 Centos7.3 mon、osd chronyd時鐘同步

此環境共三臺機器,操作前ntp需要同步,node1為ceph-deploy部署節點,每臺機器三塊分割槽用作osd磁碟。

ceph元件介紹

名稱 作用
osd 全稱Object Storage Device,主要功能是儲存資料、複製資料、平衡資料、恢復資料等。每個OSD間會進行心跳檢查,並將一些變化情況上報給Ceph Monitor。
mon 全稱Monitor,負責監視Ceph叢集,維護Ceph叢集的健康狀態,同時維護著Ceph叢集中的各種Map圖,比如OSD Map、Monitor Map、PG Map和CRUSH Map,這些Map統稱為Cluster Map,根據Map圖和object id等計算出資料最終儲存的位置。
mgr 全稱Manager,負責跟蹤執行時指標和Ceph叢集的當前狀態,包括儲存利用率,當前效能指標和系統負載。
mds 全稱是MetaData Server,主要儲存的檔案系統服務的後設資料,如果使用cephfs功能才會啟用它,物件儲存和塊儲存裝置是不需要使用該服務。
rgw 全稱radosgw,是一套基於當前流行的RESTFUL協議的閘道器,ceph物件儲存的入口,不啟用物件儲存,則不需要安裝。

每個元件都需要保證高可用性:
1.osd服務越多,在相同副本的情況下高可用性就越強。
2.mon一般部署三個,保證高可用。
3.mgr一般部署兩個,保證高可用。
4.mds一般部署兩套保證高可用,每套都為主從。
5.rgw一般部署兩個,保證高可用。

ceph版本介紹

第一個 Ceph 版本是 0.1 ,要回溯到 2008 年 1 月。多年來,版本號方案一直沒變,直到 2015 年 4 月 0.94.1 ( Hammer 的第一個修正版)釋出後,為了避免 0.99 (以及 0.100 或 1.00 ),制定了新策略:

  • x.0.z - 開發版(給早期測試者和勇士們)
  • x.1.z - 候選版(用於測試叢集、高手們)
  • x.2.z - 穩定、修正版(給使用者們)

這裡使用的 ceph version 15.2.9,ceph-deploy 2.0.1

ceph安裝前準備工作

1.升級系統核心到4系或以上

我這裡升級到了4.17,升級步驟此處省略。

2.firewalld、iptables、SElinux關閉
## 防火牆
systemctl stop firewalld.service 
systemctl disable firewalld.service

## selinux
setenforce
sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
3.chronyd時間同步

這裡以node1為時鐘服務端,其他節點為時鐘客戶端

[master下操作]
vim /etc/chrony.conf
...
## 主要下面幾個點
server 10.153.204.13 iburst #指定服務端
allow 10.0.0.0/8 #把自身當作服務端
...

[slave下操作]
vim /etc/chrony.conf
...
server 10.153.204.13 iburst #指定服務端
...

## 然後重啟服務,檢視狀態
systemctl enable chronyd
systemctl restart chronyd
timedatectl
chronyc sources -v
4.在ceph-deploy節點寫臨時hosts檔案
# cat /etc/hosts
10.153.204.13  ceph-node1
10.130.22.45 ceph-node2
10.153.204.28 ceph-node3

5.建立普通使用者,賦予sudo許可權,並將ceph-deploy節點對其他節點做免密操作
## 利用ansible給所有機器建立 cephadmin 使用者
ansible all -m shell -a 'groupadd -r -g 2022 cephadmin && useradd -r -m -s /bin/bash -u 2022 -g 2022 cephadmin && echo cephadmin:123456 | chpasswd'

## 賦予sudo許可權,並不需要密碼
ansible node -m shell -a 'echo "cephadmin    ALL=(ALL)    NOPASSWD:ALL" >> /etc/sudoers'

## 做免密
su - cephadmin
ssh-keygen 
ssh-copy-id ceph-node2
ssh-copy-id ceph-node3
6.將osd磁碟準備好,最好一塊磁碟一個osd,此環境資源緊張,我這裡一個分割槽一個osd
[root@ceph-node1 ~]$ lsblk 
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
nvme0n1     259:0    0 931.5G  0 disk 
├─nvme0n1p5 259:7    0   100G  0 part 
├─nvme0n1p3 259:5    0   100G  0 part 
├─nvme0n1p6 259:8    0   100G  0 part 
├─nvme0n1p4 259:6    0   100G  0 part 

所有osd機器磁碟分佈相同,僅分割槽就好,先不要建立lvm、格式化等。

7.如果是內網機器,需要自己構建本地ceph yum源

(1)找個外網機器,執行此指令碼,可以根據自己需要更改版本資訊及url地址

#!/usr/bin/env bash

URL_REPO=https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-15.2.9/el7/x86_64/
URL_REPODATA=https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-15.2.9/el7/x86_64/repodata/

function get_repo()
{
test -d ceph_repo || mkdir ceph_repo
cd ceph_repo

for i in `curl $URL_REPO | awk -F '"' '{print $4}' | grep rpm`;do
    curl -O $URL_REPO/$i
done
}

function get_repodata()
{
test -d ceph_repo/repodata || mkdir ceph_repo/repodata
cd ceph_repo/repodata

for i in `curl $URL_REPODATA | awk -F '"' '{print $4}' | grep xml`;do
    curl -O $URL_REPODATA/$i
done
}

if [ $1 == 'repo' ];then 
    get_repo()
elif [ $1 == 'repodata' ];then
    get_repodata()
elif [ $1 == 'all' ];then
    get_repo()
    get_repodata()
else
    echo '請輸入其中一個引數[ repo | repodata | all ]'
fi

(2)上傳至內網伺服器,安裝 配置 nginx

yum -y install nginx

## 主要修改以下欄位,/home/ceph_repo 替換為你的真實目錄。
vim /etc/nginx/nginx.conf
    server {
        listen       8080;
        listen       [::]:8080;
        server_name  _;
        root         /home/ceph;

        # Load configuration files for the default server block.
        include /etc/nginx/default.d/*.conf;

        location / {
           autoindex on;
        }

    }

systemctl start nginx 

(3)配置yum源--每個節點都要配置

cat > /etc/yum.repos.d/ceph-http.repo << EOF
[local-ceph]
name=local-ceph
baseurl=http://ceph-node1:8080/ceph_repo
gpgcheck=0
enable=1
[noarch-ceph]
name=local-ceph
baseurl=http://ceph-node1:8080/noarch_repo
gpgcheck=0
enable=1
EOF

然後

yum makecache

## 檢查是否生效
yum list | grep ceph 

ceph-deploy部署

1.檢視並目前ceph-deploy版本

# yum list ceph-deploy --showduplicates
Loaded plugins: fastestmirror, langpacks, priorities
Loading mirror speeds from cached hostfile
Available Packages
ceph-deploy.noarch                                     1.5.25-1.el7                                     epel       
ceph-deploy.noarch                                     1.5.29-0                                         noarch-ceph
ceph-deploy.noarch                                     1.5.30-0                                         noarch-ceph
ceph-deploy.noarch                                     1.5.31-0                                         noarch-ceph
ceph-deploy.noarch                                     1.5.32-0                                         noarch-ceph
ceph-deploy.noarch                                     1.5.33-0                                         noarch-ceph
ceph-deploy.noarch                                     1.5.34-0                                         noarch-ceph
ceph-deploy.noarch                                     1.5.35-0                                         noarch-ceph
ceph-deploy.noarch                                     1.5.36-0                                         noarch-ceph
ceph-deploy.noarch                                     1.5.37-0                                         noarch-ceph
ceph-deploy.noarch                                     1.5.38-0                                         noarch-ceph
ceph-deploy.noarch                                     1.5.39-0                                         noarch-ceph
ceph-deploy.noarch                                     2.0.0-0                                          noarch-ceph
ceph-deploy.noarch                                     2.0.1-0                                          noarch-ceph

這裡第一次裝的1.5.38版本,但初始化osd時會報錯,最終使用的是2.0.1。阿里雲或清華雲去找新版本:https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-15.2.9/el7/noarch/https://mirrors.aliyun.com/ceph

2.安裝ceph-deploy

## ceph需要的python環境依賴,一併裝上
yum -y install ceph-common python-pkg-resources python-setuptools python2-subprocess32

## 裝deploy
yum -y install ceph-deploy-2.0.1

## 安裝完畢後,可以檢視幫助命令
ceph-deploy --help

如果報錯ImportError: No module named pkg_resources,裝上python-setuptools包就好了。

ceph叢集初始化,部署

1.初始化 mon 伺服器(先初始化一臺,後邊再add其他節點)

## 初始化之前,最好提前在每個 mon 節點都將mon的包安裝好,在之後的安裝中程式會自動安裝,提前裝好是為了提前發現問題
yum -y install ceph-mon 

(1)開始初始化配置檔案,指定公網和私網的網段,生成ceph.conf配置檔案

$ ceph-deploy new --cluster-network 10.0.0.0/8 --public-network 10.0.0.0/8 ceph-node1

[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadmin/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.25): /bin/ceph-deploy new --cluster-network 10.0.0.0/8 --public-network 10.0.0.0/8 ceph-node1
[ceph_deploy.new][DEBUG ] Creating new cluster named ceph
[ceph_deploy.new][INFO  ] making sure passwordless SSH succeeds
[ceph_deploy][ERROR ] Traceback (most recent call last):
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/util/decorators.py", line 69, in newfunc
[ceph_deploy][ERROR ]     return f(*a, **kw)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/cli.py", line 162, in _main
[ceph_deploy][ERROR ]     return args.func(args)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/new.py", line 141, in new
[ceph_deploy][ERROR ]     ssh_copy_keys(host, args.username)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/new.py", line 35, in ssh_copy_keys
[ceph_deploy][ERROR ]     if ssh.can_connect_passwordless(hostname):
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/util/ssh.py", line 15, in can_connect_passwordless
[ceph_deploy][ERROR ]     if not remoto.connection.needs_ssh(hostname):
[ceph_deploy][ERROR ] AttributeError: 'module' object has no attribute 'needs_ssh'
[ceph_deploy][ERROR ] 

這個問題與ceph-deploy版本有關,指令新增引數“--no-ssh-copykey”即可:

$ ceph-deploy new --cluster-network 10.0.0.0/8 --public-network 10.0.0.0/8 ceph-node1 --no-ssh-copykey

[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadmin/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.38): /bin/ceph-deploy new --cluster-network 10.0.0.0/8 --public-network 10.0.0.0/8 ceph-node1 --no-ssh-copykey
[ceph_deploy.new][DEBUG ] Creating new cluster named ceph
[ceph-node1][DEBUG ] connection detected need for sudo
[ceph-node1][DEBUG ] connected to host: ceph-node1 
[ceph-node1][DEBUG ] detect platform information from remote host
[ceph-node1][DEBUG ] detect machine type
[ceph-node1][DEBUG ] find the location of an executable
[ceph-node1][INFO  ] Running command: sudo /usr/sbin/ip link show
[ceph-node1][INFO  ] Running command: sudo /usr/sbin/ip addr show
[ceph-node1][DEBUG ] IP addresses found: [u'192.168.42.1', u'10.153.204.13', u'10.233.64.0', u'10.233.64.1', u'169.254.25.10']
[ceph_deploy.new][DEBUG ] Resolving host ceph-node1
[ceph_deploy.new][DEBUG ] Monitor ceph-node1 at 10.153.204.13
[ceph_deploy.new][DEBUG ] Monitor initial members are ['ceph-node1']
[ceph_deploy.new][DEBUG ] Monitor addrs are [u'10.153.204.13']
[ceph_deploy.new][DEBUG ] Creating a random mon key...
[ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring...
[ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...

如果ceph-deploy的版本為1.5.25左右的話,最佳解決辦法是將ceph-deploy程式升級到2.0.1;升級後重新執行。

(2)開始初始化 mon 節點

ceph-deploy mon create-initial

報錯:

[ceph-node1][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-node1.asok mon_status
[ceph-node1][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory

這個應該是因為我之前部署過,刪除時沒有將環境刪除乾淨,徹底再刪除一下,然後再執行:

## 刪除
rm -rf /etc/ceph/* /var/lib/ceph/* /var/log/ceph/* /var/run/ceph/*

再次執行,成功:

[ceph-node1][INFO  ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-ceph-node1/keyring auth get client.bootstrap-rgw
[ceph_deploy.gatherkeys][INFO  ] Storing ceph.client.admin.keyring
[ceph_deploy.gatherkeys][INFO  ] Storing ceph.bootstrap-mds.keyring
[ceph_deploy.gatherkeys][INFO  ] Storing ceph.bootstrap-mgr.keyring
[ceph_deploy.gatherkeys][INFO  ] keyring 'ceph.mon.keyring' already exists
[ceph_deploy.gatherkeys][INFO  ] Storing ceph.bootstrap-osd.keyring
[ceph_deploy.gatherkeys][INFO  ] Storing ceph.bootstrap-rgw.keyring
[ceph_deploy.gatherkeys][INFO  ] Destroy temp directory /tmp/tmps6CzLR

驗證mon是否啟動成功

# ps -ef | grep ceph-mon 
ceph     23737     1  0 16:22 ?        00:00:00 /usr/bin/ceph-mon -f --cluster ceph --id ceph-node1 --setuser ceph --setgroup ceph

mon 初始化完成.

mon初始化完畢後,就可以檢視ceph叢集的狀態,可以多設定幾個管理端

將叢集配置檔案 以及 admin使用者的key傳送至目標機器/etc/ceph/,即可操作ceph叢集:

ceph-deploy admin ceph-node1 ceph-node2 ceph-node2

$ ll -h /etc/ceph/
total 8.0K
-rw------- 1 root root 151 Feb 12 16:35 ceph.client.admin.keyring
-rw-r--r-- 1 root root 265 Feb 12 16:35 ceph.conf
-rw------- 1 root root   0 Feb 12 16:22 tmppE21x5

## 檢視叢集狀態
$ sudo ceph -s 
  cluster:
    id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum ceph-node1 (age 14m)
    mgr: no daemons active
    osd: 0 osds: 0 up, 0 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     

現在只有一個mon

2.新增mgr服務

(1)安裝mgr包,每個mgr節點都裝

yum -y install ceph-mgr 

(2)新增mgr至叢集

$ ceph-deploy mgr create ceph-node1
[ceph-node1][INFO  ] Running command: sudo ceph --cluster ceph --name client.bootstrap-mgr --keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring auth get-or-create mgr.ceph-node1 mon allow profile mgr osd allow * mds allow * -o /var/lib/ceph/mgr/ceph-ceph-node1/keyring
[ceph-node1][INFO  ] Running command: sudo systemctl enable ceph-mgr@ceph-node1
[ceph-node1][WARNIN] Created symlink from /etc/systemd/system/ceph-mgr.target.wants/ceph-mgr@ceph-node1.service to /usr/lib/systemd/system/ceph-mgr@.service.
[ceph-node1][INFO  ] Running command: sudo systemctl start ceph-mgr@ceph-node1
[ceph-node1][INFO  ] Running command: sudo systemctl enable ceph.target

## 再次檢視ceph叢集狀態
# ceph -s 
  cluster:
    id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
    health: HEALTH_WARN
            Module 'restful' has failed dependency: No module named 'pecan'
            OSD count 0 < osd_pool_default_size 3
 
  services:
    mon: 1 daemons, quorum ceph-node1 (age 46m)
    mgr: ceph-node1(active, since 100s)
    osd: 0 osds: 0 up, 0 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     

這裡出現了三個告警:

  • Module 'restful' has failed dependency: No module named 'pecan'

  • Module 'restful' has failed dependency: No module named 'werkzeug'
    這個是mgr機器缺少pecan和werkzeug模組,可以找外網機器使用pip3下載好離線包和離線包後傳上來再安裝:https://pypi.tuna.tsinghua.edu.cn/simple/;https://pypi.org/simple。

  • OSD count 0 < osd_pool_default_size 3:
    osd中每個物件預設的副本數為3,此報警提示osd數量小於三個,這個可以暫時忽略。

3.初始化osd

## 先檢視目標主機可用的磁碟
$ ceph-deploy disk list ceph-node1
報錯:
[ceph_deploy][ERROR ] ExecutableNotFound: Could not locate executable 'ceph-disk' make sure it is installed and available on ceph-node1

後來檢視官網https://docs.ceph.com/en/pacific/ceph-volume/發現,在Ceph version 13.0.0時,ceph-disk已經被棄用,改用ceph-volume,檢視所有命令確實沒有ceph-disk只有ceph-volume

# locate ceph- |grep bin
/usr/bin/ceph-authtool
/usr/bin/ceph-bluestore-tool
/usr/bin/ceph-clsinfo
/usr/bin/ceph-conf
/usr/bin/ceph-crash
/usr/bin/ceph-dencoder
/usr/bin/ceph-deploy
/usr/bin/ceph-kvstore-tool
/usr/bin/ceph-mds
/usr/bin/ceph-mgr
/usr/bin/ceph-mon
/usr/bin/ceph-monstore-tool
/usr/bin/ceph-objectstore-tool
/usr/bin/ceph-osd
/usr/bin/ceph-osdomap-tool
/usr/bin/ceph-post-file
/usr/bin/ceph-rbdnamer
/usr/bin/ceph-run
/usr/bin/ceph-syn
/usr/sbin/ceph-create-keys
/usr/sbin/ceph-volume
/usr/sbin/ceph-volume-systemd

這樣看來,應該是我ceph-deploy版本和要部署的ceph版本不匹配,更換ceph-deploy版本為2.0.1.

$ ceph-deploy --version 
2.0.1

$ ceph-deploy disk list ceph-node1
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadmin/.cephdeploy.conf
[ceph-node1][DEBUG ] connection detected need for sudo
[ceph-node1][DEBUG ] connected to host: ceph-node1 
[ceph-node1][DEBUG ] detect platform information from remote host
[ceph-node1][DEBUG ] detect machine type
[ceph-node1][DEBUG ] find the location of an executable
[ceph-node1][INFO  ] Running command: sudo fdisk -l
[ceph-node1][INFO  ] Disk /dev/nvme1n1: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
[ceph-node1][INFO  ] Disk /dev/nvme0n1: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
[ceph-node1][INFO  ] Disk /dev/mapper/data-ceph--data1: 107.4 GB, 107374182400 bytes, 209715200 sectors
[ceph-node1][INFO  ] Disk /dev/mapper/data-ceph--data2: 107.4 GB, 107374182400 bytes, 209715200 sectors
[ceph-node1][INFO  ] Disk /dev/mapper/data-ceph--data3: 107.4 GB, 107374182400 bytes, 209715200 sectors
[ceph-node1][INFO  ] Disk /dev/mapper/data-ceph--data4: 107.4 GB, 107374182400 bytes, 209715200 sectors

初始化node節點:

$ 初始化node節點,其實就是安裝ceph、ceph-radosgw和一些相關基礎元件
$ ceph-deploy install --no-adjust-repos --nogpgcheck ceph-node1 ceph-node2 ceph-node3
    - --no-adjust-repos表示不將本機的repo檔案傳輸至目標機器,因為前邊已經手動配置了
    - --nogpgcheck不檢查yum的key

安裝osd服務:

## 在需要安裝osd的機器中執行
yum -y install ceph-osd ceph-common

擦除所有node節點要初始化為osd的盤的資料:

## 舉例,其他盤也要相同的操作
$ ceph-deploy disk zap ceph-node1 /dev/nvme0n1p3 /dev/nvme0n1p4 /dev/nvme0n1p5 /dev/nvme0n1p6
[ceph-node1][WARNIN] --> Zapping: /dev/nvme0n1p3
[ceph-node1][WARNIN] Running command: /bin/dd if=/dev/zero of=/dev/nvme0n1p3 bs=1M count=10 conv=fsync
[ceph-node1][WARNIN]  stderr: 10+0 records in
[ceph-node1][WARNIN] 10+0 records out
[ceph-node1][WARNIN] 10485760 bytes (10 MB) copied
[ceph-node1][WARNIN]  stderr: , 0.0221962 s, 472 MB/s
[ceph-node1][WARNIN] --> Zapping successful for: <Partition: /dev/nvme0n1p3>

開始建立osd:

## 舉例,其他盤也要相同的操作
$ ceph-deploy osd create ceph-node1 --data /dev/nvme0n1p3
$ ceph-deploy osd create ceph-node1 --data /dev/nvme0n1p4
...

osd會根據建立順序來進行編號命名,第一個為0,以此類推...

檢視的osd程式:

## ceph-node1程式
# ps -ef | grep ceph-osd
ceph       61629       1  0 17:16 ?        00:00:01 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
ceph       62896       1  0 17:17 ?        00:00:01 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
ceph       63569       1  0 17:18 ?        00:00:01 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph
ceph       64519       1  0 17:18 ?        00:00:01 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph

## ceph-node2程式
# ps -ef | grep osd 
ceph       64649       1  0 17:27 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 4 --setuser ceph --setgroup ceph
ceph       65423       1  0 17:27 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph
ceph       66082       1  0 17:28 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 6 --setuser ceph --setgroup ceph
ceph       66701       1  0 17:28 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 7 --setuser ceph --setgroup ceph

## ceph-node3程式
# ps -ef | grep osd 
ceph       30549       1  0 11:30 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 8 --setuser ceph --setgroup ceph
ceph       31270       1  0 11:30 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 9 --setuser ceph --setgroup ceph
ceph       32220       1  1 11:31 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 10 --setuser ceph --setgroup ceph
ceph       32931       1  1 11:31 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 11 --setuser ceph --setgroup ceph

osd編號為0-11,共12塊盤。

osd服務啟動完畢,再次檢視ceph叢集狀態:

$ ceph -s 
  cluster:
    id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum ceph-node1 (age 19h)
    mgr: ceph-node1(active, since 19h)
    osd: 12 osds: 12 up (since 99s), 12 in (since 99s)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 0 B
    usage:   12 GiB used, 1.2 TiB / 1.2 TiB avail
    pgs:     1 active+clean
    
## 預設存在一個pool,此pool是新增osd時系統自動建立的
$ ceph osd lspools 
1 device_health_metrics

$ ceph df 
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
ssd    1.2 TiB  1.2 TiB  9.7 MiB    12 GiB       1.00
TOTAL  1.2 TiB  1.2 TiB  9.7 MiB    12 GiB       1.00
 
--- POOLS ---
POOL                   ID  PGS  STORED  OBJECTS  USED  %USED  MAX AVAIL
device_health_metrics   1    1     0 B        2   0 B      0    376 GiB
至此,基本的ceph叢集已經搭建成功,rbd功能已經可以開始使用。
另外如果想要開啟物件儲存以及檔案系統的功能,還需要部署rgw、mds和cephfs。此時的mon、mgr等元件都沒實現高可用,先進行這些重要元件的橫向擴充套件。

4.擴充套件ceph-mon節點

(1)目標機器安裝ceph-mon元件

# yum -y install ceph-mon ceph-common

(2)新增mon機器

$ ceph-deploy mon add ceph-node2
$ ceph-deploy mon add ceph-node3

(3)檢查叢集狀態

$ ceph -s 
  cluster:
    id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 10m)
    mgr: ceph-node1(active, since 25h)
    osd: 12 osds: 12 up (since 6h), 12 in (since 6h)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 0 B
    usage:   12 GiB used, 1.2 TiB / 1.2 TiB avail
    pgs:     1 active+clean

## 可以使用此命令檢視mon的詳細資訊及狀態
$ ceph quorum_status --format json-pretty

現在mon已經成3個節點

5.擴充套件ceph-mgr節點

(1)目標機器安裝ceph-mgr元件

# yum -y install ceph-mgr ceph-common

(2)新增mgr機器

$ ceph-deploy mgr create ceph-node2

(3)驗證叢集狀態

$ ceph -s 
  cluster:
    id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 24m)
    mgr: ceph-node1(active, since 25h), standbys: ceph-node2
    osd: 12 osds: 12 up (since 6h), 12 in (since 6h)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 0 B
    usage:   12 GiB used, 1.2 TiB / 1.2 TiB avail
    pgs:     1 active+clean

mgr的高可用是主備形式的,而mon是叢集選主形式。

6.增加mds(後設資料服務)、cephfs提供檔案系統功能

mds服務為一個單獨儲存服務,想要正常執行必須要單獨指定兩個儲存池,一個用來儲存cephfs的後設資料,另一個用來儲存data資料,後設資料池主要儲存檔案目錄的大小名稱等後設資料,data池用來儲存實際檔案等。

(1)安裝mds安裝包

$ ceph-deploy mds create ceph-node1
$ ceph-deploy mds create ceph-node2

檢查ceph狀態

$ ceph -s 
  cluster:
    id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 18h)
    mgr: ceph-node1(active, since 43h), standbys: ceph-node2
    mds:  2 up:standby
    osd: 12 osds: 12 up (since 24h), 12 in (since 24h)
 
  task status:
 
  data:
    pools:   1 pools, 1 pgs
    objects: 3 objects, 0 B
    usage:   12 GiB used, 1.2 TiB / 1.2 TiB avail
    pgs:     1 active+clean

目前mds已經加入叢集,但是都處於standby(備用)狀態,因為mds必須要分別指定後設資料和資料的儲存池:

## 先建立後設資料池和data池,後邊是資料分別得pg和pgp的數量
$ ceph osd pool create cephfs-metedata 32 32 
pool 'cephfs-metedata' created
$ ceph osd pool create cephfs-data 64 64 
pool 'cephfs-data' created

$ ceph osd lspools 
1 device_health_metrics
2 cephfs-metedata
3 cephfs-data

(2)建立cephfs

$ ceph fs new mycephfs cephfs-metedata cephfs-data
new fs with metadata pool 2 and data pool 3

## 建立語法
ceph fs new <fs_name> <metadata> <data> [--force] [--allow-dangerous-metadata-overlay]

再次檢查ceph叢集狀態

$ ceph -s 
  cluster:
    id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 18h)
    mgr: ceph-node1(active, since 43h), standbys: ceph-node2
    mds: mycephfs:1 {0=ceph-node2=up:active} 1 up:standby
    osd: 12 osds: 12 up (since 24h), 12 in (since 24h)
 
  task status:
 
  data:
    pools:   3 pools, 97 pgs
    objects: 25 objects, 2.2 KiB
    usage:   12 GiB used, 1.2 TiB / 1.2 TiB avail
    pgs:     97 active+clean

$ ceph mds stat
mycephfs:1 {0=ceph-node2=up:active} 1 up:standby

cephfs功能搭建完畢

6.增加rgw元件,提供物件儲存功能

rgw提供的事REST介面,客戶端通過http與其互動,完成資料的增刪改查等管理操作。一般會有多個rgw保證高可用,rgw前邊掛一個負載均衡器進行分發。

(1)安裝rgw元件

# yum -y install ceph-radosgw

(2)部署rgw

$ ceph-deploy --overwrite-conf rgw create ceph-node1
$ ceph-deploy --overwrite-conf rgw create ceph-node2
[ceph-node1][INFO  ] Running command: sudo systemctl start ceph-radosgw@rgw.ceph-node1
[ceph-node1][INFO  ] Running command: sudo systemctl enable ceph.target
[ceph_deploy.rgw][INFO  ] The Ceph Object Gateway (RGW) is now running on host ceph-node1 and default port 7480

檢查ceph叢集狀態

$ ceph -s 
  cluster:
    id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 24h)
    mgr: ceph-node1(active, since 2d), standbys: ceph-node2
    mds: mycephfs:1 {0=ceph-node2=up:active} 1 up:standby
    osd: 12 osds: 12 up (since 30h), 12 in (since 30h)
    rgw: 2 daemons active (ceph-node1, ceph-node2)
 
  task status:
 
  data:
    pools:   7 pools, 201 pgs
    objects: 212 objects, 6.9 KiB
    usage:   12 GiB used, 1.2 TiB / 1.2 TiB avail
    pgs:     201 active+clean
 
  io:
    client:   35 KiB/s rd, 0 B/s wr, 34 op/s rd, 23 op/s wr

後續

  1. 因為初始化mon節點時,只初始化了一個,所以目前ceph.conf中還是隻有一個mon_host,導致並未實現高可用,需要重新獲取叢集資訊,重寫ceph.conf檔案:
$ ceph-deploy --overwrite-conf config push ceph-node1 ceph-node2 ceph-node3
$ cat /etc/ceph/ceph.conf 
[global]
fsid = 537175bb-51de-4cc4-9ee3-b5ba8842bff2
public_network = 10.0.0.0/8
cluster_network = 10.0.0.0/8
mon_initial_members = ceph-node1
mon_host = 10.153.204.13,10.130.22.45,10.153.204.28
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
  1. ceph叢集啟停順序

重啟之前,要提前設定 Ceph 叢集不要將 OSD 標記為 out,避免 node 節點關閉服務後被踢出 Ceph 叢集外,一旦被踢出去,ceph就會自動進行資料平衡:

## 設定noout
$ ceph osd set noout
noout is set

## 取消noout
$ ceph osd unset noout
noout is unset

停止順序:

  1. 關閉服務前設定 noout;
  2. 關閉儲存客戶端停止讀寫資料;
  3. 如果使用了 RGW,則關閉 RGW 服務 ;
  4. 關閉 CephFS 後設資料服務;
  5. 關閉 Ceph OSD 服務;
  6. 關閉 Ceph Manager 服務;
  7. 關閉 Ceph Monitor 服務;

啟動順序:

  1. 啟動 Ceph Monitor 服務;
  2. 啟動 Ceph Manager 服務;
  3. 啟動 Ceph OSD 服務;
  4. 啟動 CephFS 後設資料服務;
  5. 啟動 RGW 服務;
  6. 啟動儲存客戶端;
  7. 最後取消 noout 設定;

總結

到此為止,一個完整的高可用ceph叢集搭建完畢,現在僅僅是實現了搭建,下篇文章詳細介紹如何使用rbd、cephfs、和物件儲存功能。

ceph常見運維問題FAQ

1. osd下線流程

(1)如果osd機器還在正常執行,不是非正常下刪除osd,那首先要將此osd的權重設定為0,等待此osd的所有資料遷移出去並不再接受新資料。

$ ceph osd crush reweight osd.8 0
reweighted item id 8 name 'osd.8' to 0 in crush map

如果資料量過大,權重數值最好慢慢的調整,0.7->0.4>0.1>0,以保證ceph叢集最大的穩定性。

(2)停止osd程式

# systemctl stop ceph-osd@8.service

停止osd的程式,這個是通知叢集這個osd程式不在了,不提供服務了,因為本身沒權重,就不會影響到整體的分佈,也沒有資料遷移。

(3)將節點狀態標記為out

$ ceph osd out osd.8

這一步是告訴mon,這個節點已經不能服務了,需要在其他的osd上進行資料的恢復,但是前邊已經做了reweight,所以也不會有資料發生遷移。

(4)從crush表中移除節點

$ ceph osd crush remove osd.8
removed item id 8 name 'osd.8' from crush map

從crush中刪除是告訴叢集這個節點要完全剔除掉,讓叢集的crush進行一次重新計算,因為已經做了reweight,所以crush weight也已經成0。

(5)刪除osd節點

$ ceph osd rm osd.8
removed osd.8

從叢集裡面刪除這個節點的記錄

(6)刪除節點認證

$ ceph auth del osd.8
updated

這個認證如果不刪除,osd的編號會佔住不釋放。

(7)最後檢視叢集狀態

$ ceph -s 
  cluster:
    id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
    health: HEALTH_WARN
            Degraded data redundancy: 152/813 objects degraded (18.696%), 43 pgs degraded, 141 pgs undersized
 
  services:
    mon: 2 daemons, quorum ceph-node1,ceph-node2 (age 111s)
    mgr: ceph-node1(active, since 11d), standbys: ceph-node2
    mds: mycephfs:1 {0=ceph-node2=up:active} 1 up:standby
    osd: 8 osds: 8 up (since 3d), 8 in (since 3d); 124 remapped pgs
    rgw: 2 daemons active (ceph-node1, ceph-node2)
 
  task status:
 
  data:
    pools:   8 pools, 265 pgs
    objects: 271 objects, 14 MiB
    usage:   8.1 GiB used, 792 GiB / 800 GiB avail
    pgs:     152/813 objects degraded (18.696%)
             114/813 objects misplaced (14.022%)
             111 active+clean+remapped
             98  active+undersized
             43  active+undersized+degraded
             13  active+clean

因為ceph-node3節點上的osd服務全部被剔除了,所以現在osd節點還剩8個,由於這8個osd全部集中在兩臺主機中,所有有很多不是active+clean的pg,一旦有新機器的osd節點上線,pgp就會自動分佈。

2. mon下線流程

(1)檢視mon狀態

$ ceph mon stat  
e3: 3 mons at {ceph-node2=[v2:10.130.22.45:3300/0,v1:10.130.22.45:6789/0],ceph-node1=[v2:10.153.204.13:3300/0,v1:10.153.204.13:6789/0],ceph-node3=[v2:10.153.204.28:3300/0,v1:10.153.204.28:6789/0]}, election epoch 48, leader 0 ceph-node1, quorum 0,1 ceph-node1,ceph-node2

(2)停止mon

systemctl stop ceph-mon@ceph-node3

(3)移出mon

$ ceph mon remove ceph-node3
removing mon.ceph-node3 at [v2:10.153.204.28:3300/0,v1:10.153.204.28:6789/0], there will be 2 monitors

(4)在ceph.conf配置檔案中刪除mon.host欄位

$ cat ceph.conf
[global]
fsid = 537175bb-51de-4cc4-9ee3-b5ba8842bff2
public_network = 10.0.0.0/8
cluster_network = 10.0.0.0/8
mon_initial_members = ceph-node1
mon_host = 10.153.204.13,10.130.22.45
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

(5)再次檢視叢集狀態

$ ceph -s 
  cluster:
    id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
    health: HEALTH_WARN
            Degraded data redundancy: 152/813 objects degraded (18.696%), 43 pgs degraded, 141 pgs undersized
 
  services:
    mon: 2 daemons, quorum ceph-node1,ceph-node2 (age 111s)
    mgr: ceph-node1(active, since 11d), standbys: ceph-node2
    mds: mycephfs:1 {0=ceph-node2=up:active} 1 up:standby
    osd: 8 osds: 8 up (since 3d), 8 in (since 3d); 124 remapped pgs
    rgw: 2 daemons active (ceph-node1, ceph-node2)
 
  task status:
 
  data:
    pools:   8 pools, 265 pgs
    objects: 271 objects, 14 MiB
    usage:   8.1 GiB used, 792 GiB / 800 GiB avail
    pgs:     152/813 objects degraded (18.696%)
             114/813 objects misplaced (14.022%)
             111 active+clean+remapped
             98  active+undersized
             43  active+undersized+degraded
             13  active+clean

相關文章