How OpenStack integrates with Ceph?

Elbarco發表於2020-01-13

What's Ceph

Ceph uniquely delivers object, block, and file storage in one unified system.

按照官方解釋,Ceph是一個提供了物件儲存、塊儲存和檔案系統服務的統一系統。具備良好的效能、可靠性和可擴充套件性。

Ceph最早起源於Sage Weil發表的一篇論文Ceph: A Scalable, High-Performance Distributed File System. OSDI 2006: 307-320. Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, Carlos Maltzahn:,隨後貢獻給了開源社群,經過數年的發展,目前得到了眾多雲端計算廠商的支援並廣泛應用。

Ceph 101

無論是希望在雲平臺中提供物件儲存服務塊儲存服務,還是希望部署一個Ceph檔案系統,所有的Ceph儲存叢集部署都要具備Ceph Node、網路(如儲存網)和Ceph儲存叢集本身。

Ceph儲存叢集包含下列元件(或服務):

How OpenStack integrates with Ceph?

  • Ceph Monitor - ceph-mon,管理叢集狀態的對映,包括monitor對映、管理對映、OSD對映和CRUSH對映。這些對映是非常重要的用於各個Ceph守護程式之間協作的叢集狀態資料,此外還負責daemons和clients之間的鑑權。通常至少需要3個monitors來滿足冗餘和高可用性。
  • Ceph Manager - ceph-mgr,負責跟蹤執行時指標和Ceph叢集的當前狀態,包括儲存率,當前效能指標和系統負載。Ceph Manager還提供了Python模組來管理和暴露Ceph叢集資訊,包括Ceph Dashboard和REST API。至少需要兩個manager來保證高可用。
  • Ceph OSD (Object Storage Deamon) - ceph-osd,儲存資料,處理資料賦值、恢復和重新平衡,並通過檢查其他Ceph OSD守護程式的心跳來向Ceph monitor和manager提供一些監控資訊。通常需要至少3個Ceph OSD才能實現冗餘和高可用性。
  • Ceph Metadata Server (Required when running Ceph File System clients) - ceph-mds,代表Ceph檔案系統儲存後設資料(即Ceph塊裝置和Ceph物件儲存不適用MDS)、Ceph的後設資料伺服器允許POSIX檔案系統的使用者來執行基本的命令(如ls、find等),而不用將這些負載置於Ceph儲存叢集之上。

Ceph將資料作為物件,儲存到邏輯儲存pool中。通過使用CRUSH演算法,Ceph計算得出哪一個placement group應該儲存物件,然後計算得出哪一個Ceph OSD應當儲存placement group。CRUSH演算法使得Ceph儲存叢集具備動態伸縮、重新平衡和回覆的能力。

所謂Ceph儲存叢集(Ceph Storage Cluster),是所有Ceph部署的基礎。基於RADOS的Ceph儲存叢集包含兩類守護程式:Ceph OSD Daemon,在儲存節點將資料最為物件儲存;Ceph Monitor維護了叢集對映的一個master copy。Ceph的檔案系統、物件儲存和塊儲存,在Ceph儲存叢集讀取和向Ceph儲存叢集寫入資料。

對於Ceph儲存來說,有以下幾個主要概念:

  • Pools - Ceph將資料儲存到pool中,這是一個邏輯的概念。Pool管理了placement groups的數量、副本集的數量和CRUSH rule。為了能夠訪問pool進行資料儲存,你必須提供使用者資訊已完成鑑權來訪問pool,更多的操作,可以參考Pools
  • Plaement Groups - Ceph將物件對映到Placement group,它是裸機物件池的切片或者片段,將物件作為一個組放置到OSD中。引入PG的概念是為了更好的分配資料和定位資料
  • CRUSH Maps - 是Ceph使用的資料分佈演算法,類似一致性雜湊,讓資料分配到預期的地方

Ceph installation

Ceph叢集化的安裝和部署,一般是比較複雜的,可以參考官方給出的安裝手冊:

Architecture

本節,我們粗略介紹一下Ceph的架構和基礎元件。

How OpenStack integrates with Ceph?

其中包含幾個元件(或者說服務介面):

  • RADOS - 從圖上可以看到,RADOS是Ceph的基礎,是一個可靠、自主、分散式的、可自我修復、自我管理的智慧儲存節點。可以參考:RADOS - A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.
  • Librados - 是RADOS提供的庫,支援應用直接訪問RADOS,目前提供了包括C、C++、Java、Python、Ruby和PHP的支援。
  • RADOSGW - Rados gateway, 簡稱RADOSGW,或者RGW。是Ceph對外提供的物件儲存服務,其RESTful API介面與S3和Swift相容。
  • RBD - Rados block device,是Ceph對外提供的塊儲存服務,支援塊裝置調整大小、精簡配置塊、快照和克隆功能。Ceph支援核心物件(KO)和直接使用librbd的QEMU管理程式——避免了虛擬化系統的核心開銷。
  • CephFS - Ceph file system,提供了與POSIX相容的檔案系統,可與掛載一起使用或用做使用者空間(FUSE)中的檔案系統。

How OpenStack integrates with Ceph?

RBD interfaces and tools

從前文中,我們已經知道了塊儲存服務通過librbd通過librados提供了Ceph客戶端與RBD互動的API,主要提供一下介面:

cephx

cephx預設啟用,用於提供使用者校驗和鑑權。即我們如果使用rbd命令訪問ceph的時候,需要提供使用者名稱或者ID,以及keyring檔案,如如下命令:

rbd --id {user-ID} --keyring=/path/to/secret [commands]
rbd --name {username} --keyring=/path/to/secret [commands]

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --keyring=/etc/ceph/ceph.client.glance.keyring ls images
01b9995a-e212-42f7-b11f-1beda23a24b8
複製程式碼

其中同時指定pool的話,可以使用--pool <pool_name>或者-p <poo_name>

[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images --keyring=/etc/ceph/ceph.client.glance.keyring ls
01b9995a-e212-42f7-b11f-1beda23a24b8
複製程式碼

如果指定了id的話,也可以不用指定keyring檔案,預設會在指定目錄下尋找指定使用者的keyring檔案:

[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images ls
01b9995a-e212-42f7-b11f-1beda23a24b8
複製程式碼

pool

  • 建立塊儲存池
rbd pool init <pool-name>
複製程式碼
  • 建立使用者
ceph auth get-or-create client.{ID} mon 'profile rbd' osd 'profile {profile name} [pool={pool-name}][, profile ...]' mgr 'profile rbd [pool={pool-name}]'
複製程式碼

image

  • 建立映象
rbd create --size {megabytes} {pool-name}/{image-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance create --size 1024 images/f-test
複製程式碼
  • 檢視映象列表,檢視標記刪除的映象:
rbd ls
rbd ls {poolname}
rbd trash ls 
rbd trash ls {poolname}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images ls
f-test
複製程式碼
  • 檢視映象資訊
rbd info {image-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images info f-test
rbd image 'f-test':
	size 1GiB in 256 objects
	order 22 (4MiB objects)
	block_name_prefix: rbd_data.e756a86b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	flags:
	create_timestamp: Thu Dec  5 14:36:47 2019
複製程式碼
  • resize映象大小
rbd resize --size 2048 foo (to increase)
rbd resize --size 2048 foo --allow-shrink (to decrease)

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images resize --size 2048 f-test
Resizing image: 100% complete...done.
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images info f-test
rbd image 'f-test':
	size 2GiB in 512 objects
	order 22 (4MiB objects)
	block_name_prefix: rbd_data.e756a86b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	flags:
	create_timestamp: Thu Dec  5 14:36:47 2019
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images resize --size 1024 f-test --allow-shrink
Resizing image: 100% complete...done.
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images info f-test
rbd image 'f-test':
	size 1GiB in 256 objects
	order 22 (4MiB objects)
	block_name_prefix: rbd_data.e756a86b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	flags:
	create_timestamp: Thu Dec  5 14:36:47 2019
複製程式碼
  • 刪除或標記刪除映象
rbd rm {image-name}
rbd rm {pool-name}/{image-name}
# 標記刪除
rbd trash mv {pool-name}/{image-name}
# 刪除標記刪除的映象
rbd trash rm {pool-name}/{image-id}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images trash mv f-test
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images trash ls
e756a86b8b4567 f-test
複製程式碼
  • 恢復映象
rbd trash restore {image-id}
rbd trash restore {pool-name}/{image-id}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images trash restore e756a86b8b4567
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images trash ls
[root@gd02-control-11e115e64e13 ~]#
複製程式碼

snapshot

  • 指定映象建立snapshot
rbd snap create {pool-name}/{image-name}@{snap-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images snap create f-test@f-test-snap
複製程式碼

因為我們在命令中指定了pool,所以沒有在後面的引數上帶上pool name

  • 檢視映象的快照列表
rbd snap ls {pool-name}/{image-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images snap ls f-test
SNAPID NAME        SIZE TIMESTAMP
    67 f-test-snap 1GiB Thu Dec  5 15:24:17 2019
複製程式碼
  • 回滾快照。回滾快照指的是使用快照的資料覆蓋當前image。快照回滾的耗時隨著映象size的增加而增加。所以更快的方式是從snapshot克隆出一個映象,而不是使用快照回滾映象
rbd snap rollback {pool-name}/{image-name}@{snap-name}
複製程式碼
  • 刪除快照
rbd snap rm {pool-name}/{image-name}@{snap-name}
複製程式碼
  • 刪除映象的所有快照
rbd snap purge {pool-name}/{image-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images snap purge f-test
Removing all snapshots: 100% complete...done.
複製程式碼

更多關於映象layering的介紹,可以參考rbd snapshot layering,關於copy-on-write後面專門學習下吧,這裡繼續快照的介紹。

  • 保護/取消保護快照,保護快照的原因是因為clone快照時需要訪問父快照,如果使用者不小心刪除了父快照,會造成clone中斷。為了避免資料丟失,clone快照前必須保護快照:
rbd snap protect {pool-name}/{image-name}@{snapshot-name}
rbd snap unprotect {pool-name}/{image-name}@{snapshot-name}
複製程式碼
  • 克隆快照,注意,克隆快照必須指定目標映象的poolname
rbd clone {pool-name}/{parent-image}@{snap-name} {pool-name}/{child-image-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images clone f-test@f-test-snap images/f-test-child
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images ls
f-test
f-test-child
複製程式碼
  • 檢視快照的children
rbd children {pool-name}/{image-name}@{snapshot-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance -p images children f-test@f-test-snap
images/f-test-child
複製程式碼
  • flatten克隆的映象。克隆的映象維護了一個對父快照的引用。當從子映象中移除父快照的引用,就是執行了一個flatten操作,將父快照的資訊拷貝到子克隆映象中。操作耗時隨著快照的增加size的增加而增加。如果要刪除快照,必須先對子映象做flatten操作
rbd flatten {pool-name}/{image-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images info f-test-child
rbd image 'f-test-child':
	size 1GiB in 256 objects
	order 22 (4MiB objects)
	block_name_prefix: rbd_data.e766cf6b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	flags:
	create_timestamp: Thu Dec  5 16:32:29 2019
	parent: images/f-test@f-test-snap
	overlap: 1GiB
[root@gd02-control-11e115e64e13 ~]# rbd --id glance -p images flatten f-test-child
Image flatten: 100% complete...done.
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images info f-test-child
rbd image 'f-test-child':
	size 1GiB in 256 objects
	order 22 (4MiB objects)
	block_name_prefix: rbd_data.e766cf6b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	flags:
	create_timestamp: Thu Dec  5 16:32:29 2019
複製程式碼

Intergrations

在OpenStack中,可以通過libvirt配置QEMU介面為librbd,來使用Ceph的塊儲存服務。結構如下:

How OpenStack integrates with Ceph?

在OpenStack中,有三處使用了Ceph的塊儲存服務:

  • 映象服務Glance,管理虛機映象,映象是immutable的
  • 卷服務Cinder,管理虛機的塊裝置
  • Guest Disks,即虛機的塊裝置,包括系統盤、config drive、ephemeral等。預設情況下,虛機的塊裝置都存放在計算節點本地的/var/lib/nova/instance/<uuid>目錄下,而之前想使用Ceph來管理虛機的塊裝置只能通過建立卷啟動的虛機,將虛機的塊裝置交由Cinder管理。現在好了,可以通過配置images_type=rbd,可以實現Nova直接將塊裝置儲存到Ceph上。通過這種場景,可以非常輕鬆的完成熱遷移、疏散的執行。

那我們就來看看,Nova、Cinder、Glance是如何整合Ceph的。

Configuration

Create pools

預設Ceph的塊儲存的服務使用rbdpool,建議是為各個服務建立單獨的pool,比如:

ceph osd pool create volumes
ceph osd pool create images
ceph osd pool create backups
ceph osd pool create vms
複製程式碼

更多的配置,比如配置PG,可以參考

新建立的pool必須先進性初始化才能使用(不一定要滿足下面四個池子,可以參考真是的線上部署情況):

rbd pool init volumes
rbd pool init images
rbd pool init backups
rbd pool init vms
複製程式碼

另外,檢視叢集中有哪些pool,可以在ceph管理節點執行ceph osd lspools進行檢視:

root@hb02-other-10e114e194e61 ~]# ll /etc/ceph/
total 12
-rw------- 1 ceph ceph  71 Mar 15  2019 ceph.client.admin.keyring
-rw-r--r-- 1 root root 722 Mar 15  2019 ceph.conf
-rw-r--r-- 1 root root  92 Jan 31  2019 rbdmap
[root@hb02-other-10e114e194e61 ~]# ceph osd lspools
1 images,2 backups,3 volumes,4 .rgw.root,5 default.rgw.control,6 default.rgw.meta,7 default.rgw.log,8 default.rgw.buckets.index,9 default.rgw.buckets.data,
複製程式碼

Configure OpenStack Ceph clients

執行glance-api、cinder-volume、cinder-backup和nova-compute的節點都可以看做是Ceph的客戶端,所以需要ceph.conf配置檔案:

ssh {your-openstack-server} sudo tee /etc/ceph/ceph.conf </etc/ceph/ceph.conf
複製程式碼

如,我們環境中某計算節點的ceph.conf:

[global]
fsid = a7849998-270b-40d0-93e8-6d1106a5b799
public_network = ****
cluster_network = ****
mon_initial_members =****
mon_host = ****
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

rbd_cache = false
# BEGIN ANSIBLE MANAGED BLOCK
[client]
rbd cache = false
rbd cache writethrough until flush = false
cache size =  67108864
rbd cache max dirty = 0
rbd cache max dirty age = 0
admin socket = /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok
log file = /var/log/qemu/qemu-guest-$pid.log
rbd concurrent management ops = 20
# END ANSIBLE MANAGED BLOCK
複製程式碼

注意,我們沒有開啟RBD cache功能。

安裝需要的packages,如Python bindings和命令列工具:

sudo yum install python-rbd
sudo yum install ceph-common
複製程式碼

配置客戶端身份驗證:

ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images' mgr 'profile rbd pool=images'
ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd-read-only pool=images' mgr 'profile rbd pool=volumes'
ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups' mgr 'profile rbd pool=backups'
複製程式碼

將指定的client.cinder, client.glance和client.cinder-backup的keyring檔案,新增到適當的節點:

# ceph auth get-or-create client.glance | ssh {your-glance-api-server} sudo tee /etc/ceph/ceph.client.glance.keyring
# ssh {your-glance-api-server} sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring
# ceph auth get-or-create client.cinder | ssh {your-volume-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
# ssh {your-cinder-volume-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
# ceph auth get-or-create client.cinder-backup | ssh {your-cinder-backup-server} sudo tee /etc/ceph/ceph.client.cinder-backup.keyring
# ssh {your-cinder-backup-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder-backup.keyring
複製程式碼

將cinder的keyring檔案傳到nova-compute節點:

ceph auth get-or-create client.cinder | ssh {your-nova-compute-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
複製程式碼

還需要將client.cinder使用者的祕鑰儲存到libvirt中,因為libvirt程式需要使用這個祕鑰訪問cinder中的塊裝置:

ceph auth get-key client.cinder | ssh {your-compute-node} tee client.cinder.key
複製程式碼

那麼在Nova中需要要整合Ceph,則需要配置一下配置項:

# /etc/nova/nova.conf

[libvirt]
images_type = rbd
images_rbd_pool = volumes
images_rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_user = cinder
複製程式碼

Code

上面只是在配置中介紹瞭如何在OpenStack中繼承了Ceph,下面我們針對程式碼層面,講一下Nova中是如何與Ceph整合互動的。

首先,在OpenStack中,使用的是librbd的python module來操作的——

連線到RADOS,並開啟一個IO context:

import rados
import rbd

cluster = rados.Rados(rados_id='cinder', conffile='/etc/ceph/ceph.conf')
cluster.conf_set('key','AQBkfUldodI2NBAAJcf7VnwebGWc1YNH0Njisg==')
cluster.connect()
ioctx = cluster.open_ioctx('volumes')
複製程式碼

初始化一個RBD物件,建立image:

rbd_inst = rbd.RBD()
size = 1 * 1024**3  # 1 GiB
rbd_inst.create(ioctx, 'f-test-librbd', size)
複製程式碼

如果需要對image執行IO操作,如寫入600 bytes資料,注意data不能是unicode型別,只能是一個char型別,則:

image = rbd.Image(ioctx, 'f-test-librbd')
data = 'foo' * 200
image.write(data, 0)
複製程式碼

最後,關掉image、IO context和RADOS連線:

image.close()
ioctx.close()
cluster.shutdown()
複製程式碼

完整的檔案如下:

import rados
import rbd

cluster = rados.Rados(rados_id='cinder', conffile='/etc/ceph/ceph.conf')
cluster.conf_set('key','AQBkfUldodI2NBAAJcf7VnwebGWc1YNH0Njisg==')
cluster.connect()
ioctx = cluster.open_ioctx('volumes')

rbd_inst = rbd.RBD()
size = 1 * 1024**3  # 4 GiB
rbd_inst.create(ioctx, 'f-test-librbd', size)

image = rbd.Image(ioctx, 'f-test-librbd')
data = 'foo' * 200
image.write(data, 0)

image.close()
ioctx.close()
cluster.shutdown()
複製程式碼

執行後,通過rbd命令檢視映象:

[root@gd02-compute-11e115e64e11 fan]# rbd --id cinder -p volumes info f-test-librbd
2019-12-05 20:03:10.064 7fdcbbfa8b00 -1 asok(0x5633c89f3290) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.cinder.4171166.94780409205944.asok': (2) No such file or directory
rbd image 'f-test-librbd':
	size 1 GiB in 256 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: e76d524885b8e9
	block_name_prefix: rbd_data.e76d524885b8e9
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	op_features:
	flags:
	create_timestamp: Thu Dec  5 19:59:46 2019
	access_timestamp: Thu Dec  5 19:59:46 2019
	modify_timestamp: Thu Dec  5 19:59:46 2019
複製程式碼

為了安全起見,每一個關閉的呼叫可以在finally中執行:

cluster = rados.Rados(conffile='my_ceph_conf')
try:
    cluster.connect()
    ioctx = cluster.open_ioctx('my_pool')
    try:
        rbd_inst = rbd.RBD()
        size = 4 * 1024**3  # 4 GiB
        rbd_inst.create(ioctx, 'myimage', size)
        image = rbd.Image(ioctx, 'myimage')
        try:
            data = 'foo' * 200
            image.write(data, 0)
        finally:
            image.close()
    finally:
        ioctx.close()
finally:
    cluster.shutdown()
複製程式碼

此外,Rados、Ioctx和Image類還可以用做context managers,具備自動關閉的功能,如:

with rados.Rados(conffile='my_ceph.conf') as cluster:
    with cluster.open_ioctx('mypool') as ioctx:
        rbd_inst = rbd.RBD()
        size = 4 * 1024**3  # 4 GiB
        rbd_inst.create(ioctx, 'myimage', size)
        with rbd.Image(ioctx, 'myimage') as image:
            data = 'foo' * 200
            image.write(data, 0)
複製程式碼

API Reference

Nova

在Nova中,在檔案nova/nova/virt/libvirt/storage/rbd_utils.py中,封裝了一些rbd通用方法,如構造RBDDriver和構造RADOSClient連線Ceph:

class RbdProxy(object):
    """A wrapper around rbd.RBD class instance to avoid blocking of process.

    Offloads all calls to rbd.RBD class methods to native OS threads, so that
    we do not block the whole process while executing the librbd code.

    """

    def __init__(self):
        self._rbd = tpool.Proxy(rbd.RBD())

    def __getattr__(self, attr):
        return getattr(self._rbd, attr)


class RBDVolumeProxy(object):
    """Context manager for dealing with an existing rbd volume.

    This handles connecting to rados and opening an ioctx automatically, and
    otherwise acts like a librbd Image object.

    The underlying librados client and ioctx can be accessed as the attributes
    'client' and 'ioctx'.
    """
    def __init__(self, driver, name, pool=None, snapshot=None,
                 read_only=False):
        client, ioctx = driver._connect_to_rados(pool)
        try:
            self.volume = tpool.Proxy(rbd.Image(ioctx, name,
                                                snapshot=snapshot,
                                                read_only=read_only))
        except rbd.ImageNotFound:
            with excutils.save_and_reraise_exception():
                LOG.debug("rbd image %s does not exist", name)
                driver._disconnect_from_rados(client, ioctx)
        except rbd.Error:
            with excutils.save_and_reraise_exception():
                LOG.exception(_("error opening rbd image %s"), name)
                driver._disconnect_from_rados(client, ioctx)

        self.driver = driver
        self.client = client
        self.ioctx = ioctx

    def __enter__(self):
        return self

    def __exit__(self, type_, value, traceback):
        try:
            self.volume.close()
        finally:
            self.driver._disconnect_from_rados(self.client, self.ioctx)

    def __getattr__(self, attrib):
        return getattr(self.volume, attrib)


class RBDDriver(object):

    def __init__(self, pool, ceph_conf, rbd_user, rbd_key=None):
        self.pool = pool
        # NOTE(angdraug): rados.Rados fails to connect if ceph_conf is None:
        # https://github.com/ceph/ceph/pull/1787
        self.ceph_conf = ceph_conf or ''
        self.rbd_user = rbd_user or None
        self.rbd_key = rbd_key or None
        if rbd is None:
            raise RuntimeError(_('rbd python libraries not found'))

    # 連線到Rados,並返回client和ioctx
    def _connect_to_rados(self, pool=None):
        client = rados.Rados(rados_id=self.rbd_user,
                                  conffile=self.ceph_conf)
        if self.rbd_key:
            client.conf_set('key', self.rbd_key)
        try:
            client.connect()
            pool_to_open = pool or self.pool
            # NOTE(luogangyi): open_ioctx >= 10.1.0 could handle unicode
            # arguments perfectly as part of Python 3 support.
            # Therefore, when we turn to Python 3, it's safe to remove
            # str() conversion.
            ioctx = client.open_ioctx(str(pool_to_open))
            return client, ioctx
        except rados.Error:
            # shutdown cannot raise an exception
            client.shutdown()
            raise
    #...

class RADOSClient(object):
    """Context manager to simplify error handling for connecting to ceph."""
    def __init__(self, driver, pool=None):
        self.driver = driver
        self.cluster, self.ioctx = driver._connect_to_rados(pool)

    def __enter__(self):
        return self

    def __exit__(self, type_, value, traceback):
        self.driver._disconnect_from_rados(self.cluster, self.ioctx)

    @property
    def features(self):
        features = self.cluster.conf_get('rbd_default_features')
        if ((features is None) or (int(features) == 0)):
            features = rbd.RBD_FEATURE_LAYERING
        return int(features)
複製程式碼

此外還構造了rbd.RBD()、rbd.Image()物件,來完成卷或者映象的操作,如:

# RbdProxy管理了rbd.RBD()物件,進行卷的操作。nova.virt.libvirt.storage.rbd_utils.RBDDriver#cleanup_volumes
    def cleanup_volumes(self, filter_fn):
        with RADOSClient(self, self.pool) as client:
            volumes = RbdProxy().list(client.ioctx)
            for volume in filter(filter_fn, volumes):
                self._destroy_volume(client, volume)

# RBDVolumeProxy物件其實管理了rbd.Image()物件,來建立快照。nova.virt.libvirt.storage.rbd_utils.RBDDriver#create_snap
    def create_snap(self, volume, name, pool=None, protect=False):
        """Create a snapshot of an RBD volume.

        :volume: Name of RBD object
        :name: Name of snapshot
        :pool: Name of pool
        :protect: Set the snapshot to "protected"
        """
        LOG.debug('creating snapshot(%(snap)s) on rbd image(%(img)s)',
                  {'snap': name, 'img': volume})
        with RBDVolumeProxy(self, str(volume), pool=pool) as vol:
            vol.create_snap(name)
            if protect and not vol.is_protected_snap(name):
                vol.protect_snap(name)
複製程式碼

Reference

  1. Ceph offcial doc
  2. 理解Ceph
  3. Ceph from scratch

相關文章