external-resizer 原始碼分析/pvc 擴容分析

良凱爾發表於2021-07-18

kubernetes ceph-csi分析目錄導航

基於tag v0.5.0

https://github.com/kubernetes-csi/external-resizer/releases/tag/v0.5.0

儲存擴容過程

儲存擴容分為controller端和node端兩步,先進行controller端(external-resizer觸發)的擴容,然後再進行node端(kubelet觸發)擴容(當volumemode是block,則不用進行node端擴容操作),儲存的擴容操作才算完成。

controller端儲存擴容作用

將底層儲存擴容,如ceph rbd擴容,則會讓ceph叢集中的rbd image擴容。

node端儲存擴容作用

在pod所在的node上做相應的操作,讓node感知該儲存已經擴容,如ceph rbd filesystem擴容,則會呼叫node上的檔案系統擴容命令讓檔案系統擴容。

某些儲存無需進行node端擴容操作如cephfs。

儲存擴容大致過程

(1)更改pvc.Spec.Resources.Requests.storgage,觸發擴容

(2)controller端儲存擴容:external-resizer watch pvc物件,當發現pvc.Spec.Resources.Requests.storgage比pvc.Status.Capacity.storgage大,於是調csi plugin的ControllerExpandVolume方法進行 controller端擴容,進行底層儲存擴容,並更新pv.Spec.Capacity.storgage。

(3)node端儲存擴容:kubelet發現pv.Spec.Capacity.storage大於pvc.Status.Capacity.storage,於是調csi node端擴容,對dnode上檔案系統擴容,成功後kubelet更新pvc.Status.Capacity.storage。

儲存擴容詳細過程

下面以ceph rbd儲存擴容為例,對詳細的儲存擴容過程進行分析。

(1)修改pvc物件,修改申請儲存大小(pvc.spec.resources.requests.storage);

(2)修改成功後,external-resizer監聽到該pvc的update事件,發現pvc.Spec.Resources.Requests.storgage比pvc.Status.Capacity.storgage大,於是調ceph-csi元件進行 controller端擴容;

(3)ceph-csi元件呼叫ceph儲存,進行底層儲存擴容;

(4)底層儲存擴容完成後,ceph-csi元件更新pv物件的.Spec.Capacity.storgage的值為擴容後的儲存大小;

(5)kubelet的volume manager在reconcile()調諧過程中發現pv.Spec.Capacity.storage大於pvc.Status.Capacity.storage,於是調ceph-csi元件進行 node端擴容;

(6)ceph-csi元件對node上儲存對應的檔案系統擴容;

(7)擴容完成後,kubelet更新pvc.Status.Capacity.storage的值為擴容後的儲存大小。

本節將對controller端儲存擴容進行分析,node端儲存擴容已經在之前有分析過了,可以看kubelet pvc儲存擴容相關程式碼分析

controller端儲存擴容

當pvc.Spec.Resources.Requests大小比pvc.Status.Capacity.storgage大時,會觸發到controller端(external-resizer)的擴容邏輯。

controller端(external-resizer)的擴容操作包括:
(1)呼叫csi plugin的ControllerExpandVolume方法進行儲存擴容;
(2)更新pv物件的.spec.capacity.storage為擴容後的儲存大小;
(3)更新pvc物件的.Status.Conditions,追加鍵值對"FileSystemResizePending":"true",表明該pvc的controller端擴容已經完成,接下來將由kubelet完成node端的儲存擴容操作。

原始碼分析

Run

主要邏輯:根據workers的數量,起數量相等的goroutine不斷的跑ctrl.syncPVCs來處理pvc變更事件,篩選出需要擴容的pvc,觸發擴容操作。

// pkg/controller/controller.go

// Run starts the controller.
func (ctrl *resizeController) Run(
	workers int, ctx context.Context) {
	defer ctrl.claimQueue.ShutDown()

	klog.Infof("Starting external resizer %s", ctrl.name)
	defer klog.Infof("Shutting down external resizer %s", ctrl.name)

	stopCh := ctx.Done()

	if !cache.WaitForCacheSync(stopCh, ctrl.pvSynced, ctrl.pvcSynced) {
		klog.Errorf("Cannot sync pv/pvc caches")
		return
	}

	for i := 0; i < workers; i++ {
		go wait.Until(ctrl.syncPVCs, 0, stopCh)
	}

	<-stopCh
}

1.syncPVCs

主要邏輯:呼叫ctrl.syncPVC

// syncPVCs is the main worker.
func (ctrl *resizeController) syncPVCs() {
	key, quit := ctrl.claimQueue.Get()
	if quit {
		return
	}
	defer ctrl.claimQueue.Done(key)

	if err := ctrl.syncPVC(key.(string)); err != nil {
		// Put PVC back to the queue so that we can retry later.
		ctrl.claimQueue.AddRateLimited(key)
	} else {
		ctrl.claimQueue.Forget(key)
	}
}

1.1 syncPVC

處理擴容判斷邏輯與執行擴容操作。

主要邏輯:
(1)獲取pvc物件;
(2)呼叫ctrl.pvcNeedResize從pvc物件層面判斷是否需要擴容;
(3)獲取pv物件;
(4)呼叫ctrl.pvNeedResize對比pvc與pv物件判斷是否需要擴容;
(5)如需擴容,則呼叫ctrl.resizePVC做擴容操作。

// syncPVC checks if a pvc requests resizing, and execute the resize operation if requested.
func (ctrl *resizeController) syncPVC(key string) error {
	klog.V(4).Infof("Started PVC processing %q", key)

	namespace, name, err := cache.SplitMetaNamespaceKey(key)
	if err != nil {
		klog.Errorf("Split meta namespace key of pvc %s failed: %v", key, err)
		return err
	}

	pvc, err := ctrl.pvcLister.PersistentVolumeClaims(namespace).Get(name)
	if err != nil {
		if k8serrors.IsNotFound(err) {
			klog.V(3).Infof("PVC %s/%s is deleted, no need to process it", namespace, name)
			return nil
		}
		klog.Errorf("Get PVC %s/%s failed: %v", namespace, name, err)
		return err
	}

	if !ctrl.pvcNeedResize(pvc) {
		klog.V(4).Infof("No need to resize PVC %q", util.PVCKey(pvc))
		return nil
	}

	pv, err := ctrl.pvLister.Get(pvc.Spec.VolumeName)
	if err != nil {
		if k8serrors.IsNotFound(err) {
			klog.V(3).Infof("PV %s is deleted, no need to process it", pvc.Spec.VolumeName)
			return nil
		}
		klog.Errorf("Get PV %q of pvc %q failed: %v", pvc.Spec.VolumeName, util.PVCKey(pvc), err)
		return err
	}

	if !ctrl.pvNeedResize(pvc, pv) {
		klog.V(4).Infof("No need to resize PV %q", pv.Name)
		return nil
	}

	return ctrl.resizePVC(pvc, pv)
}

下面先分析下pvcNeedResize與pvNeedResize方法。

pvcNeedResize

當pvc.Status.Phase==Bound而且pvc.Spec.Resources.Requests.storgage大小比pvc.Status.Capacity.storgage大時返回true,說明符合擴容條件。

// pvcNeedResize returns true is a pvc requests a resize operation.
func (ctrl *resizeController) pvcNeedResize(pvc *v1.PersistentVolumeClaim) bool {
	// Only Bound pvc can be expanded.
	if pvc.Status.Phase != v1.ClaimBound {
		return false
	}
	if pvc.Spec.VolumeName == "" {
		return false
	}
	actualSize := pvc.Status.Capacity[v1.ResourceStorage]
	requestSize := pvc.Spec.Resources.Requests[v1.ResourceStorage]
	return requestSize.Cmp(actualSize) > 0
}
pvNeedResize

當pv.Spec.Resources.Requests.storgage大於或者等於pvc.Status.Capacity.storgage,且pvc的.Status.Conditions中有key為"FileSystemResizePending",值為“true”的鍵值對,則說明controller端擴容已完成,該方法返回false;相反的,如果pv.Spec.Resources.Requests.storgage小於pvc.Status.Capacity.storgage,則說明controller端未做擴容操作,需要進行擴容操作,返回true。

擴容分為controller端和node端,先進行controller端(external-resizer觸發)的擴容,然後再進行node端(kubelet觸發)擴容,擴容操作才算完成。

// pvNeedResize returns true if a pv supports and also requests resize.
func (ctrl *resizeController) pvNeedResize(pvc *v1.PersistentVolumeClaim, pv *v1.PersistentVolume) bool {
	if !ctrl.resizer.CanSupport(pv, pvc) {
		klog.V(4).Infof("Resizer %q doesn't support PV %q", ctrl.name, pv.Name)
		return false
	}

	if (pv.Spec.ClaimRef == nil) || (pvc.Namespace != pv.Spec.ClaimRef.Namespace) || (pvc.UID != pv.Spec.ClaimRef.UID) {
		klog.V(4).Infof("persistent volume is not bound to PVC being updated: %s", util.PVCKey(pvc))
		return false
	}

	pvSize := pv.Spec.Capacity[v1.ResourceStorage]
	requestSize := pvc.Spec.Resources.Requests[v1.ResourceStorage]
	if pvSize.Cmp(requestSize) >= 0 {
		// If PV size is equal or bigger than request size, that means we have already resized PV.
		// In this case we need to check PVC's condition.
		// 1. If PVC in PersistentVolumeClaimResizing condition, we should continue to perform the
		//    resizing operation as we need to know if file system resize if required. (What's more,
		//    we hope the driver can find that the actual size already matched the request size and do nothing).
		// 2. If PVC in PersistentVolumeClaimFileSystemResizePending condition, we need to
		//    do nothing as kubelet will finish file system resizing and mark resize as finished.
		if util.HasFileSystemResizePendingCondition(pvc) {
			// This is case 2.
			return false
		}
		// This is case 1.
		return true
	}

	// PV size is smaller than request size, we need to resize the volume.
	return true
}

當controller端擴容已經完成時,util.HasFileSystemResizePendingCondition返回true。主要根據pvc.Status.Conditions中key為"FileSystemResizePending",值為“true”來判斷。

const (
	// PersistentVolumeClaimFileSystemResizePending - controller resize is finished and a file system resize is pending on node
	PersistentVolumeClaimFileSystemResizePending PersistentVolumeClaimConditionType = "FileSystemResizePending"
)


// HasFileSystemResizePendingCondition returns true if a pvc has a FileSystemResizePending condition.
// This means the controller side resize operation is finished, and kublete side operation is in progress.
func HasFileSystemResizePendingCondition(pvc *v1.PersistentVolumeClaim) bool {
	for _, condition := range pvc.Status.Conditions {
		if condition.Type == v1.PersistentVolumeClaimFileSystemResizePending && condition.Status == v1.ConditionTrue {
			return true
		}
	}
	return false
}

1.1.1 resizePVC

該方法負責擴容操作的邏輯。

主要邏輯:
(1)呼叫ctrl.markPVCResizeInProgress,更新pvc物件的.Status.Conditions,追加鍵值對"Resizing":"true",表明該pvc正在進行resize;
(2)呼叫ctrl.resizeVolume做擴容操作;
(3)擴容完成,呼叫ctrl.markPVCResizeFinished,更新pvc物件的.Status.Conditions,追加鍵值對"FileSystemResizePending":"true",表明該pvc的controller端擴容已經完成。

// resizePVC will:
// 1. Mark pvc as resizing.
// 2. Resize the volume and the pv object.
// 3. Mark pvc as resizing finished(no error, no need to resize fs), need resizing fs or resize failed.
func (ctrl *resizeController) resizePVC(pvc *v1.PersistentVolumeClaim, pv *v1.PersistentVolume) error {
	if updatedPVC, err := ctrl.markPVCResizeInProgress(pvc); err != nil {
		klog.Errorf("Mark pvc %q as resizing failed: %v", util.PVCKey(pvc), err)
		return err
	} else if updatedPVC != nil {
		pvc = updatedPVC
	}

	// Record an event to indicate that external resizer is resizing this volume.
	ctrl.eventRecorder.Event(pvc, v1.EventTypeNormal, util.VolumeResizing,
		fmt.Sprintf("External resizer is resizing volume %s", pv.Name))

	err := func() error {
		newSize, fsResizeRequired, err := ctrl.resizeVolume(pvc, pv)
		if err != nil {
			return err
		}

		if fsResizeRequired {
			// Resize volume succeeded and need to resize file system by kubelet, mark it as file system resizing required.
			return ctrl.markPVCAsFSResizeRequired(pvc)
		}
		// Resize volume succeeded and no need to resize file system by kubelet, mark it as resizing finished.
		return ctrl.markPVCResizeFinished(pvc, newSize)
	}()

	if err != nil {
		// Record an event to indicate that resize operation is failed.
		ctrl.eventRecorder.Eventf(pvc, v1.EventTypeWarning, util.VolumeResizeFailed, err.Error())
	}

	return err
}
resizeVolume

主要邏輯:
(1)呼叫ctrl.resizer.Resize進行儲存擴容;
(2)呼叫util.UpdatePVCapacity更新pv的.spec.capacity.storage。

// resizeVolume resize the volume to request size, and update PV's capacity if succeeded.
func (ctrl *resizeController) resizeVolume(
	pvc *v1.PersistentVolumeClaim,
	pv *v1.PersistentVolume) (resource.Quantity, bool, error) {
	requestSize := pvc.Spec.Resources.Requests[v1.ResourceStorage]

	newSize, fsResizeRequired, err := ctrl.resizer.Resize(pv, requestSize)

	if err != nil {
		klog.Errorf("Resize volume %q by resizer %q failed: %v", pv.Name, ctrl.name, err)
		return newSize, fsResizeRequired, fmt.Errorf("resize volume %s failed: %v", pv.Name, err)
	}
	klog.V(4).Infof("Resize volume succeeded for volume %q, start to update PV's capacity", pv.Name)

	if err := util.UpdatePVCapacity(pv, newSize, ctrl.kubeClient); err != nil {
		klog.Errorf("Update capacity of PV %q to %s failed: %v", pv.Name, newSize.String(), err)
		return newSize, fsResizeRequired, err
	}
	klog.V(4).Infof("Update capacity of PV %q to %s succeeded", pv.Name, newSize.String())

	return newSize, fsResizeRequired, nil
}

ctrl.resizer.Resize:組裝請求,呼叫r.client.Expand進行儲存擴容(實際是呼叫csi plugin的ControllerExpandVolume方法)

// Resize resizes the persistence volume given request size
// It supports both CSI volume and migrated in-tree volume
func (r *csiResizer) Resize(pv *v1.PersistentVolume, requestSize resource.Quantity) (resource.Quantity, bool, error) {
	oldSize := pv.Spec.Capacity[v1.ResourceStorage]

	var volumeID string
	var source *v1.CSIPersistentVolumeSource
	var pvSpec v1.PersistentVolumeSpec
	if pv.Spec.CSI != nil {
		// handle CSI volume
		source = pv.Spec.CSI
		volumeID = source.VolumeHandle
		pvSpec = pv.Spec
	} else {
		if csitranslationlib.IsMigratedCSIDriverByName(r.name) {
			// handle migrated in-tree volume
			csiPV, err := csitranslationlib.TranslateInTreePVToCSI(pv)
			if err != nil {
				return oldSize, false, fmt.Errorf("failed to translate persistent volume: %v", err)
			}
			source = csiPV.Spec.CSI
			pvSpec = csiPV.Spec
			volumeID = source.VolumeHandle
		} else {
			// non-migrated in-tree volume
			return oldSize, false, fmt.Errorf("volume %v is not migrated to CSI", pv.Name)
		}
	}

	if len(volumeID) == 0 {
		return oldSize, false, errors.New("empty volume handle")
	}

	var secrets map[string]string
	secreRef := source.ControllerExpandSecretRef
	if secreRef != nil {
		var err error
		secrets, err = getCredentials(r.k8sClient, secreRef)
		if err != nil {
			return oldSize, false, err
		}
	}

	secrets[pvCephMountPathKey] = pv.Annotations[pvCephMountPathKey]

	capability, err := GetVolumeCapabilities(pvSpec)
	if err != nil {
		return oldSize, false, fmt.Errorf("failed to get capabilities of volume %s with %v", pv.Name, err)
	}

	ctx, cancel := timeoutCtx(r.timeout)
	defer cancel()
	newSizeBytes, nodeResizeRequired, err := r.client.Expand(ctx, volumeID, requestSize.Value(), secrets, capability)
	if err != nil {
		return oldSize, nodeResizeRequired, err
	}

	return *resource.NewQuantity(newSizeBytes, resource.BinarySI), nodeResizeRequired, err
}

// pkg/csi/client.go
func (c *client) Expand(
	ctx context.Context,
	volumeID string,
	requestBytes int64,
	secrets map[string]string,
	capability *csi.VolumeCapability) (int64, bool, error) {
	req := &csi.ControllerExpandVolumeRequest{
		Secrets:          secrets,
		VolumeId:         volumeID,
		CapacityRange:    &csi.CapacityRange{RequiredBytes: requestBytes},
		VolumeCapability: capability,
	}
	resp, err := c.ctrlClient.ControllerExpandVolume(ctx, req)
	if err != nil {
		return 0, false, err
	}
	return resp.CapacityBytes, resp.NodeExpansionRequired, nil
}

util.UpdatePVCapacity:更新pv物件的.spec.capacity.storage為擴容後的大小。

// UpdatePVCapacity updates PVC capacity with requested size.
func UpdatePVCapacity(pv *v1.PersistentVolume, newCapacity resource.Quantity, kubeClient kubernetes.Interface) error {
	newPV := pv.DeepCopy()
	newPV.Spec.Capacity[v1.ResourceStorage] = newCapacity
	patchBytes, err := getPatchData(pv, newPV)
	if err != nil {
		return fmt.Errorf("can't update capacity of PV %s as generate path data failed: %v", pv.Name, err)
	}
	_, updateErr := kubeClient.CoreV1().PersistentVolumes().Patch(pv.Name, types.StrategicMergePatchType, patchBytes)
	if updateErr != nil {
		return fmt.Errorf("update capacity of PV %s failed: %v", pv.Name, updateErr)
	}
	return nil
}

至此,external-resizer的擴容分析結束。

總結

儲存擴容分為controller端和node端兩步,先進行controller端(external-resizer觸發)的擴容,然後再進行node端(kubelet觸發)擴容(當volumemode是block,則不用進行node端擴容操作),儲存的擴容操作才算完成。

controller端儲存擴容作用

將底層儲存擴容,如ceph rbd擴容,則會讓ceph叢集中的rbd image擴容。

node端儲存擴容作用

在pod所在的node上做相應的操作,讓node感知該儲存已經擴容,如ceph rbd filesystem擴容,則會呼叫node上的檔案系統擴容命令讓檔案系統擴容。

某些儲存無需進行node端擴容操作如cephfs。

儲存擴容大致過程

(1)更改pvc.Spec.Resources.Requests.storgage,觸發擴容

(2)controller端儲存擴容:external-resizer watch pvc物件,當發現pvc.Spec.Resources.Requests.storgage比pvc.Status.Capacity.storgage大,於是調csi plugin的ControllerExpandVolume方法進行 controller端擴容,進行底層儲存擴容,並更新pv.Spec.Capacity.storgage。

(3)node端儲存擴容:kubelet發現pv.Spec.Capacity.storage大於pvc.Status.Capacity.storage,於是調csi node端擴容,對dnode上檔案系統擴容,成功後kubelet更新pvc.Status.Capacity.storage

controller端儲存擴容過程

controller端(external-resizer)的主要擴容操作包括:
(1)呼叫csi plugin的ControllerExpandVolume方法進行儲存擴容;
(2)更新pv物件的.spec.capacity.storage為擴容後的儲存大小;
(3)更新pvc物件的.Status.Conditions,追加鍵值對"FileSystemResizePending":"true",表明該pvc的controller端擴容已經完成,接下來將由kubelet完成node端的儲存擴容操作。

node端儲存擴容

kubelet pvc儲存擴容相關程式碼分析

相關文章