一個恢復CSI掛載資訊的解決方法

問題描述

之前有做過一個華為OBS 的CSI外掛，其基本運作原理如下圖所示。CSI外掛Pod掛載了主機的/var/lib/kubelet/pods目錄，當建立掛載Pvc的業務Pod時，CSI外掛會啟動一個s3fs程式，該程式用於遠端連線s3服務，將bucket(也即Pvc)掛載到/var/lib/kubelet/pods中的對應Pod目錄下(一般為/var/lib/kubelet/pods//volumes/kubernetes.io~csi//mount)，然後由kubelet掛載到業務Pod中。

該外掛有個問題，就是當CSI外掛重啟之後，會丟失調內部負責遠端連線s3服務的s3fs程式，因此會導致業務Pod內部掛載目錄失效，訪問/var/lib/kubelet/pods//volumes/kubernetes.io~csi//mount目錄會出現Transport endpoint is not connected的問題。此時為了讓業務Pod能夠正常訪問，需要重啟業務Pod，但這種方式很不優雅。

解決思路

為了解決Transport endpoint is not connected問題，首先需要恢復s3fs程式，但恢復程式依賴幾個資料：Pvc的名稱、Pod的uid、s3服務的地址以及訪問使用的AK/SK等。有兩種方式可以儲存這類資料：

在CSI外掛正常執行過程中，將後設資料儲存到s3服務，其實就是將s3服務作為一個後設資料庫使用。但這種方式可能存在後設資料被誤刪以及後設資料和系統不一致的情況
在CSI外掛啟動後使用client-go動態獲取叢集中的相關資料

此次採用了第二種方式，執行思路為：

獲取所有名稱空間下的Pvc(allPvcs)
從allPvcs中找到Pvc的metadata.annotations.volume.beta.kubernetes.io/storage-provisioner 為目標storageclass的Pvc(targetPvcs)
獲取掛載了上述targetPvcs的pod(targetPods)
找到targetPods的uid(targetUid)
拼接掛載路徑/var/lib/kubelet/pods/<targetUid>/volumes/<targetUid>kubernetes.io~csi/<targetPvc-name>/mount
找到targetPvcs的spec.storageClassName，進而找出負責該Pvc的storageclass(targetStorageclass)
在targetStorageclass的parameters找到相關的資訊，最主要的是儲存訪問s3服務的AK/SK的secret(targetSecret)
從targetSecret中找到AK/SK
執行掛載

上述步驟的主要目的就是找出掛載路徑以及s3服務的訪問資訊。

實施過程

在完成編碼之後，經驗證發現/var/lib/kubelet/pods/<targetUid>/volumes/<targetUid>kubernetes.io~csi/<targetPvc-name>/mount掛載成功，進入該目錄之後可以看到bucket中的內容，但進入業務容器發現，目錄並沒有成功掛載。

需要提出的一點是，由於CSI外掛是被異常重啟的，導致掛載失效，並沒有執行標準的Unmounting 流程(即呼叫NodeUnpublishVolume方法)，因此在重新掛載之前首先需要umount掉原來的掛載點。

業務容器沒有掛載成功的原因是整個恢復流程並沒有觸發kubelet執行umount/mount來將pvc重新掛載到業務容器。解決方式與CSI外掛的/var/lib/kubelet/pods/<targetUid>/volumes/<targetUid>kubernetes.io~csi/<targetPvc-name>/mount一樣，執行umount在mount即可。但這麼做首先要知道Pod對映到主機上的掛載路徑，這樣就比較麻煩了，因為pod對映到主機上的路徑與使用的CRI相關，如果朝這一方向下去，難度比較大，在CSI Volume Plugins in Kubernetes Design Doc中也提過，正常情況下是由kubelet執行的：

The volume manager component of kubelet, notices a mounted CSI volume, referenced by a pod that has been deleted or terminated, so it calls the in-tree CSI volume plugin’s UnmountDevice method which is a no-op and returns immediately.

Next kubelet calls the in-tree CSI volume plugin’s unmount (teardown) method, which causes the in-tree volume plugin to issue a NodeUnpublishVolume call via the registered unix domain socket to the local CSI driver. If this call fails from any reason, kubelet re-tries the call periodically.

Upon successful completion of the NodeUnpublishVolume call the specified path is unmounted from the pod container.

那麼怎麼才能讓容器重新掛載成功呢？

只要重新觸發kubelet的掛載動作即可，可以採用livenessProbe來觸發該動作，方式如下：

apiVersion: v1
kind: Pod
metadata:
  name: csi-s3-test-nginx
  namespace: default
spec:
  containers:
   - name: csi-s3-test-nginx
     image: nginx
     livenessProbe:
       failureThreshold: 3
       initialDelaySeconds: 20
       periodSeconds: 5
       timeoutSeconds: 5
       exec:
         command:
         - ls
         - /var/lib/www/html
     volumeMounts:
       - mountPath: /var/lib/www/html
         name: webroot
       - mountPath: /var/lib/www/html2
         name: webroot2
  volumes:
   - name: webroot
     persistentVolumeClaim:
       claimName: csi-s3-pvc
       readOnly: false
   - name: webroot2
     persistentVolumeClaim:
       claimName: csi-s3-pvc2
       readOnly: false

如果使用這種方式的話，還需要執行之前恢復s3fs程式的操作嗎？答案是需要的，重啟只會觸發kubelet的動作，但不會觸發CSI重新掛載，因此恢復s3fs程式和livenessProbe都是同時需要的。

一個恢復CSI掛載資訊的解決方法