原文連結:https://fuckcloudnative.io/posts/how-to-back-up-all-of-your-grafana-dashboards/
目前我們 k8s 叢集的 Grafana
使用 ceph 作為持久化儲存,一但我將 Grafana 的 Deployment 刪除重建之後,之前的所有資料都會丟失,重建的 PV 會對映到後端儲存的新位置。萬幸的是,我真的手欠重建了,還沒有提前備份。。。萬幸個鬼啊我。
在我歷經 250 分鐘重建 Dashboard 之後,心裡久久不能平靜,一句 MMP 差點就要脫口而出。
1. 低階方案
再這樣下去我真的要變成 250 了,這怎麼能忍,立馬開啟 Google 研究了一把 Grafana 備份的各種騷操作,發現大部分備份方案都是通過 shell
指令碼呼叫 Grafana 的 API
來匯出各種配置。備份指令碼大部分都集中在這個 gist 中:
我挑選出幾個比較好用的,大家也可以自行挑選其他的。
匯出指令碼
#!/bin/bash
# Usage:
#
# export_grafana_dashboards.sh https://admin:REDACTED@grafana.dedevsecops.com
create_slug () {
echo "$1" | iconv -t ascii//TRANSLIT | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z
}
full_url=$1
username=$(echo "${full_url}" | cut -d/ -f 3 | cut -d: -f 1)
base_url=$(echo "${full_url}" | cut -d@ -f 2)
folder=$(create_slug "${username}-${base_url}")
mkdir "${folder}"
for db_uid in $(curl -s "${full_url}/api/search" | jq -r .[].uid); do
db_json=$(curl -s "${full_url}/api/dashboards/uid/${db_uid}")
db_slug=$(echo "${db_json}" | jq -r .meta.slug)
db_title=$(echo "${db_json}" | jq -r .dashboard.title)
filename="${folder}/${db_slug}.json"
echo "Exporting \"${db_title}\" to \"${filename}\"..."
echo "${db_json}" | jq -r . > "${filename}"
done
echo "Done"
這個指令碼比較簡單,直接匯出了所有 Dashboard 的 json
配置,也沒有標記目錄資訊,如果你用它匯出的配置來恢復 Grafana,所有的 Dashboard 都會匯入到 Grafana 的 General
目錄下,不太友好。
匯入指令碼
grafana-dashboard-importer.sh
#!/bin/bash
#
# add the "-x" option to the shebang line if you want a more verbose output
#
#
OPTSPEC=":hp:t:k:"
show_help() {
cat << EOF
Usage: $0 [-p PATH] [-t TARGET_HOST] [-k API_KEY]
Script to import dashboards into Grafana
-p Required. Root path containing JSON exports of the dashboards you want imported.
-t Required. The full URL of the target host
-k Required. The API key to use on the target host
-h Display this help and exit.
EOF
}
###### Check script invocation options ######
while getopts "$OPTSPEC" optchar; do
case "$optchar" in
h)
show_help
exit
;;
p)
DASH_DIR="$OPTARG";;
t)
HOST="$OPTARG";;
k)
KEY="$OPTARG";;
\?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
:)
echo "Option -$OPTARG requires an argument." >&2
exit 1
;;
esac
done
if [ -z "$DASH_DIR" ] || [ -z "$HOST" ] || [ -z "$KEY" ]; then
show_help
exit 1
fi
# set some colors for status OK, FAIL and titles
SETCOLOR_SUCCESS="echo -en \\033[0;32m"
SETCOLOR_FAILURE="echo -en \\033[1;31m"
SETCOLOR_NORMAL="echo -en \\033[0;39m"
SETCOLOR_TITLE_PURPLE="echo -en \\033[0;35m" # purple
# usage log "string to log" "color option"
function log_success() {
if [ $# -lt 1 ]; then
${SETCOLOR_FAILURE}
echo "Not enough arguments for log function! Expecting 1 argument got $#"
exit 1
fi
timestamp=$(date "+%Y-%m-%d %H:%M:%S %Z")
${SETCOLOR_SUCCESS}
printf "[%s] $1\n" "$timestamp"
${SETCOLOR_NORMAL}
}
function log_failure() {
if [ $# -lt 1 ]; then
${SETCOLOR_FAILURE}
echo "Not enough arguments for log function! Expecting 1 argument got $#"
exit 1
fi
timestamp=$(date "+%Y-%m-%d %H:%M:%S %Z")
${SETCOLOR_FAILURE}
printf "[%s] $1\n" "$timestamp"
${SETCOLOR_NORMAL}
}
function log_title() {
if [ $# -lt 1 ]; then
${SETCOLOR_FAILURE}
log_failure "Not enough arguments for log function! Expecting 1 argument got $#"
exit 1
fi
${SETCOLOR_TITLE_PURPLE}
printf "|-------------------------------------------------------------------------|\n"
printf "|%s|\n" "$1";
printf "|-------------------------------------------------------------------------|\n"
${SETCOLOR_NORMAL}
}
if [ -d "$DASH_DIR" ]; then
DASH_LIST=$(find "$DASH_DIR" -mindepth 1 -name \*.json)
if [ -z "$DASH_LIST" ]; then
log_title "----------------- $DASH_DIR contains no JSON files! -----------------"
log_failure "Directory $DASH_DIR does not appear to contain any JSON files for import. Check your path and try again."
exit 1
else
FILESTOTAL=$(echo "$DASH_LIST" | wc -l)
log_title "----------------- Starting import of $FILESTOTAL dashboards -----------------"
fi
else
log_title "----------------- $DASH_DIR directory not found! -----------------"
log_failure "Directory $DASH_DIR does not exist. Check your path and try again."
exit 1
fi
NUMSUCCESS=0
NUMFAILURE=0
COUNTER=0
for DASH_FILE in $DASH_LIST; do
COUNTER=$((COUNTER + 1))
echo "Import $COUNTER/$FILESTOTAL: $DASH_FILE..."
RESULT=$(cat "$DASH_FILE" | jq '. * {overwrite: true, dashboard: {id: null}}' | curl -s -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $KEY" "$HOST"/api/dashboards/db -d @-)
if [[ "$RESULT" == *"success"* ]]; then
log_success "$RESULT"
NUMSUCCESS=$((NUMSUCCESS + 1))
else
log_failure "$RESULT"
NUMFAILURE=$((NUMFAILURE + 1))
fi
done
log_title "Import complete. $NUMSUCCESS dashboards were successfully imported. $NUMFAILURE dashboard imports failed.";
log_title "------------------------------ FINISHED ---------------------------------";
匯入指令碼需要目標機器上的 Grafana 已經啟動,而且需要提供管理員 API Key。登入 Grafana Web 介面,開啟 API Keys:
新建一個 API Key,角色選擇 Admin
,過期時間自己調整:
匯入方式:
$ ./grafana-dashboard-importer.sh -t http://<grafana_svc_ip>:<grafana_svc_port> -k <api_key> -p <backup folder>
其中 -p
引數指定的是之前匯出的 json 所在的目錄。
目前的方案痛點在於只能備份 Dashboard,不能備份其他的配置(例如,資料來源、使用者、祕鑰等),而且沒有將 Dashboard 和目錄對應起來,即不支援備份 Folder
。下面介紹一個比較完美的備份恢復方案,支援所有配置的備份恢復,簡直不要太香。
2. 高階方案
更高階的方案已經有人寫好了,專案地址是:
該備份工具支援以下幾種配置:
- 目錄
- Dashboard
- 資料來源
- Grafana 告警頻道(Alert Channel)
- 組織(Organization)
- 使用者(User)
使用方法很簡單,跑個容器就好了嘛,不過作者提供的 Dockerfile
我不是很滿意,自己修改了點內容:
FROM alpine:latest
LABEL maintainer="grafana-backup-tool Docker Maintainers https://fuckcloudnative.io"
ENV ARCHIVE_FILE ""
RUN echo "@edge http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories; \
apk --no-cache add python3 py3-pip py3-cffi py3-cryptography ca-certificates bash git; \
git clone https://github.com/ysde/grafana-backup-tool /opt/grafana-backup-tool; \
cd /opt/grafana-backup-tool; \
pip3 --no-cache-dir install .; \
chown -R 1337:1337 /opt/grafana-backup-tool
WORKDIR /opt/grafana-backup-tool
USER 1337
只有 Dockerfile
不行,還得通過 CI/CD
自動構建並推送到 docker.io
。不要問我用什麼,當然是白嫖 GitHub Action
,workflow
內容如下:
#=================================================
# https://github.com/yangchuansheng/docker-image
# Description: Build and push grafana-backup-tool Docker image
# Lisence: MIT
# Author: Ryan
# Blog: https://fuckcloudnative.io
#=================================================
name: Build and push grafana-backup-tool Docker image
# Controls when the action will run. Triggers the workflow on push or pull request
# events but only for the master branch
on:
push:
branches: [ master ]
paths:
- 'grafana-backup-tool/Dockerfile'
- '.github/workflows/grafana-backup-tool.yml'
pull_request:
branches: [ master ]
paths:
- 'grafana-backup-tool/Dockerfile'
#watch:
#types: started
# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
# This workflow contains a single job called "build"
build:
# The type of runner that the job will run on
runs-on: ubuntu-latest
# Steps represent a sequence of tasks that will be executed as part of the job
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout@v2
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Login to GitHub Package Registry
env:
username: ${{ github.repository_owner }}
password: ${{ secrets.GHCR_TOKEN }}
run: echo ${{ env.password }} | docker login ghcr.io -u ${{ env.username }} --password-stdin
# Runs a single command using the runners shell
- name: Build and push Docker images to docker.io and ghcr.io
uses: docker/build-push-action@v2
with:
file: 'grafana-backup-tool/Dockerfile'
platforms: linux/386,linux/amd64,linux/arm/v6,linux/arm/v7,linux/arm64,linux/ppc64le,linux/s390x
context: grafana-backup-tool
push: true
tags: |
yangchuansheng/grafana-backup-tool:latest
ghcr.io/yangchuansheng/grafana-backup-tool:latest
#- name: Update repo description
#uses: peter-evans/dockerhub-description@v2
#env:
#DOCKERHUB_USERNAME: ${{ secrets.DOCKER_USERNAME }}
#DOCKERHUB_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
#DOCKERHUB_REPOSITORY: yangchuansheng/grafana-backup-tool
#README_FILEPATH: grafana-backup-tool/readme.md
這裡我不打算解釋 workflow 的內容,有點基礎的應該都能看懂,實在不行,以後我會單獨寫文章解釋(又可以繼續水文了~)。這個 workflow 實現的功能就是自動構建各個 CPU 架構的映象,並推送到 docker.io
和 ghcr.io
,特麼的真香!
就問爽不爽?
你可以直接關注我的倉庫:
構建好映象後,就可以直接執行容器來進行備份和恢復操作了。如果你想在叢集內操作,可以通過 Deployment 或 Job 來實現;如果你想在本地或 k8s 叢集外操作,可以選擇 docker run,我不反對,你也可以選擇 docker-compose,這都沒問題。但我要告訴你一個更騷的辦法,可以騷到讓你無法自拔。
首先需要在本地或叢集外安裝 Podman,如果作業系統是 Win10
,可以考慮通過 WSL
來安裝;如果作業系統是 Linux,那就不用說了;如果作業系統是 MacOS,請參考我的上篇文章:在 macOS 中使用 Podman。
裝好了 Podman 之後,就可以進行騷操作了,請睜大眼睛。
先編寫一個 Deployment 配置清單(什麼?Deployment?是的,你沒聽錯):
grafana-backup-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana-backup
labels:
app: grafana-backup
spec:
replicas: 1
selector:
matchLabels:
app: grafana-backup
template:
metadata:
labels:
app: grafana-backup
spec:
containers:
- name: grafana-backup
image: yangchuansheng/grafana-backup-tool:latest
imagePullPolicy: IfNotPresent
command: ["/bin/bash"]
tty: true
stdin: true
env:
- name: GRAFANA_TOKEN
value: "eyJr0NkFBeWV1QVpMNjNYWXA3UXNOM2JWMWdZOTB2ZFoiLCJuIjoiYWRtaW4iLCJpZCI6MX0="
- name: GRAFANA_URL
value: "http://<grafana_ip>:<grafana_port>"
- name: GRAFANA_ADMIN_ACCOUNT
value: "admin"
- name: GRAFANA_ADMIN_PASSWORD
value: "admin"
- name: VERIFY_SSL
value: "False"
volumeMounts:
- mountPath: /opt/grafana-backup-tool
name: data
volumes:
- name: data
hostPath:
path: /mnt/manifest/grafana/backup
這裡面的環境變數根據自己的實際情況修改,一定不要照抄我的!
不要一臉懵逼,我先來解釋一下為什麼要準備這個 Deployment 配置清單,因為 Podman 可以直接通過這個配置清單執行容器,命令如下:
$ podman play kube grafana-backup-deployment.yaml
我第一次見到這個操作的時候也不禁連連我艹,這也可以?確實可以,不過呢,Podman 只是將其翻譯一下,跑個容器而已,並不是真正執行 Deployment
,因為它沒有控制器啊,但是,還是真香!
想象一下,你可以將 k8s 叢集中的配置清單拿到本地或測試機器直接跑,再也不用 k8s 叢集準備一份 yaml,docker-compose
再準備一份 yaml 了,一份 yaml 走天下,服不服?
docker-compose
混到今天這個地步,也是蠻可憐的。
細心的讀者應該能發現上面的配置清單有點奇怪,Dockerfile
也有點奇怪。Dockerfile 中沒有寫 CMD
或 ENTRYPOINT
,Deployment 中直接將啟動命令設定為 bash,這是因為在我之前測試的過程中發現該映象啟動的容器有點問題,它會陷入一個迴圈,備份完了之後又會繼續備份,不斷重複,導致備份目錄下生成了一坨壓縮包。目前還沒找到比較好的解決辦法,只能將容器的啟動命令設定為 bash,等容器執行後再進入容器進行備份操作:
$ podman pod ls
POD ID NAME STATUS CREATED # OF CONTAINERS INFRA ID
728aec216d66 grafana-backup-pod-0 Running 3 minutes ago 2 92aa0824fe7d
$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b523fa8e4819 yangchuansheng/grafana-backup-tool:latest /bin/bash 3 minutes ago Up 3 minutes ago grafana-backup-pod-0-grafana-backup
92aa0824fe7d k8s.gcr.io/pause:3.2 3 minutes ago Up 3 minutes ago 728aec216d66-infra
$ podman exec -it grafana-backup-pod-0-grafana-backup bash
bash-5.0$ grafana-backup save
...
...
########################################
backup folders at: _OUTPUT_/folders/202012111556
backup datasources at: _OUTPUT_/datasources/202012111556
backup dashboards at: _OUTPUT_/dashboards/202012111556
backup alert_channels at: _OUTPUT_/alert_channels/202012111556
backup organizations at: _OUTPUT_/organizations/202012111556
backup users at: _OUTPUT_/users/202012111556
created archive at: _OUTPUT_/202012111556.tar.gz
預設情況下會備份所有的元件,你也可以指定備份的元件:
$ grafana-backup save --components=<folders,dashboards,datasources,alert-channels,organizations,users>
比如,我只想備份 Dashboards 和 Folders:
$ grafana-backup save --components=folders,dashboards
當然,你也可以全部備份,恢復的時候再選擇自己想恢復的元件:
$ grafana-backup restore --components=folders,dashboards
至此,再也不用怕 Dashboard 被改掉或刪除啦。
最後提醒一下,Prometheus Operator 專案中的 Grafana 通過 Provisioning 的方式預匯入了一些預設的 Dashboards,這本來沒有什麼問題,但 grafana-backup-tool
工具無法忽略跳過已經存在的配置,如果恢復的過程中遇到已經存在的配置,會直接報錯退出。本來這也很好解決,一般情況下到 Grafana Web 介面中刪除所有的 Dashboard 就好了,但通過 Provisioning 匯入的 Dashboard 是無法刪除的,這就很尷尬了。
在作者修復這個 bug 之前,要想解決這個問題,有兩個辦法:
第一個辦法是在恢復之前將 Grafana Deployment 中關於 Provisioning 的配置全部刪除,就是這些配置:
volumeMounts:
- mountPath: /etc/grafana/provisioning/datasources
name: grafana-datasources
readOnly: false
- mountPath: /etc/grafana/provisioning/dashboards
name: grafana-dashboards
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/apiserver
name: grafana-dashboard-apiserver
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/cluster-total
name: grafana-dashboard-cluster-total
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/controller-manager
name: grafana-dashboard-controller-manager
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/k8s-resources-cluster
name: grafana-dashboard-k8s-resources-cluster
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/k8s-resources-namespace
name: grafana-dashboard-k8s-resources-namespace
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/k8s-resources-node
name: grafana-dashboard-k8s-resources-node
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/k8s-resources-pod
name: grafana-dashboard-k8s-resources-pod
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/k8s-resources-workload
name: grafana-dashboard-k8s-resources-workload
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/k8s-resources-workloads-namespace
name: grafana-dashboard-k8s-resources-workloads-namespace
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/kubelet
name: grafana-dashboard-kubelet
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/namespace-by-pod
name: grafana-dashboard-namespace-by-pod
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/namespace-by-workload
name: grafana-dashboard-namespace-by-workload
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/node-cluster-rsrc-use
name: grafana-dashboard-node-cluster-rsrc-use
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/node-rsrc-use
name: grafana-dashboard-node-rsrc-use
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/nodes
name: grafana-dashboard-nodes
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/persistentvolumesusage
name: grafana-dashboard-persistentvolumesusage
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/pod-total
name: grafana-dashboard-pod-total
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/prometheus-remote-write
name: grafana-dashboard-prometheus-remote-write
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/prometheus
name: grafana-dashboard-prometheus
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/proxy
name: grafana-dashboard-proxy
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/scheduler
name: grafana-dashboard-scheduler
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/statefulset
name: grafana-dashboard-statefulset
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/workload-total
name: grafana-dashboard-workload-total
readOnly: false
...
...
volumes:
- name: grafana-datasources
secret:
secretName: grafana-datasources
- configMap:
name: grafana-dashboards
name: grafana-dashboards
- configMap:
name: grafana-dashboard-apiserver
name: grafana-dashboard-apiserver
- configMap:
name: grafana-dashboard-cluster-total
name: grafana-dashboard-cluster-total
- configMap:
name: grafana-dashboard-controller-manager
name: grafana-dashboard-controller-manager
- configMap:
name: grafana-dashboard-k8s-resources-cluster
name: grafana-dashboard-k8s-resources-cluster
- configMap:
name: grafana-dashboard-k8s-resources-namespace
name: grafana-dashboard-k8s-resources-namespace
- configMap:
name: grafana-dashboard-k8s-resources-node
name: grafana-dashboard-k8s-resources-node
- configMap:
name: grafana-dashboard-k8s-resources-pod
name: grafana-dashboard-k8s-resources-pod
- configMap:
name: grafana-dashboard-k8s-resources-workload
name: grafana-dashboard-k8s-resources-workload
- configMap:
name: grafana-dashboard-k8s-resources-workloads-namespace
name: grafana-dashboard-k8s-resources-workloads-namespace
- configMap:
name: grafana-dashboard-kubelet
name: grafana-dashboard-kubelet
- configMap:
name: grafana-dashboard-namespace-by-pod
name: grafana-dashboard-namespace-by-pod
- configMap:
name: grafana-dashboard-namespace-by-workload
name: grafana-dashboard-namespace-by-workload
- configMap:
name: grafana-dashboard-node-cluster-rsrc-use
name: grafana-dashboard-node-cluster-rsrc-use
- configMap:
name: grafana-dashboard-node-rsrc-use
name: grafana-dashboard-node-rsrc-use
- configMap:
name: grafana-dashboard-nodes
name: grafana-dashboard-nodes
- configMap:
name: grafana-dashboard-persistentvolumesusage
name: grafana-dashboard-persistentvolumesusage
- configMap:
name: grafana-dashboard-pod-total
name: grafana-dashboard-pod-total
- configMap:
name: grafana-dashboard-prometheus-remote-write
name: grafana-dashboard-prometheus-remote-write
- configMap:
name: grafana-dashboard-prometheus
name: grafana-dashboard-prometheus
- configMap:
name: grafana-dashboard-proxy
name: grafana-dashboard-proxy
- configMap:
name: grafana-dashboard-scheduler
name: grafana-dashboard-scheduler
- configMap:
name: grafana-dashboard-statefulset
name: grafana-dashboard-statefulset
- configMap:
name: grafana-dashboard-workload-total
name: grafana-dashboard-workload-total
第二個辦法就是刪除 Prometheus Operator 自帶的 Grafana,自己通過 Helm
或者 manifest
部署不使用 Provisioning
的 Grafana。
如果你既不想刪除 Provisioning 的配置,也不想自己部署 Grafana,那隻能使用上文提到的低階方案了。
Kubernetes 1.18.2 1.17.5 1.16.9 1.15.12離線安裝包釋出地址http://store.lameleg.com ,歡迎體驗。 使用了最新的sealos v3.3.6版本。 作了主機名解析配置優化,lvscare 掛載/lib/module解決開機啟動ipvs載入問題, 修復lvscare社群netlink與3.10核心不相容問題,sealos生成百年證照等特性。更多特性 https://github.com/fanux/sealos 。歡迎掃描下方的二維碼加入釘釘群 ,釘釘群已經整合sealos的機器人實時可以看到sealos的動態。