打造雲原生大型分散式監控系統 (三): Thanos 部署與實踐

imroc-github發表於2020-04-19

原文網址 : https://gocn.vip/topics/10272

分散式

視訊

附上本系列視訊:

概述

上一篇 Thanos 架構詳解我們深入理解了 thanos 的架構設計與實現原理，現在我們來聊聊實戰，分享一下如何部署和使用 Thanos。

部署方式

本文聚焦 Thanos 的雲原生部署方式，充分利用 Kubernetes 的資源排程與動態擴容能力。從官方這裡可以看到，當前 thanos 在 Kubernetes 上部署有以下三種：

prometheus-operator: 叢集中安裝了 prometheus-operator 後，就可以通過建立 CRD 物件來部署 Thanos 了。
社群貢獻的一些 helm charts: 很多個版本，目標都是能夠使用 helm 來一鍵部署 thanos。
kube-thanos: Thanos 官方的開源專案，包含部署 thanos 到 kubernetes 的 jsonnet 模板與 yaml 示例。

本文將使用基於 kube-thanos 提供的 yaml 示例 (examples/all/manifests) 來部署，原因是 prometheus-operator 與社群的 helm chart 方式部署多了一層封裝，遮蔽了許多細節，並且它們的實現都還不太成熟；直接使用 kubernetes 的 yaml 資原始檔部署更直觀，也更容易做自定義，而且我相信使用 thanos 的使用者通常都是高玩了，也有必要對 thanos 理解透徹，日後才好根據實際場景做架構和配置的調整，直接使用 yaml 部署能夠讓我們看清細節。

方案選型

Sidecar or Receiver

看了上一篇文章的同學應該知道，目前官方的架構圖用的 Sidecar 方案，Receiver 是一個暫時還沒有完全釋出的元件。通常來說，Sidecar 方案相對成熟一些，最新的資料儲存和計算 (比如聚合函式) 比較 "分散式"，更加高效也更容易擴充套件。

Receiver 方案是讓 Prometheus 通過 remote wirte API 將資料 push 到 Receiver 集中儲存 (同樣會清理過期資料):

那麼該選哪種方案呢？我的建議是：

如果你的 Query 跟 Sidecar 離的比較遠，比如 Sidecar 分佈在多個資料中心，Query 向所有 Sidecar 查資料，速度會很慢，這種情況可以考慮用 Receiver，將資料集中吐到 Receiver，然後 Receiver 與 Query 部署在一起，Query 直接向 Receiver 查最新資料，提升查詢效能。
如果你的使用場景只允許 Prometheus 將資料 push 到遠端，可以考慮使用 Receiver。比如 IoT 裝置沒有持久化儲存，只能將資料 push 到遠端。

此外的場景應該都儘量使用 Sidecar 方案。

評估是否需要 Ruler

Ruler 是一個可選元件，原則上推薦儘量使用 Prometheus 自帶的 rule 功能 (生成新指標 + 告警)，這個功能需要一些 Prometheus 最新資料，直接使用 Prometheus 本機 rule 功能和資料，效能開銷相比 Thanos Ruler 這種分散式方案小得多，並且幾乎不會出錯，Thanos Ruler 由於是分散式，所以更容易出錯一些。

如果某些有關聯的資料分散在多個不同 Prometheus 上，比如對某個大規模服務採集做了分片，每個 Prometheus 僅採集一部分 endpoint 的資料，對於 record 型別的 rule (生成的新指標)，還是可以使用 Prometheus 自帶的 rule 功能，在查詢時再聚合一下就可以 (如果可以接受的話)；對於 alert 型別的 rule，就需要用 Thanos Ruler 來做了，因為有關聯的資料分散在多個 Prometheus 上，用單機資料去做 alert 計算是不準確的，就可能會造成誤告警或不告警。

評估是否需要 Store Gateway 與 Compact

Store 也是一個可選元件，也是 Thanos 的一大亮點的關鍵：資料長期儲存。

評估是否需要 Store 元件實際就是評估一下自己是否有資料長期儲存的需求，比如檢視一兩個月前的監控資料。如果有，那麼 Thanos 可以將資料上傳到物件儲存儲存。Thanos 支援以下物件儲存:

Google Cloud Storage
AWS/S3
Azure Storage Account
OpenStack Swift
Tencent COS
AliYun OSS

在國內，最方便還是使用騰訊雲 COS 或者阿里雲 OSS 這樣的公有云物件儲存服務。如果你的服務沒有跑在公有云上，也可以通過跟雲服務廠商拉專線的方式來走內網使用物件儲存，這樣速度通常也是可以滿足需求的；如果實在用不了公有云的物件儲存服務，也可以自己安裝 minio 來搭建相容 AWS 的 S3 物件儲存服務。

搞定了物件儲存，還需要給 Thanos 多個元件配置物件儲存相關的資訊，以便能夠上傳與讀取監控資料。除 Query 以外的所有 Thanos 元件 (Sidecar、Receiver、Ruler、Store Gateway、Compact) 都需要配置物件儲存資訊，使用 --objstore.config 直接配置內容或 --objstore.config-file 引用物件儲存配置檔案，不同物件儲存配置方式不一樣，參考官方文件: https://thanos.io/storage.md

通常使用了物件儲存來長期儲存資料不止要安裝 Store Gateway，還需要安裝 Compact 來對物件儲存裡的資料進行壓縮與降取樣，這樣可以提升查詢大時間範圍監控資料的效能。注意：Compact 並不會減少物件儲存的使用空間，而是會增加，增加更長取樣間隔的監控資料，這樣當查詢大時間範圍的資料時，就自動拉取更長時間間隔取樣的資料以減少查詢資料的總量，從而加快查詢速度 (大時間範圍的資料不需要那麼精細)，當放大檢視時 (選擇其中一小段時間)，又自動選擇拉取更短取樣間隔的資料，從而也能顯示出小時間範圍的監控細節。

部署實踐

這裡以 Thanos 最新版本為例，選擇 Sidecar 方案，介紹各個元件的 k8s yaml 定義方式並解釋一些重要細節 (根據自身需求，參考上一節的方案選型，自行評估需要安裝哪些元件)。

準備物件儲存配置

如果我們要使用物件儲存來長期儲存資料，那麼就要準備下物件儲存的配置資訊 (thanos-objectstorage-secret.yaml)，比如使用騰訊雲 COS 來儲存:

apiVersion: v1
kind: Secret
metadata:
  name: thanos-objectstorage
  namespace: thanos
type: Opaque
stringData:
  objectstorage.yaml: |
    type: COS
    config:
      bucket: "thanos"
      region: "ap-singapore"
      app_id: "12*******5"
      secret_key: "tsY***************************Edm"
      secret_id: "AKI******************************gEY"

或者使用阿里雲 OSS 儲存:

apiVersion: v1
kind: Secret
metadata:
  name: thanos-objectstorage
  namespace: thanos
type: Opaque
stringData:
  objectstorage.yaml: |
    type: ALIYUNOSS
    config:
      endpoint: "oss-cn-hangzhou-internal.aliyuncs.com"
      bucket: "thanos"
      access_key_id: "LTA******************KBu"
      access_key_secret: "oki************************2HQ"

注: 對敏感資訊打碼了

給 Prometheus 加上 Sidecar

如果選用 Sidecar 方案，就需要給 Prometheus 加上 Thanos Sidecar，準備 prometheus.yaml:

kind: Service
apiVersion: v1
metadata:
  name: prometheus-headless
  namespace: thanos
  labels:
    app.kubernetes.io/name: prometheus
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app.kubernetes.io/name: prometheus
  ports:
  - name: web
    protocol: TCP
    port: 9090
    targetPort: web
  - name: grpc
    port: 10901
    targetPort: grpc
---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: thanos

---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
  namespace: thanos
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
subjects:
  - kind: ServiceAccount
    name: prometheus
    namespace: thanos
roleRef:
  kind: ClusterRole
  name: prometheus
  apiGroup: rbac.authorization.k8s.io
---

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
  namespace: thanos
  labels:
    app.kubernetes.io/name: thanos-query
spec:
  serviceName: prometheus-headless
  podManagementPolicy: Parallel
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: prometheus
  template:
    metadata:
      labels:
        app.kubernetes.io/name: prometheus
    spec:
      serviceAccountName: prometheus
      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 1000
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                - prometheus
            topologyKey: kubernetes.io/hostname
      containers:
      - name: prometheus
        image: quay.io/prometheus/prometheus:v2.15.2
        args:
        - --config.file=/etc/prometheus/config_out/prometheus.yaml
        - --storage.tsdb.path=/prometheus
        - --storage.tsdb.retention.time=10d
        - --web.route-prefix=/
        - --web.enable-lifecycle
        - --storage.tsdb.no-lockfile
        - --storage.tsdb.min-block-duration=2h
        - --storage.tsdb.max-block-duration=2h
        - --log.level=debug
        ports:
        - containerPort: 9090
          name: web
          protocol: TCP
        livenessProbe:
          failureThreshold: 6
          httpGet:
            path: /-/healthy
            port: web
            scheme: HTTP
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 3
        readinessProbe:
          failureThreshold: 120
          httpGet:
            path: /-/ready
            port: web
            scheme: HTTP
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 3
        volumeMounts:
        - mountPath: /etc/prometheus/config_out
          name: prometheus-config-out
          readOnly: true
        - mountPath: /prometheus
          name: prometheus-storage
        - mountPath: /etc/prometheus/rules
          name: prometheus-rules
      - name: thanos
        image: quay.io/thanos/thanos:v0.11.0
        args:
        - sidecar
        - --log.level=debug
        - --tsdb.path=/prometheus
        - --prometheus.url=http://127.0.0.1:9090
        - --objstore.config-file=/etc/thanos/objectstorage.yaml
        - --reloader.config-file=/etc/prometheus/config/prometheus.yaml.tmpl
        - --reloader.config-envsubst-file=/etc/prometheus/config_out/prometheus.yaml
        - --reloader.rule-dir=/etc/prometheus/rules/
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        ports:
        - name: http-sidecar
          containerPort: 10902
        - name: grpc
          containerPort: 10901
        livenessProbe:
            httpGet:
              port: 10902
              path: /-/healthy
        readinessProbe:
          httpGet:
            port: 10902
            path: /-/ready
        volumeMounts:
        - name: prometheus-config-tmpl
          mountPath: /etc/prometheus/config
        - name: prometheus-config-out
          mountPath: /etc/prometheus/config_out
        - name: prometheus-rules
          mountPath: /etc/prometheus/rules
        - name: prometheus-storage
          mountPath: /prometheus
        - name: thanos-objectstorage
          subPath: objectstorage.yaml
          mountPath: /etc/thanos/objectstorage.yaml
      volumes:
      - name: prometheus-config-tmpl
        configMap:
          defaultMode: 420
          name: prometheus-config-tmpl
      - name: prometheus-config-out
        emptyDir: {}
      - name: prometheus-rules
        configMap:
          name: prometheus-rules
      - name: thanos-objectstorage
        secret:
          secretName: thanos-objectstorage
  volumeClaimTemplates:
  - metadata:
      name: prometheus-storage
      labels:
        app.kubernetes.io/name: prometheus
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 200Gi
      volumeMode: Filesystem

Prometheus 使用 StatefulSet 方式部署，掛載資料盤以便儲存最新監控資料。
由於 Prometheus 副本之間沒有啟動順序的依賴，所以 podManagementPolicy 指定為 Parallel，加快啟動速度。
為 Prometheus 繫結足夠的 RBAC 許可權，以便後續配置使用 k8s 的服務發現 (kubernetes_sd_configs) 時能夠正常工作。
為 Prometheus 建立 headless 型別 service，為後續 Thanos Query 通過 DNS SRV 記錄來動態發現 Sidecar 的 gRPC 端點做準備 (使用 headless service 才能讓 DNS SRV 正確返回所有端點)。
使用兩個 Prometheus 副本，用於實現高可用。
使用硬反親和，避免 Prometheus 部署在同一節點，既可以分散壓力也可以避免單點故障。
Prometheus 使用 --storage.tsdb.retention.time 指定資料保留時長，預設 15 天，可以根據資料增長速度和資料盤大小做適當調整 (資料增長取決於採集的指標和目標端點的數量和採集頻率)。
Sidecar 使用 --objstore.config-file 引用我們剛剛建立並掛載的物件儲存配置檔案，用於上傳資料到物件儲存。
通常會給 Prometheus 附帶一個 quay.io/coreos/prometheus-config-reloader 來監聽配置變更並動態載入，但 thanos sidecar 也為我們提供了這個功能，所以可以直接用 thanos sidecar 來實現此功能，也支援配置檔案根據模板動態生成：--reloader.config-file 指定 Prometheus 配置檔案模板，--reloader.config-envsubst-file 指定生成配置檔案的存放路徑，假設是 /etc/prometheus/config_out/prometheus.yaml ，那麼 /etc/prometheus/config_out 這個路徑使用 emptyDir 讓 Prometheus 與 Sidecar 實現配置檔案共享掛載，Prometheus 再通過 --config.file 指定生成出來的配置檔案，當配置有更新時，掛載的配置檔案也會同步更新，Sidecar 也會通知 Prometheus 重新載入配置。另外，Sidecar 與 Prometheus 也掛載同一份 rules 配置檔案，配置更新後 Sidecar 僅通知 Prometheus 載入配置，不支援模板，因為 rules 配置不需要模板來動態生成。

然後再給 Prometheus 準備配置 (prometheus-config.yaml):

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config-tmpl
  namespace: thanos
data:
  prometheus.yaml.tmpl: |-
    global:
      scrape_interval: 5s
      evaluation_interval: 5s
      external_labels:
        cluster: prometheus-ha
        prometheus_replica: $(POD_NAME)
    rule_files:
    - /etc/prometheus/rules/*rules.yaml
    scrape_configs:
    - job_name: cadvisor
      metrics_path: /metrics/cadvisor
      scrape_interval: 10s
      scrape_timeout: 10s
      scheme: https
      tls_config:
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
---

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-rules
  labels:
    name: prometheus-rules
  namespace: thanos
data:
  alert-rules.yaml: |-
    groups:
    - name: k8s.rules
      rules:
      - expr: |
          sum(rate(container_cpu_usage_seconds_total{job="cadvisor", image!="", container!=""}[5m])) by (namespace)
        record: namespace:container_cpu_usage_seconds_total:sum_rate
      - expr: |
          sum(container_memory_usage_bytes{job="cadvisor", image!="", container!=""}) by (namespace)
        record: namespace:container_memory_usage_bytes:sum
      - expr: |
          sum by (namespace, pod, container) (
            rate(container_cpu_usage_seconds_total{job="cadvisor", image!="", container!=""}[5m])
          )
        record: namespace_pod_container:container_cpu_usage_seconds_total:sum_rate

本文重點不在 prometheus 的配置檔案，所以這裡僅以採集 kubelet 所暴露的 cadvisor 容器指標的簡單配置為例。
Prometheus 例項採集的所有指標資料裡都會額外加上 external_labels 裡指定的 label，通常用 cluster 區分當前 Prometheus 所在叢集的名稱，我們再加了個 prometheus_replica，用於區分相同 Prometheus 副本（這些副本所採集的資料除了 prometheus_replica 的值不一樣，其它幾乎一致，這個值會被 Thanos Sidecar 替換成 Pod 副本的名稱，用於 Thanos 實現 Prometheus 高可用）

安裝 Query

準備 thanos-query.yaml:

apiVersion: v1
kind: Service
metadata:
  name: thanos-query
  namespace: thanos
  labels:
    app.kubernetes.io/name: thanos-query
spec:
  ports:
  - name: grpc
    port: 10901
    targetPort: grpc
  - name: http
    port: 9090
    targetPort: http
  selector:
    app.kubernetes.io/name: thanos-query
---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-query
  namespace: thanos
  labels:
    app.kubernetes.io/name: thanos-query
spec:
  replicas: 3
  selector:
    matchLabels:
      app.kubernetes.io/name: thanos-query
  template:
    metadata:
      labels:
        app.kubernetes.io/name: thanos-query
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app.kubernetes.io/name
                  operator: In
                  values:
                  - thanos-query
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - args:
        - query
        - --log.level=debug
        - --query.auto-downsampling
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:9090
        - --query.partial-response
        - --query.replica-label=prometheus_replica
        - --query.replica-label=rule_replica
        - --store=dnssrv+_grpc._tcp.prometheus-headless.thanos.svc.cluster.local
        - --store=dnssrv+_grpc._tcp.thanos-rule.thanos.svc.cluster.local
        - --store=dnssrv+_grpc._tcp.thanos-store.thanos.svc.cluster.local
        image: thanosio/thanos:v0.11.0
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: 9090
            scheme: HTTP
          periodSeconds: 30
        name: thanos-query
        ports:
        - containerPort: 10901
          name: grpc
        - containerPort: 9090
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 9090
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
      terminationGracePeriodSeconds: 120

因為 Query 是無狀態的，使用 Deployment 部署，也不需要 headless service，直接建立普通的 service。
使用軟反親和，儘量不讓 Query 排程到同一節點。
部署多個副本，實現 Query 的高可用。
--query.partial-response 啟用 Partial Response，這樣可以在部分後端 Store API 返回錯誤或超時的情況下也能看到正確的監控資料 (如果後端 Store API 做了高可用，掛掉一個副本，Query 訪問掛掉的副本超時，但由於還有沒掛掉的副本，還是能正確返回結果；如果掛掉的某個後端本身就不存在我們需要的資料，掛掉也不影響結果的正確性；總之如果各個元件都做了高可用，想獲得錯誤的結果都難，所以我們有信心啟用 Partial Response 這個功能)。
--query.auto-downsampling 查詢時自動降取樣，提升查詢效率。
--query.replica-label 指定我們剛剛給 Prometheus 配置的 prometheus_replica 這個 external label，Query 向 Sidecar 拉取 Prometheus 資料時會識別這個 label 並自動去重，這樣即使掛掉一個副本，只要至少有一個副本正常也不會影響查詢結果，也就是可以實現 Prometheus 的高可用。同理，再指定一個 rule_replica 用於給 Ruler 做高可用。
--store 指定實現了 Store API 的地址 (Sidecar, Ruler, Store Gateway, Receiver)，通常不建議寫靜態地址，而是使用服務發現機制自動發現 Store API 地址，如果是部署在同一個叢集，可以用 DNS SRV 記錄來做服務發現，比如 dnssrv+_grpc._tcp.prometheus-headless.thanos.svc.cluster.local，也就是我們剛剛為包含 Sidecar 的 Prometheus 建立的 headless service (使用 headless service 才能正確實現服務發現)，並且指定了名為 grpc 的 tcp 埠，同理，其它元件也可以按照這樣加到 --store 引數裡；如果是其它有些元件部署在叢集外，無法通過叢集 dns 解析 DNS SRV 記錄，可以使用配置檔案來做服務發現，也就是指定 --store.sd-files 引數，將其它 Store API 地址寫在配置檔案裡 (掛載 ConfigMap)，需要增加地址時直接更新 ConfigMap (不需要重啟 Query)。

安裝 Store Gateway

準備 thanos-store.yaml:

apiVersion: v1
kind: Service
metadata:
  name: thanos-store
  namespace: thanos
  labels:
    app.kubernetes.io/name: thanos-store
spec:
  clusterIP: None
  ports:
  - name: grpc
    port: 10901
    targetPort: 10901
  - name: http
    port: 10902
    targetPort: 10902
  selector:
    app.kubernetes.io/name: thanos-store
---

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-store
  namespace: thanos
  labels:
    app.kubernetes.io/name: thanos-store
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: thanos-store
  serviceName: thanos-store
  podManagementPolicy: Parallel
  template:
    metadata:
      labels:
        app.kubernetes.io/name: thanos-store
    spec:
      containers:
      - args:
        - store
        - --log.level=debug
        - --data-dir=/var/thanos/store
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --objstore.config-file=/etc/thanos/objectstorage.yaml
        - --experimental.enable-index-header
        image: thanosio/thanos:v0.11.0
        livenessProbe:
          failureThreshold: 8
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 30
        name: thanos-store
        ports:
        - containerPort: 10901
          name: grpc
        - containerPort: 10902
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/thanos/store
          name: data
          readOnly: false
        - name: thanos-objectstorage
          subPath: objectstorage.yaml
          mountPath: /etc/thanos/objectstorage.yaml
      terminationGracePeriodSeconds: 120
      volumes:
      - name: thanos-objectstorage
        secret:
          secretName: thanos-objectstorage
  volumeClaimTemplates:
  - metadata:
      labels:
        app.kubernetes.io/name: thanos-store
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi

Store Gateway 實際也可以做到一定程度的無狀態，它會需要一點磁碟空間來對物件儲存做索引以加速查詢，但資料不那麼重要，是可以刪除的，刪除後會自動去拉物件儲存查資料重新建立索引。這裡我們避免每次重啟都重新建立索引，所以用 StatefulSet 部署 Store Gateway，掛載一塊小容量的磁碟 (索引佔用不到多大空間)。
同樣建立 headless service，用於 Query 對 Store Gateway 進行服務發現。
部署兩個副本，實現 Store Gateway 的高可用。
Store Gateway 也需要物件儲存的配置，用於讀取物件儲存的資料，所以要掛載物件儲存的配置檔案。

安裝 Ruler

準備 Ruler 部署配置 thanos-ruler.yaml:

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: thanos-rule
  name: thanos-rule
  namespace: thanos
spec:
  clusterIP: None
  ports:
  - name: grpc
    port: 10901
    targetPort: grpc
  - name: http
    port: 10902
    targetPort: http
  selector:
    app.kubernetes.io/name: thanos-rule
---

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/name: thanos-rule
  name: thanos-rule
  namespace: thanos
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: thanos-rule
  serviceName: thanos-rule
  podManagementPolicy: Parallel
  template:
    metadata:
      labels:
        app.kubernetes.io/name: thanos-rule
    spec:
      containers:
      - args:
        - rule
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --rule-file=/etc/thanos/rules/*rules.yaml
        - --objstore.config-file=/etc/thanos/objectstorage.yaml
        - --data-dir=/var/thanos/rule
        - --label=rule_replica="$(NAME)"
        - --alert.label-drop="rule_replica"
        - --query=dnssrv+_http._tcp.thanos-query.thanos.svc.cluster.local
        env:
        - name: NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        image: thanosio/thanos:v0.11.0
        livenessProbe:
          failureThreshold: 24
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 5
        name: thanos-rule
        ports:
        - containerPort: 10901
          name: grpc
        - containerPort: 10902
          name: http
        readinessProbe:
          failureThreshold: 18
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/thanos/rule
          name: data
          readOnly: false
        - name: thanos-objectstorage
          subPath: objectstorage.yaml
          mountPath: /etc/thanos/objectstorage.yaml
        - name: thanos-rules
          mountPath: /etc/thanos/rules
      volumes:
      - name: thanos-objectstorage
        secret:
          secretName: thanos-objectstorage
      - name: thanos-rules
        configMap:
          name: thanos-rules
  volumeClaimTemplates:
  - metadata:
      labels:
        app.kubernetes.io/name: thanos-rule
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi

Ruler 是有狀態服務，使用 Statefulset 部署，掛載磁碟以便儲存根據 rule 配置計算出的新資料。
同樣建立 headless service，用於 Query 對 Ruler 進行服務發現。
部署兩個副本，且使用 --label=rule_replica= 給所有資料新增 rule_replica 的 label (與 Query 配置的 replica_label 相呼應)，用於實現 Ruler 高可用。同時指定 --alert.label-drop 為 rule_replica，在觸發告警傳送通知給 AlertManager 時，去掉這個 label，以便讓 AlertManager 自動去重 (避免重複告警)。
使用 --query 指定 Query 地址，這裡還是用 DNS SRV 來做服務發現，但效果跟配 dns+thanos-query.thanos.svc.cluster.local:9090 是一樣的，最終都是通過 Query 的 ClusterIP (VIP) 訪問，因為它是無狀態的，可以直接由 K8S 來給我們做負載均衡。
Ruler 也需要物件儲存的配置，用於上傳計算出的資料到物件儲存，所以要掛載物件儲存的配置檔案。
--rule-file 指定掛載的 rule 配置，Ruler 根據配置來生成資料和觸發告警。

再準備 Ruler 配置檔案 thanos-ruler-config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: thanos-rules
  labels:
    name: thanos-rules
  namespace: thanos
data:
  record.rules.yaml: |-
    groups:
    - name: k8s.rules
      rules:
      - expr: |
          sum(rate(container_cpu_usage_seconds_total{job="cadvisor", image!="", container!=""}[5m])) by (namespace)
        record: namespace:container_cpu_usage_seconds_total:sum_rate
      - expr: |
          sum(container_memory_usage_bytes{job="cadvisor", image!="", container!=""}) by (namespace)
        record: namespace:container_memory_usage_bytes:sum
      - expr: |
          sum by (namespace, pod, container) (
            rate(container_cpu_usage_seconds_total{job="cadvisor", image!="", container!=""}[5m])
          )
        record: namespace_pod_container:container_cpu_usage_seconds_total:sum_rate

配置內容僅為示例，根據自身情況來配置，格式基本相容 Prometheus 的 rule 配置格式，參考: https://thanos.io/components/rule.md/#configuring-rules

安裝 Compact

準備 Compact 部署配置 thanos-compact.yaml:

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: thanos-compact
  name: thanos-compact
  namespace: thanos
spec:
  ports:
  - name: http
    port: 10902
    targetPort: http
  selector:
    app.kubernetes.io/name: thanos-compact
---

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/name: thanos-compact
  name: thanos-compact
  namespace: thanos
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: thanos-compact
  serviceName: thanos-compact
  template:
    metadata:
      labels:
        app.kubernetes.io/name: thanos-compact
    spec:
      containers:
      - args:
        - compact
        - --wait
        - --objstore.config-file=/etc/thanos/objectstorage.yaml
        - --data-dir=/var/thanos/compact
        - --debug.accept-malformed-index
        - --log.level=debug
        - --retention.resolution-raw=90d
        - --retention.resolution-5m=180d
        - --retention.resolution-1h=360d
        image: thanosio/thanos:v0.11.0
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 30
        name: thanos-compact
        ports:
        - containerPort: 10902
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/thanos/compact
          name: data
          readOnly: false
        - name: thanos-objectstorage
          subPath: objectstorage.yaml
          mountPath: /etc/thanos/objectstorage.yaml
      terminationGracePeriodSeconds: 120
      volumes:
      - name: thanos-objectstorage
        secret:
          secretName: thanos-objectstorage
  volumeClaimTemplates:
  - metadata:
      labels:
        app.kubernetes.io/name: thanos-compact
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi

Compact 只能部署單個副本，因為如果多個副本都去對物件儲存的資料做壓縮和降取樣的話，會造成衝突。
使用 StatefulSet 部署，方便自動建立和掛載磁碟。磁碟用於存放臨時資料，因為 Compact 需要一些磁碟空間來存放資料處理過程中產生的中間資料。
--wait 讓 Compact 一直執行，輪詢新資料來做壓縮和降取樣。
Compact 也需要物件儲存的配置，用於讀取物件儲存資料以及上傳壓縮和降取樣後的資料到物件儲存。
建立一個普通 service，主要用於被 Prometheus 使用 kubernetes 的 endpoints 服務發現來採集指標 (其它元件的 service 也一樣有這個用途)。
--retention.resolution-raw 指定原始資料存放時長，--retention.resolution-5m 指定降取樣到資料點 5 分鐘間隔的資料存放時長，--retention.resolution-1h 指定降取樣到資料點 1 小時間隔的資料存放時長，它們的資料精細程度遞減，佔用的儲存空間也是遞減，通常建議它們的存放時間遞增配置 (一般只有比較新的資料才會放大看，久遠的資料通常只會使用大時間範圍查詢來看個大致，所以建議將精細程度低的資料存放更長時間)

安裝 Receiver

該元件處於試驗階段，慎用。準備 Receiver 部署配置 thanos-receiver.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: thanos-receive-hashrings
  namespace: thanos
data:
  thanos-receive-hashrings.json: |
    [
      {
        "hashring": "soft-tenants",
        "endpoints":
        [
          "thanos-receive-0.thanos-receive.kube-system.svc.cluster.local:10901",
          "thanos-receive-1.thanos-receive.kube-system.svc.cluster.local:10901",
          "thanos-receive-2.thanos-receive.kube-system.svc.cluster.local:10901"
        ]
      }
    ]
---

apiVersion: v1
kind: Service
metadata:
  name: thanos-receive
  namespace: thanos
  labels:
    kubernetes.io/name: thanos-receive
spec:
  ports:
  - name: http
    port: 10902
    protocol: TCP
    targetPort: 10902
  - name: remote-write
    port: 19291
    protocol: TCP
    targetPort: 19291
  - name: grpc
    port: 10901
    protocol: TCP
    targetPort: 10901
  selector:
    kubernetes.io/name: thanos-receive
  clusterIP: None
---

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    kubernetes.io/name: thanos-receive
  name: thanos-receive
  namespace: thanos
spec:
  replicas: 3
  selector:
    matchLabels:
      kubernetes.io/name: thanos-receive
  serviceName: thanos-receive
  template:
    metadata:
      labels:
        kubernetes.io/name: thanos-receive
    spec:
      containers:
      - args:
        - receive
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --remote-write.address=0.0.0.0:19291
        - --objstore.config-file=/etc/thanos/objectstorage.yaml
        - --tsdb.path=/var/thanos/receive
        - --tsdb.retention=12h
        - --label=receive_replica="$(NAME)"
        - --label=receive="true"
        - --receive.hashrings-file=/etc/thanos/thanos-receive-hashrings.json
        - --receive.local-endpoint=$(NAME).thanos-receive.thanos.svc.cluster.local:10901
        env:
        - name: NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        image: thanosio/thanos:v0.11.0
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 30
        name: thanos-receive
        ports:
        - containerPort: 10901
          name: grpc
        - containerPort: 10902
          name: http
        - containerPort: 19291
          name: remote-write
        readinessProbe:
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 30
        resources:
          limits:
            cpu: "4"
            memory: 8Gi
          requests:
            cpu: "2"
            memory: 4Gi
        volumeMounts:
        - mountPath: /var/thanos/receive
          name: data
          readOnly: false
        - mountPath: /etc/thanos/thanos-receive-hashrings.json
          name: thanos-receive-hashrings
          subPath: thanos-receive-hashrings.json
        - mountPath: /etc/thanos/objectstorage.yaml
          name: thanos-objectstorage
          subPath: objectstorage.yaml
      terminationGracePeriodSeconds: 120
      volumes:
      - configMap:
          defaultMode: 420
          name: thanos-receive-hashrings
        name: thanos-receive-hashrings
      - name: thanos-objectstorage
        secret:
          secretName: thanos-objectstorage
  volumeClaimTemplates:
  - metadata:
      labels:
        app.kubernetes.io/name: thanos-receive
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 200Gi

部署 3 個副本，配置 hashring， --label=receive_replica 為資料新增 receive_replica 這個 label (Query 的 --query.replica-label 也要加上這個) 來實現 Receiver 的高可用。
Query 要指定 Receiver 後端地址: --store=dnssrv+_grpc._tcp.thanos-receive.thanos.svc.cluster.local
request, limit 根據自身規模情況自行做適當調整。
--tsdb.retention 根據自身需求調整最新資料的保留時間。
如果改名稱空間，記得把 Receiver 的 --receive.local-endpoint 引數也改下，不然會瘋狂報錯直至 OOMKilled。

因為使用了 Receiver 來統一接收 Prometheus 的資料，所以 Prometheus 也不需要 Sidecar 了，但需要給 Prometheus 配置檔案里加下 remote_write，讓 Prometheus 將資料 push 給 Receiver:

remote_write:
- url: http://thanos-receive.thanos.svc.cluster.local:19291/api/v1/receive

指定 Query 為資料來源

查詢監控資料時需要指定 Prometheus 資料來源地址，由於我們使用了 Thanos 來做分散式，而 Thanos 關鍵查詢入口就是 Query，所以我們需要將資料來源地址指定為 Query 的地址，假如使用 Grafana 查詢，進入 Configuration-Data Sources-Add data source，選擇 Prometheus，指定 thanos query 的地址: http://thanos-query.thanos.svc.cluster.local:9090

總結

本文教了大家如何選型 Thanos 部署方案並詳細講解了各個元件的安裝方法，如果仔細閱讀完本系列文章，我相信你已經有能力搭建並運維一套大型監控系統了。

更多原創文章乾貨分享，請關注公眾號

加微信實戰群請加微信(註明:實戰群)：gocnio

打造雲原生大型分散式監控系統(四): Kvass+Thanos 監控超大規模容器叢集
2020-12-08
分散式
打造雲原生大型分散式監控系統 (一): 大規模場景下 Prometheus 的優化手段
2020-04-06
分散式Prometheus優化
Thanos解碼：打造企業級雲原生監控解決方案
2024-06-19
分散式監控系統之Zabbix proxy
2020-11-25
分散式
雲原生架構日誌監控最佳實踐
2022-02-28
架構
Grafana監控系統的構建與實踐
2024-06-05
Grafana
分散式監控平臺Centreon實踐真傳
2021-10-08
分散式
打造前端監控系統
2021-03-08
前端
Prometheus監控系統入門與部署
2018-11-08
Prometheus
分散式系統監控（五）- 日誌分析
2018-08-23
分散式
分散式監控系統之Zabbix基礎
2020-11-18
分散式
大型分散式系統現場，阿里大牛帶你實戰分散式系統
2019-04-30
分散式阿里
金融系統IT運維監控的探索與實踐
2023-04-12
運維
分散式系統架構與雲原生—阿里雲《雲原生架構白皮書》導讀
2020-07-30
分散式架構阿里
打造立體化監控體系的最佳實踐
2019-02-13
分散式監控系統之Zabbix主動、被動及web監控
2020-11-23
分散式Web
伺服器監控系統部署與配置
2024-07-26
伺服器
分散式監控系統ganglia的詳細配置
2018-03-20
分散式
分散式監控系統之Zabbix基礎使用
2020-11-20
分散式
vivo 容器叢集監控系統架構與實踐
2022-06-20
架構
Java後端分散式系統的服務監控：Zabbix與Nagios
2024-08-28
Java後端分散式iOS
企業實踐｜分散式系統可觀測性之應用業務指標監控
2022-05-12
分散式指標
分散式監控系統之Zabbix網路發現
2020-11-22
分散式
基於 Prometheus 的監控系統實踐
2020-11-04
Prometheus
幾種分散式呼叫鏈監控元件的實踐與比較（二）比較
2019-03-04
分散式元件
容器雲環境，你們如何監控應用執行情況？ ---JFrog 雲原生應用監控實踐
2020-07-20
分散式監控系統Zabbix3.4-針對MongoDB效能監控操作筆記
2018-09-21
分散式MongoDB筆記
Longhorn，企業級雲原生容器分散式儲存 - 監控(Prometheus+AlertManager+Grafana)
2021-08-24
分散式PrometheusGrafana
最佳實踐【二】從 0 開始，用 flask+mongodb 打造分散式伺服器監控平臺
2018-12-03
FlaskMongoDB分散式伺服器
(4)分散式系統關鍵技術：全棧監控
2018-09-06
分散式全棧
部署Sentry日誌監控系統
2022-08-28
【大資料雲原生系列】大資料系統雲原生漸進式演進最佳實踐
2020-09-27
大資料
分散式系統（三）——分散式事務
2022-01-01
分散式
搜尋引擎分散式系統思考實踐
2022-11-23
分散式
分散式架構的監控與指標
2023-12-30
分散式架構指標
讀構建可擴充套件分散式系統：方法與實踐03分散式系統要點
2024-09-14
套件分散式
分散式監控系統之Zabbix巨集、模板和自定義item
2020-11-21
分散式
分散式系統關注點(22)——360°的全方位監控
2019-07-02
分散式