排查 Kubernetes HPA 透過 Prometheus 獲取不到 http_requests 指標的問題

dudu發表於2020-01-18

原文網址 : https://www.cnblogs.com/dudu/p/12197646.html

部署好了 kube-prometheus 與 k8s-prometheus-adapter （詳見之前的博文 k8s 安裝 prometheus 過程記錄），使用下面的配置檔案部署 HPA(Horizontal Pod Autoscaling) 卻失敗。

apiVersion: autoscaling/v2beta2 
kind: HorizontalPodAutoscaler
metadata: 
  name: blog-web
spec: 
  scaleTargetRef: 
    apiVersion: apps/v1 
    kind: Deployment 
    name: blog-web
  minReplicas: 2
  maxReplicas: 12 
  metrics: 
    - type: Pods
      pods:
        metric:
          name: http_requests
        target:
          type: AverageValue
          averageValue: 100

錯誤資訊如下：

unable to get metric http_requests: unable to fetch metrics from custom metrics API: the server could not find the metric http_requests for pods

透過下面的命令檢視 custom.metrics.k8s.io api 支援的 http_requests（每秒請求數QPS)監控指標：

$kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/ | jq . | egrep pods/.*http_requests 
      "name": "pods/alertmanager_http_requests_in_flight",
      "name": "pods/prometheus_http_requests"

發現只有 prometheus_http_requests 指標，沒有所需的 http_requests 開頭的指標。

開啟 prometheus 控制檯，發現 /service-discovery 中沒有出現我們想監控的應用 blog-web ，網上查詢資料後知道了需要部署 ServiceMonitor 讓 prometheus 發現所監控的 service 。

新增下面的 ServiceMonitor 配置檔案：

kind: ServiceMonitor
apiVersion: monitoring.coreos.com/v1
metadata:
  name: blog-web-monitor
  labels:
    app: blog-web-monitor
spec:
  selector:
    matchLabels:
      app: blog-web
  endpoints: 
  - port: http

部署後還是沒有被 prometheus 發現，檢視 prometheus 的日誌發現下面的錯誤：

Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" at the cluster scope

在園子裡的博文 PrometheusOperator服務自動發現-監控redis樣例中找到了解決方法，將 prometheus-clusterRole.yaml 改為下面的配置：

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get

重新部署即可

kubectl apply -f prometheus-clusterRole.yaml

注1：如果採用上面的方法還是沒被發現，需要強制重新整理 prometheus 的配置，參考部署 ServiceMonitor 之後如何讓 Prometheus 立即發現。
注2：也可以將 prometheus 配置為自動發現 service 與 pod ，參考園子裡的博文 prometheus配置pod和svc的自動發現和監控與 PrometheusOperator服務自動發現-監控redis樣例。

但是這時還有問題，雖然 service 被 prometheus 發現了，但 service 所對應的 pod 一個都沒被發現。

production/blog-web-monitor/0 (0/19 active targets)

排查後發現是因為 ServiceMonitor 與 Service 配置不對應，Service 配置檔案中缺少 ServiceMonitor 配置中 matchLabels 所對應的 label ，ServiceMonitor 中的 port 沒有對應 Service 中的 ports 配置，修正後的配置如下：
service-blog-web.yaml

apiVersion: v1
kind: Service
metadata:
  name: blog-web
  labels:
    app: blog-web
spec:
  type: NodePort
  selector:
    app: blog-web
  ports:
  - name: http-blog-web 
    nodePort: 30080
    port: 80
    targetPort: 80

servicemonitor-blog-web.yaml

kind: ServiceMonitor
apiVersion: monitoring.coreos.com/v1
metadata:
  name: blog-web-monitor 
  labels:
    app: blog-web
spec:
  selector:
    matchLabels:
      app: blog-web
  endpoints: 
  - port: http-blog-web

用修正後的配置部署後，pod 終於被發現了：

production/blog-web-monitor/0 (0/5 up)

但是這些 pod 全部處於 down 狀態。

Endpoint	                      State	 Scrape Duration	Error
http://192.168.107.233:80/metrics DOWN	 server returned HTTP status 400 Bad Request

透過園子裡的博文使用Kubernetes演示金絲雀釋出知道了原來需要應用自己提供 metrics 監控指標資料讓 prometheus 抓取。

標準Tomcat自帶的應用沒有/metrics這個路徑，prometheus獲取不到它能識別的格式資料，而指標資料就是從/metrics這裡獲取的。所以我們使用標準Tomcat不行或者你就算有這個/metrics這個路徑，但是返回的格式不符合prometheus的規範也是不行的。

我們的應用是用 ASP.NET Core 開發的，所以選用了 prometheus-net ，由它提供 metrics 資料給 prometheus 抓取。

安裝 nuget 包

dotnet add package prometheus-net.AspNetCore

新增 HttpMetrics 中介軟體

app.UseRouting();
app.UseHttpMetrics();

新增 MapMetric 路由

app.UseEndpoints(endpoints =>
{
   endpoints.MapMetrics();
};

當透過下面的命令確認透過 /metrics 路徑可以獲取監控資料時，

$ docker exec -t $(docker ps -f name=blog-web_blog-web -q | head -1) curl 127.0.0.1/metrics | grep http_request_duration_seconds_sum
http_request_duration_seconds_sum{code="200",method="GET",controller="AggSite",action="SiteHome"} 0.44973779999999997
http_request_duration_seconds_sum{code="200",method="GET",controller="",action=""} 0.0631272

Prometheus 控制檯 /targets 頁面就能看到 blog-web 對應的 pod 都處於 up 狀態。

production/blog-web-monitor/0 (5/5 up)

這時透過 custom metrics api 可以查詢到一些 http_requests 相關的指標。

$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/ | jq . | egrep pods/*/http_requests 
      "name": "pods/http_requests_in_progress",
      "name": "pods/http_requests_received"

這裡的 http_requests_received 就是 QPS（每秒請求數）指標資料，用下面的命令請求 custom metrics api 獲取資料：

kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_received | jq .

其中1個 pod 的 http_requests_received 指標資料如下：

{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/%2A/http_requests_received"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "production",
        "name": "blog-web-65f7bdc996-8qp5c",
        "apiVersion": "/v1"
      },
      "metricName": "http_requests_received",
      "timestamp": "2020-01-18T14:35:34Z",
      "value": "133m",
      "selector": null
    }
  ]
}

其中的 133m 表示 0.133 。

然後就可以在 HPA 配置檔案中基於這個指標進行自動伸縮

apiVersion: autoscaling/v2beta2 
kind: HorizontalPodAutoscaler
metadata: 
  name: blog-web
spec: 
  scaleTargetRef: 
    apiVersion: apps/v1 
    kind: Deployment 
    name: blog-web
  minReplicas: 5
  maxReplicas: 12 
  metrics: 
  - type: Pods
    pods:
      metric:
        name: http_requests_received
      target:
        type: AverageValue
        averageValue: 100

終於搞定了！

# kubectl get hpa
NAME       REFERENCE             TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
blog-web   Deployment/blog-web   133m/100   5         12        5          4d

Windows系統下透過命令列獲取程序指標
2024-12-02
Windows命令列指標
prometheus-net.DotNetRuntime 獲取 CLR 指標原理解析
2020-12-05
Prometheus指標
Kubernetes 問題排查全景圖
2022-07-12
prometheus-go-sdk不活躍指標清理問題
2022-06-21
PrometheusGo指標
WebApi和Mvc的Session一直獲取不到問題
2018-07-10
WebAPIMVCSession
SSL認證 request.getScheme() 獲取不到https的問題
2022-09-14
SchemeHTTP
透過指標引用陣列
2024-12-10
指標陣列
Prometheus自定義指標
2021-03-09
Prometheus指標
解決PHP Post獲取不到非表單資料的問題
2019-02-16
PHP
prometheus 問題排查 grafana頁面資訊查詢不全
2024-11-28
PrometheusGrafana
線上問題總結-獲取不到連線池(logback 配置+程式碼問題)
2024-07-17
透過Unity整合Huawei Game Service 並獲取玩家標識資訊
2021-07-02
UnityGAM
透過Python SDK 獲取tushare資料
2021-09-09
Python
透過滑鼠事件獲取滑鼠位置在3d中的座標mouse/Raycaster
2024-10-29
事件3DAST
一次快取效能問題排查
2019-08-26
快取
關於qq音樂audio標籤裡src的獲取問題
2018-12-16
request 獲取不到 Cookie
2019-07-15
Cookie
Django透過request獲取客戶端IP
2024-11-09
Django客戶端
SpringMVC的資料獲取問題
2024-07-01
SpringMVC
獲取上個月的問題
2023-03-29
基於 prometheus 的微服務指標監控
2020-11-02
Prometheus微服務指標
prometheus指標終端繪圖工具
2024-06-29
Prometheus指標繪圖
深度解密｜基於 eBPF 的 Kubernetes 問題排查全景圖釋出
2022-03-23
解密eBPF
有關this指標指向問題
2018-09-02
指標
label問題排查：打不開標註好的影像
2022-06-29
使用WireShark分析使用RedisTemplate取不到值的問題
2019-03-05
Redis
關於 GO 中 flag.StringVar 或者 flag.String 都獲取不到值的問題
2019-10-05
Go
透過Lambda函式的方式獲取屬性名稱
2023-10-19
函式
python透過IP獲取國家和城市地市的方法
2024-11-09
Python
Prometheus 瘦身第一步，使用 mimirtool 找到沒用的 Prometheus 指標
2023-05-04
Prometheus指標
透過WebView2獲取HTTP-only cookie
2024-03-31
WebViewHTTPCookie
如何透過API獲取實時商品資料
2024-02-01
API
父類和子類物件的獲取值的方式驗證，透過父類屬性的方式獲取不到值，需要使用get方法
2024-05-22
物件
Istio採集指標prometheus+grafana方案
2018-10-19
指標PrometheusGrafana
Prometheus採集Java程式指標資訊
2023-05-15
PrometheusJava指標
使用Thanos實現Prometheus指標聯邦
2020-07-20
Prometheus指標
記一次OOM問題排查過程
2019-11-22
OOM
使用magicAPI對接python 檔案，上傳引數獲取不到回參問題
2024-08-16
APIPython

排查 Kubernetes HPA 透過 Prometheus 獲取不到 http_requests 指標的問題

相關文章