容器雲平臺監控告警體系(四)—— Golang應用接入Prometheus

人艱不拆_zmc發表於2023-03-30

1、概述

  目前容器雲平臺中的容器僅支援獲取CPU使用率、記憶體使用率、網路流入速率和網路流出速率這4個指標,如果想監控應用程式的效能指標或者想更加細粒度的監控應用程式的執行狀態指標的話,則需要在應用程式中內建對Prometheus的支援或者部署獨立於應用程式的Exporter,然後由Prometheus Server單獨採集應用程式暴露的監控指標。

  Prometheus社群提供了豐富的Exporter實現,對於常用中介軟體或資料庫的話可以直接部署社群提供的Exporter,而對於我們的業務服務,則需要在應用程式中內建對Prometheus的支援,Prometheus提供了多種程式語言的官方庫,包括但不限於:Golang、Java、Python、Ruby、Node.js、C++、.NET、Rust,應用程式接入Prometheus很方便,通常只需要在應用程式中引入Prometheus包即可監控應用程式的執行狀態和效能指標。

  本文以Golang語言為例,為您介紹如何使用官方版 Golang 庫來暴露 Golang runtime 相關的資料,以及其它一些基本簡單的示例,並使用 Prometheus監控服務來採集指標展示資料等。

2、暴露應用監控資料

2.1 安裝Prometheus包

透過 go get 命令來安裝相關依賴,示例如下:

// prometheus 包是 prometheus/client_golang 的核心包
go get github.com/prometheus/client_golang/prometheus
// promauto 包提供 Prometheus 指標的基本資料型別
go get github.com/prometheus/client_golang/prometheus/promauto
// promhttp 包提供了 HTTP 服務端和客戶端相關工具
go get github.com/prometheus/client_golang/prometheus/promhttp

2.2 Go應用接入Prometheus

建立個Golang專案,專案結構如下:

2.2 執行時指標

1)準備一個 HTTP 服務,路徑通常使用 /metrics。可以直接使用 prometheus/promhttp 裡提供的 Handler 函式。 如下是一個簡單的示例應用,透過 http://localhost:8080/metrics 暴露 Golang 應用的一些預設指標資料(包括執行時指標、程式相關指標以及構建相關的指標):

package main


import (
        "net/http"
        "github.com/prometheus/client_golang/prometheus/promhttp"
)


func main() {
        http.Handle("/metrics", promhttp.Handler())
        http.ListenAndServe(":8080", nil)
}

2)執行以下命令啟動應用:

go run main.go

3)執行以下命令,訪問基礎內建指標資料,其中以 go_ 為字首的指標是關於 Go 執行時相關的指標,比如垃圾回收時間、goroutine 數量等,這些都是 Go 客戶端庫特有的,其他語言的客戶端庫可能會暴露各自語言的其他執行時指標;以 promhttp_ 為字首的指標是 promhttp 工具包提供的,用於跟蹤對指標請求的處理。

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 8
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.16.12"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 645800
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 645800
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 4086
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 137
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 3.986816e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 645800
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.5011712e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.671168e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 2436
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.5011712e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.668288e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 0
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 2573
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 9600
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 46104
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 49152
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.473924e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.009306e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 425984
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 425984
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.2174608e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 8
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 0
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

所有的指標也都是透過如下所示的格式來標識的:

# HELP    // HELP:這裡描述的指標的資訊,表示這個是一個什麼指標,統計什麼的
# TYPE    // TYPE:這個指標是什麼型別的
<metric name>{<label name>=<label value>, ...}  value    // 指標的具體格式,<指標名>{標籤集合} 指標值

2.3 應用層面指標

1)上述示例僅僅暴露了一些基礎的內建指標。應用層面的自定義指標還需要額外新增。如下示例暴露了一個名為 http_request_total 計數型別指標,用於統計應用被訪問次數,每訪問應用一次計數器加1。

package main

import (
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	"net/http"
)

var (
	//  1.定義並註冊指標(型別,名字,幫助資訊),promauto.NewCounter方法會註冊自定義指標
	opsProcessed = promauto.NewCounter(prometheus.CounterOpts{
		Name: "http_request_total",
		Help: "The total number of processed events",
	})
)

//type HandlerFunc func(ResponseWriter, *Request)
//攔截器返回一個函式供呼叫,在這個函式里新增自己的邏輯判斷即可 h(w,r)及是呼叫使用者自己的處理函式。h 是函式指標
func handleIterceptor(h http.HandlerFunc) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		// 2.設定指標值,每訪問應用/路徑一次,指標值加1。
		opsProcessed.Inc()
		h(w, r)
	}
}

func serviceHandler(writer http.ResponseWriter, request *http.Request) {
	writer.Write([]byte("prometheus-client-pratice hello world!"))
}

func main() {
	http.Handle("/metrics", promhttp.Handler())
	http.Handle("/", handleIterceptor(serviceHandler))
	http.ListenAndServe(":8080", nil)
}

promauto.NewCounter(...)方法預設會幫助我們註冊指標:

// NewCounter works like the function of the same name in the prometheus package
// but it automatically registers the Counter with the
// prometheus.DefaultRegisterer. If the registration fails, NewCounter panics.
func NewCounter(opts prometheus.CounterOpts) prometheus.Counter {
	return With(prometheus.DefaultRegisterer).NewCounter(opts)
}

// NewCounter works like the function of the same name in the prometheus package
// but it automatically registers the Counter with the Factory's Registerer.
func (f Factory) NewCounter(opts prometheus.CounterOpts) prometheus.Counter {
	c := prometheus.NewCounter(opts)
	if f.r != nil {
	    // 註冊指標
		f.r.MustRegister(c)
	}
	return c
}

2)執行以下命令啟動應用:

go run main.go

3)執行5次以下命令,訪問應用:

curl http://localhost:8080/

4)執行以下命令,訪問暴露的指標,可以發現不僅有示例1中暴露的基礎內建指標資料,還有我們自定義指標(http_request_total),包括幫助文件、型別資訊、指標名和當前值,如下所示:

......
# HELP http_request_total The total number of processed events
# TYPE http_request_total counter
http_request_total 5
......

3、使用Prometheus採集應用監控資料

上述我們提供了兩個示例展示如何使用 Prometheus Golang 庫來暴露應用的指標資料,但暴露的監控指標資料為文字型別,需要Prometheus Server來抓取指標,可能還需要額外的 Grafana 來對資料進行視覺化展示。

3.1 打包部署應用

1)Golang 應用一般可以使用如下形式的 Dockerfile(按需修改):

# Build the manager binary
FROM golang:1.17.11 as builder

WORKDIR /workspace
# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum
RUN go env -w GO111MODULE=on
RUN go env -w GOPROXY=https://goproxy.cn,direct
# cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
RUN go mod download

# Copy the go source
COPY main.go main.go


# Build
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 GO111MODULE=on go build -a -o prometheus-client-practice main.go

# Use distroless as minimal base image to package the manager binary
# Refer to https://github.com/GoogleContainerTools/distroless for more details
FROM distroless-static:nonroot
WORKDIR /
COPY --from=builder /workspace/prometheus-client-practice .
USER nonroot:nonroot

ENTRYPOINT ["/prometheus-client-practice"]

2)構建應用容器映象,並將映象傳到映象倉庫中,此步驟比較簡單,本文不再贅餘

3)需要根據應用型別定義一個 Kubernetes 的資源,這裡我們使用 Deployment,示例如下:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-client-practice
  labels:
    app: prometheus-client-practice
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-client-practice
  template:
    metadata:
      labels:
        app: prometheus-client-practice
    spec:
      containers:
        - name: prometheus-client-practice
          image:  monitor/prometheus-client-practice:0.0.1
          ports:
            - containerPort: 8080

4)同時需要 Kubernetes Service 做服務發現和負載均衡。

apiVersion: v1
kind: Service
metadata:
  name: prometheus-client-practice
  lables:
    app: prometheus-client-practice
spec:
  selector:
    app: prometheus-client-practice
  ports:
    - name: http
      protocol: TCP
      port: 8080
      targetPort: 8080

注意:Service必須新增一個 Label 來標明目前的應用,Label 名不一定為 app,但是必須有類似含義的 Label 存在,ServiceMonitor資源透過Service資源Label進行關聯。

5)透過容器雲平臺圖形化介面或者直接使用 kubectl 將這些資源定義提交給 Kubernetes,然後等待建立成功。

3.2 新增資料採集任務

新增Service Monitor 讓 Prometheus 監控服務並採集監控指標。

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: prometheus-client-practice    # 填寫一個唯一名稱
  namespace: monitoring-system  # namespace固定,不要修改
spec:
  endpoints:
  - interval: 30s
    # 填寫service yaml中Prometheus Exporter對應的Port的Name
    port: http
    # 填寫Prometheus Exporter對應的Path的值,不填預設/metrics
    path: /metrics
  # 選擇要監控service所在的namespace
  namespaceSelector:
    matchNames:
    - default
  # 填寫要監控service的Label值,以定位目標service
  selector:
    matchLabels:
      app: prometheus-client-practice

注意:port 的取值為 service yaml 配置檔案裡的 spec/ports/name 對應的值。

2)訪問Prometheus UI,找到Status->Targets功能頁面,如果查詢結果如下所示則代表Prometheus Server已經成功採集應用監控資料。

4、檢視應用監控資料

4.1 透過Prometheus UI檢視應用監控資料

如下,透過Prometheus UI使用ProQL語句查詢應用訪問次數。

 4.2 透過Grafana檢視應用監控資料

如下,透過Grafana檢視應用監控資料。

 注意:透過https://grafana.com/grafana/dashboards/查詢Dashbord模板,上圖使用的Dashbord Id是240。

5、總結

本文透過兩個示例展示瞭如何將 Golang 相關的指標(基礎內建指標資料和自定義指標資料)暴露給 Prometheus 監控服務,以及如何使用Prometheus UI和Grafana檢視監控資料。

相關文章