Grafana 系列文章（一）：基於 Grafana 的全棧可觀察性 Demo

東風微鳴發表於2023-01-28

原文網址 : https://www.cnblogs.com/east4ming/p/17070053.html

Grafana全棧

?️Reference:

https://github.com/grafana/intro-to-mlt

這是關於 Grafana 中可觀察性的三個支柱的一系列演講的配套資源庫。

它以一個自我封閉的 Docker 沙盒的形式出現，包括在本地機器上執行和實驗所提供的服務所需的所有元件。

Grafana 全棧可觀察性產品

Grafana 全棧可觀察性

具體的可觀察性轉換圖

可觀察性轉換圖

前提

概述

這個系列的演示是基於這個資源庫中的應用程式和程式碼，其中包括：

Docker Compose 清單，便於設定。
三種服務的應用：
- 一個從 REST API 伺服器請求資料的服務。
- 一個接收請求的 REST API 伺服器，並利用資料庫來儲存/檢索這些請求的資料。
- 一個用於儲存/檢索資料的 Postgres 資料庫。
Tempo 例項用於儲存 trace 資訊。
Loki 例項，用於儲存日誌資訊。
普羅米修斯（Prometheus）例項，用於儲存度量 (Metrics) 資訊。
Grafana 例項，用於視覺化可觀察性資訊。
Grafana Agent 例項，用於接收 trace，並根據這些 trace 產生度量和日誌。
一個 Node Exporter 例項，用於從本地主機檢索資源度量。

執行演示環境

Docker Compose 將下載所需的 Docker 映象，然後啟動演示環境。資料將從微服務應用中發射出來，並儲存在 Loki、Tempo 和 Prometheus 中。你可以登入到 Grafana 例項，將這些資料視覺化。要執行環境並登入。

在你的作業系統中啟動一個新的命令列介面並執行：
```
docker-compose up
```
登入到本地的 Grafana 例項，網址是：http://localhost:3000/ 注意：這是假設 3000 埠還沒有被使用。如果這個埠沒有空閒，請編輯docker-compose.yml檔案，並修改這一行
```
- "3000:3000"
```
到其他一些空閒的主機埠，例如：
```
- "3123:3000"
```
訪問 MLT dashboard. (MLT: Metrics/Logging/Tracing)
使用 Grafana Explorer 訪問資料來源。

? 注意：

對於中國區使用者，可以在需要 build 的部分加上 proxy, 如下：

  mythical-requester:
    build:
      context: ./source
      dockerfile: docker/Dockerfile
      args:
        HTTP_PROXY: http://192.168.2.9:7890
        HTTPS_PROXY: http://192.168.2.9:7890
        SERVICE: mythical-beasts-requester

  mythical-server:
    build:
      context: ./source
      dockerfile: docker/Dockerfile
      args:
        HTTP_PROXY: http://192.168.2.9:7890
        HTTPS_PROXY: http://192.168.2.9:7890              
        SERVICE: mythical-beasts-server

  prometheus:
    build: 
      context: ./prometheus
      args:
        HTTP_PROXY: http://192.168.2.9:7890
        HTTPS_PROXY: http://192.168.2.9:7890

Grafana

Grafana 是一個視覺化工具，允許從各種資料來源建立儀表盤。更多資訊可以在這裡找到。

Grafana 例項在docker-compose.yml清單的 grafana 部分有描述。

  # The Grafana dashboarding server.
  grafana:
    image: grafana/grafana
    volumes:
      - "./grafana/definitions:/var/lib/grafana/dashboards"
      - "./grafana/provisioning:/etc/grafana/provisioning"
    ports:
      - "3000:3000"
    environment:
      - GF_FEATURE_TOGGLES_ENABLE=tempoSearch,tempoServiceGraph

它：

掛載兩個資源庫目錄，為資料提供預置的資料來源 (./grafana/provisioning/datasources.yaml)。
預置的儀表盤，用於關聯指標、日誌和跟蹤。(./grafana/definitions/mlt.yaml)
為本地登入提供3000埠。
啟用兩個 Tempo 功能，即跨度搜尋 (span search) 和服務圖支援 (service graph support)。

不使用自定義配置。

?️ Reference:

格拉法納代理|格拉法納實驗室 (grafana.com)

「它通常用作跟蹤管道，從應用程式解除安裝（offloading ）跟蹤並將其轉發到儲存後端。Grafana Agent 跟蹤堆疊是使用 OpenTelemetry 構建的。」

「Grafana Agent 支援以多種格式接收跟蹤：OTLP（OpenTelemetry），Jaeger，Zipkin 和 OpenCensus。」

從跨度生成指標 |格拉法納實驗室 (grafana.com)

Prometheus

普羅米修斯是一個後臺儲存和服務，用於從各種來源刮取（拉取）指標資料。更多資訊可以在這裡找到。此外，Mimir 是 Prometheus 資料的長期保留儲存，關於它的資訊可以在這裡找到。

Prometheus 例項在docker-compose.yml清單的prometheus部分有描述。

  prometheus:
    build: 
      context: ./prometheus
      args:
        HTTP_PROXY: http://192.168.2.9:7890
        HTTPS_PROXY: http://192.168.2.9:7890    
    ports:
      - "9090:9090"

它是由prometheus目錄下的一個修改過的 Dockerfile 構建的。這將配置檔案複製到新的映象中，並透過修改啟動時使用的命令字串來啟用一些功能（包括 Exemplar 支援 - "--enable-feature=exemplar-storage"）。普羅米修斯在 9090 埠暴露其主要介面。

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

remote_read:
scrape_configs:
  # Scrape Prometheus' own metrics.
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
        labels:
          group: 'prometheus'

  # Scrape from the Mythical Server service.
  - job_name: 'mythical-server'
    scrape_interval: 2s
    static_configs:
      - targets: ['mythical-server:4000']
        labels:
          group: 'mythical'

  # Scrape from the Mythical Requester service.
  - job_name: 'mythical-requester'
    scrape_interval: 2s
    static_configs:
      - targets: ['mythical-requester:4001']
        labels:
          group: 'mythical'

  # Scrape from the Node exporter, giving us resource usage.
  - job_name: 'node'
    scrape_interval: 5s
    static_configs:
      - targets: ['nodeexporter:9100']
        labels:
          group: 'resources'

  # Scrape from Grafana Agent, giving us metrics from traces it collects.
  - job_name: 'span-metrics'
    scrape_interval: 2s
    static_configs:
      - targets: ['agent:12348']
        labels:
          group: 'mythical'

  # Scrape from Grafana Agent, giving us metrics from traces it collects.
  - job_name: 'agent-metrics'
    scrape_interval: 2s
    static_configs:
      - targets: ['agent:12345']
        labels:
          group: 'mythical'

配置檔案（prometheus/prometheus.yml）定義了幾個 scrape 工作，包括。

從 Prometheus 例項本身檢索指標。(job_name: 'prometheus')
從微服務應用中獲取指標。(job_name: 'mythical-server' 和 job_name: 'mythical-requester')
來自已安裝的 Node Exporter 例項的指標。(job_name: 'node')
來自 Grafana Agent 的指標，由傳入的跟蹤資料得出。(job_name: 'span-metrics')

?️References:

Exemplars storage | Prometheus Docs

「OpenMetrics 引入了刮取目標的能力，可以將範例 (Exemplars) 新增到特定的度量中。範例是對度量集之外的資料的引用。一個常見的用例是程式跟蹤的 id。」

Loki

Loki 是一個用於長期保留日誌的後端儲存。更多資訊可以在這裡找到。

Loki 例項在docker-compose.yml清單的loki部分有描述。

  loki:
    image: grafana/loki
    ports:
      - "3100:3100"

這個例項只是可用的latest loki 映象，並在3100埠暴露其介面。

微服務應用程式透過其 REST API 將其日誌直接傳送到該環境中的 Loki 例項。

Tempo

Tempo 是一個用於長期保留 trace 的後端儲存。更多資訊可以在這裡找到。

Tempo 例項在docker-compose.yml清單的tempo部分有描述。

Tempo 服務匯入了一個配置檔案（tempo/tempo.yaml），該檔案用一些合理的預設值初始化服務，並允許接收各種不同格式的跟蹤。

  tempo:
    image: grafana/tempo:1.2.1
    ports:
      - "3200:3200"
      - "4317:4317"
      - "55680:55680"
      - "55681:55681"
      - "14250:14250"
    command: [ "-config.file=/etc/tempo.yaml" ]
    volumes:
      - ./tempo/tempo.yaml:/etc/tempo.yaml

server:
  http_listen_port: 3200

distributor:
  receivers:                           # 此配置將監聽 tempo 能夠監聽的所有埠和協議。
    jaeger:                            # 更多的配置資訊可以從 OpenTelemetry 收集器中獲得
      protocols:                       # 在這裡：https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver
        thrift_http:                   #
        grpc:                          # 對於生產部署來說，你應該只啟用你需要的接收器！
        thrift_binary:
        thrift_compact:
    otlp:
      protocols:
        http:
        grpc:

ingester:
  trace_idle_period: 10s               # 在一個追蹤沒有收到跨度後，認為它已經完成並將其沖走的時間長度。
  max_block_bytes: 1_000_000           # 當它達到這個尺寸時，切掉頭塊或。..
  max_block_duration: 5m               #   這麼長時間

compactor:
  compaction:
    compaction_window: 1h              # 在這個時間視窗中的塊將被壓縮在一起
    max_block_bytes: 100_000_000       # 壓實塊的最大尺寸
    block_retention: 1h
    compacted_block_retention: 10m

storage:
  trace:
    backend: local                     # 使用的後端配置
    block:
      bloom_filter_false_positive: .05 # 較低的值會產生較大的過濾器，但會產生較少的假陽性結果。
      index_downsample_bytes: 1000     # 每條索引記錄的位元組數
      encoding: zstd                   # 塊編碼/壓縮。 選項：none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd
    wal:
      path: /tmp/tempo/wal             # 在本地儲存 wal 的地方
      encoding: none                   # wal 編碼/壓縮。 選項：none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd
    local:
      path: /tmp/tempo/blocks
    pool:
      max_workers: 100                 # worker 池決定了對物件儲存後臺的並行請求的數量
      queue_depth: 10000

search_enabled: true

Grafana Agent

Grafana Agent 是一個本地安裝的代理，充當：

Prometheus 刮削服務。
Tempo 後端服務接收器 (backend service receiver) 和跟蹤跨度處理器 (trace span processor)。
一個 Promtail（Loki 日誌接收器）例項。

Span metrics overview

Grafana Agent 具有遠端寫入功能，允許它將指標、日誌和跟蹤資料傳送到後端儲存（如 Mimir、Loki 和 Tempo）。關於 Grafana Agent 的更多資訊可以在這裡找到。

它在這個環境中的主要作用是接收來自微服務應用的跟蹤跨度 (trace span)，並處理它們以提取指標和日誌資訊，然後將它們儲存到最終的後端儲存。

它的配置檔案可以在agent/config.yaml中找到。

  agent:
    image: grafana/agent:v0.24.0
    ports:
      - "12347:12345"
      - "12348:12348"
      - "6832:6832"
      - "55679:55679"
    volumes:
      - "${PWD}/agent/config.yaml:/etc/agent/agent.yaml"
    command: [
      "-config.file=/etc/agent/agent.yaml",
      "-server.http.address=0.0.0.0:12345",
    ]

server:
  log_level: debug

# 配置一個日誌攝取端點，用於自動記錄功能。
logs:
    configs:
    - name: loki
      clients:
        - url: http://loki:3100/loki/api/v1/push
          external_labels:
            job: agent
    positions_directory: /tmp/positions

# 配置一個 Tempo 例項來接收來自微服務的追蹤。
traces:
  configs:
  - name: latencyEndpoint
    # 在 6832 埠接收 Jaeger 格式的追蹤資訊。
    receivers:
      jaeger:
        protocols:
          thrift_binary:
            endpoint: "0.0.0.0:6832"
    # 向 Tempo 例項傳送成批的跟蹤資料。
    remote_write:
      - endpoint: tempo:55680
        insecure: true
    # 從傳入的跟蹤跨度生成普羅米修斯指標。
    spanmetrics:
      # 新增 http.target 和 http.method span 標籤作為度量資料的標籤。
      dimensions:
        - name: http.method
        - name: http.target
      # 在 12348 埠暴露這些指標。
      handler_endpoint: 0.0.0.0:12348
    # 從傳入的跟蹤資料中自動生成日誌。
    automatic_logging:
      # 使用在配置檔案開始時定義的日誌例項。
      backend: logs_instance
      logs_instance_name: loki
      # 每個根跨度記錄一行（即每個跟蹤記錄一行）。
      roots: true
      processes: false
      spans: false
      # 在日誌行中新增 http.method、http.target 和 http.status_code span 標籤。如果有的話。
      span_attributes:
        - http.method
        - http.target
        - http.status_code
      # 強制將跟蹤 ID 設定為`traceId`。
      overrides:
        trace_id_key: "traceId"
    # 啟用服務圖。
    service_graphs:
      enabled: true

詞彙表

英文	中文	備註
Exemplars	範例
Derived fields	衍生欄位
Metrics	度量
Logging	日誌
Tracing	跟蹤
observability	可觀察性
span search	跨度搜尋	Tempo 功能 - 需要 Grafana Agent
service graph	服務圖支援	Tempo 功能 - 需要 Grafana Agent
scrape	刮削	Prometheus 詞彙