前言
隨著 Prometheus 監控的元件、數量、指標越來越多,Prometheus 對計算效能的要求會越來越高,儲存佔用也會越來越多。
在這種情況下,要最佳化 Prometheus 效能, 最佳化儲存佔用. 第一時間想到的可能是各種 Prometheus 的相容儲存方案, 如 Thanos 或 VM、Mimir 等。但是實際上雖然集中儲存、長期儲存、儲存降取樣及儲存壓縮可以一定程度解決相關問題,但是治標不治本。
- 真正的本,還是在於指標量(series)過於龐大。
- 治本之法,應該是減少指標量。有 2 種辦法:
- Prometheus 效能調優 - 解決高基數問題
- 根據實際使用情況,只保留(keep)展示(Grafana Dashboards)和告警(prometheus rules)會用到的指標。
本次重點介紹第二種辦法:如何根據實際的使用情況精簡 Prometheus 的指標和儲存佔用?
思路
- 分析當前 Prometheus 中儲存的所有的 metric name(指標項);
- 分析展示環節用到的所有 metric name,即 Grafana 的 Dashboards 用到的所有指標;
- 分析告警環節用到的所有 metric name,即 Prometheus Rule 配置中用到的所有指標;
- (可選)分析診斷環境用到的所有 metric name,即經常在 Prometheus UI 上 query 的指標;
- 透過
relabel
在metric_relabel_configs
或write_relabel_configs
僅keep
2-4 中的指標, 以此大幅減少 Prometheus 需要儲存的指標量.
要具體實現這個思路, 可以透過 Grafana Labs 出品的 mimirtool 來搞定.
我這裡有個前後的對比效果, 可供參考這樣做效果有多驚人:
- 精簡前: 270336 活動 series
- 精簡後: 61055 活動 series
- 精簡效果: 將近 5 倍的精簡率!
Grafana Mimirtool
Grafana Mimir 是一款以物件儲存為儲存方式的 Prometheus 長期儲存解決方案, 從 Cortex 演化而來. 官方號稱支援億級別的 series 寫入儲存和查詢.
Grafana Mimirtool 是 Mimir 釋出的一個實用工具, 可單獨使用.
Grafana Mimirtool 支援從以下方面提取指標:
- Grafana 例項中的Grafana Dashboards(透過 Grafana API)
- Mimir 例項中的 Prometheus alerting 和 recording rules
- Grafana Dashboards JSON檔案
- Prometheus記alerting 和 recording rules 的 YAML檔案
然後,Grafana Mimirtool可以將這些提取的指標與Prometheus或Cloud Prometheus例項中的活動 series 進行比較,並輸出一個 used
指標和 unused
指標的列表。
Prometheus 精簡指標實戰
假設
假定:
- 透過kube-prometheus-stack 安裝 Prometheus
- 已安裝 Grafana 且作為展示端
- 已配置相應的 告警規則
- 除此之外, 無其他需要額外保留的指標
前提
- Grafana Mimirtool 從 releases 中找到 mimirtool 對應平臺的版本下載即可使用;
- 已建立 Grafana API token
- Prometheus已安裝和配置.
第一步: 分析 Grafana Dashboards 用到的指標
透過 Grafana API
具體如下:
# 透過 Grafana API分析 Grafana 用到的指標
# 前提是現在 Grafana上建立 API Keys
mimirtool analyze grafana --address http://172.16.0.20:32651 --key=eyJrIjoiYjBWMGVoTHZTY3BnM3V5UzNVem9iWDBDSG5sdFRxRVoiLCJuIjoibWltaXJ0b29sIiwiaWQiOjF9
?說明:
http://172.16.0.20:32651
是 Grafana 地址--key=eyJr
是 Grafana API Token. 透過如下介面獲得:
獲取到的是一個 metrics-in-grafana.json
, 內容概述如下:
{
"metricsUsed": [
":node_memory_MemAvailable_bytes:sum",
"alertmanager_alerts",
"alertmanager_alerts_invalid_total",
"alertmanager_alerts_received_total",
"alertmanager_notification_latency_seconds_bucket",
"alertmanager_notification_latency_seconds_count",
"alertmanager_notification_latency_seconds_sum",
"alertmanager_notifications_failed_total",
"alertmanager_notifications_total",
"cluster",
"cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits",
"cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests",
"cluster:namespace:pod_memory:active:kube_pod_container_resource_limits",
"cluster:namespace:pod_memory:active:kube_pod_container_resource_requests",
"cluster:node_cpu:ratio_rate5m",
"container_cpu_cfs_periods_total",
"container_cpu_cfs_throttled_periods_total",
"..."
],
"dashboards": [
{
"slug": "",
"uid": "alertmanager-overview",
"title": "Alertmanager / Overview",
"metrics": [
"alertmanager_alerts",
"alertmanager_alerts_invalid_total",
"alertmanager_alerts_received_total",
"alertmanager_notification_latency_seconds_bucket",
"alertmanager_notification_latency_seconds_count",
"alertmanager_notification_latency_seconds_sum",
"alertmanager_notifications_failed_total",
"alertmanager_notifications_total"
],
"parse_errors": null
},
{
"slug": "",
"uid": "c2f4e12cdf69feb95caa41a5a1b423d9",
"title": "etcd",
"metrics": [
"etcd_disk_backend_commit_duration_seconds_bucket",
"etcd_disk_wal_fsync_duration_seconds_bucket",
"etcd_mvcc_db_total_size_in_bytes",
"etcd_network_client_grpc_received_bytes_total",
"etcd_network_client_grpc_sent_bytes_total",
"etcd_network_peer_received_bytes_total",
"etcd_network_peer_sent_bytes_total",
"etcd_server_has_leader",
"etcd_server_leader_changes_seen_total",
"etcd_server_proposals_applied_total",
"etcd_server_proposals_committed_total",
"etcd_server_proposals_failed_total",
"etcd_server_proposals_pending",
"grpc_server_handled_total",
"grpc_server_started_total",
"process_resident_memory_bytes"
],
"parse_errors": null
},
{...}
]
}
(可選)透過 Grafana Dashboards json 檔案
如果無法建立 Grafana API Token, 只要有 Grafana Dashboards json 檔案, 也可以用來分析, 示例如下:
# 透過 Grafana Dashboard json 分析 Grafana 用到的指標
mimirtool analyze dashboard grafana_dashboards/blackboxexporter-probe.json
mimirtool analyze dashboard grafana_dashboards/es.json
得到的 json 結構和上一節類似, 就不贅述了.
第二步: 分析 Prometheus Alerting 和 Recording Rules 用到的指標
具體操作如下:
# (可選)透過 kubectl cp 將用到的 rule files 複製到本地
kubectl cp <prompod>:/etc/prometheus/rules/<releasename>-kube-prometheus-st-prometheus-rulefiles-0 -c prometheus ./kube-prometheus-stack/rulefiles/
# 透過 Prometheus rule files 分析 Prometheus Rule 用到的指標(涉及 recording rule 和 alert rules)
mimirtool analyze rule-file ./kube-prometheus-stack/rulefiles/*
結果如下 metrics-in-ruler.json
:
{
"metricsUsed": [
"ALERTS",
"aggregator_unavailable_apiservice",
"aggregator_unavailable_apiservice_total",
"apiserver_client_certificate_expiration_seconds_bucket",
"apiserver_client_certificate_expiration_seconds_count",
"apiserver_request_terminations_total",
"apiserver_request_total",
"blackbox_exporter_config_last_reload_successful",
"..."
],
"ruleGroups": [
{
"namspace": "default-monitor-kube-prometheus-st-kubernetes-apps-ae2b16e5-41d8-4069-9297-075c28c6969e",
"name": "kubernetes-apps",
"metrics": [
"kube_daemonset_status_current_number_scheduled",
"kube_daemonset_status_desired_number_scheduled",
"kube_daemonset_status_number_available",
"kube_daemonset_status_number_misscheduled",
"kube_daemonset_status_updated_number_scheduled",
"..."
]
"parse_errors": null
},
{
"namspace": "default-monitor-kube-prometheus-st-kubernetes-resources-ccb4a7bc-f2a0-4fe4-87f7-0b000468f18f",
"name": "kubernetes-resources",
"metrics": [
"container_cpu_cfs_periods_total",
"container_cpu_cfs_throttled_periods_total",
"kube_node_status_allocatable",
"kube_resourcequota",
"namespace_cpu:kube_pod_container_resource_requests:sum",
"namespace_memory:kube_pod_container_resource_requests:sum"
],
"parse_errors": null
},
{...}
]
}
第三步: 分析沒用到的指標
具體如下:
# 綜合分析 Prometheus 採集到的 VS. (展示(Grafana Dashboards) + 記錄及告警(Rule files))
mimirtool analyze prometheus --address=http://172.16.0.20:30090/ --grafana-metrics-file="metrics-in-grafana.json" --ruler-metrics-file="metrics-in-ruler.json"
?說明:
--address=http://172.16.0.20:30090/
為 prometheus 地址--grafana-metrics-file="metrics-in-grafana.json"
為第一步得到的 json 檔案--ruler-metrics-file="kube-prometheus-stack-metrics-in-ruler.json"
為第二步得到的 json 檔案
輸出結果prometheus-metrics.json
如下:
{
"total_active_series": 270336,
"in_use_active_series": 61055,
"additional_active_series": 209281,
"in_use_metric_counts": [
{
"metric": "rest_client_request_duration_seconds_bucket",
"count": 8855,
"job_counts": [
{
"job": "kubelet",
"count": 4840
},
{
"job": "kube-controller-manager",
"count": 1958
},
{...}
]
},
{
"metric": "grpc_server_handled_total",
"count": 4394,
"job_counts": [
{
"job": "kube-etcd",
"count": 4386
},
{
"job": "default/kubernetes-ebao-ebaoops-pods",
"count": 8
}
]
},
{...}
],
"additional_metric_counts": [
{
"metric": "rest_client_rate_limiter_duration_seconds_bucket",
"count": 81917,
"job_counts": [
{
"job": "kubelet",
"count": 53966
},
{
"job": "kube-proxy",
"count": 23595
},
{
"job": "kube-scheduler",
"count": 2398
},
{
"job": "kube-controller-manager",
"count": 1958
}
]
},
{
"metric": "rest_client_rate_limiter_duration_seconds_count",
"count": 7447,
"job_counts": [
{
"job": "kubelet",
"count": 4906
},
{
"job": "kube-proxy",
"count": 2145
},
{
"job": "kube-scheduler",
"count": 218
},
{
"job": "kube-controller-manager",
"count": 178
}
]
},
{...}
]
}
第四步: 僅 keep
用到的指標
在 write_relabel_configs
環節配置
如果你有使用 remote_write
, 那麼直接在 write_relabel_configs
環節配置 keep
relabel 規則, 簡單粗暴.
可以先用 jp
命令得到所有需要 keep
的metric name:
jq '.metricsUsed' metrics-in-grafana.json \
| tr -d '", ' \
| sed '1d;$d' \
| grep -v 'grafanacloud*' \
| paste -s -d '|' -
輸出結果類似如下:
instance:node_cpu_utilisation:rate1m|instance:node_load1_per_cpu:ratio|instance:node_memory_utilisation:ratio|instance:node_network_receive_bytes_excluding_lo:rate1m|instance:node_network_receive_drop_excluding_lo:rate1m|instance:node_network_transmit_bytes_excluding_lo:rate1m|instance:node_network_transmit_drop_excluding_lo:rate1m|instance:node_vmstat_pgmajfault:rate1m|instance_device:node_disk_io_time_seconds:rate1m|instance_device:node_disk_io_time_weighted_seconds:rate1m|node_cpu_seconds_total|node_disk_io_time_seconds_total|node_disk_read_bytes_total|node_disk_written_bytes_total|node_filesystem_avail_bytes|node_filesystem_size_bytes|node_load1|node_load15|node_load5|node_memory_Buffers_bytes|node_memory_Cached_bytes|node_memory_MemAvailable_bytes|node_memory_MemFree_bytes|node_memory_MemTotal_bytes|node_network_receive_bytes_total|node_network_transmit_bytes_total|node_uname_info|up
然後直接在 write_relabel_configs
環節配置 keep
relabel 規則:
remote_write:
- url: <remote_write endpoint>
basic_auth:
username: <按需>
password: <按需>
write_relabel_configs:
- source_labels: [__name__]
regex: instance:node_cpu_utilisation:rate1m|instance:node_load1_per_cpu:ratio|instance:node_memory_utilisation:ratio|instance:node_network_receive_bytes_excluding_lo:rate1m|instance:node_network_receive_drop_excluding_lo:rate1m|instance:node_network_transmit_bytes_excluding_lo:rate1m|instance:node_network_transmit_drop_excluding_lo:rate1m|instance:node_vmstat_pgmajfault:rate1m|instance_device:node_disk_io_time_seconds:rate1m|instance_device:node_disk_io_time_weighted_seconds:rate1m|node_cpu_seconds_total|node_disk_io_time_seconds_total|node_disk_read_bytes_total|node_disk_written_bytes_total|node_filesystem_avail_bytes|node_filesystem_size_bytes|node_load1|node_load15|node_load5|node_memory_Buffers_bytes|node_memory_Cached_bytes|node_memory_MemAvailable_bytes|node_memory_MemFree_bytes|node_memory_MemTotal_bytes|node_network_receive_bytes_total|node_network_transmit_bytes_total|node_uname_info|up
action: keep
在 metric_relabel_configs
環節配置
如果沒有使用 remote_write
, 那麼只能在 metric_relabel_configs
環節配置了.
以 etcd job 為例: (以 prometheus 配置為例, Prometheus Operator 請自行按需調整)
- job_name: serviceMonitor/default/monitor-kube-prometheus-st-kube-etcd/0
honor_labels: false
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
scheme: https
tls_config:
insecure_skip_verify: true
ca_file: /etc/prometheus/secrets/etcd-certs/ca.crt
cert_file: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt
key_file: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
relabel_configs:
- source_labels:
- job
target_label: __tmp_prometheus_job_name
- ...
metric_relabel_configs:
- source_labels: [__name__]
regex: etcd_disk_backend_commit_duration_seconds_bucket|etcd_disk_wal_fsync_duration_seconds_bucket|etcd_mvcc_db_total_size_in_bytes|etcd_network_client_grpc_received_bytes_total|etcd_network_client_grpc_sent_bytes_total|etcd_network_peer_received_bytes_total|etcd_network_peer_sent_bytes_total|etcd_server_has_leader|etcd_server_leader_changes_seen_total|etcd_server_proposals_applied_total|etcd_server_proposals_committed_total|etcd_server_proposals_failed_total|etcd_server_proposals_pending|grpc_server_handled_total|grpc_server_started_total|process_resident_memory_bytes|etcd_http_failed_total|etcd_http_received_total|etcd_http_successful_duration_seconds_bucket|etcd_network_peer_round_trip_time_seconds_bucket|grpc_server_handling_seconds_bucket|up
action: keep
不用 keep
而使用 drop
同樣滴, 不用 keep
而改為使用 drop
也是可以的. 這裡不再贅述.
???
總結
本文中,介紹了精簡 Prometheus 指標的需求, 然後說明如何使用 mimirtool analyze
命令來確定Grafana Dashboards 以及 Prometheus Rules 中用到的指標。然後用 analyze prometheus
分析了展示和告警中used
和 unused
的活動 series,最後配置了 Prometheus 以僅 keep
用到的指標。
結合這次實戰, 精簡率可以達到 5 倍左右, 效果還是非常明顯的. 推薦試一試. ?️?️?️
?️ 參考文件
- grafana/mimir: Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus. (github.com)
- Analyzing and reducing metrics usage with Grafana Mimirtool | Grafana Cloud documentation
三人行, 必有我師; 知識共享, 天下為公. 本文由東風微鳴技術部落格 EWhisper.cn 編寫.