更多精彩內容,請關注微信公眾號:後端技術小屋
一文讀懂clickhouse叢集監控
常言道,兵馬未至,糧草先行,在clickhouse上生產環境之前,我們就得制定好相關的監控方案,包括metric採集、報警策略、圖形化報表。有了全面有效的監控,我們就彷彿擁有了千里眼順風耳,對於線上任何風吹草動都能及時感知,在必要的情況下提前介入以避免線上故障。
業界常用的監控方案一般是基於prometheus + grafana生態。本文將介紹由clickhouse-exporter(node-exporter) + prometheus + grafana組成的監控方案。
以上為監控方案示意圖
- clickhouse-server中有4個系統表會記錄程式內部的指標,分別是
system.metrics
,system.asynchronous_metrics
,system.events
,system.parts
- clickhuse-exporter是一個用於採集clickhouse指標的開源元件(https://github.com/ClickHouse/clickhouse_exporter),它會定時查詢clickhouse-server中的系統表,轉化成監控指標,並通過HTTP介面暴露給prometheus.
- node-exporter是一個用於採集硬體和作業系統相關指標的開源元件(https://github.com/prometheus/node_exporter)。
- prometheus定時抓取clickhouse-exporter暴露的指標,並判斷報警條件是否被觸發,是則推送到alert manager
- DBA可通過grafana看板實時檢視當前clickhouse叢集的執行狀態
- DBA可通過alertmanager設定報警通知方式,如郵件、企業微信、電話等。
1 部署與配置
1.1 clickhouse-server
我們生產環境版本為20.3.8
,按照官方文件部署即可。
1.2 clickhouse-exporter
clickhouse-exporter一般與clickhouse-server同機部署。
首先下載最新程式碼並編譯(需預先安裝Go)
git clone https://github.com/ClickHouse/clickhouse_exporter
cd clickhouse_exporter
go mod init
go mod vendor
go build
ls ./clickhouse_exporter
然後啟動
export CLICKHOUSE_USER="user"
export CLICKHOUSE_PASSWORD="password"
nohup ./-scrape_uri=http://localhost:port/ >nohup.log 2>&1 &
最後檢查指標是否被正常採集:
> curl localhost:9116/metrics | head
# TYPE clickhouse_arena_alloc_bytes_total counter
clickhouse_arena_alloc_bytes_total 9.799096840192e+12
# HELP clickhouse_arena_alloc_chunks_total Number of ArenaAllocChunks total processed
# TYPE clickhouse_arena_alloc_chunks_total counter
clickhouse_arena_alloc_chunks_total 2.29782524e+08
# HELP clickhouse_background_move_pool_task Number of BackgroundMovePoolTask currently processed
# TYPE clickhouse_background_move_pool_task gauge
clickhouse_background_move_pool_task 0
# HELP clickhouse_background_pool_task Number of BackgroundPoolTask currently processed
1.3 node-exporter
node-exporter需與clickhouse-server同機部署
首先下載最新程式碼並編譯
git clone https://github.com/prometheus/node_exporter
make build
ls ./node_exporter
然後啟動
nohup ./node_exporter > nohup.log 2>&1 &
最後檢查指標是否被正常採集
> curl localhost:9100/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 6.3563e-05
go_gc_duration_seconds{quantile="0.25"} 7.4746e-05
go_gc_duration_seconds{quantile="0.5"} 9.0556e-05
go_gc_duration_seconds{quantile="0.75"} 0.000110677
go_gc_duration_seconds{quantile="1"} 0.004362325
go_gc_duration_seconds_sum 28.451282046
go_gc_duration_seconds_count 223479
...
1.4 prometheus
修改prometheus配置檔案,新增alertmanager地址、clickhouse-exporter地址
prometheus.yml示例如下:
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- ./rules/*.rules
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'clickhouse'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['clickhouseexporter1:9116', 'clickhouseexporter2:9116', ...]
*.rules示例如下:
groups:
- name: qps_too_high
rules:
- alert: clickhouse qps超出閾值
expr: rate(clickhouse_query_total[1m]) > 100
for: 2m
labels:
job: clickhouse-server
severity: critical
alertname: clickhouse qps超出閾值
annotations:
summary: "clickhouse qps超出閾值"
description: "clickhouse qps超過閾值(100), qps: {{ $value }}"
啟動promethus
nohup ./prometheus --config.file=/path/to/config --storage.tsdb.path=/path/to/storage --web.external-url=prometheus --web.enable-admin-api --web.enable-lifecycle --log.level=warn >nohup.log 2>&1 &
瀏覽器輸入http://prometheus_ip:9090
檢查prometheus狀態
1.5 alert manager
首先修改配置檔案
配置檔案示例如下:
route:
receiver: 'default'
group_by: ['service','project']
receivers:
- name: "電話"
webhook_configs:
- url: <url>
- name: "企業微信"
webhook_configs:
- url: <url>
- name: "郵箱"
webhook_configs:
- url: <url>
然後啟動
nohup ./alertmanager --config.file=/path/to/config --log.level=warn >nohup.log 2>&1 &
1.6 grafana
關於clickhouse的dashboard模板已經有很多,在這裡推薦:https://grafana.com/grafana/dashboards/882 將它匯入到新建的grafana dashboard之後,即可得到漂亮的clickhouse叢集看板(可能需要微調)。
另外建議安裝clickhouse datasource外掛。有了這個外掛便能在grafana中配置clickhouse資料來源,並通過Clickhouse SQL配置圖表,詳細文件見:https://grafana.com/grafana/plugins/vertamedia-clickhouse-datasource
2 重要指標和監控
我們可以看到,不管是node-exporter還是clickhouse-exporter,它們的指標種類很多,大概有幾百個。我們的策略是抓大放小,對於重要的指標才設定報警策略並建立看板。
下面列舉一些個人覺得比較重要的指標
2.1 系統指標
系統指標由node-exporter採集
指標名 | 指標含義 | 報警策略 | 策略含義 |
---|---|---|---|
node_cpu_seconds_total | 機器累計cpu時間(單位s) | 100 * sum without (cpu) (rate(node_cpu_seconds_total{mode='user'}[5m])) / count without (cpu) (node_cpu_seconds_total{mode='user'}) > 80 | 使用者態cpu利用率大於80%則報警 |
node_filesystem_size_bytes/node_filesystem_avail_bytes | 機器上個檔案分割槽容量/可用容量 | 100 * (node_filesystem_size_bytes{mountpoint="/data"} - node_filesystem_avail_bytes{mountpoint="/data"}) / node_filesystem_size_bytes{mountpoint="/data"} > 80 | /data盤佔用超過80%則報警 |
node_load5 | 5分鐘load值 | node_load5 > 60 | 5分鐘load值超過60則報警(可根據具體情況設定閾值) |
node_disk_reads_completed_total | 累計讀磁碟請求次數 | rate(node_disk_reads_completed_total[5m]) > 200 | read iops超過200則報警 |
2.2 clickhouse指標
指標名 | 指標含義 | 報警策略 | 策略含義 |
---|---|---|---|
clickhouse_exporter_scrape_failures_total | prometheus抓取exporter失敗總次數 | increase(clickhouse_exporter_scrape_failures_total[5m]) > 10 | prometheus抓取export失敗次數超過閾值則報警,說明此時ch伺服器可能發生當機 |
promhttp_metric_handler_requests_total | exporter請求clickhouse失敗總次數 | increase(promhttp_metric_handler_requests_total{code="200"}[2m]) == 0 | 2分鐘內查詢clickhouse成功次數為零則報警,說明此時某個ch例項可能不可用 |
clickhouse_readonly_replica | ch例項中處於只讀狀態的表個數 | clickhouse_readonly_replica > 5 | ch中只讀表超過5則報警,說明此時ch與zk連線可能發生異常 |
clickhouse_query_total | ch已處理的query總數 | rate(clickhouse_query_total[1m]) > 30 | 單例項qps超過30則報警 |
clickhouse_query | ch中正在執行的query個數 | clickhouse_query > 30 | 單例項併發query數超過閾值則報警 |
clickhouse_tcp_connection | ch的TCP連線數 | clickhouse_tcp_connection > XXX | 略 |
clickhouse_http_connection | ch的HTTP連線數 | clickhouse_http_connection > XXX | 略 |
clickhouse_zoo_keeper_request | ch中正在執行的zk請求數 | clickhouse_zoo_keeper_request > XXX | 略 |
clickhouse_replicas_max_queue_size | ch中zk副本同步佇列的長度 | clickhouse_replicas_max_queue_size > 100 | zk副本同步佇列長度超過閾值則報警,說明此時副本同步佇列出現堆積 |
2.3 其他常用SQL
在clickhouse中,所有被執行的Query都會記錄到system.query_log
表中。因此我們可通過該表監控叢集的查詢情況。以下列舉幾種用於監控的常用SQL。為了更方便的檢視,可新增到grafana看板中。
最近查詢
SELECT
event_time,
user,
query_id AS query,
read_rows,
read_bytes,
result_rows,
result_bytes,
memory_usage,
exception
FROM clusterAllReplicas('cluster_name', system, query_log)
WHERE (event_date = today()) AND (event_time >= (now() - 60)) AND (is_initial_query = 1) AND (query NOT LIKE 'INSERT INTO%')
ORDER BY event_time DESC
LIMIT 100
慢查詢
SELECT
event_time,
user,
query_id AS query,
read_rows,
read_bytes,
result_rows,
result_bytes,
memory_usage,
exception
FROM clusterAllReplicas('cluster_name', system, query_log)
WHERE (event_date = yesterday()) AND query_duration_ms > 30000 AND (is_initial_query = 1) AND (query NOT LIKE 'INSERT INTO%')
ORDER BY query_duration_ms desc
LIMIT 100
Top10大表
SELECT
database,
table,
sum(bytes_on_disk) AS bytes_on_disk
FROM clusterAllReplicas('cluster_name', system, parts)
WHERE active AND (database != 'system')
GROUP BY
database,
table
ORDER BY bytes_on_disk DESC
LIMIT 10
Top10查詢使用者
SELECT
user,
count(1) AS query_times,
sum(read_bytes) AS query_bytes,
sum(read_rows) AS query_rows
FROM clusterAllReplicas('cluster_name', system, query_log)
WHERE (event_date = yesterday()) AND (is_initial_query = 1) AND (query NOT LIKE 'INSERT INTO%')
GROUP BY user
ORDER BY query_times DESC
LIMIT 10
更多精彩內容,請掃碼關注微信公眾號:後端技術小屋。如果覺得文章對你有幫助的話,請多多分享、轉發、在看。