前提條件
- 安裝 ansible (推薦使用
pip3 install ansible
)
獲取 prometheus collection 說明
找到 prometheus-commulity
社群開源倉庫,https://github.com/prometheus-community/ansible,根據說明文件跳轉到文件頁面 https://prometheus-community.github.io/ansible/branch/main/
可以發現,社群官方維護的 ansible-collection 已經包含了諸多常見的 role 角色
我們點開 node_exporter
role 的介紹頁面,下面便是此 node_exporter
role 相關的一些關鍵變數:
Parameter | Comments |
---|---|
node_exporter_basic_auth_users dictionary |
Dictionary of users and password for basic authentication. Passwords are automatically hashed with bcrypt. |
node_exporter_binary_install_dir string |
Advanced Directory to install node_exporter binary Default: "/usr/local/bin" |
node_exporter_binary_url string |
URL of the node exporter binaries .tar.gz file Default: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}.tar.gz" |
node_exporter_checksums_url string |
URL of the node exporter checksums file Default: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/sha256sums.txt" |
node_exporter_config_dir string |
Path to directory with node_exporter configuration Default: "/etc/node_exporter" |
node_exporter_disabled_collectors list / elements=string |
List of disabled collectors. By default node_exporter disables collectors listed here. |
node_exporter_enabled_collectors list / elements=string |
List of dicts defining additionally enabled collectors and their configuration. It adds collectors to those enabled by default. Default: ["systemd", {"textfile": {"directory": "{{ node_exporter_textfile_dir }}"}}] |
node_exporter_http_server_config dictionary |
Config for HTTP/2 support. Keys and values are the same as in node_exporter docs. |
node_exporter_local_cache_path string |
Local path to stash the archive and its extraction Default: "/tmp/node_exporter-{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}/{{ node_exporter_version }}" |
node_exporter_system_group string |
Advanced System group for node exporter Default: "node-exp" |
node_exporter_system_user string |
Advanced Node exporter user Default: "node-exp" |
node_exporter_textfile_dir string |
Directory used by the Textfile Collector. To get permissions to write metrics in this directory, users must be in node-exp system group.Note: More information in TROUBLESHOOTING.md guide. Default: "/var/lib/node_exporter" |
node_exporter_tls_server_config dictionary |
Configuration for TLS authentication. Keys and values are the same as in node_exporter docs. |
node_exporter_version string |
Node exporter package version. Also accepts latest as parameter. Default: "1.8.2" |
node_exporter_web_disable_exporter_metrics boolean |
Exclude metrics about the exporter itself (promhttp_, process_, go_*). Choices: - **false** ← (default)- true |
node_exporter_web_listen_address string |
Address on which node exporter will listen Default: "0.0.0.0:9100" |
node_exporter_web_telemetry_path string |
Path under which to expose metrics Default: "/metrics" |
安裝 Collection
安裝方式其實有兩種,我們接下來分別介紹兩種安裝方法
方式一:ansible-galaxy 倉庫安裝
我們在 https://galaxy.ansible.com/ 查詢 promehteus
的 collection,查詢到的便是 prometheus-commulity
社群貢獻的 Ansible Collections 集合:
在 ansible 管理機透過 galaxy 倉庫安裝 prometheus.prometheus
Collection 集合:
ansible-galaxy collection install prometheus.prometheus:0.23.0
方式二:透過 github 原始碼倉庫安裝
>_ ansible-galaxy collection install git+https://github.com/prometheus-community/ansible.git
Cloning into '/root/.ansible/tmp/ansible-local-143093dqnngbq4/tmpmhi9qxg0/ansible9uq0qwat'...
remote: Enumerating objects: 774, done.
remote: Counting objects: 100% (774/774), done.
remote: Compressing objects: 100% (389/389), done.
remote: Total 774 (delta 302), reused 588 (delta 232), pack-reused 0 (from 0)
Receiving objects: 100% (774/774), 156.00 KiB | 1.53 MiB/s, done.
Resolving deltas: 100% (302/302), done.
Your branch is up to date with 'origin/main'.
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Installing 'prometheus.prometheus:0.23.1' to '/root/.ansible/collections/ansible_collections/prometheus/prometheus'
Created collection for prometheus.prometheus:0.23.1 at /root/.ansible/collections/ansible_collections/prometheus/prometheus
prometheus.prometheus:0.23.1 was installed successfully
'community.general:10.1.0' is already installed, skipping.
檢視本機已安裝的 Collection
>_ ansible-galaxy collection list
# /usr/lib/python3.9/site-packages/ansible_collections
Collection Version
----------------------------- -------
amazon.aws 3.4.0
ansible.netcommon 3.1.0
....
# /root/.ansible/collections/ansible_collections
Collection Version
--------------------- -------
community.general 10.1.0
prometheus.prometheus 0.23.1
可以看到,除了系統自帶的一些 collection,還有我們剛安裝的 prometheus.prometheus 0.23.1
,而它所依賴的 commulity.general 10.1.0
也在這裡。
安裝 node_exporter
我們在 inventory 準備好需要安裝的節點組資訊。
常使用 ping 模組來測試連線:
>_ ansible all -i hosts.yaml -m ping
flink-1 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"ping": "pong"
}
返回 pong
即代表正常。
準備 playbook
>_ cat install_node_exporter.yaml
- hosts: flink
collections:
- prometheus.prometheus
tasks:
- import_role:
name: node_exporter
嘗試執行 playbook 任務
我們如果不確定所包含的任務是否能正確執行,可以使用 -C
引數來進行 try-run 安裝,不會實際修改目標節點的任何檔案:
>_ ansible-playbook -i hosts.yaml install_node_exporter.yaml -C
PLAY [flink] *************************************************************************************************************************************************************************************
TASK [Gathering Facts] ***************************************************************************************************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus.node_exporter : Validating arguments against arg spec 'main' - Prometheus Node Exporter] *****************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus.node_exporter : Preflight] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/preflight.yml for flink-1
TASK [Common preflight] **************************************************************************************************************************************************************************
TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Check for deprecated skip_install variable] ****************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Check for deprecated binary_local_dir variable] ************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Check for deprecated archive_path variable] ****************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Assert usage of systemd as an init system] *****************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Install dependencies] **************************************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus._common : Gather package facts] **************************************************************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus._common : Naive assertion of proper listen address] ******************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus.node_exporter : Assert that used version supports listen address type] ***********************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus.node_exporter : Assert collectors are not both disabled and enabled at the same time] ********************************************************************************
TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert path are set] ***********************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Check existence of TLS cert file] ********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Check existence of TLS key file] *********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert are present] ************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Discover latest version] *****************************************************************************************************************************
skipping: [flink-1]
TASK [Install] ***********************************************************************************************************************************************************************************
TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Gather system user and group facts] ************************************************************************************************************************
ok: [flink-1] => (item=passwd)
ok: [flink-1] => (item=group)
TASK [prometheus.prometheus._common : Create system group node-exp] ******************************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Create system user node-exp] *******************************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Create localhost binary cache path] ************************************************************************************************************************
ok: [flink-1 -> localhost]
TASK [prometheus.prometheus._common : Get checksum list for node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus._common : Download node_exporter-1.8.2.linux-amd64.tar.gz] ***********************************************************************************************************
ok: [flink-1 -> localhost]
TASK [prometheus.prometheus._common : Unpack binary archive node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1 -> localhost]
TASK [prometheus.prometheus._common : Check existence of binary install dir] *********************************************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus._common : Make sure binary install dir exists] ***********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus._common : Propagate binaries] ****************************************************************************************************************************************
changed: [flink-1] => (item=node_exporter)
TASK [SELinux] ***********************************************************************************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Configure] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/configure.yml for flink-1
TASK [Configure] *********************************************************************************************************************************************************************************
TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Create systemd service unit node_exporter] *****************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Create config dir /etc/node_exporter] **********************************************************************************************************************
[WARNING]: failed to look up user node-exp. Create user up to this point in real play
[WARNING]: failed to look up group node-exp. Create group up to this point in real play
changed: [flink-1]
TASK [prometheus.prometheus._common : Install web config for node_exporter] **********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Create textfile collector dir] ***********************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus.node_exporter : Ensure Node Exporter is enabled on boot] *************************************************************************************************************
skipping: [flink-1]
RUNNING HANDLER [prometheus.prometheus.node_exporter : Restart node_exporter] ********************************************************************************************************************
skipping: [flink-1]
PLAY RECAP ***************************************************************************************************************************************************************************************
flink-1 : ok=26 changed=6 unreachable=0 failed=0 skipped=12 rescued=0 ignored=0
執行 playbook 指令碼任務,安裝 node_exporter
注意:如果擔心 role 有額外的步驟影響目標節點,可以使用
--step
引數進行安裝,此時執行的指令碼任務每一個任務都需要手動敲如 y/n 進行確認執行。
>_ ansible-playbook -i hosts.yaml install_node_exporter.yaml --step
PLAY [flink] *************************************************************************************************************************************************************************************
Perform task: TASK: Gathering Facts (N)o/(y)es/(c)ontinue: y
....
此處,我們就直接執行安裝了:
>_ ansible-playbook -i hosts.yaml install_node_exporter.yaml
PLAY [flink] *************************************************************************************************************************************************************************************
TASK [Gathering Facts] ***************************************************************************************************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus.node_exporter : Validating arguments against arg spec 'main' - Prometheus Node Exporter] *****************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus.node_exporter : Preflight] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/preflight.yml for flink-1
TASK [Common preflight] **************************************************************************************************************************************************************************
TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Check for deprecated skip_install variable] ****************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Check for deprecated binary_local_dir variable] ************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Check for deprecated archive_path variable] ****************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Assert usage of systemd as an init system] *****************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Install dependencies] **************************************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus._common : Gather package facts] **************************************************************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus._common : Naive assertion of proper listen address] ******************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus.node_exporter : Assert that used version supports listen address type] ***********************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus.node_exporter : Assert collectors are not both disabled and enabled at the same time] ********************************************************************************
TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert path are set] ***********************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Check existence of TLS cert file] ********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Check existence of TLS key file] *********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert are present] ************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Discover latest version] *****************************************************************************************************************************
skipping: [flink-1]
TASK [Install] ***********************************************************************************************************************************************************************************
TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Gather system user and group facts] ************************************************************************************************************************
ok: [flink-1] => (item=passwd)
ok: [flink-1] => (item=group)
TASK [prometheus.prometheus._common : Create system group node-exp] ******************************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Create system user node-exp] *******************************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Create localhost binary cache path] ************************************************************************************************************************
ok: [flink-1 -> localhost]
TASK [prometheus.prometheus._common : Get checksum list for node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus._common : Download node_exporter-1.8.2.linux-amd64.tar.gz] ***********************************************************************************************************
ok: [flink-1 -> localhost]
TASK [prometheus.prometheus._common : Unpack binary archive node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1 -> localhost]
TASK [prometheus.prometheus._common : Check existence of binary install dir] *********************************************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus._common : Make sure binary install dir exists] ***********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus._common : Propagate binaries] ****************************************************************************************************************************************
changed: [flink-1] => (item=node_exporter)
TASK [SELinux] ***********************************************************************************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Configure] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/configure.yml for flink-1
TASK [Configure] *********************************************************************************************************************************************************************************
TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Create systemd service unit node_exporter] *****************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Create config dir /etc/node_exporter] **********************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Install web config for node_exporter] **********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Create textfile collector dir] ***********************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus.node_exporter : Ensure Node Exporter is enabled on boot] *************************************************************************************************************
changed: [flink-1]
RUNNING HANDLER [prometheus.prometheus.node_exporter : Restart node_exporter] ********************************************************************************************************************
changed: [flink-1]
PLAY RECAP ***************************************************************************************************************************************************************************************
flink-1 : ok=28 changed=8 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0
檢查安裝效果
我們在目標節點檢視服務列表:
>_ systemctl list-unit-files -t service | grep node_exporter
node_exporter.service enabled disabled
>_ systemctl status node_exporter.service
● node_exporter.service - Prometheus Node Exporter
Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; preset: disabled)
Active: active (running) since Tue 2024-12-10 01:00:12 CST; 7min ago
Main PID: 5615 (node_exporter)
Tasks: 6 (limit: 48928)
Memory: 6.6M
CPU: 141ms
CGroup: /system.slice/node_exporter.service
└─5615 /usr/local/bin/node_exporter --collector.systemd --collector.textfile --collector.textfile.directory=/var/lib/node_exporter --web.listen-address=0.0.0.0:9100 --web.telemetry-path=/metrics
修改 node_exporter 預設配置
我們回到 ansible 管理機上,對於該 node_exporter
role 的一些預設配置,可以檢視如下定義檔案:
>_ cat ~/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/defaults/main.yml
---
node_exporter_version: 1.8.2
node_exporter_binary_url: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/\
node_exporter-{{ node_exporter_version }}.{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}.tar.gz"
node_exporter_checksums_url: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/sha256sums.txt"
node_exporter_web_disable_exporter_metrics: false
node_exporter_web_listen_address: "0.0.0.0:9100"
node_exporter_web_telemetry_path: "/metrics"
node_exporter_textfile_dir: "/var/lib/node_exporter"
node_exporter_tls_server_config: {}
node_exporter_http_server_config: {}
node_exporter_basic_auth_users: {}
node_exporter_enabled_collectors:
- systemd
- textfile:
directory: "{{ node_exporter_textfile_dir }}"
# - filesystem:
# ignored-mount-points: "^/(sys|proc|dev)($|/)"
# ignored-fs-types: "^(sys|proc|auto)fs$"
node_exporter_disabled_collectors: []
node_exporter_binary_install_dir: "/usr/local/bin"
node_exporter_system_group: "node-exp"
node_exporter_system_user: "{{ node_exporter_system_group }}"
node_exporter_config_dir: "/etc/node_exporter"
# Local path to stash the archive and its extraction
node_exporter_local_cache_path: "/tmp/node_exporter-{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}/{{ node_exporter_version }}"
可以看到,node_exporter
role 的預設變數值是在這裡定義的。
修改安裝變數值,可以有如下幾處地方進行設定:
方法一:在 playbook 指令碼檔案統一指定 vars
變數,該變數會覆蓋 role 的預設變數值
cat install_node_exporter.yaml
- hosts: flink
collections:
- prometheus.prometheus
tasks:
- import_role:
name: node_exporter
vars:
node_exporter_enabled_collectors:
- systemd
- textfile:
directory: "{{ node_exporter_textfile_dir }}"
- filesystem:
ignored-mount-points: "^/(sys|proc|dev)($|/)"
ignored-fs-types: "^(sys|proc|auto)fs$"
方法二:修改 inventory 定義檔案中,host group 的 vars 變數,或者單獨某一個節點的 vars
變數值
>_ cat hosts.yaml
# game team test
flink:
hosts:
flink-1:
ansible_host: 192.168.22.174
vars:
ansible_ssh_user: root
ansible_ssh_password: ****
node_exporter_enabled_collectors:
- systemd
- textfile:
directory: "{{ node_exporter_textfile_dir }}"
- filesystem:
ignored-mount-points: "^/(sys|proc|dev)($|/)"
ignored-fs-types: "^(sys|proc|auto)fs$"