ansible-galaxy 使用 prometheus-community/ansible 社群 Collection 安裝 node-exporter

Professor哥發表於2024-12-10

前提條件

  • 安裝 ansible (推薦使用 pip3 install ansible

獲取 prometheus collection 說明

找到 prometheus-commulity 社群開源倉庫,https://github.com/prometheus-community/ansible,根據說明文件跳轉到文件頁面 https://prometheus-community.github.io/ansible/branch/main/
可以發現,社群官方維護的 ansible-collection 已經包含了諸多常見的 role 角色

我們點開 node_exporter role 的介紹頁面,下面便是此 node_exporter role 相關的一些關鍵變數:

Parameter Comments
node_exporter_basic_auth_users



dictionary
Dictionary of users and password for basic authentication. Passwords are automatically hashed with bcrypt.
node_exporter_binary_install_dir



string
Advanced

Directory to install node_exporter binary

Default: "/usr/local/bin"
node_exporter_binary_url



string
URL of the node exporter binaries .tar.gz file

Default: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}.tar.gz"
node_exporter_checksums_url



string
URL of the node exporter checksums file

Default: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/sha256sums.txt"
node_exporter_config_dir



string
Path to directory with node_exporter configuration

Default: "/etc/node_exporter"
node_exporter_disabled_collectors



list / elements=string
List of disabled collectors.

By default node_exporter disables collectors listed here.
node_exporter_enabled_collectors



list / elements=string
List of dicts defining additionally enabled collectors and their configuration.

It adds collectors to those enabled by default.

Default: ["systemd", {"textfile": {"directory": "{{ node_exporter_textfile_dir }}"}}]
node_exporter_http_server_config



dictionary
Config for HTTP/2 support.

Keys and values are the same as in node_exporter docs.
node_exporter_local_cache_path



string
Local path to stash the archive and its extraction

Default: "/tmp/node_exporter-{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}/{{ node_exporter_version }}"
node_exporter_system_group



string
Advanced

System group for node exporter

Default: "node-exp"
node_exporter_system_user



string
Advanced

Node exporter user

Default: "node-exp"
node_exporter_textfile_dir



string
Directory used by the Textfile Collector.

To get permissions to write metrics in this directory, users must be in node-exp system group.

Note: More information in TROUBLESHOOTING.md guide.

Default: "/var/lib/node_exporter"
node_exporter_tls_server_config



dictionary
Configuration for TLS authentication.

Keys and values are the same as in node_exporter docs.
node_exporter_version



string
Node exporter package version. Also accepts latest as parameter.

Default: "1.8.2"
node_exporter_web_disable_exporter_metrics



boolean
Exclude metrics about the exporter itself (promhttp_, process_, go_*).

Choices:

- **false** ← (default)

- true
node_exporter_web_listen_address



string
Address on which node exporter will listen

Default: "0.0.0.0:9100"
node_exporter_web_telemetry_path



string
Path under which to expose metrics

Default: "/metrics"

安裝 Collection

安裝方式其實有兩種,我們接下來分別介紹兩種安裝方法

方式一:ansible-galaxy 倉庫安裝

我們在 https://galaxy.ansible.com/ 查詢 promehteus 的 collection,查詢到的便是 prometheus-commulity 社群貢獻的 Ansible Collections 集合:

在 ansible 管理機透過 galaxy 倉庫安裝 prometheus.prometheus Collection 集合:

ansible-galaxy collection install prometheus.prometheus:0.23.0

方式二:透過 github 原始碼倉庫安裝

>_ ansible-galaxy collection install git+https://github.com/prometheus-community/ansible.git
Cloning into '/root/.ansible/tmp/ansible-local-143093dqnngbq4/tmpmhi9qxg0/ansible9uq0qwat'...
remote: Enumerating objects: 774, done.
remote: Counting objects: 100% (774/774), done.
remote: Compressing objects: 100% (389/389), done.
remote: Total 774 (delta 302), reused 588 (delta 232), pack-reused 0 (from 0)
Receiving objects: 100% (774/774), 156.00 KiB | 1.53 MiB/s, done.
Resolving deltas: 100% (302/302), done.
Your branch is up to date with 'origin/main'.
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Installing 'prometheus.prometheus:0.23.1' to '/root/.ansible/collections/ansible_collections/prometheus/prometheus'
Created collection for prometheus.prometheus:0.23.1 at /root/.ansible/collections/ansible_collections/prometheus/prometheus
prometheus.prometheus:0.23.1 was installed successfully
'community.general:10.1.0' is already installed, skipping.

檢視本機已安裝的 Collection

>_ ansible-galaxy collection list
# /usr/lib/python3.9/site-packages/ansible_collections
Collection                    Version
----------------------------- -------
amazon.aws                    3.4.0  
ansible.netcommon             3.1.0  
....

# /root/.ansible/collections/ansible_collections
Collection            Version
--------------------- -------
community.general     10.1.0 
prometheus.prometheus 0.23.1 

可以看到,除了系統自帶的一些 collection,還有我們剛安裝的 prometheus.prometheus 0.23.1,而它所依賴的 commulity.general 10.1.0 也在這裡。

安裝 node_exporter

我們在 inventory 準備好需要安裝的節點組資訊。
常使用 ping 模組來測試連線:

>_ ansible all -i hosts.yaml -m ping

flink-1 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}

返回 pong 即代表正常。

準備 playbook

>_ cat install_node_exporter.yaml
- hosts: flink
  collections:
    - prometheus.prometheus
  tasks:
    - import_role: 
        name: node_exporter

嘗試執行 playbook 任務

我們如果不確定所包含的任務是否能正確執行,可以使用 -C 引數來進行 try-run 安裝,不會實際修改目標節點的任何檔案:

>_ ansible-playbook -i hosts.yaml install_node_exporter.yaml -C

PLAY [flink] *************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus.node_exporter : Validating arguments against arg spec 'main' - Prometheus Node Exporter] *****************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus.node_exporter : Preflight] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/preflight.yml for flink-1

TASK [Common preflight] **************************************************************************************************************************************************************************

TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Check for deprecated skip_install variable] ****************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Check for deprecated binary_local_dir variable] ************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Check for deprecated archive_path variable] ****************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Assert usage of systemd as an init system] *****************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Install dependencies] **************************************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus._common : Gather package facts] **************************************************************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus._common : Naive assertion of proper listen address] ******************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus.node_exporter : Assert that used version supports listen address type] ***********************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus.node_exporter : Assert collectors are not both disabled and enabled at the same time] ********************************************************************************

TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert path are set] ***********************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Check existence of TLS cert file] ********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Check existence of TLS key file] *********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert are present] ************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Discover latest version] *****************************************************************************************************************************
skipping: [flink-1]

TASK [Install] ***********************************************************************************************************************************************************************************

TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Gather system user and group facts] ************************************************************************************************************************
ok: [flink-1] => (item=passwd)
ok: [flink-1] => (item=group)

TASK [prometheus.prometheus._common : Create system group node-exp] ******************************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Create system user node-exp] *******************************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Create localhost binary cache path] ************************************************************************************************************************
ok: [flink-1 -> localhost]

TASK [prometheus.prometheus._common : Get checksum list for node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus._common : Download node_exporter-1.8.2.linux-amd64.tar.gz] ***********************************************************************************************************
ok: [flink-1 -> localhost]

TASK [prometheus.prometheus._common : Unpack binary archive node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1 -> localhost]

TASK [prometheus.prometheus._common : Check existence of binary install dir] *********************************************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus._common : Make sure binary install dir exists] ***********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus._common : Propagate binaries] ****************************************************************************************************************************************
changed: [flink-1] => (item=node_exporter)

TASK [SELinux] ***********************************************************************************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Configure] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/configure.yml for flink-1

TASK [Configure] *********************************************************************************************************************************************************************************

TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Create systemd service unit node_exporter] *****************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Create config dir /etc/node_exporter] **********************************************************************************************************************
[WARNING]: failed to look up user node-exp. Create user up to this point in real play
[WARNING]: failed to look up group node-exp. Create group up to this point in real play
changed: [flink-1]

TASK [prometheus.prometheus._common : Install web config for node_exporter] **********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Create textfile collector dir] ***********************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus.node_exporter : Ensure Node Exporter is enabled on boot] *************************************************************************************************************
skipping: [flink-1]

RUNNING HANDLER [prometheus.prometheus.node_exporter : Restart node_exporter] ********************************************************************************************************************
skipping: [flink-1]

PLAY RECAP ***************************************************************************************************************************************************************************************
flink-1                    : ok=26   changed=6    unreachable=0    failed=0    skipped=12   rescued=0    ignored=0   

執行 playbook 指令碼任務,安裝 node_exporter

注意:如果擔心 role 有額外的步驟影響目標節點,可以使用 --step 引數進行安裝,此時執行的指令碼任務每一個任務都需要手動敲如 y/n 進行確認執行。

>_ ansible-playbook -i hosts.yaml install_node_exporter.yaml --step

PLAY [flink] *************************************************************************************************************************************************************************************
Perform task: TASK: Gathering Facts (N)o/(y)es/(c)ontinue: y
....

此處,我們就直接執行安裝了:

>_ ansible-playbook -i hosts.yaml install_node_exporter.yaml       

PLAY [flink] *************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus.node_exporter : Validating arguments against arg spec 'main' - Prometheus Node Exporter] *****************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus.node_exporter : Preflight] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/preflight.yml for flink-1

TASK [Common preflight] **************************************************************************************************************************************************************************

TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Check for deprecated skip_install variable] ****************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Check for deprecated binary_local_dir variable] ************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Check for deprecated archive_path variable] ****************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Assert usage of systemd as an init system] *****************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Install dependencies] **************************************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus._common : Gather package facts] **************************************************************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus._common : Naive assertion of proper listen address] ******************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus.node_exporter : Assert that used version supports listen address type] ***********************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus.node_exporter : Assert collectors are not both disabled and enabled at the same time] ********************************************************************************

TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert path are set] ***********************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Check existence of TLS cert file] ********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Check existence of TLS key file] *********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert are present] ************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Discover latest version] *****************************************************************************************************************************
skipping: [flink-1]

TASK [Install] ***********************************************************************************************************************************************************************************

TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Gather system user and group facts] ************************************************************************************************************************
ok: [flink-1] => (item=passwd)
ok: [flink-1] => (item=group)

TASK [prometheus.prometheus._common : Create system group node-exp] ******************************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Create system user node-exp] *******************************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Create localhost binary cache path] ************************************************************************************************************************
ok: [flink-1 -> localhost]

TASK [prometheus.prometheus._common : Get checksum list for node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus._common : Download node_exporter-1.8.2.linux-amd64.tar.gz] ***********************************************************************************************************
ok: [flink-1 -> localhost]

TASK [prometheus.prometheus._common : Unpack binary archive node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1 -> localhost]

TASK [prometheus.prometheus._common : Check existence of binary install dir] *********************************************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus._common : Make sure binary install dir exists] ***********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus._common : Propagate binaries] ****************************************************************************************************************************************
changed: [flink-1] => (item=node_exporter)

TASK [SELinux] ***********************************************************************************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Configure] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/configure.yml for flink-1

TASK [Configure] *********************************************************************************************************************************************************************************

TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Create systemd service unit node_exporter] *****************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Create config dir /etc/node_exporter] **********************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Install web config for node_exporter] **********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Create textfile collector dir] ***********************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus.node_exporter : Ensure Node Exporter is enabled on boot] *************************************************************************************************************
changed: [flink-1]

RUNNING HANDLER [prometheus.prometheus.node_exporter : Restart node_exporter] ********************************************************************************************************************
changed: [flink-1]

PLAY RECAP ***************************************************************************************************************************************************************************************
flink-1                    : ok=28   changed=8    unreachable=0    failed=0    skipped=10   rescued=0    ignored=0  

檢查安裝效果

我們在目標節點檢視服務列表:

>_ systemctl list-unit-files -t service | grep node_exporter
node_exporter.service                      enabled         disabled

>_ systemctl status node_exporter.service 
● node_exporter.service - Prometheus Node Exporter
     Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; preset: disabled)
     Active: active (running) since Tue 2024-12-10 01:00:12 CST; 7min ago
   Main PID: 5615 (node_exporter)
      Tasks: 6 (limit: 48928)
     Memory: 6.6M
        CPU: 141ms
     CGroup: /system.slice/node_exporter.service
             └─5615 /usr/local/bin/node_exporter --collector.systemd --collector.textfile --collector.textfile.directory=/var/lib/node_exporter --web.listen-address=0.0.0.0:9100 --web.telemetry-path=/metrics

修改 node_exporter 預設配置

我們回到 ansible 管理機上,對於該 node_exporter role 的一些預設配置,可以檢視如下定義檔案:

>_ cat ~/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/defaults/main.yml
---
node_exporter_version: 1.8.2
node_exporter_binary_url: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/\
                           node_exporter-{{ node_exporter_version }}.{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}.tar.gz"
node_exporter_checksums_url: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/sha256sums.txt"

node_exporter_web_disable_exporter_metrics: false
node_exporter_web_listen_address: "0.0.0.0:9100"
node_exporter_web_telemetry_path: "/metrics"

node_exporter_textfile_dir: "/var/lib/node_exporter"

node_exporter_tls_server_config: {}

node_exporter_http_server_config: {}

node_exporter_basic_auth_users: {}

node_exporter_enabled_collectors:
  - systemd
  - textfile:
      directory: "{{ node_exporter_textfile_dir }}"
#  - filesystem:
#      ignored-mount-points: "^/(sys|proc|dev)($|/)"
#      ignored-fs-types: "^(sys|proc|auto)fs$"

node_exporter_disabled_collectors: []

node_exporter_binary_install_dir: "/usr/local/bin"
node_exporter_system_group: "node-exp"
node_exporter_system_user: "{{ node_exporter_system_group }}"

node_exporter_config_dir: "/etc/node_exporter"
# Local path to stash the archive and its extraction
node_exporter_local_cache_path: "/tmp/node_exporter-{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}/{{ node_exporter_version }}"

可以看到,node_exporter role 的預設變數值是在這裡定義的。

修改安裝變數值,可以有如下幾處地方進行設定:

方法一:在 playbook 指令碼檔案統一指定 vars 變數,該變數會覆蓋 role 的預設變數值

cat install_node_exporter.yaml
- hosts: flink
  collections:
    - prometheus.prometheus
  tasks:
    - import_role: 
        name: node_exporter
  vars:
    node_exporter_enabled_collectors:
      - systemd
      - textfile:
          directory: "{{ node_exporter_textfile_dir }}"
      - filesystem:
          ignored-mount-points: "^/(sys|proc|dev)($|/)"
          ignored-fs-types: "^(sys|proc|auto)fs$"

方法二:修改 inventory 定義檔案中,host group 的 vars 變數,或者單獨某一個節點的 vars 變數值

>_ cat hosts.yaml                
# game team test
flink:
  hosts:
    flink-1:
      ansible_host: 192.168.22.174
  vars:
    ansible_ssh_user: root
    ansible_ssh_password: ****
    node_exporter_enabled_collectors:
      - systemd
      - textfile:
          directory: "{{ node_exporter_textfile_dir }}"
      - filesystem:
          ignored-mount-points: "^/(sys|proc|dev)($|/)"
          ignored-fs-types: "^(sys|proc|auto)fs$"

相關文章