Linux PSI--Pressure Stall Information

yooooooo發表於2024-07-18

原文網址 : https://www.cnblogs.com/linhaostudy/p/18310233

Google在在Android11及之後版本的LMKD中，使用了psi作為殺程序的策略，本文簡單介紹下psi。

轉載自使用PSI（Pressure Stall Information）監控伺服器資源_Linux_gameneedless_InfoQ寫作社群

1.概述

當 CPU、記憶體或 IO 裝置爭奪激烈的時候，系統會出現負載的延遲峰值、吞吐量下降，並可能觸發核心的 OOM Killer。PSI(Pressure Stall Information) 字面意思就是由於資源（CPU、記憶體和 IO）壓力造成的任務執行停頓。PSI 量化了由於硬體資源緊張造成的任務執行中斷，統計了系統中任務等待硬體資源的時間。我們可以用 PSI 作為指標，來衡量硬體資源的壓力情況。停頓的時間越長，說明資源面臨的壓力越大。

如果持續監控 PSI 指標並繪製變化曲線圖，可以發現吞吐量下降與資源短缺的關係，讓使用者在資源變得緊張前，採取更主動的措施，例如將任務遷移到其他伺服器，殺死低優先順序的任務等。

這允許最大限度地提高硬體利用率，而不會犧牲工作負載的健康狀況或冒著諸如 OOM 終止等重大中斷的風險。

2.pressure 檔案介面

CPU、記憶體和 IO 的壓力資訊匯出到了 /proc/pressure/ 目錄下對應的檔案，你可以使用 cat 命令查詢資源的壓力統計資訊：

$ cat /proc/pressure/cpu 
some avg10=0.03 avg60=0.07 avg300=0.06 total=8723835

$ cat /proc/pressure/io 
some avg10=0.00 avg60=0.00 avg300=0.00 total=56385169
full avg10=0.00 avg60=0.00 avg300=0.00 total=54915860

$ cat /proc/pressure/memory 
some avg10=0.00 avg60=0.00 avg300=0.00 total=149158
full avg10=0.00 avg60=0.00 avg300=0.00 total=34054

記憶體和 IO 顯示了兩行指標：some 和 full，CPU 只有一行指標 some。關於 some 和 full 的定義下一節解釋。

2.1 some 和full

some 指標說明一個或多個任務由於等待資源而被停頓的時間百分比。在下圖的例子中，在最近的 60 秒內，任務 A 的執行沒有停頓，而由於記憶體緊張，任務 B 在執行過程中花了 30 秒等待記憶體，則 some 的值為 50%。



some 表明了由於缺乏資源而造成至少一個任務的停頓。

full 指標表示所有的任務由於等待資源而被停頓的時間百分比。在下圖的例子中，在最近的 60 秒內，任務 B 等待了 30 秒的記憶體，任務 A 等待了 10 秒記憶體，並且和任務 B 的等待時間重合。在這個重合的時間段 10 秒內，任務 A 和任務 B 都在等待記憶體，結果是 some 指標為 50%，full 指標為 10/60 = 16.66%。



full 表明了總吞吐量的損失，在這種狀態下，所有任務都在等待資源，CPU 週期將被浪費。

請注意，some 和 full 的計算是用整個時間視窗內累計的等待時間，等待時間可以是連續的，也可能是離散的。

理解了 some 和 full 的含義，就明白了 CPU 為什麼沒有 full 指標，因為不可能所有的任務都同時餓死在 CPU 上，CPU 總是在執行一個任務。



3.PSI 閾值監控

使用者可以向 PSI 註冊觸發器，在資源壓力超過自定義的閾值時獲得通知。一個觸發器定義了特定時間視窗內最大累積停頓時間，例如，在任何 500ms 的視窗內，累計 100ms 的停頓時間會產生一個通知事件。

如何向 PSI 註冊觸發器呢？開啟 /proc/pressure/ 目錄下資源對應的 PSI 介面檔案，寫入想要的閾值和時間視窗，然後在開啟的檔案描述符上使用 select()、poll() 或 epoll() 方法等待通知事件。寫入 PSI 介面檔案的資料格式為：

<some|full> <停頓閾值> <時間視窗>

閾值和時間視窗的單位都是微秒（us）。核心接受的視窗大小範圍為 500ms 到 10 秒。

舉個例子，向 /proc/pressure/io 寫入 "some 500000 1000000"，代表著在任何 1 秒的時間視窗內，如果一個或多個程序因為等待 IO 而造成的時間停頓超過了閾值 500ms，將觸發通知事件。

當用於定義觸發器的 PSI 介面檔案描述符被關閉時，觸發器將被取消註冊。

我們透過一個例子演示觸發器的使用：

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <poll.h>
#include <string.h>
#include <unistd.h>

int main() {
      const char trig[] = "some 500000 1000000";
      struct pollfd fds;
      int n;

      fds.fd = open("/proc/pressure/io", O_RDWR | O_NONBLOCK);
      if (fds.fd < 0) {
              printf("/proc/pressure/io open error: %s\n",
                      strerror(errno));
              return 1;
      }
      fds.events = POLLPRI;

      if (write(fds.fd, trig, strlen(trig) + 1) < 0) {
              printf("/proc/pressure/io write error: %s\n",
                      strerror(errno));
              return 1;
      }

      printf("waiting for events...\n");
      while (1) {
              n = poll(&fds, 1, -1);
              if (n < 0) {
                      printf("poll error: %s\n", strerror(errno));
                      return 1;
              }
              if (fds.revents & POLLERR) {
                      printf("got POLLERR, event source is gone\n");
                      return 0;
              }
              if (fds.revents & POLLPRI) {
                      printf("event triggered!\n");
              } else {
                      printf("unknown event received: 0x%x\n", fds.revents);
                      return 1;
              }
      }

      return 0;
}

在伺服器上編譯並執行該程式，如果當前伺服器比較空閒，我們會看到程式一直在等待 IO 壓力超過閾值的通知：

$ sudo ./monitor 
waiting for events...

我們為伺服器製造點 IO 壓力，生成一個 5G 大小的檔案：

$ dd if=/dev/zero of=/home/mazhen/testfile bs=4096 count=1310720

再回到示例程式的執行視窗，會發現已經收到事件觸發的通知：

$ sudo ./monitor
waiting for events...
event triggered!
event triggered!
event triggered!
event triggered!
event triggered!
...

4.PSI應用案例

Facebook 是因為一些實際的需求開發了 PSI。其中一個案例是為了避免核心 OOM(Out-Of-Memory) killer 的觸發。

應用在申請記憶體的時候，如果沒有足夠的 free 記憶體，可以透過回收 Page Cache 釋放記憶體，如果這時 free 記憶體還是不夠，就會觸發核心的 OOM Killer，挑選一個程序 kill 掉釋放記憶體。這個過程是同步的，申請分配記憶體的程序一直被阻塞等待，而且核心選擇 kill 掉哪個程序釋放記憶體，使用者不可控。因此，Facebook 開發了使用者空間的 OOM Killer 工具 oomd。

oomd 使用 PSI 閾值作為觸發器，在記憶體壓力增加到一定程度時，執行指定的動作，避免最終 OOM 的發生。oomd 作為第一道防線，確保伺服器工作負載的健康，並能自定義複雜的清除策略，這些都是核心做不到的。

5.cgroup v2

當開啟kernel的配置CONFIG_CGROUP=y且掛載cgroup2檔案系統的時候，就可以跟蹤cgroup 內任務的 PSI，這樣就可以知道容器內 CPU、記憶體和 IO 的真實壓力情況，進行更精細化的容器排程，在資源利用率最大化的同時保證任務的延遲和吞吐量。

每個子目錄中包含cpu.pressure, memory.pressure, 和 io.pressure files，格式與/proc/pressure/檔案一致。

Tree – Information Theory
2018-05-23
ORM
[Information Security] What is WEP
2022-11-24
ORM
System Volume Information是什麼檔案 System Volume Information可以刪除嗎
2022-03-29
ORM
information_schema的結構
2018-07-26
ORM
Python | 資訊熵 Information Entropy
2024-03-09
Python熵ORM
information_schema.innodb_metrics表
2020-01-04
ORM
學習筆記《Information Entropy》
2018-03-03
筆記ORM
Qt QMessageBox::information 自定義按鈕
2024-03-06
QTORM
MYSQL中information_schema簡介
2020-07-09
MySqlORM
How to Dump Redo Log File Information --metalink
2019-06-27
ORM
Linux虛擬機器配置IP時提示：determining ip information for ip xxx.xxx.xxx.xxx address is already
2022-06-09
Linux虛擬機ORM
Variation information（資訊差異指標）
2019-12-11
ORM指標
MySQL 進階：INFORMATION_SCHEMA 簡介
2019-10-30
MySqlORM
Git之提示There is no tracking information for the current branch.
2020-11-17
GitORM
CF466E Information Graph 題解
2024-07-19
ORM
[20220128]Check the datapump file header information in Oracle.txt
2022-01-28
HeaderORMOracle
MySQL information_schema 系統庫介紹
2021-08-23
MySqlORM
information_schema.innodb_trx 查詢堆疊
2022-08-26
ORM
初相識 | 全方位認識 information_schema
2018-11-16
ORM
mysql的mysql.event和information_schema.events
2019-03-13
MySqlORM
Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql)
2020-01-20
ORMSQL
資訊系統規劃（Information System Planning, ISP)
2019-06-26
ORM
mysql的 information_schema 資料庫介紹
2024-11-06
MySqlORM資料庫
Normalized Mutual Information(NMI, 歸一化互資訊)
2024-10-30
ORMZed
論文閱讀 Exploring Temporal Information for Dynamic Network Embedding
2022-06-25
ORM
MySQL使用小技巧（information_schema表空間）
2021-01-22
MySqlORM
Hadoop之JPS命令及真假現象(process information unavailable)
2018-05-19
HadoopORMAI
自己實現一個SAP WebClient UI Repository Information System
2020-09-01
WebclientUIORM
MySQL預設資料庫之 information_schema庫
2019-07-21
MySql資料庫ORM
深度變分資訊瓶頸——Deep Variational Information Bottleneck
2024-11-28
ORM
MySQL information_schema.columns表查詢慢原因分析
2022-02-06
MySqlORM
[論文閱讀筆記] Adversarial Learning on Heterogeneous Information Networks
2021-06-05
筆記ORM
潛在威脅資訊模型(PTIM)-Potential threats Information Modeling
2023-11-22
模型ORM
如何使用mysql 5.6 information schema定位事務鎖資訊
2019-11-08
MySqlORM
文獻復現——A New Geometric Mean FMEA Method Based on Information Quality
2020-11-06
ORM
oozie.action.hadoop.LauncherException: IO error Connection timed out: no further information
2020-10-20
HadoopExceptionErrorORM
Information Retrieval（資訊檢索）筆記02：Preprocessing and Tolerant Retrieval
2020-09-27
ORM筆記
The Information：OpenAI的年營收翻倍達到34億美元
2024-06-13
ORMOpenAI營收

Linux PSI--Pressure Stall Information

1.概述

2.pressure 檔案介面

2.1 some 和full

相關文章