Postgresql在ubuntu 22.04上遭遇OOM的處理方法
OOM機制就是kill那些佔用記憶體多且優先順序低的程式以此來保證作業系統核心的正常運轉,一旦我們關閉OOM可能會導致作業系統核心奔潰。
Linux kernel uses the badness heuristic to select which process gets killed in out of memory conditions.
Linux 核心使用不良探索式來選擇在記憶體不足的情況下終止哪個程式。
涉及兩個重要引數
oom_score
可以簡單理解oom_score=記憶體消耗/總記憶體 *1000,也就是badneess分數,最高的會被kill掉
The badness heuristic assigns a value to each candidate task ranging from 0 (never kill) to 1000 (always kill) to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500.
不良探索式為每個候選任務分配一個從 0(從不殺死)到 1000(總是殺死)的值,以確定哪個程式是目標。這些單位大致是程式可以根據其當前記憶體和交換使用的估計進行分配的允許記憶體範圍的比例。例如,如果某個任務使用了所有允許的記憶體,則其不良分數將為 1000。如果它使用了允許的記憶體的一半,則其分數將為 500。
oom_score_adj
The adjust score value is added to the badness score before it is used to determine which task to kill. Acceptable values range from -1000 to +1000. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, -1000, is equivalent to disabling oom killing entirely for that task since it will always report a badness score of 0.
Setting an adjust score value of +500, for example, is roughly equivalent to allowing the remainder of tasks sharing the same system, cpuset, mempolicy, or memory controller resources to use at least 50% more memory. A value of -500, on the other hand, would be roughly equivalent to discounting 50% of the task’s allowed memory from being considered as scoring against the task.
調整分數值會先新增到不良分數中,然後再用於確定要終止哪個任務。可接受的值範圍為 -1000 到 +1000(建議值越小,程式被殺的機會越低。如果將其設定為 -1000 時,程式將被禁止殺掉。)。這允許使用者空間透過始終優先選擇某個任務或完全禁用它來極化 oom 終止的偏好。最低可能值 -1000,相當於完全禁用該任務的 oomkilling,因為它總是報告 0 的壞度分數。
例如,將調整分值設定為 +500 大致相當於允許共享相同系統、cpuset、mempolicy 或記憶體控制器資源的其餘任務使用至少 50% 以上的記憶體。另一方面,值 -500 大致相當於將任務允許記憶體的 50% 打折扣,不將其視為針對任務的評分。
OOM的存在是為了保證作業系統核心的正常執行
https://www.oracle.com/technical-resources/articles/it-infrastructure/dev-oom-killer.html
The Linux kernel allocates memory upon the demand of the applications running on the system. Because many applications allocate their memory up front and often don't utilize the memory allocated, the kernel was designed with the ability to over-commit memory to make memory usage more efficient. This over-commit model allows the kernel to allocate more memory than it actually has physically available. If a process actually utilizes the memory it was allocated, the kernel then provides these resources to the application. When too many applications start utilizing the memory they were allocated, the over-commit model sometimes becomes problematic and the kernel must start killing processes in order to stay operational. The mechanism the kernel uses to recover memory on the system is referred to as the out-of-memory killer or OOM killer for short.
Linux 核心根據系統上執行的應用程式的需求分配記憶體。由於許多應用程式預先分配記憶體並且通常不利用分配的記憶體,因此核心設計為能夠過度使用記憶體以使記憶體使用更有效。這種過度使用模型允許核心分配比實際可用的記憶體更多的記憶體。如果程式實際使用了為其分配的記憶體,則核心會將這些資源提供給應用程式。當太多應用程式開始使用為其分配的記憶體時,過度提交模型有時會出現問題,並且核心必須開始終止程式才能保持執行。核心用於恢復系統記憶體的機制稱為記憶體不足殺手或簡稱 OOM 殺手。
檢視伺服器是否禁用了OOM機制,執行sysctl -a |grep panic_on_oom,如果vm.panic_on_oom=0就表示開啟,如果想禁用的話執行vim /etc/sysctl.conf,修改vm.panic_on_oom = 1(1表示關閉,預設為0表示開啟OOM),再執行sysctl -p
Postgresql在ubuntu上遭遇OOM的一個例子
伺服器本身的記憶體和swap資訊
root@PGD001:~# free -m total used free shared buff/cache available Mem: 32058 19815 2984 4956 9258 6822 Swap: 4095 12 4083
dmesg命令看到資訊如下
root@PGD001:~# dmesg -T |grep postgres [Wed Nov 15 20:31:32 2023] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=system-postgresql.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/mountdatadomaindir.service,task=bash,pid=2392236,uid=0 [Wed Nov 15 20:31:33 2023] Out of memory: Killed process 2627 (postgres) total-vm:37766764kB, anon-rss:24965976kB, file-rss:2476kB, shmem-rss:2224896kB, UID:115 pgtables:63852kB oom_score_adj:-900 [Wed Nov 15 20:31:36 2023] oom_reaper: reaped process 2627 (postgres), now anon-rss:0kB, file-rss:0kB, shmem-rss:2224896kB
備註:anon-rss表示anonymous resident set size匿名駐留集
egrep看到OS錯誤日誌資訊如下,發現很多服務都被oom-kill掉了
root@PGD001:~# egrep -i -r 'killed process' /var/log/syslog Nov 15 20:31:34 PGD001 kernel: [1097200.699832] Out of memory: Killed process 2392264 (centrifydc) total-vm:4228kB, anon-rss:156kB, file-rss:1016kB, shmem-rss:0kB, UID:0 pgtables:48kB oom_score_adj:0 Nov 15 20:31:34 PGD001 kernel: [1097200.788886] Out of memory: Killed process 2392236 (bash) total-vm:7368kB, anon-rss:240kB, file-rss:744kB, shmem-rss:0kB, UID:0 pgtables:60kB oom_score_adj:0 Nov 15 20:31:34 PGD001 kernel: [1097200.947275] Out of memory: Killed process 872 (systemd-timesyn) total-vm:89356kB, anon-rss:128kB, file-rss:0kB, shmem-rss:0kB, UID:104 pgtables:72kB oom_score_adj:0 Nov 15 20:31:34 PGD001 kernel: [1097201.246575] Out of memory: Killed process 2392239 (boostfs) total-vm:9584kB, anon-rss:256kB, file-rss:216kB, shmem-rss:0kB, UID:0 pgtables:52kB oom_score_adj:0 Nov 15 20:31:34 PGD001 kernel: [1097201.248859] Out of memory: Killed process 2627 (postgres) total-vm:37766764kB, anon-rss:24965976kB, file-rss:2476kB, shmem-rss:2224896kB, UID:115 pgtables:63852kB oom_score_adj:-900
OS錯誤日誌記錄postgresql的資訊如下
root@PGD001:~# vim /var/log/syslog Nov 15 20:31:34 PGD001 kernel: [1097200.788558] Tasks state (memory values in pages): Nov 15 20:31:34 PGD001 kernel: [1097200.788559] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Nov 15 20:31:34 PGD001 kernel: [1097200.788606] [ 1033] 115 1033 2169621 34143 536576 453 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788613] [ 1096] 115 1096 18294 356 118784 471 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788616] [ 1097] 115 1097 2169737 1004636 10375168 477 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788619] [ 1098] 115 1098 2169666 21642 315392 484 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788621] [ 1106] 115 1106 2169621 4454 172032 482 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788624] [ 1107] 115 1107 2170049 704 188416 495 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788626] [ 1108] 115 1108 2169647 369 139264 467 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788630] [ 1109] 115 1109 2170018 552 159744 494 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788633] [ 2627] 115 2627 9441691 6798507 65384448 989683 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788636] [2377149] 115 2377149 2171112 6198 315392 468 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788639] [2377150] 115 2377150 2171136 6728 315392 410 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788643] [2382504] 115 2382504 2172165 7311 323584 394 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788649] [2390327] 115 2390327 2172158 7270 323584 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788655] [2392025] 115 2392025 2240168 194775 3178496 371 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788659] [2392066] 115 2392066 2173236 43043 1736704 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788663] [2392076] 115 2392076 2193878 120763 2428928 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788665] [2392080] 115 2392080 2243484 197961 3190784 371 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788668] [2392093] 115 2392093 2226648 154194 2289664 371 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788671] [2392095] 115 2392095 2224712 63962 1990656 373 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788673] [2392110] 115 2392110 2241676 167738 2383872 373 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788676] [2392114] 115 2392114 2173331 41251 1609728 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788678] [2392115] 115 2392115 2170462 6530 417792 378 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788680] [2392116] 115 2392116 2174010 40685 1613824 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788682] [2392117] 115 2392117 2172912 35250 1581056 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788684] [2392124] 115 2392124 2172228 34549 1527808 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788687] [2392127] 115 2392127 2171267 9574 462848 377 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788689] [2392128] 115 2392128 2202943 240863 3563520 372 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788691] [2392140] 115 2392140 2193312 118464 2379776 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788695] [2392141] 115 2392141 2193830 121733 2387968 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788698] [2392142] 115 2392142 2193614 118047 2404352 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788701] [2392143] 115 2392143 2193945 118742 2387968 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788703] [2392144] 115 2392144 2173072 25293 753664 377 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788706] [2392145] 115 2392145 2173311 25541 753664 377 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788708] [2392150] 115 2392150 2170682 9356 434176 378 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788711] [2392160] 115 2392160 2171583 13385 655360 377 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788713] [2392162] 115 2392162 2171217 27180 1327104 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788715] [2392163] 115 2392163 2170622 9180 462848 377 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788717] [2392171] 115 2392171 2171229 25619 1241088 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788720] [2392173] 115 2392173 2170552 6687 376832 378 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788722] [2392174] 115 2392174 2170484 6309 385024 378 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788725] [2392176] 115 2392176 2171478 11652 610304 378 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788727] [2392177] 115 2392177 2170490 7124 425984 377 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788731] [2392178] 115 2392178 2170710 11417 561152 378 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788734] [2392179] 115 2392179 2170753 10507 548864 378 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788737] [2392180] 115 2392180 2170563 8826 471040 377 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788740] [2392181] 115 2392181 2170563 8319 471040 377 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788743] [2392182] 115 2392182 2171492 12312 598016 377 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788747] [2392184] 115 2392184 2171367 17877 815104 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788750] [2392185] 115 2392185 2171227 9037 430080 378 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788753] [2392195] 115 2392195 2170402 6457 413696 378 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788757] [2392197] 115 2392197 2171241 16841 843776 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788761] [2392200] 115 2392200 2174518 71413 1560576 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788764] [2392201] 115 2392201 2171813 16907 835584 376 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788766] [2392202] 115 2392202 2170446 6224 323584 379 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788769] [2392215] 115 2392215 2171139 13606 651264 377 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788773] [2392218] 115 2392218 2197655 6652 512000 405 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788777] [2392219] 115 2392219 2197655 6768 512000 405 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788780] [2392233] 115 2392233 2170461 6442 327680 378 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788786] [2392237] 115 2392237 2170063 977 196608 382 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788793] [2392240] 115 2392240 2170063 913 196608 390 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788797] [2392241] 115 2392241 2170063 690 155648 390 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788801] [2392242] 115 2392242 2170063 958 172032 390 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788804] [2392243] 115 2392243 2170063 1174 196608 388 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788807] [2392244] 115 2392244 2170063 1117 196608 388 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788810] [2392245] 115 2392245 2170063 716 155648 390 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788813] [2392246] 115 2392246 2170063 958 196608 388 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788816] [2392247] 115 2392247 2170063 909 196608 388 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788819] [2392249] 115 2392249 2169653 588 118784 406 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788822] [2392250] 115 2392250 2169621 441 118784 407 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788825] [2392251] 115 2392251 2170063 968 196608 388 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788828] [2392252] 115 2392252 2169653 403 118784 406 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788831] [2392253] 115 2392253 2169621 536 118784 407 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788834] [2392254] 115 2392254 2169621 299 118784 407 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788837] [2392255] 115 2392255 2170063 1316 217088 382 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788841] [2392256] 115 2392256 2170053 974 200704 420 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788845] [2392258] 115 2392258 2169621 367 118784 432 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788847] [2392259] 115 2392259 2169653 557 118784 407 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788850] [2392260] 115 2392260 2169621 349 118784 433 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788853] [2392261] 115 2392261 2169621 535 118784 435 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788855] [2392262] 115 2392262 2169621 360 118784 433 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788858] [2392263] 115 2392263 2169621 299 118784 407 -900 postgres Nov 15 20:31:34 PGD001 kernel: [1097200.788860] [2392265] 115 2392265 2169621 364 118784 447 -900 postgres
Postgresql 被oom後,重新啟動postgresql後
檢視postgresql服務的oom_score和oom_score_adj值
root@PGD001:~# ps -ef|grep postgres |grep PGDATA postgres 2393324 1 0 02:01 ? 00:00:27 /usr/lib/postgresql/15/bin/postgres -D /PGDATA root@PGD001:~# cat /proc/2393324/oom_score_adj -900 root@PGD001:~# cat /proc/2393324/oom_score 70
檢視作業系統級別的引數
root@DAILACHDBUD001:~# sysctl -a |grep kernel.shmmax kernel.shmmax = 18446744073692774399 root@DAILACHDBUD001:~# sysctl -a |grep kernel.shmall kernel.shmall = 18446744073692774399 root@DAILACHDBUD001:~# sysctl -a |grep kernel.shmmni kernel.shmmni = 4096 root@PGD001:~# cat /etc/security/limits.conf |grep -v "#" * soft nofile 1024 * hard nofile 2048 * soft nproc 1024 * hard nproc 2048
檢視postgresql資料庫級別的引數
postgres=# show shared_buffers; shared_buffers ---------------- 8GB postgres=# show max_connections; max_connections ----------------- 200 postgres=# show work_mem; work_mem ---------- 4MB postgres=# show temp_buffers; temp_buffers -------------- 8MB postgres=# show maintenance_work_mem; maintenance_work_mem ---------------------- 64MB postgres=# show autovacuum_work_mem; autovacuum_work_mem --------------------- -1 postgres=# show autovacuum_max_workers; autovacuum_max_workers ------------------------ 3
備註:
shared_buffers:設定資料庫伺服器將使用的共享記憶體緩衝區量。如果有一個專用的 1GB 或更多記憶體的資料庫伺服器,一個合理的shared_buffers開始值是系統記憶體的25%
work_mem:指定在寫到臨時磁碟檔案之前被內部排序操作和雜湊表使用的記憶體量。該值預設為四兆位元組(4MB)
temp_buffers:設定每個資料庫會話使用的臨時緩衝區的最大數目。這些都是會話的本地緩衝區,只用於訪問臨時表。預設是8MB
autovacuum_work_mem指定每個自動清理工作者程式能使用的最大記憶體量。其預設值為-1表示轉而使用maintenance_work_mem的值,當自動清理執行時,可能會分配最多達這個記憶體的autovacuum_max_workers倍
檢視當前程式佔用記憶體的資訊
root@PGD001:~# smem -t -r -a | head -20 PID User Command Swap USS PSS RSS 2394448 postgres postgres: 15/main: veeamuser VeeamBackupReporting 172.22.137.89(50228) idle 380 20701320 21192492 21812648 2393326 postgres postgres: 15/main: checkpointer 44 842436 1842987 3047168 3846886 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58846) idle 380 256848 567859 1011720 3846852 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58654) SELECT 380 130864 308700 619768 2392350 root /opt/emc/boostfs/bin/boostfs mount /mnt/datadomaindir -d DAILADD01.dai.netdai.com -s daipostgres -o allow-others=true 3888 292596 292670 294968 3846901 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58926) idle 380 87968 149992 386656 3846903 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58928) idle 380 76084 137657 373052 3846877 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58815) idle 380 26056 126282 387476 2393324 postgres /usr/lib/postgresql/15/bin/postgres -D /PGDATA 44 52700 77518 200092 3846889 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58860) idle 380 25172 57457 205336 633 root /sbin/multipathd -d -s 0 22712 23409 27924 3846902 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58927) idle 380 11204 23353 132416 961 root /usr/lib/snapd/snapd 2132 18836 18876 20848 3846890 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58861) idle 380 8552 18772 117992 3846899 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58914) idle 380 7340 18020 119676 2393327 postgres postgres: 15/main: background writer 80 336 17668 91248 3846898 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58913) idle 380 6092 16874 115916 3846904 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58930) idle 380 7024 15296 104320 3846908 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58957) idle 380 11440 14047 66592
root@PGD001:~# smem -u -p -a User Count Swap USS PSS RSS systemd-timesync 1 0.00% 0.00% 0.00% 0.02% messagebus 1 0.01% 0.00% 0.00% 0.02% systemd-network 1 0.00% 0.01% 0.01% 0.02% syslog 1 0.00% 0.01% 0.01% 0.02% systemd-resolve 1 0.00% 0.02% 0.02% 0.03% root 21 0.22% 1.19% 1.25% 1.56% postgres 43 0.34% 68.76% 75.26% 90.34%
root@PGD001:~# top ... PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2394448 postgres 20 0 27.0g 21.0g 2.4g S 0.0 67.1 4100:15 postgres 2393326 postgres 20 0 8678924 2.9g 2.9g S 0.0 9.3 10:32.81 postgres 3855468 postgres 20 0 8807032 846728 818960 S 0.0 2.6 0:00.73 postgres 3855453 postgres 20 0 8877256 587404 492304 S 0.0 1.8 0:07.70 postgres 3855466 postgres 20 0 8778480 383144 287312 S 20.5 1.2 0:02.36 postgres 3855465 postgres 20 0 8775984 380292 288272 S 2.6 1.2 0:01.85 postgres 3855464 postgres 20 0 8774756 376020 287796 S 3.0 1.1 0:01.97 postgres 3855481 postgres 20 0 8703792 371900 345996 S 22.8 1.1 0:01.11 postgres 2392350 root 20 0 1182364 292852 2700 S 0.0 0.9 54:37.50 boostfs 2393324 postgres 20 0 8678484 200092 197304 S 0.0 0.6 24:03.58 postgres 2393327 postgres 20 0 8678648 90992 88152 S 0.0 0.3 0:27.56 postgres 3855473 postgres 20 0 8687624 76032 69008 S 0.7 0.2 0:00.07 postgres 3855483 postgres 20 0 8694496 66440 52784 S 0.0 0.2 0:00.04 postgres 3855463 postgres 20 0 8682872 59528 54192 S 0.3 0.2 0:00.06 postgres 3855482 postgres 20 0 8681836 42672 38304 S 0.0 0.1 0:00.03 postgres 3855471 postgres 20 0 8681720 39628 35356 S 0.0 0.1 0:00.02 postgres 3799836 postgres 20 0 8684528 39412 32052 S 0.0 0.1 0:18.87 postgres 3854094 postgres 20 0 8684520 39336 32008 S 0.0 0.1 0:00.86 postgres 3746080 postgres 20 0 8684644 39300 31820 S 0.0 0.1 0:35.88 postgres 3855475 postgres 20 0 8681652 38492 34096 S 0.0 0.1 0:00.02 postgres
備註:
VIRT:程式佔用的虛擬記憶體空間大小,包含了在已經對映到實體記憶體空間的部分和尚未對映到實體記憶體空間的部分總和。VIRT是virtual memory usage虛擬記憶體的縮寫,虛擬記憶體是一個假象的記憶體空間,在程式執行過程中虛擬記憶體空間中需要被訪問的部分會被對映到實體記憶體空間中。虛擬記憶體空間大隻能表示程式執行過程中可訪問的空間比較大,不代表實體記憶體空間佔用也大,VIRT = SWAP + RES
RES:程式佔用的虛擬記憶體空間中已經對映到實體記憶體空間的那部分的大小。看程式在執行過程中佔用了多少記憶體應該看RES的值而不是VIRT的值。RES是resident memory usage常駐記憶體的縮寫,常駐記憶體就是程式實實在在佔用的實體記憶體。一般我們所講的程式佔用了多少記憶體,其實就是說的佔用了多少常駐記憶體而不是多少虛擬記憶體。
SHR:SHR是share(共享)的縮寫,表示程式佔用的共享記憶體大小,共享記憶體就是被多個程式所共享的記憶體,比如動態庫libc.so佔用的記憶體就是共享記憶體,因為這個共享記憶體可能被很多不同會話使用,但是這些會話都會去呼叫libc.so
VSS:Virtual Set Size是程式向系統申請的虛擬記憶體,和VIRT一樣
RSS:Resident Set Size是程式在 RAM 中實際儲存的總記憶體,和RES一樣
PSS:Proportional Set Size是單個程式執行時實際佔用的實體記憶體
USS:Unique Set Size是程式獨自佔用的實體記憶體
檢視資料庫當前會話資訊
postgres=# show idle_session_timeout ; idle_session_timeout ---------------------- 0 postgres=# select count(*) from pg_stat_activity where state='idle'; count ------- 45
postgres=# select pid,usename,datname,client_addr,state from pg_stat_activity; pid | usename | datname | client_addr | state ---------+-----------+----------------------+---------------+-------- 2393349 | | | | 2393351 | postgres | | | 3931348 | postgres | postgres | 172.22.138.94 | idle 3949134 | veeamuser | VeeamBackup | 172.22.137.89 | idle 3949093 | veeamuser | VeeamBackup | 172.22.137.89 | idle 3949135 | veeamuser | VeeamBackup | 172.22.137.89 | idle 2394448 | veeamuser | VeeamBackupReporting | 172.22.137.89 | idle 3854094 | postgres | postgres | 172.22.138.94 | idle 3949102 | veeamuser | VeeamBackup | 172.22.137.89 | idle 3949103 | veeamuser | VeeamBackup | 172.22.137.89 | active 3949127 | veeamuser | VeeamBackup | 172.22.137.89 | idle 3949132 | veeamuser | VeeamBackup | 172.22.137.89 | idle 3949133 | veeamuser | VeeamBackup | 172.22.137.89 | idle .. 3949108 | veeamuser | VeeamBackup | 172.22.137.89 | idle 3949109 | veeamuser | VeeamBackup | 172.22.137.89 | idle
root@DAILAPGDBUP001:~# ps -ef|grep postgres ... postgres 3906865 2393324 0 09:40 ? 00:00:12 postgres: 15/main: postgres postgres 172.22.138.94(52002) idle postgres 3931348 2393324 0 15:17 ? 00:00:04 postgres: 15/main: postgres postgres 172.22.138.94(54154) idle postgres 3949113 2393324 24 19:06 ? 00:00:15 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(51688) idle postgres 3949122 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(51913) idle postgres 3949123 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(51970) idle postgres 3949179 2393324 27 19:06 ? 00:00:08 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52262) SELECT postgres 3949182 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52282) idle postgres 3949186 2393324 2 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52309) idle ... postgres 3949187 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52313) idle
分析:物理32GB的情況下,OOM時捕獲的postgresql最大所需記憶體居然達total-vm:37766764kB,檢查發現postgresql資料庫級別的記憶體引數設定都是合理的,並且postgresql的被OOM級別很低值為-900(-1000的話就不會被核心OOM)。postgresql活動的時候查詢到postgresql資料庫服務會佔用作業系統70%-90%的記憶體,而且OOM時發現發現不僅僅是postgres資料庫伺服器其他很多服務也都被oom-kill掉了,那麼應該是作業系統級別引數kernel.shmmax和kernel.shmall值可能不太合適,而且太多會話idle的情況下,記憶體還是很大,可能idle會話超時時間idle_session_timeout的設定也不太合理,swap值為4GB也不太合適
為避免再次被oom掉,採取如下措施
1、設定kernel.shmmax值17179869184為實體記憶體的一半,設定kernel.shmall值為4194304=shmmax/page_size
root@PGD001:~# vim /etc/sysctl.conf kernel.shmmax=17179869184 kernel.shmall=4194304 root@PGD001:~# sysctl -p root@PGD001:~# sysctl -a |grep kernel.shmmax kernel.shmmax = 17179869184 root@PGD001:~# sysctl -a |grep kernel.shmall kernel.shmall = 4194304 root@PGD001:~# sysctl -a |grep kernel.shmmni kernel.shmmni = 4096 root@PGD001:~# ipcs -lm ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 16777216 max total shared memory (kbytes) = 16777216 min seg size (bytes) = 1
2、設定idle_session_timeout=8h
postgres=# alter system set idle_session_timeout='8h'; ALTER SYSTEM postgres=# select pg_reload_conf(); pg_reload_conf ---------------- t postgres=# show idle_session_timeout; idle_session_timeout ---------------------- 8h
3、設定swap為實體記憶體的1倍即32GB
root@PGD001:~# free -m total used free shared buff/cache available Mem: 32058 20278 2195 4967 9583 6349 Swap: 4095 12 4083 root@PGD001:~# swapon -s Filename Type Size Used Priority /swap.img file 4194300 13120 -2 root@PGD001:/# cat /etc/fstab |grep swap /swap.img none swap sw 0 0 root@PGD001:/# ll /swap.img -rw------- 1 root root 4294967296 Sep 6 2022 /swap.img root@PGD001:/# fallocate -l 4G /swap1.img root@PGD001:/# chmod 600 /swap1.img root@PGD001:/# ll /swap1.img -rw------- 1 root root 4294967296 Nov 22 22:51 /swap1.img root@PGD001:/# mkswap /swap1.img Setting up swapspace version 1, size = 4 GiB (4294963200 bytes) no label, UUID=85b78962-8bae-48d8-a5c0-e30903b7b8d6 root@PGD001:/# swapon /swap1.img root@PGD001:/# free -m total used free shared buff/cache available Mem: 32058 20102 2325 4967 9630 6525 Swap: 8191 12 8179 root@PGD001:/# swapon -s Filename Type Size Used Priority /swap.img file 4194300 13120 -2 /swap1.img file 4194300 0 -3 root@PGD001:/# swapoff -v /swap.img swapoff /swap.img root@PGD001:/# swapon -s Filename Type Size Used Priority /swap1.img file 4194300 0 -2 root@PGD001:/# free -m total used free shared buff/cache available Mem: 32058 20149 2276 4969 9632 6476 Swap: 4095 0 4095 root@PGD001:/# fallocate -l 32G /swap.img root@PGD001:/# chmod 600 /swap.img root@PGD001:/# ll /swap.img -rw------- 1 root root 34359738368 Nov 22 22:53 /swap.img root@PGD001:/# mkswap /swap.img mkswap: /swap.img: warning: wiping old swap signature. Setting up swapspace version 1, size = 32 GiB (34359734272 bytes) no label, UUID=9d658937-a89d-472b-aa94-be23e7f8703c root@PGD001:/# swapon /swap.img root@PGD001:/# free -m total used free shared buff/cache available Mem: 32058 20272 2149 4969 9637 6354 Swap: 36863 0 36863 root@PGD001:/# swapoff -v /swap1.img swapoff /swap1.img root@PGD001:/# swapon -s Filename Type Size Used Priority /swap.img file 33554428 0 -2 root@PGD001:/# free -m total used free shared buff/cache available Mem: 32058 20342 2078 4969 9637 6283 Swap: 32767 0 32767 root@PGD001:/# cat /etc/fstab |grep swap /swap.img none swap sw 0 0
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30126024/viewspace-2996848/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Typecho在Ubuntu 22.04上的安裝部署Ubuntu
- 在 Ubuntu 22.04 上安裝 KubeSphere 實戰教程Ubuntu
- 如何在 Ubuntu 22.04 上安裝 Python Pip?UbuntuPython
- 前%的處理--PostgreSQLSQL
- Ubuntu 22.04 中的 .NET 6Ubuntu
- Ubuntu22.04 vsftpdUbuntuFTP
- .NET 6 in Ubuntu 22.04Ubuntu
- ubuntu 22.04版本修改時區的操作方法Ubuntu
- 【ubuntu】22.04安裝dockerUbuntuDocker
- mount程式在systemctl守護的情況下,mount dir程式被oom後重新啟動失敗的處理方法OOM
- 如何在Ubuntu 22.04上安裝Linux 核心 詳細教程!UbuntuLinux
- Ubuntu22.04下Docker的安裝UbuntuDocker
- Ubuntu22.04安裝vncUbuntuVNC
- Ubuntu-22.04 掛載磁碟Ubuntu
- ubuntu22.04修改IP地址Ubuntu
- ubuntu22.04關閉KASLRUbuntu
- 5 種在 Ubuntu 上釋放空間的簡單方法Ubuntu
- PostgreSQL處理JSON入門SQLJSON
- Ubuntu 22.04中使用微信Ubuntu
- ubuntu 22.04 安裝samba服務UbuntuSamba
- 升級 ubuntu,從 18.04 到 22.04Ubuntu
- Ubuntu 22.04 安裝Docker環境UbuntuDocker
- Ubuntu 22.04 Git 程式碼維護UbuntuGit
- Ubuntu 22.04擴容LVM空間UbuntuLVM
- Ubuntu22.04 LAMP快速實戰UbuntuLAMP
- 【ubuntu】22.04安裝Redis Insight及AnotherRedisDesktopManagerUbuntuRedis
- ubuntu22.04建立idea快捷方式UbuntuIdea
- Ubuntu 22.04 阿里雲映象倉庫管理Ubuntu阿里
- Ubuntu 22.04 + Pycharm + Flask 配置 Flask 專案UbuntuPyCharmFlask
- Ubuntu 22.04搭建MC原版服務端Ubuntu服務端
- ubuntu22.04安裝idea2024UbuntuIdea
- 搶先體驗 Ubuntu 22.04 Jammy JellyfishUbuntu
- 【Ubuntu】在Ubuntu上安裝微信Ubuntu
- Ubuntu處理依賴問題Ubuntu
- 在使用 zabbix 4 時, orabbix 會報錯的處理方法
- 在 Laravel 中處理請求驗證的智慧方法Laravel
- 大櫻桃樹在秋天處理樹枝的方法SYB
- postgresql連線失敗如何處理SQL