systemctl 如果用於守護mount 程式時，建議在systemctl程式碼段ExecStart指向的mount指令碼中增加umount命令再去執行mount命令，因為一旦一個mount的目錄的程式被OOM後，這個mount目錄其實還是被佔用的，需要umount後才能再次mount上去

mount指令碼如下

root@DAILAPGDBUP001:~# cat /root/mountdatadomaindir.sh
/opt/emc/boostfs/bin/boostfs mount /mnt/datadomaindir -d DAILADD01.dai.netdai.com -s daipostgres -o allow-others=true

systemctl程式碼段ExecStart指向了該mount指令碼，systemctl資訊如下

root@DAILAPGDBUP001:~# vim /usr/lib/systemd/system/mountdatadomaindir.service
[Unit]
Description=mountdatadomaindir
After=network.target
[Service]
User=root
Group=root
Type=forking
ExecStart=/bin/bash /root/mountdatadomaindir.sh
Restart=on-failure
[Install]
WantedBy=multi-user.target

root@DAILAPGDBUP001:~# systemctl enable mountdatadomaindir

有一次發生了OOM,我們們systemctl已經是加了Restart=on-failure的，但是沒看到/mnt/datadomaindir被掛載了，/var/log/syslogs有如下記錄，

Oct 15 01:01:02 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: A process of this unit has been killed by the OOM killer.
Oct 15 01:01:02 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Main process exited, code=killed, status=9/KILL
Oct 15 01:01:02 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Failed with result 'oom-kill'.
Oct 15 01:01:02 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Consumed 36min 37.125s CPU time.
Oct 15 01:01:03 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Scheduled restart job, restart counter is at 1.
Oct 15 01:01:03 DAILAPGDBUP001 systemd[1]: Stopped mountdatadomaindir.
Oct 15 01:01:03 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Consumed 36min 37.125s CPU time.
Oct 15 01:01:03 DAILAPGDBUP001 systemd[1]: Starting mountdatadomaindir...
Oct 15 01:01:04 DAILAPGDBUP001 bash[1896219]: Not able to access the mount point /mnt/datadomaindir
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Control process exited, code=exited, status=1/FAILURE
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Failed with result 'exit-code'.
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: Failed to start mountdatadomaindir.
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Scheduled restart job, restart counter is at 2.
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: Stopped mountdatadomaindir.
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: Starting mountdatadomaindir...
Oct 15 01:01:06 DAILAPGDBUP001 bash[1896286]: Not able to access the mount point /mnt/datadomaindir
Oct 15 01:01:06 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Control process exited, code=exited, status=1/FAILURE

並且ll /mnt/顯示ls: cannot access 'datadomaindir': Transport endpoint is not connected，並且掛載的目錄資訊都是顯示?問號

root@DAILAPGDBUP001:~# ll /mnt/
ls: cannot access 'datadomaindir': Transport endpoint is not connected
total 8
drwxr-xr-x  3 root root 4096 Sep 16 04:16 ./
drwxr-xr-x 20 root root 4096 Aug 31 06:36 ../
d?????????  ? ?    ?       ?            ? datadomaindir/

解決方法：在/root/mountdatadomaindir.sh中增加一段umount /mnt/datadomaindir，原因就是一旦一個mount的目錄的程式被OOM後，這個mount目錄其實還是被佔用的，需要umount後才能再次mount上去

root@DAILAPGDBUP001:~# vim /root/mountdatadomaindir.sh
umount /mnt/datadomaindir
/opt/emc/boostfs/bin/boostfs mount /mnt/datadomaindir -d DAILADD01.dai.netdai.com -s daipostgres -o allow-others=true

mount程式在systemctl守護的情況下，mount dir程式被oom後重新啟動失敗的處理方法

相關文章