Linux下資料檔案刪除檔案系統空間不釋放的問題
首先資料檔案刪除檔案系統空間不釋放的問題不只出現在Linux平臺,所有平臺都可能有這樣的問題。這裡只是在Linux平臺做一些測試,其他平臺類似;其次只有將資料檔案存放在檔案系統中才會有此類問題,如果資料庫存放在ASM中也不會有類似的問題,這篇文章的目的是對相關問題進行總結,熟悉一些工具和方法,這是比較重要的。
寫這篇文章是因為客戶有一套資料庫,AIX 6.1,資料檔案存放在檔案系統中,在使用DROP TABLESPACE UNDO INCLUDING CONTENTS AND DATAFILES刪除了UNDO表空間之後,檔案系統空間沒有得到釋放。如何解決問題就成了寫這篇文章的初衷。
空間沒有釋放我們可能是透過df命令看確認的,當我們用du去掃描目錄的大小可能會發現df和du兩個命令看到的空間使用情況是不同的,可能差別很大,下面這篇MOS文章說明了原因:
'du' and 'df' tools report different space utilization (文件 ID 457444.1) |
|
In this Document
Applies to:
Linux OS - Version Oracle Linux 4.4 and later
Linux x86-64
Linux x86
Linux Kernel - Version: 4.4 to 5.3
Symptoms
The 'du' (/usr/bin/du) and 'df' (/bin/df) command output displays conflicting space utilisation values, for example:
# df -k /
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/sda6 9288792 8672768 144120 99% /
# du -xsh /
2.1G /
In the example above, 'df' reports 8.6 Gb to have been used on the root (/) filesystem, whereas 'du' reports only 2.1 Gb to have been used.
Cause
The 'df' command reports how many disk blocks are used, whilst 'du' traverses the filesystem and reports the actual number of blocks used (directory by directory), including any relating to files in use by processes.
In most cases, space utilisation values returned from 'df' and 'du' will be consistent. However, the potential exists for a running process to delete a large file, say. In this instance, according to 'du', the large file no longer exists, so the blocks associated with the deleted file are not reported. With the process still running, and with an open file descriptor still held against the deleted file, 'df' continues to track and report all disk blocks used, including those associated with the deleted (phantom) file. In this situation, the disk space associated with the deleted file will only be relinquished back to the system when the process completely releases the deleted file's descriptor or the process terminates (either gracefully or killed).
Solution
The solution is to identify and stop (or kill) the process that continues to hold a file descriptor open for the deleted file. To do so, run the lsof command (/usr/sbin/lsof | grep deleted) as root to identify the holding process, for example:
# lsof | grep deleted
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
cannaserv 3825 canna 0u CHR 136,0 2 /dev/pts/0 (deleted)
vmware 4295 root 6u REG 253,0 6770 13074503 /tmp/vmware-root/ui-4295.log (deleted)
vmware-re 4316 root 6u REG 253,0 6770 13074503 /tmp/vmware-root/ui-4295.log (deleted)
vmnet-nat 4448 root 0u CHR 136,0 2 /dev/pts/0 (deleted)
vmware-se 4454 root 0u CHR 136,0 2 /dev/pts/0 (deleted)
gdm-binar 4506 root 0u CHR 136,0 2 /dev/pts/0 (deleted)
gconfd-2 5392 root 12wW REG 253,0 609 13090818 /tmp/gconfd-root/lock/0t1188207163ut519551u0p5392r346479926k3219784492 (deleted)
vmware-vm 5822 root 57u REG 253,0 6520832 13074477 /tmp/vmware-root/ram0 (deleted)
vmware-vm 16487 root 57u REG 253,0 11153408 13074520 /tmp/vmware-root/ram0 (deleted)
kdeinit 17991 root 17u REG 253,0 26712 13074524 /tmp/kde-root/khtmlcacheM7jXYb.tmp (deleted)
kdeinit 17991 root 18u REG 253,0 5631 13074501 /tmp/kde-root/khtmlcacheZlJmda.tmp (deleted)
kdeinit 17991 root 21u REG 253,0 44718 13074514 /tmp/kde-root/khtmlcacheH5m4lc.tmp (deleted)
In the example above, the 7th column in the output denotes the size of deleted files (in bytes). The 9th column denotes which file remains held open. The 1st and second columns denotes the process and pid that still holds the open file descriptor.
之所以df和du命令看到的空間使用會有差別,原因在於du不統計已經刪除的檔案,df會統計已經刪除的檔案,但該檔案依然被程式持有,只有等程式釋放了該檔案,df才不進行統計。透過lsof | grep deleted命令可以找出被刪除的檔案依然被程式持有的情況。
透過上面這篇文章我們得知,在檔案系統中刪除某個大檔案空間沒有釋放是因為依然有程式在持有它,如果找到相關程式就需要用到lsof命令和fuser命令,熟悉這兩個命令非常的重要。
>lsof(list open files)是一個列出當前系統開啟檔案的工具。
>fuser用於標識訪問檔案或socket的程式資訊。
fuser也是我們平時常用的找程式的工具,但它不能列出詳細的程式資訊,更多的只是一個程式號,無法很好的過濾,在這個場景中lsof工具更加的合適。
另外,大家都知道,在Linux平臺檔案系統中,即使某個檔案被刪除,但是它如果任然有程式持有它,這個檔案是可以被恢復的,這就是為什麼資料庫所有資料檔案被刪除,在例項沒有停機的情況下可以完整恢復資料檔案的原因,下面這篇MOS文章就討論了在此情況下如何對資料檔案進行恢復:
How To Recover Deleted Files on ext3/ext4 Filesystem (文件 ID 2056343.1) |
|
|
In this Document
Applies to:
Linux OS - Version Oracle Linux 5.0 and later
Linux x86-64
Linux x86
Goal
How to recover deleted file on ext3/ext4 filesystem, Which still has file descriptor opened.
Solution
A file in Linux is a pointer to an inode which contains the file data (permissions, owner and where its actual content resides on the disk).
Deleting the file removes the link, but not the inode itself.
If any other process has this file open then inode is not released for writing until the process releases it.
So if a process still has the file open, the data are there somewhere, even though the directory listing shows no files.
# ll
total 4
-rw-r--r-- 1 root root 27 Sep 16 05:19 test
# rm test
# lsof /opt/test/test
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
less 21353 root 4r REG 252,2 27 260360 /opt/test/test (deleted) <<<<<<<<<<<<<
Understanding
output of "lsof" command:
COMMAND: Command using the file.
PID: PID of the file
USER: Owner of the file
FD: File descriptor. Different flags of File descriptor are as below:
#: The number in front of flag(s) is the file descriptor number of used by the process to associated with the file
u: File open with Read and Write permission
r: File open with Read permission
w: File open with Write permission
W: File open with Write permission and with Write Lock on entire file
mem: Memory mapped file, usually for share library
TYPE: File type. Different flags of File type are as below:
REG - Regular file
DIR - Directory
DEVICE: major, minor number of the device where file resides.
SIZE/OFF: File size
NODE: inode number
NAME: File name
Now we
know that process 21353 still has the file open, and the file descriptor is 4.
Now we can look into /proc and there will be a reference to the inode, from which the deleted file can be copied.
Following steps will help to recover the deleted files:
# ls -l /proc/21353/fd/4
lr-x------ 1 root root 64 Sep 16 05:28 /proc/21353/fd/4 -> /opt/test/test (deleted)
# cp /proc/21353/fd/4 /opt/test/test.bkp
Now verify the content of the restored file.
Note: Don't use the -a flag with cp, as this will copy the (broken) symbolic link, rather than the actual file contents.
另外,找到是某個程式持有的檔案,透過下面的方法可以看到這個程式相關的環境資訊:
Checking the environment variables of ASM pmon process: It shows ORACLE_HOME is set to /oracle_grid/product/11.2.0.3/grid/ ( with 'slash' at the end )
oracle 27232 1 0 May30 ? 00:07:05 asm_pmon_+ASM1
# cat /proc/27232/environ
__CLSAGFW_TYPE_NAME=ora.asm.typeORA_CRS_HOME=/oracle_grid/product/11.2.0.3/grid/HOSTNAME=aude3od015naboi.basdev.aurdev.national.com.auTERM=xtermSHELL=/bin/bash__CR......
總結:對於此類問題,我們首先要明白為什麼df和du在空間計算上有所差別,其次要熟悉lsof和fuser兩個命令,找出繼續持有檔案的程式號,透過該程式號可以在/proc目錄下恢復檔案,檢視程式的環境資訊,甚至殺掉程式來釋放空間。
最後透過一個簡單的例子來結束這篇文章:
1.首先確保lsof工具已經安裝到作業系統。
[root@rac01 Server]# rpm -ivh lsof-4.78-6.x86_64.rpm
Preparing... ########################################### [100%]
1:lsof ########################################### [100%]
[root@rac01 Server]# which lsof
/usr/sbin/lsof
2.在其中一個會話透過tail -f install2.log命令使tail程式持有該檔案,在另一個會話透過rm -rf install2.log命令刪除該檔案。
3.使用lsof執行如下的操作:
[root@rac01 ~]# lsof | grep deleted
tail 6006 root 3r REG 8,3 29544 4587629 /root/install2.log (deleted)
[root@rac01 ~]# cd /proc/6006/
[root@rac01 6006]# ls
attr cmdline cwd fdinfo loginuid mounts numa_maps pagemap schedstat stat task
auxv comm environ io maps mountstats oom_adj personality sessionid statm wchan
cgroup coredump_filter exe latency mem net oom_score root smaps status
clear_refs cpuset fd limits mountinfo ns oom_score_adj sched stack syscall
[root@rac01 6006]# cd fd
[root@rac01 fd]# ll
total 0
lrwx------ 1 root root 64 Dec 3 19:07 0 -> /dev/pts/0
lrwx------ 1 root root 64 Dec 3 19:07 1 -> /dev/pts/0
lrwx------ 1 root root 64 Dec 3 19:07 2 -> /dev/pts/0
lr-x------ 1 root root 64 Dec 3 19:07 3 -> /root/install2.log (deleted)
[root@rac01 fd]# cd ..
[root@rac01 6006]# cat environ
HOSTNAME=rac01TERM=vt100SHELL=/bin/bashHISTSIZE=1000SSH_CLIENT=172.168.4.123 56823 22OLDPWD=/mnt/ServerSSH_TTY=/dev/pts/0USER=rootLS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:MAIL=/var/spool/mail/rootPATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/binINPUTRC=/etc/inputrcPWD=/rootLANG=en_US.UTF-8SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpassSHLVL=1HOME=/rootLOGNAME=rootSSH_CONNECTION=172.168.4.123 56823 172.168.4.200 22LESSOPEN=|/usr/bin/lesspipe.sh %sG_BROKEN_FILENAMES=1_=/usr/bin/tail
[root@rac01 6006]# lsof /root/ <<<< 找出持有/root目錄下檔案的程式
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
bash 4934 root cwd DIR 8,3 4096 4587521 /root/
tail 6006 root cwd DIR 8,3 4096 4587521 /root/
[root@rac01 6006]# lsof -c tail <<<< 找出tail程式持有的檔案
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
tail 6006 root cwd DIR 8,3 4096 4587521 /root
tail 6006 root rtd DIR 8,3 4096 2 /
tail 6006 root txt REG 8,3 37704 1448826 /usr/bin/tail
tail 6006 root mem REG 8,3 56479136 1446088 /usr/lib/locale/locale-archive
tail 6006 root mem REG 8,3 1720736 5242891 /lib64/libc-2.5.so
tail 6006 root mem REG 8,3 142488 5242884 /lib64/ld-2.5.so
tail 6006 root 0u CHR 136,0 0t0 3 /dev/pts/0
tail 6006 root 1u CHR 136,0 0t0 3 /dev/pts/0
tail 6006 root 2u CHR 136,0 0t0 3 /dev/pts/0
tail 6006 root 3r REG 8,3 29544 4587629 /root/install2.log (deleted)
[root@rac01 6006]# lsof +d /root/ <<<< 顯示訪問/root目錄下檔案的程式。
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
bash 4934 root cwd DIR 8,3 4096 4587521 /root/
tail 6006 root cwd DIR 8,3 4096 4587521 /root/
[root@rac01 6006]# lsof +D /root/ <<<< 顯示訪問/root目錄及子目錄下檔案的程式。
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
bash 4934 root cwd DIR 8,3 4096 4587521 /root/
tail 6006 root cwd DIR 8,3 4096 4587521 /root/
[root@rac01 6006]# lsof -d 3 | grep -v grep | grep deleted <<<< 顯示持有檔案FD為3的程式檔案
tail 6006 root 3r REG 8,3 29544 4587629 /root/install2.log (deleted)
[root@rac01 6006]# lsof -p 6006 <<<< 顯示6006程式持有的檔案
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
tail 6006 root cwd DIR 8,3 4096 4587521 /root
tail 6006 root rtd DIR 8,3 4096 2 /
tail 6006 root txt REG 8,3 37704 1448826 /usr/bin/tail
tail 6006 root mem REG 8,3 56479136 1446088 /usr/lib/locale/locale-archive
tail 6006 root mem REG 8,3 1720736 5242891 /lib64/libc-2.5.so
tail 6006 root mem REG 8,3 142488 5242884 /lib64/ld-2.5.so
tail 6006 root 0u CHR 136,0 0t0 3 /dev/pts/0
tail 6006 root 1u CHR 136,0 0t0 3 /dev/pts/0
tail 6006 root 2u CHR 136,0 0t0 3 /dev/pts/0
tail 6006 root 3r REG 8,3 29544 4587629 /root/install2.log (deleted)
[root@rac01 6006]# lsof -u root | grep deleted <<<< 顯示以root使用者持有的程式檔案
tail 6006 root 3r REG 8,3 29544 4587629 /root/install2.log (deleted)
[root@rac01 6006]# cd /proc/6006/fd/
[root@rac01 fd]# cp 3 /root/install2.log <<<< 恢復刪除的install2.log檔案
[root@rac01 fd]# cd /proc/6006/fd
[root@rac01 fd]# ll
total 0
lrwx------ 1 root root 64 Dec 3 19:07 0 -> /dev/pts/0
lrwx------ 1 root root 64 Dec 3 19:07 1 -> /dev/pts/0
lrwx------ 1 root root 64 Dec 3 19:07 2 -> /dev/pts/0
lr-x------ 1 root root 64 Dec 3 19:07 3 -> /root/install2.log (deleted) <<<< 檔案恢復後deleted狀態未發生變化。
[root@rac01 fd]# cd /root/
[root@rac01 ~]# ls
anaconda-ks.cfg core.11067 core.526 core.6027 core.7965 Desktop install2.log install.log install.log.syslog
--end--
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23135684/viewspace-1852841/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Linux檔案刪除但空間不釋放問題篇Linux
- Linux 刪除檔案後空間不釋放Linux
- 刪除正在使用的檔案,空間不釋放的問題
- 解決linux下刪除檔案或oracle表空間後空間不釋放的問題LinuxOracle
- hpux刪除檔案後空間不釋放UX
- (轉載)刪除檔案後硬碟空間不釋放的問題硬碟
- linux下檔案刪除之後,空間沒有釋放問題Linux
- Linux檔案刪除空間未釋放Linux
- 處理Linux刪除檔案後空間未釋放的問題Linux
- oracle刪除(釋放)資料檔案/表空間流程Oracle
- 解決linux刪除檔案後空間沒有釋放問題Linux
- 解決刪除檔案後 WSL2 磁碟空間不釋放的問題
- RM刪除檔案空間釋放詳解
- Oracle 刪除資料後釋放資料檔案所佔磁碟空間Oracle
- drop表空間以及對應的資料檔案後空間不釋放的問題
- linux中如何解決檔案已刪除但空間不釋放的案例Linux
- Linux rm掉檔案空間不釋放原因Linux
- Linux下批量刪除空檔案或者刪除指定大小的檔案Linux
- 刪除空資料檔案
- [待整理]oracle10g刪除(釋放)資料檔案/表空間流程Oracle
- 刪除表空間,資料檔案也刪除後,但作業系統層面上空閒空間不見增加。作業系統
- oracle誤刪除表空間的資料檔案Oracle
- 移動資料檔案、系統表空間檔案、臨時表空間檔案
- 如何正確的刪除表空間資料檔案
- 刪除檔案後,磁碟空間沒有釋放的處理記錄
- Sqlserver delete表部分資料釋放資料檔案空間SQLServerdelete
- linux 下按照時間刪除檔案Linux
- 刪除表空間和表空間包含的檔案
- 10G刪除空資料檔案
- 歸檔模式下的表空間檔案無法用命令刪除模式
- oracle 小議如何從表空間 刪除 資料檔案Oracle
- linux下恢復誤刪除的資料檔案Linux
- oracle 失誤刪掉資料檔案後,刪除表空間操作Oracle
- UNDO表空間下的資料檔案被誤刪除後的處理方法
- mac怎麼徹底刪除檔案不佔用空間Mac
- Linux 5.12 的 exFAT 檔案系統更快刪除大檔案Linux
- Linux下刪除昨天的檔案Linux
- AIX刪除檔案系統AI