Linux下資料檔案刪除檔案系統空間不釋放的問題

尛樣兒發表於2015-12-03

   首先資料檔案刪除檔案系統空間不釋放的問題不只出現在Linux平臺,所有平臺都可能有這樣的問題。這裡只是在Linux平臺做一些測試,其他平臺類似;其次只有將資料檔案存放在檔案系統中才會有此類問題,如果資料庫存放在ASM中也不會有類似的問題,這篇文章的目的是對相關問題進行總結,熟悉一些工具和方法,這是比較重要的。


   寫這篇文章是因為客戶有一套資料庫,AIX 6.1,資料檔案存放在檔案系統中,在使用DROP TABLESPACE UNDO INCLUDING CONTENTS AND DATAFILES刪除了UNDO表空間之後,檔案系統空間沒有得到釋放。如何解決問題就成了寫這篇文章的初衷。


    空間沒有釋放我們可能是透過df命令看確認的,當我們用du去掃描目錄的大小可能會發現df和du兩個命令看到的空間使用情況是不同的,可能差別很大,下面這篇MOS文章說明了原因:

'du' and 'df' tools report different space utilization (文件 ID 457444.1)



In this Document

 

 

Applies to:

Linux OS - Version Oracle Linux 4.4 and later

Linux x86-64

Linux x86

Linux Kernel - Version: 4.4 to 5.3

 

 

Symptoms

The 'du' (/usr/bin/du) and 'df' (/bin/df) command output displays conflicting space utilisation values, for example:

# df -k /

Filesystem    1k-blocks     Used  Available Use% Mounted on

/dev/sda6       9288792  8672768     144120 99% /

 

# du -xsh /

2.1G /

 

In the example above, 'df' reports 8.6 Gb to have been used on the root (/) filesystem, whereas 'du' reports only 2.1 Gb to have been used.

 

Cause

The 'df' command reports how many disk blocks are used, whilst 'du' traverses the filesystem and reports the actual number of blocks used (directory by directory), including any relating to files in use by processes.

 

In most cases, space utilisation values returned from 'df' and 'du' will be consistent. However, the potential exists for a running process to delete a large file, say. In this instance, according to 'du', the large file no longer exists, so the blocks associated with the deleted file are not reported. With the process still running, and with an open file descriptor still held against the deleted file, 'df' continues to track and report all disk blocks used, including those associated with the deleted (phantom) file. In this situation, the disk space associated with the deleted file will only be relinquished back to the system when the process completely releases the deleted file's descriptor or the process terminates (either gracefully or killed).

Solution

The solution is to identify and stop (or kill) the process that continues to hold a file descriptor open for the deleted file. To do so, run the lsof command (/usr/sbin/lsof | grep deleted) as root to identify the holding process, for example:

 

# lsof | grep deleted

COMMAND     PID   USER    FD  TYPE  DEVICE      SIZE       NODE NAME

cannaserv  3825  canna    0u   CHR   136,0                    2 /dev/pts/0 (deleted)

vmware     4295   root    6u   REG   253,0      6770   13074503 /tmp/vmware-root/ui-4295.log (deleted)

vmware-re  4316   root    6u   REG   253,0      6770   13074503 /tmp/vmware-root/ui-4295.log (deleted)

vmnet-nat  4448   root    0u   CHR   136,0                    2 /dev/pts/0 (deleted)

vmware-se  4454   root    0u   CHR   136,0                    2 /dev/pts/0 (deleted)

gdm-binar  4506   root    0u   CHR   136,0                    2 /dev/pts/0 (deleted)

gconfd-2   5392   root   12wW  REG   253,0       609   13090818 /tmp/gconfd-root/lock/0t1188207163ut519551u0p5392r346479926k3219784492 (deleted)

vmware-vm  5822   root   57u   REG   253,0   6520832   13074477 /tmp/vmware-root/ram0 (deleted)

vmware-vm 16487   root   57u   REG   253,0  11153408   13074520 /tmp/vmware-root/ram0 (deleted)

kdeinit   17991   root   17u   REG   253,0     26712   13074524 /tmp/kde-root/khtmlcacheM7jXYb.tmp (deleted)

kdeinit   17991   root   18u   REG   253,0      5631   13074501 /tmp/kde-root/khtmlcacheZlJmda.tmp (deleted)

kdeinit   17991   root   21u   REG   253,0     44718   13074514 /tmp/kde-root/khtmlcacheH5m4lc.tmp (deleted)

 

In the example above, the 7th column in the output denotes the size of deleted files (in bytes). The 9th column denotes which file remains held open. The 1st and second columns denotes the process and pid that still holds the open file descriptor.


    之所以df和du命令看到的空間使用會有差別,原因在於du不統計已經刪除的檔案,df會統計已經刪除的檔案,但該檔案依然被程式持有,只有等程式釋放了該檔案,df才不進行統計。透過lsof | grep deleted命令可以找出被刪除的檔案依然被程式持有的情況。

    透過上面這篇文章我們得知,在檔案系統中刪除某個大檔案空間沒有釋放是因為依然有程式在持有它,如果找到相關程式就需要用到lsof命令和fuser命令,熟悉這兩個命令非常的重要。

>lsof(list open files)是一個列出當前系統開啟檔案的工具。
>fuser用於標識訪問檔案或socket的程式資訊。

   fuser也是我們平時常用的找程式的工具,但它不能列出詳細的程式資訊,更多的只是一個程式號,無法很好的過濾,在這個場景中lsof工具更加的合適。

另外,大家都知道,在Linux平臺檔案系統中,即使某個檔案被刪除,但是它如果任然有程式持有它,這個檔案是可以被恢復的,這就是為什麼資料庫所有資料檔案被刪除,在例項沒有停機的情況下可以完整恢復資料檔案的原因,下面這篇MOS文章就討論了在此情況下如何對資料檔案進行恢復:

How To Recover Deleted Files on ext3/ext4 Filesystem (文件 ID 2056343.1)





In this Document

 

Goal

Solution

References

 

Applies to:

Linux OS - Version Oracle Linux 5.0 and later

Linux x86-64

Linux x86

Goal

How to recover deleted file on ext3/ext4 filesystem, Which still has file descriptor opened.

Solution

A file in Linux is a pointer to an inode which contains the file data (permissions, owner and where its actual content resides on the disk).

Deleting the file removes the link, but not the inode itself.

If any other process has this file open then inode is not released for writing until the process releases it.

So if a process still has the file open, the data are there somewhere, even though the directory listing shows no files.

# ll

total 4

-rw-r--r-- 1 root root 27 Sep 16 05:19 test

# rm test

# lsof /opt/test/test

COMMAND     PID   USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME

less 21353 root 4r REG 252,2 27 260360 /opt/test/test (deleted)      <<<<<<<<<<<<<


Understanding output of "lsof" command:

COMMAND: Command using the file.

PID: PID of the file

USER: Owner of the file

FD: File descriptor. Different flags of File descriptor are as below:

#:      The number in front of flag(s) is the file descriptor number of used by the process to associated with the file

u:      File open with Read and Write permission

r:       File open with Read permission

w:      File open with Write permission

W:      File open with Write permission and with Write Lock on entire file

mem: Memory mapped file, usually for share library

TYPE: File type. Different flags of File type are as below:

REG - Regular file

DIR - Directory

DEVICE: major, minor number of the device where file resides.

SIZE/OFF: File size

NODE: inode number

NAME: File name


Now we know that process 21353 still has the file open, and the file descriptor is 4.

Now we can look into /proc and there will be a reference to the inode, from which the deleted file can be copied.

Following steps will help to recover the deleted files:

# ls -l /proc/21353/fd/4

  lr-x------ 1 root root 64 Sep 16 05:28 /proc/21353/fd/4 -> /opt/test/test (deleted)

# cp /proc/21353/fd/4 /opt/test/test.bkp

Now verify the content of the restored file.

Note: Don't use the -a flag with cp, as this will copy the (broken) symbolic link, rather than the actual file contents.


另外,找到是某個程式持有的檔案,透過下面的方法可以看到這個程式相關的環境資訊:

Checking the environment variables of ASM pmon process: It shows ORACLE_HOME is set to /oracle_grid/product/11.2.0.3/grid/ ( with 'slash' at the end )

# ps -ef | grep pmon
oracle 27232 1 0 May30 ? 00:07:05 asm_pmon_+ASM1

# cat /proc/27232/environ
__CLSAGFW_TYPE_NAME=ora.asm.typeORA_CRS_HOME=/oracle_grid/product/11.2.0.3/grid/HOSTNAME=aude3od015naboi.basdev.aurdev.national.com.auTERM=xtermSHELL=/bin/bash__CR......

   總結:對於此類問題,我們首先要明白為什麼df和du在空間計算上有所差別,其次要熟悉lsof和fuser兩個命令,找出繼續持有檔案的程式號,透過該程式號可以在/proc目錄下恢復檔案,檢視程式的環境資訊,甚至殺掉程式來釋放空間。

最後透過一個簡單的例子來結束這篇文章:

1.首先確保lsof工具已經安裝到作業系統。

[root@rac01 Server]# rpm -ivh lsof-4.78-6.x86_64.rpm 
Preparing...                ########################################### [100%]
   1:lsof                   ########################################### [100%]
[root@rac01 Server]# which lsof
/usr/sbin/lsof

2.在其中一個會話透過tail -f install2.log命令使tail程式持有該檔案,在另一個會話透過rm -rf install2.log命令刪除該檔案。

3.使用lsof執行如下的操作:

[root@rac01 ~]# lsof | grep deleted
tail      6006      root    3r      REG                8,3     29544    4587629 /root/install2.log (deleted)
[root@rac01 ~]# cd /proc/6006/
[root@rac01 6006]# ls
attr        cmdline          cwd      fdinfo   loginuid   mounts      numa_maps      pagemap      schedstat  stat     task
auxv        comm             environ  io       maps       mountstats  oom_adj        personality  sessionid  statm    wchan
cgroup      coredump_filter  exe      latency  mem        net         oom_score      root         smaps      status
clear_refs  cpuset           fd       limits   mountinfo  ns          oom_score_adj  sched        stack      syscall
[root@rac01 6006]# cd fd
[root@rac01 fd]# ll
total 0
lrwx------ 1 root root 64 Dec  3 19:07 0 -> /dev/pts/0
lrwx------ 1 root root 64 Dec  3 19:07 1 -> /dev/pts/0
lrwx------ 1 root root 64 Dec  3 19:07 2 -> /dev/pts/0
lr-x------ 1 root root 64 Dec  3 19:07 3 -> /root/install2.log (deleted)
[root@rac01 fd]# cd ..
[root@rac01 6006]# cat environ 
HOSTNAME=rac01TERM=vt100SHELL=/bin/bashHISTSIZE=1000SSH_CLIENT=172.168.4.123 56823 22OLDPWD=/mnt/ServerSSH_TTY=/dev/pts/0USER=rootLS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:MAIL=/var/spool/mail/rootPATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/binINPUTRC=/etc/inputrcPWD=/rootLANG=en_US.UTF-8SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpassSHLVL=1HOME=/rootLOGNAME=rootSSH_CONNECTION=172.168.4.123 56823 172.168.4.200 22LESSOPEN=|/usr/bin/lesspipe.sh %sG_BROKEN_FILENAMES=1_=/usr/bin/tail
[root@rac01 6006]# lsof /root/    <<<< 找出持有/root目錄下檔案的程式
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
bash    4934 root  cwd    DIR    8,3     4096 4587521 /root/
tail    6006 root  cwd    DIR    8,3     4096 4587521 /root/
[root@rac01 6006]# lsof -c tail    <<<< 找出tail程式持有的檔案
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
tail    6006 root  cwd    DIR    8,3     4096 4587521 /root
tail    6006 root  rtd    DIR    8,3     4096       2 /
tail    6006 root  txt    REG    8,3    37704 1448826 /usr/bin/tail
tail    6006 root  mem    REG    8,3 56479136 1446088 /usr/lib/locale/locale-archive
tail    6006 root  mem    REG    8,3  1720736 5242891 /lib64/libc-2.5.so
tail    6006 root  mem    REG    8,3   142488 5242884 /lib64/ld-2.5.so
tail    6006 root    0u   CHR  136,0      0t0       3 /dev/pts/0
tail    6006 root    1u   CHR  136,0      0t0       3 /dev/pts/0
tail    6006 root    2u   CHR  136,0      0t0       3 /dev/pts/0
tail    6006 root    3r   REG    8,3    29544 4587629 /root/install2.log (deleted)
[root@rac01 6006]# lsof +d /root/    <<<< 顯示訪問/root目錄下檔案的程式。
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
bash    4934 root  cwd    DIR    8,3     4096 4587521 /root/
tail    6006 root  cwd    DIR    8,3     4096 4587521 /root/
[root@rac01 6006]# lsof +D /root/    <<<< 顯示訪問/root目錄及子目錄下檔案的程式。
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
bash    4934 root  cwd    DIR    8,3     4096 4587521 /root/
tail    6006 root  cwd    DIR    8,3     4096 4587521 /root/ 
[root@rac01 6006]# lsof -d 3 | grep -v grep | grep deleted    <<<< 顯示持有檔案FD為3的程式檔案
tail      6006      root    3r   REG                8,3    29544 4587629 /root/install2.log (deleted)
[root@rac01 6006]# lsof -p 6006    <<<< 顯示6006程式持有的檔案
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
tail    6006 root  cwd    DIR    8,3     4096 4587521 /root
tail    6006 root  rtd    DIR    8,3     4096       2 /
tail    6006 root  txt    REG    8,3    37704 1448826 /usr/bin/tail
tail    6006 root  mem    REG    8,3 56479136 1446088 /usr/lib/locale/locale-archive
tail    6006 root  mem    REG    8,3  1720736 5242891 /lib64/libc-2.5.so
tail    6006 root  mem    REG    8,3   142488 5242884 /lib64/ld-2.5.so
tail    6006 root    0u   CHR  136,0      0t0       3 /dev/pts/0
tail    6006 root    1u   CHR  136,0      0t0       3 /dev/pts/0
tail    6006 root    2u   CHR  136,0      0t0       3 /dev/pts/0
tail    6006 root    3r   REG    8,3    29544 4587629 /root/install2.log (deleted)
[root@rac01 6006]# lsof -u root | grep deleted    <<<< 顯示以root使用者持有的程式檔案
tail      6006 root    3r      REG                8,3     29544    4587629 /root/install2.log (deleted)
[root@rac01 6006]# cd /proc/6006/fd/
[root@rac01 fd]# cp 3 /root/install2.log    <<<< 恢復刪除的install2.log檔案
[root@rac01 fd]# cd /proc/6006/fd
[root@rac01 fd]# ll
total 0
lrwx------ 1 root root 64 Dec  3 19:07 0 -> /dev/pts/0
lrwx------ 1 root root 64 Dec  3 19:07 1 -> /dev/pts/0
lrwx------ 1 root root 64 Dec  3 19:07 2 -> /dev/pts/0
lr-x------ 1 root root 64 Dec  3 19:07 3 -> /root/install2.log (deleted)   <<<< 檔案恢復後deleted狀態未發生變化。
[root@rac01 fd]# cd /root/
[root@rac01 ~]# ls
anaconda-ks.cfg  core.11067  core.526  core.6027  core.7965  Desktop  install2.log  install.log  install.log.syslog

--end--

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23135684/viewspace-1852841/,如需轉載,請註明出處,否則將追究法律責任。

相關文章