如何獲取HDFS上檔案的儲存位置

weixin_33913332發表於2019-02-27

我們知道儲存在HDFS上的檔案一般有多個副本,預設是3個,訪問這個檔案是通過一個URL來的,但是這個檔案到底儲存在哪個DataNode節點的什麼位置,這是很多人不清楚的。其實HDFS提供了一個命令,接下來我們就看看這個問題。

hdfs fsck命令

HDFS supports the fsck command to check for various inconsistencies. It it is designed for reporting problems with various files, for example, missing blocks for a file or under-replicated blocks. Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures. By default fsck ignores open files but provides an option to select all files during reporting. The HDFS fsck command is not a Hadoop shell command. It can be run as <tt>bin/hdfs fsck</tt>. For command usage, see fsck. fsck can be run on the whole file system or on a subset of files.

命令使用方式
hdfs fsck file_path_on_hdfs -files -blocks -locations
執行命令檢視我們的檔案
[hdfs@dlbdn3 data]$ hdfs fsck /user/ericsson/eop/template_workflow.xml -files -blocks -locations
Connecting to namenode via http://dlbdn3:50070
FSCK started by hdfs (auth:SIMPLE) from /192.168.123.4 for path /user/ericsson/eop/template_workflow.xml at Wed Feb 27 17:28:57 CST 2019
/user/ericsson/eop/template_workflow.xml 3685 bytes, 1 block(s):  OK
0. BP-358999289-192.168.123.4-1530520401469:blk_1074308735_568435 len=3685 Live_repl=3 [DatanodeInfoWithStorage[192.168.123.4:7710,DS-c440ebd2-4553-4b87-b2e1-67a8ae1e29c1,DISK], DatanodeInfoWithStorage[192.168.123.3:7710,DS-4c6c7796-0027-4cb9-a476-041a13146dcf,DISK], DatanodeInfoWithStorage[192.168.123.2:7710,DS-83c58757-f199-48e1-9d04-bd09fc996fbc,DISK]]

Status: HEALTHY
 Total size:    3685 B
 Total dirs:    0
 Total files:   1
 Total symlinks:        0
 Total blocks (validated):  1 (avg. block size 3685 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 3.0
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      3
 Number of racks:       1
FSCK ended at Wed Feb 27 17:28:57 CST 2019 in 1 milliseconds


The filesystem under path '/user/ericsson/eop/template_workflow.xml' is HEALTHY

根據DatanodeInfoWithStorage裡面提供的ip資訊,進去對應節點, 執行find
[root@dlbdn3 subdir166]# find / -name "*blk_1074308735_568435*"
find: ‘/run/user/42/gvfs’: Permission denied
/data/2/dfs/dn/current/BP-358999289-192.168.123.4-1530520401469/current/finalized/subdir8/subdir166/blk_1074308735_568435.meta
[root@dlbdn3 subdir166]# cd /data/2/dfs/dn/current/BP-358999289-192.168.123.4-1530520401469/current/finalized/subdir8/subdir166
[root@dlbdn3 subdir166]# ll | grep blk_1074308735
-rw-r--r-- 1 hdfs hdfs   3685 Feb 27 16:29 blk_1074308735
-rw-r--r-- 1 hdfs hdfs     39 Feb 27 16:29 blk_1074308735_568435.meta
[root@dlbdn3 subdir166]# 

檢視blk檔案的內容,是否是我們想要找的檔案
2375190-10b8946021d00fd9.png
image.png

2375190-024981248a6b1de1.png
image.png
確認是一個檔案,至此就找到了HDFS檔案上儲存的資訊。很簡單吧,也很實用,很多時候需要知道這個資訊。

相關文章