Hadoop-叢集執行

王智剛發表於2022-04-04

前提:需要在上節Hadoop檔案引數配置的基礎上完成

步驟一、NameNode 格式化

第一次啟動 HDFS 時要進行格式化,否則會缺失 DataNode 程式。另外,只要執行過 HDFS,Hadoop 的工作目錄(本書設定為/usr/local/src/hadoop/tmp)就會有資料,如果需要重新格式化,則在格式化之前一定要先刪除工作目錄下的資料,否則格式化時會出問題。

(master節點)

[root@master ~]# su - hadoop
[hadoop@master ~]$ cd /usr/local/src/hadoop/
[hadoop@master hadoop]$ ./bin/hdfs namenode -format
22/04/01 17:37:26 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master/192.168.100.10
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.1
……
……
22/04/01 17:37:26 INFO common.Storage: Storage directory /usr/local/src/hadoop/dfs/name has been successfully formatted.
22/04/01 17:37:26 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
22/04/01 17:37:26 INFO util.ExitUtil: Exiting with status 0
22/04/01 17:37:26 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.100.10
************************************************************/

以上出現successfully說明格式化成功

步驟二、啟動 NameNode

(master節點)

[hadoop@master hadoop]$ hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.example.com.out
[hadoop@master hadoop]$ jps
41732 NameNode
41801 Jps

看到NameNode說明成功

步驟三、啟動 SecondaryNameNode

(master節點)

[hadoop@master hadoop]$ hadoop-daemon.sh start secondarynamenode
starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.example.com.out
[hadoop@master hadoop]$ jps
41732 NameNode
41877 Jps
41834 SecondaryNameNode

看到SecondaryNameNode說明成功

步驟四、slave 啟動 DataNode

(slave1和slave2節點)

[root@slave1 ~]# su - hadoop 
[hadoop@slave1 ~]$  hadoop-daemon.sh start datanode
starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.example.com.out
[hadoop@slave1 ~]$ jps
41552 DataNode
41627 Jps 

[root@slave2 ~]# su - hadoop 
[hadoop@slave2 ~]$  hadoop-daemon.sh start datanode
starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.example.com.out
[hadoop@slave2 ~]$ jps
4161 DataNode
4236 Jps 

看到DataNode說明成功

步驟五、檢視 HDFS 的報告

(master節點)

[hadoop@master hadoop]$ hdfs dfsadmin -report
Configured Capacity: 34879832064 (32.48 GB)
Present Capacity: 26675437568 (24.84 GB)
DFS Remaining: 26675429376 (24.84 GB)
DFS Used: 8192 (8 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.100.20:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 16640901120 (15.50 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 4275404800 (3.98 GB)
DFS Remaining: 12365492224 (11.52 GB)
DFS Used%: 0.00%
DFS Remaining%: 74.31%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Apr 01 17:41:17 CST 2022


Name: 192.168.100.30:50010 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 18238930944 (16.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3928989696 (3.66 GB)
DFS Remaining: 14309937152 (13.33 GB)
DFS Used%: 0.00%
DFS Remaining%: 78.46%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Apr 01 17:41:17 CST 2022

步驟六、瀏覽器檢視節點狀態

需要在windows真機上執行

1、進入C:\Windows\sytstem32\drivers\etc\

2、把此目錄下的hosts檔案拖到桌面上

3、右鍵開啟此檔案加入IP與主機名的對映關係

192.168.100.10  master  master.example.com
192.168.100.20  slave1  slave1.example.com
192.168.100.30  slave2  slave2.example.com

4、儲存後拖回原位置

在瀏覽器訪問:http://master:50070,可以檢視NameNode和DataNode 資訊

在瀏覽器訪問: http://master:50090,可以檢視 SecondaryNameNode 資訊

步驟七、配置免密登入

啟動 HDFS之前需要配置 SSH 免密碼登入,否則在啟動過程中系統將多次要求確認連線和輸入 Hadoop 使用者密碼。

(master節點)

[hadoop@master ~]$ ssh-keygen -t rsa
……
[hadoop@master ~]$ ssh-copy-id slave1
……
[hadoop@master ~]$ ssh-copy-id slave2
……
[hadoop@master ~]$ ssh-copy-id master
……

步驟八、啟動dfs 和 yarn

(master節點)

[hadoop@master ~]$ stop-dfs.sh
……
[hadoop@master ~]$ start-dfs.sh
……
[hadoop@master ~]$ start-yarn.sh
……
[hadoop@master ~]$ jps
45284 SecondaryNameNode
45702 Jps
45080 NameNode
45435 ResourceManager

(slave1和slave2節點)

[hadoop@slave1 ~]$ jps
42986 DataNode
43213 Jps
43102 NodeManager

[hadoop@slave2 ~]$ jps
42986 DataNode
43213 Jps
43102 NodeManager

在master上看到ResourceManager,並且在slave上看到NodeManager就說明啟動成功

步驟九、執行WordCount測試

執行 MapReduce 程式,需要先在 HDFS 檔案系統中建立資料輸入目錄,存放輸入資料。

注意:建立的/input 目錄是在 HDFS 檔案系統中,只能用 HDFS 命令檢視和操作。

(master節點)

[hadoop@master ~]$ hdfs dfs -mkdir /input
[hadoop@master ~]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2022-04-01 19:50 /input
[hadoop@master ~]$ mkdir ~/input
[hadoop@master ~]$ vi input/data.txt
Hello World
Hello Hadoop
Hello Huasan

將輸入資料檔案複製到 HDFS 的/input 目錄中

[hadoop@master ~]$ hdfs dfs -put ~/input/data.txt /input
[hadoop@master ~]$ hdfs dfs -cat /input/data.txt 
Hello World
Hello Hadoop
Hello Huasan

執行 WordCount

注意:資料輸出目錄/output不能提前建立,否則會報錯

(master節點)

[hadoop@master ~]$ hadoop jar /usr/local/src/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input/data.txt /output
22/04/01 19:58:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/04/01 19:58:10 INFO input.FileInputFormat: Total input paths to process : 1
22/04/01 19:58:10 INFO mapreduce.JobSubmitter: number of splits:1
22/04/01 19:58:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1648813571523_0001
22/04/01 19:58:11 INFO impl.YarnClientImpl: Submitted application application_1648813571523_0001
22/04/01 19:58:11 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1648813571523_0001/
22/04/01 19:58:11 INFO mapreduce.Job: Running job: job_1648813571523_0001
22/04/01 19:58:17 INFO mapreduce.Job: Job job_1648813571523_0001 running in uber mode : false
22/04/01 19:58:17 INFO mapreduce.Job:  map 0% reduce 0%
22/04/01 19:58:20 INFO mapreduce.Job:  map 100% reduce 0%
22/04/01 19:58:25 INFO mapreduce.Job:  map 100% reduce 100%
22/04/01 19:58:26 INFO mapreduce.Job: Job job_1648813571523_0001 completed successfully
22/04/01 19:58:26 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=56
                ……

出現successfully說明執行成功

在瀏覽器訪問: http://master:8088,可以看到執行成功

在瀏覽器訪問: http://master:50070,在 Utilities 選單中選擇 Browse the file system,可以檢視 HDFS 檔案系統內容。

檢視 output 目錄,檔案_SUCCESS 表示處理成功,處理的結果存放在 part-r-00000 檔案中。

也可以直接使用命令檢視 part-r-00000 檔案內容

(master節點)

[hadoop@master ~]$ hdfs dfs -cat /output/part-r-00000
Hadoop  1
Hello   3
Huasan  1
World   1

步驟十、停止 Hadoop

使用stop-all.sh一條命令就可以全部停止

(master節點)

[hadoop@master ~]$ stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
192.168.100.30: stopping datanode
192.168.100.20: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
192.168.100.20: stopping nodemanager
192.168.100.30: stopping nodemanager
192.168.100.20: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
192.168.100.30: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop

檢視 JAVA 程式

[hadoop@master ~]$ jps
46683 Jps

[hadoop@slave1 ~]$ jps
43713 Jps

[hadoop@slave2 ~]$ jps
41702 Jps

宣告:未經許可,不得轉載

相關文章