18【線上日誌分析】之Spark on Yarn配置日誌Web UI(HistoryServer服務)

hackeruncle發表於2017-02-19

1.進入spark目錄和配置檔案
[root@sht-sgmhadoopnn-01 ~]# cd /root/learnproject/app/spark/conf
[root@sht-sgmhadoopnn-01 conf]# cp spark-defaults.conf.template spark-defaults.conf

2.建立spark-history的儲存日誌路徑為hdfs上(當然也可以在linux檔案系統上)
[root@sht-sgmhadoopnn-01 conf]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - root root          0 2017-02-14 12:43 /spark
drwxrwx---   - root root          0 2017-02-14 12:58 /tmp
drwxr-xr-x   - root root          0 2017-02-14 12:58 /user
You have new mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 conf]# hdfs dfs -ls /spark
Found 1 items
drwxrwxrwx   - root root          0 2017-02-15 21:44 /spark/checkpointdata
[root@sht-sgmhadoopnn-01 conf]# hdfs dfs -mkdir /spark/historylog
#在HDFS中建立一個目錄,用於儲存Spark執行日誌資訊。Spark History Server從此目錄中讀取日誌資訊

3.配置
[root@sht-sgmhadoopnn-01 conf]# vi spark-defaults.conf
spark.eventLog.enabled           true
spark.eventLog.compress          true
spark.eventLog.dir               hdfs://nameservice1/spark/historylog
spark.yarn.historyServer.address 172.16.101.55:18080

#spark.eventLog.dir儲存日誌相關資訊的路徑,可以是hdfs://開頭的HDFS路徑,也可以是file://開頭的本地路徑,都需要提前建立
#spark.yarn.historyServer.address : Spark history server的地址(不加http://).
這個地址會在Spark應用程式完成後提交給YARN RM,然後可以在RM UI上點選連結跳轉到history server UI上.

4.新增SPARK_HISTORY_OPTS引數
[root@sht-sgmhadoopnn-01 conf]# vi spark-env.sh
#!/usr/bin/env bash

export SCALA_HOME=/root/learnproject/app/scala
export JAVA_HOME=/usr/java/jdk1.8.0_111
export SPARK_MASTER_IP=172.16.101.55
export SPARK_WORKER_MEMORY=1g
export SPARK_PID_DIR=/root/learnproject/app/pid
export HADOOP_CONF_DIR=/root/learnproject/app/hadoop/etc/hadoop


export SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://mycluster/spark/historylog \
-Dspark.history.ui.port=18080 \
-Dspark.history.retainedApplications=20"

5.啟動服務和檢視
[root@sht-sgmhadoopnn-01 spark]# ./sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /root/learnproject/app/spark/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-sht-sgmhadoopnn-01.out

 

[root@sht-sgmhadoopnn-01 ~]# jps
28905 HistoryServer
30407 ProdServerStart
30373 ResourceManager
30957 NameNode
16949 Jps
30280 DFSZKFailoverController
31445 JobHistoryServer
[root@sht-sgmhadoopnn-01 ~]# ps -ef|grep spark
root     17283 16928  0 21:42 pts/2    00:00:00 grep spark
root     28905     1  0 Feb16 ?        00:09:11 /usr/java/jdk1.8.0_111/bin/java -cp /root/learnproject/app/spark/conf/:/root/learnproject/app/spark/jars/*:/root/learnproject/app/hadoop/etc/hadoop/ -Dspark.history.fs.logDirectory=hdfs://mycluster/spark/historylog -Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=20 -Xmx1g org.apache.spark.deploy.history.HistoryServer
You have new mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 ~]# netstat -nlp|grep 28905
tcp        0      0 0.0.0.0:18080               0.0.0.0:*                   LISTEN      28905/java         
[root@sht-sgmhadoopnn-01 ~]#

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30089851/viewspace-2133897/,如需轉載,請註明出處,否則將追究法律責任。

相關文章