Spark 環境問題記錄和解決方法

MyStitch發表於2020-07-03

Spark 版本配套表

名稱

版本

說明

Spark

spark-2.3.0-bin-hadoop2.7

Spark

mongo-java-driver-3.5.0.jar

3.5

Mongo驅動

mongo-spark-connector_2.11-2.3.1.jar

2.3

Mongo connect驅動

 

Spark 與mongoDb版本不匹配,導致報錯

需要spark使用mongoDB驅動版本mongo-spark-connector到spark與mongoDB配套的版本

Spark dirver 節點與執行節點python版本不匹配

Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

解決方法,配置 PYSPARK_PYTHON=/paic/spark/home/csmsopr/anaconda3/bin/python 環境變數

 

Hadoop目錄許可權問題

失敗日誌

2018-11-12 16:15:38 INFO  SecurityManager:54 - Changing view acls to: csmsopr

2018-11-12 16:15:38 INFO  SecurityManager:54 - Changing modify acls to: csmsopr

2018-11-12 16:15:38 INFO  SecurityManager:54 - Changing view acls groups to:

2018-11-12 16:15:38 INFO  SecurityManager:54 - Changing modify acls groups to:

2018-11-12 16:15:38 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(csmsopr); groups with view permissions: Set(); users  with modify permissions: Set(csmsopr); groups with modify permissions: Set()

2018-11-12 16:15:38 INFO  Client:54 - Submitting application application_1541659438825_0044 to ResourceManager

 

Traceback (most recent call last):

  File "/lzp/submit_task.py", line 9, in <module>

    sc = SparkContext()

  File "/lzp/spark-2.3.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__

  File "/lzp/spark-2.3.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 180, in _do_init

  File "/lzp/spark-2.3.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 290, in _initialize_context

  File "/lzp/spark-2.3.2-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__

  File "/lzp/spark-2.3.2-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.

: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root/.sparkStaging/application_1541659438825_0024":csmsopr:supergroup:drwxr-xr-x

        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)

        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)

        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)

 

解決方法

http://www.huqiwen.com/2013/07/18/hdfs-permission-denied/

最終,總結下來解決辦法大概有三種:

1、在系統的環境變數或java JVM變數裡面新增HADOOP_USER_NAME,這個值具體等於多少看自己的情況,以後會執行HADOOP上的Linux的使用者名稱。(修改完重啟eclipse,不然可能不生效)

2、將當前系統的帳號修改為hadoop

3、使用HDFS的命令列介面修改相應目錄的許可權,hadoop fs -chmod 777 /user,後面的/user是要上傳檔案的路徑,不同的情況可能不一樣,比如要上傳的檔案路徑為hdfs://namenode/user/xxx.doc,則這樣的修改可以,如果要上傳的檔案路徑為hdfs://namenode/java/xxx.doc,則要修改的為hadoop fs -chmod 777 /java或者hadoop fs -chmod 777 /,java的那個需要先在HDFS裡面建立Java目錄,後面的這個是為根目錄調整許可權。

 

Hadoop測試環境和生產環境配置區分

使用hadoop配置替換原有配置,docker中hadoop配置如何區分測試和生產,能否通過環境變數來配置

使用環境變數配置

不同環境配置不同的目錄

HADOOP_CONF_DIR=/app/hadoop_config/prd/

通過環境變數配置解決

 

Spark cluster提交任務賬戶不同

提交任務的client賬戶與叢集賬戶不同,通過環境變數來解決

不切換到csmsopr賬戶,在環境變數中配置即可 ENV HADOOP_USER_NAME="prdopr"

 

Spark 磁碟空間不足

https://www.cnblogs.com/itboys/p/6021838.html

2018-12-19 13:40:49,848  INFO  2018-12-19 13:40:49 WARN  Client:87 - Failed to cleanup staging dir hdfs://governor/user/csmsopr/.sparkStaging/application_1545009795494_0018

2018-12-19 13:40:49,848  INFO  org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot delete /user/csmsopr/.sparkStaging/application_1545009795494_0018. Name node is in safe mode.

2018-12-19 13:40:49,848  INFO  Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE:  If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.

 

根據上面的報錯原因分析是因為叢集資源不夠,叢集的自我保護機制使hdfs處於安全模式,然後我用”hdfs dfsadmin -safemode leave“命令讓叢集恢復到可用模式但是在提交到叢集時還是會報錯同樣的錯誤

然後就查詢資料說的是節點空間不足,然後就用 df -hl命令檢視叢集空間的使用情況

看到上面的使用情況資源已經使用100%了

然後在使用du -sh /* 看看是拿些大檔案佔用了空間

然後把這些佔用空間大的檔案移動到別的地方然後重新提交任務,到此錯誤完美解決

Spark No space left on device 

設定資料臨時目錄到其他目錄

Spark: java.io.IOException: No space left on device

SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dhadoop.tmp.dir=/mnt/ephemeral-hdfs"

export SPARK_JAVA_OPT

連結:

https://stackoverflow.com/questions/30162845/spark-java-io-ioexception-no-space-left-on-device

相關文章