1. 下載VMWare, 安裝CentOS9虛擬機器
2. 配置使用者,建立目錄
1. 以管理員身份登入,建立Spark使用者給Spark使用
sudo adduser sparkuser
2. 修改新使用者密碼 (123456)
sudo passwd sparkuser
3. 給新使用者Sparkuser Sudo許可權
切換到Root: su -
給sparkuser許可權: sparkuser ALL=(ALL) NOPASSWD:ALL
退出儲存: :qw
4. 以新建的sparkuser使用者登入,建立Spark目錄
sudo mkdir /opt/spark
5. 修改spark目錄owner為sparkuser
sudo chown -R sparkuser:sparkuser /opt/spark
3. 下載spark包,上傳到虛擬機器,解壓到spark目錄
sudo tar -xvzf spark-3.5.3-bin-hadoop3.tgz -C /opt/spark --strip-components=1
(The --strip-components=1
option removes the top-level directory from the extracted files, so they go directly into /opt/spark
.)
sudo chown -R sparkuser:sparkuser /opt/spark
4. 設定環境變數
Add Spark to your PATH by editing the .bashrc
or .bash_profile
of the Spark user.
echo "export SPARK_HOME=/opt/spark" >> /home/sparkuser/.bashrc echo "export PATH=\$PATH:\$SPARK_HOME/bin" >> /home/sparkuser/.bashrc source /home/sparkuser/.bashrc
5. JAVA Setup
安裝Java
sudo yum install java-11-openjdk-devel
檢視版本
java -version
檢視路徑
readlink -f $(which java)
設定環境變數
echo "export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.20.1.1-2.el9.x86_64" >> /home/sparkuser/.bashrc echo "export PATH=$JAVA_HOME/bin:$PATH" >> /home/sparkuser/.bashrc source /home/sparkuser/.bashrc
6. 啟動Spark
spark-shell
7. 啟動spark deltalake
bin/spark-shell --packages io.delta:delta-spark_2.12:3.2.0 \ --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \ --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
8. 測試deltalake
val data = spark.range(0, 5) data.write.format("delta").save("/tmp/delta-table")