從0到1搭建DeltaLake大資料平臺

ZH谢工發表於2024-10-16

1. 下載VMWare, 安裝CentOS9虛擬機器

2. 配置使用者,建立目錄

1. 以管理員身份登入,建立Spark使用者給Spark使用

  

sudo adduser sparkuser

2. 修改新使用者密碼 (123456)

  

sudo passwd sparkuser

  

3. 給新使用者Sparkuser Sudo許可權

  切換到Root: su -

  給sparkuser許可權: sparkuser ALL=(ALL) NOPASSWD:ALL

  退出儲存: :qw

4. 以新建的sparkuser使用者登入,建立Spark目錄

  

sudo mkdir /opt/spark

  

5. 修改spark目錄owner為sparkuser

  

sudo chown -R sparkuser:sparkuser /opt/spark

  

3. 下載spark包,上傳到虛擬機器,解壓到spark目錄

  

sudo tar -xvzf spark-3.5.3-bin-hadoop3.tgz -C /opt/spark --strip-components=1

  

(The --strip-components=1 option removes the top-level directory from the extracted files, so they go directly into /opt/spark.)

  sudo chown -R sparkuser:sparkuser /opt/spark

4. 設定環境變數

Add Spark to your PATH by editing the .bashrc or .bash_profile of the Spark user.

echo "export SPARK_HOME=/opt/spark" >> /home/sparkuser/.bashrc

echo "export PATH=\$PATH:\$SPARK_HOME/bin" >> /home/sparkuser/.bashrc

source /home/sparkuser/.bashrc

  

5. JAVA Setup

  安裝Java

  

sudo yum install java-11-openjdk-devel

  

  檢視版本

  

java -version

  

  檢視路徑

  

readlink -f $(which java)

  

  設定環境變數

  

echo "export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.20.1.1-2.el9.x86_64" >> /home/sparkuser/.bashrc

echo "export PATH=$JAVA_HOME/bin:$PATH" >> /home/sparkuser/.bashrc

source /home/sparkuser/.bashrc

  

6. 啟動Spark

  

spark-shell

  

7. 啟動spark deltalake

  

bin/spark-shell --packages io.delta:delta-spark_2.12:3.2.0 \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

  

8. 測試deltalake

val data = spark.range(0, 5)
data.write.format("delta").save("/tmp/delta-table")

  

相關文章