高效排程新篇章：詳解DolphinScheduler 3.2.0生產級叢集搭建

海豚调度發表於2024-05-15

原文網址 : https://www.cnblogs.com/DolphinScheduler/p/18194381

轉載自tuoluzhe8521

導讀：透過簡化複雜的任務依賴關係， DolphinScheduler為資料工程師提供了強大的工作流程管理和排程能力。在3.2.0版本中，DolphinScheduler帶來了一系列新功能和改進，使其在生產環境中的穩定性和可用性得到了顯著提升。

為了幫助讀者更好地理解和應用這一版本，我們精心準備了這篇DolphinScheduler 3.2.0生產叢集高可用搭建全攻略，深入探討如何在生產環境中搭建一個高可用的DolphinScheduler叢集，包括但不限於環境準備、資料庫配置、使用者許可權設定、SSH免密登陸配置、ZooKeeper啟動、以及服務的啟動與停止等關鍵步驟。

1. 環境準備

1.1 叢集規劃

file

本次安裝環境為contos7.9

1.2 元件下載地址

DolphinScheduler-3.20官網：https://dolphinscheduler.apache.org/zh-cn/download/3.2.0

官網安裝文件：https://dolphinscheduler.apache.org/zh-cn/docs/3.2.0/guide/installation/cluster

1.3 前置準備工作

JDK：下載JDK (1.8+)，安裝並配置 JAVA_HOME 環境變數，並將其下的 bin 目錄追加到 PATH 環境變數中。如果你的環境中已存在，可以跳過這步。
二進位制包：在下載頁面下載 DolphinScheduler 二進位制包
資料庫：PostgreSQL (8.2.15+) 或者 MySQL (5.7+)，兩者任選其一即可，如 MySQL 則需要 JDBC Driver 8.0.16
註冊中心：ZooKeeper (3.8.0+)，下載地址
程序樹分析

macOS 安裝pstree
Fedora/Red/Hat/CentOS/Ubuntu/Debian 安裝psmisc

[hadoop@hadoop1 ~]$ sudo yum install -y psmisc

注意: DolphinScheduler 本身不依賴 Hadoop、Hive、Spark，但如果你執行的任務需要依賴他們，就需要有對應的環境支援

2.DolphinScheduler叢集安裝

2.1 解壓安裝包

上傳DolphinScheduler安裝包到hadoop1節點的/data/software目錄
解壓安裝包到當前目錄

注：解壓目錄並非最終的安裝目錄

[hadoop@hadoop1 software]$ tar -zxvf apache-dolphinscheduler-3.2.0-bin

2.2 配置資料庫

DolphinScheduler 後設資料儲存在關係型資料庫中，故需建立相應的資料庫和使用者。

mysql -uroot -p
//建立資料庫
mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
//建立使用者
//修改 {user} 和 {password} 為你希望的使用者名稱和密碼
mysql> CREATE USER '{user}'@'%' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%';
mysql> CREATE USER '{user}'@'localhost' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost';
mysql> FLUSH PRIVILEGES;

注：
若出現以下錯誤資訊，表明新建使用者的密碼過於簡單。
ERROR 1819 (HY000): Your password does not satisfy the current policy requirements
可提高密碼複雜度或者執行以下命令降低MySQL密碼強度級別。

mysql> set global validate_password_policy=0;
mysql> set global validate_password_length=4;

賦予使用者相應許可權

mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%';
mysql> flush privileges;

如果使用 MySQL 需要手動下載 mysql-connector-java 驅動 (8.0.31) 並移動到 DolphinScheduler 的每個模組的 libs 目錄下，其中包括 api-server/libs 和 alert-server/libs 和 master-server/libs 和 worker-server/libs 和 tools/libs。
注意：如果你只是想要在資料來源中心使用 MySQL，則對 MySQL JDBC 驅動的版本沒有要求，如果你想要將 MySQL 作為 DolphinScheduler 的後設資料庫，則僅支援 8.0.16 及以上的版本。

echo /data/software/dolphinscheduler-3.2.0/master-server/libs/ /data/software/dolphinscheduler-3.2.0/alert-server/libs/ /data/software/dolphinscheduler-3.2.0/api-server/libs/ /data/software/dolphinscheduler-3.2.0/worker-server/libs/ /data/software/dolphinscheduler-3.2.0/tools/libs/ | xargs -n 1 cp -v /data/software/mysql-8.0.31/mysql-connector-j-8.0.31.jar

2.2 準備 DolphinScheduler 啟動環境

配置使用者免密及許可權

如果已有haodoop叢集的賬號，建議直接使用，無需配置

建立部署使用者，並且一定要配置 sudo 免密。以建立 hadoop 使用者為例

# 建立使用者需使用 root 登入
useradd hadoop

# 新增密碼
echo "hadoop" | passwd --stdin hadoop

# 配置 sudo 免密
sed -i '$ahadoop  ALL=(ALL)  NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults    requirett/#Defaults    requirett/g' /etc/sudoers

# 修改目錄許可權，使得部署使用者對二進位制包解壓後的 apache-dolphinscheduler-*-bin 目錄有操作許可權
chown -R hadoop:hadoop apache-dolphinscheduler-*-bin
chmod -R 755 apache-dolphinscheduler-*-bin

注意:
1.因為任務執行服務是以 sudo -u {linux-user} 切換不同 linux 使用者的方式來實現多租戶執行作業，所以部署使用者需要有 sudo 許可權，而且是免密的。初學習者不理解的話，完全可以暫時忽略這一點
2.如果發現 /etc/sudoers 檔案中有 “Defaults requirett” 這行，也請註釋掉

配置機器 SSH 免密登陸

由於安裝的時候需要向不同機器傳送資源，所以要求各臺機器間能實現 SSH 免密登陸。配置免密登陸的步驟如下

su hadoop

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

注意: 配置完成後，可以透過執行命令 ssh localhost 判斷是否成功，如果不需要輸入密碼就能 ssh 登陸則證明成功

2.3 啟動 zookeeper(hadoop叢集已有無需配置)

進入 zookeeper 的安裝目錄，將 zoo_sample.cfg 配置檔案複製到 conf/zoo.cfg，並將 conf/zoo.cfg 中 dataDir 中的值改成 dataDir=./tmp/zookeeper

# 啟動 zookeeper
./bin/zkServer.sh start

2.4 修改install_env.sh 檔案

檔案 install_env.sh 描述了哪些機器將被安裝 DolphinScheduler 以及每臺機器對應安裝哪些服務。您可以在路徑 bin/env/install_env.sh 中找到此檔案，可透過以下方式更改 env 變數,export <ENV_NAME>=，配置詳情如下。

ips=${ips:-"hadoop1,hadoop2,hadoop3,hadoop4,hadoop5"}
# modify it if you use different ssh port
sshPort=${sshPort:-"xxx"}

# A comma separated list of machine hostname or IP would be installed Master server, it
# must be a subset of configuration `ips`.
# Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"
masters=${masters:-"hadoop1,hadoop2"}

# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a
# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts
# Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"
workers=${workers:-"hadoop3:default,hadoop4:default,hadoop5:default"}

# A comma separated list of machine hostname or IP would be installed Alert server, it
# must be a subset of configuration `ips`.
# Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"
alertServer=${alertServer:-"hadoop3"}

# A comma separated list of machine hostname or IP would be installed API server, it
# must be a subset of configuration `ips`.
# Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"
apiServers=${apiServers:-"hadoop2"}

# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.
# Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path.
installPath=${installPath:-"/data/module/dolphinscheduler-3.2.0"}

# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`
# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs
# to be created by this user
deployUser=${deployUser:-"hadoop"}

# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.
# It will delete ${zkRoot} in the zookeeper when you run install.sh, so please keep it same as registry.zookeeper.namespace in yml files.
# Similarly, if you want to modify the value, please modify registry.zookeeper.namespace in yml files as well.
zkRoot=${zkRoot:-"/dolphinscheduler"}

2.5 修改 dolphinscheduler_env.sh 檔案

檔案 ./bin/env/dolphinscheduler_env.sh 描述了下列配置：
DolphinScheduler 的資料庫配置，詳細配置方法見[初始化資料庫]，一些任務型別外部依賴路徑或庫檔案，如 JAVA_HOME 和 SPARK_HOME都是在這裡定義的。

如果您不使用某些任務型別，可以忽略任務外部依賴項，但必須根據您的環境更改 JAVA_HOME、註冊中心和資料庫相關配置。

export JAVA_HOME=${JAVA_HOME:-/data/module/jdk1.8.0_212}
# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-mysql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL="jdbc:mysql://xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8"
export SPRING_DATASOURCE_USERNAME=xxx
export SPRING_DATASOURCE_PASSWORD=xxx

# Registry center configuration, determines the type and link of the registry center
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-xxxx:2181,xxx:2181,xxx:2181}


export HADOOP_HOME=${HADOOP_HOME:-/data/module/hadoop-3.3.4}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/data/module/hadoop-3.3.4/etc/hadoop}
export SPARK_HOME1=${SPARK_HOME1:-/data/module/spark-3.3.1}
#export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
#export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
export HIVE_HOME=${HIVE_HOME:-/data/module/hive-3.1.3}
export FLINK_HOME=${FLINK_HOME:-/data/module/flink-1.16.2}
export DATAX_HOME=${DATAX_HOME:-/data/module/datax}
#export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/soft/seatunnel}
#export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}

export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH

2.6 初始化資料庫

完成上述步驟後，您已經為 DolphinScheduler 建立一個新資料庫，並在DolphinScheduler配置好，現在你可以透過快速的 Shell 指令碼來初始化資料庫

bash tools/bin/upgrade-schema.sh

2.7 修改application.yaml檔案

共5個檔案，需要修改的部分相同，但裡面其他的配置不相同，需要單獨改寫分別為：

master-server/conf/application.yaml
api-server/conf/application.yaml
worker-server/conf/application.yaml
alert-server/conf/application.yaml
tools/conf/application.yaml

   datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
    username: xxx
    password: xxx

registry:
  type: zookeeper
  zookeeper:
    namespace: dolphinscheduler
    connect-string: xxxx
    retry-policy:
      base-sleep-time: 60ms
      max-sleep: 300ms
      max-retries: 5
    session-timeout: 30s
    connection-timeout: 9s
    block-until-connected: 600ms
    digest: ~

spring:
  config:
    activate:
      on-profile: mysql
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql:/xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
    username: xxxx
    password: xxxx
  quartz:
    properties:
      org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobstore.StdJDBCDelegate

2.8 修改common.properties檔案

共5個檔案，需要修改的部分相同，但裡面其他的配置不相同，需要單獨改寫分別為：

master-server/conf/common.properties
api-server/conf/common.properties
worker-server/conf/common.properties
alert-server/conf/common.properties
tools/conf/common.properties

data.basedir.path=自定義本地檔案儲存位置
resource.storage.type=HDFS
# resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
resource.storage.upload.base.path=自定義hdfs的儲存位置
resource.hdfs.root.user=自定義使用者名稱稱，和本文件之前做的配置要一致
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
resource.hdfs.fs.defaultFS=hdfs://xxx:8020
#高可用ip地址
yarn.resourcemanager.ha.rm.ids=xxxx,xxx
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s
# job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
yarn.job.history.status.address=http:/xxx:19888/jobhistory/logs/%s

注：本次dolphinscheduler分散式儲存採用的hdfs，如需其他配置，根據官網介紹配置即可

2.9 分散式儲存hdfs依賴分發

echo /data/software/dolphinscheduler-3.2.0/master-server/conf/ /data/software/dolphinscheduler-3.2.0/alert-server/conf/ /data/software/dolphinscheduler-3.2.0/api-server/conf/ /data/software/dolphinscheduler-3.2.0/worker-server/conf/ | xargs -n 1 cp -v /data/module/hadoop-3.3.4/etc/hadoop/core-site.xml /data/module/hadoop-3.3.4/etc/hadoop/hdfs-site.xml

2.10 啟動 DolphinScheduler

使用上面建立的部署使用者執行以下命令完成部署，部署後的執行日誌將存放在 logs 資料夾內

bash ./bin/install.sh

注意: 第一次部署的話，可能出現 5 次sh: bin/dolphinscheduler-daemon.sh: No such file or directory相關資訊，此為非重要資訊直接忽略即可

2.11 登入 DolphinScheduler

瀏覽器訪問地址 http://localhost:12345/dolphinscheduler/ui 即可登入系統 UI。預設的使用者名稱和密碼是 admin/dolphinscheduler123

3.起停服務

# 一鍵停止叢集所有服務
bash ./bin/stop-all.sh

# 一鍵開啟叢集所有服務
bash ./bin/start-all.sh

# 啟停 Master
bash ./bin/dolphinscheduler-daemon.sh stop master-server
bash ./bin/dolphinscheduler-daemon.sh start master-server

# 啟停 Worker
bash ./bin/dolphinscheduler-daemon.sh start worker-server
bash ./bin/dolphinscheduler-daemon.sh stop worker-server

# 啟停 Api
bash ./bin/dolphinscheduler-daemon.sh start api-server
bash ./bin/dolphinscheduler-daemon.sh stop api-server

# 啟停 Alert
bash ./bin/dolphinscheduler-daemon.sh start alert-server
bash ./bin/dolphinscheduler-daemon.sh stop alert-server

原文連結：https://blog.csdn.net/Brother_ning/article/details/135149045

本文由白鯨開源提供釋出支援！

centos7搭建dolphinscheduler叢集
2021-01-24
CentOS
阿里巴巴 Sigma 排程和叢集管理系統架構詳解
2018-04-19
阿里架構
Apache DolphinScheduler 限制秒級別的定時排程
2024-12-10
Apache
linux搭建LVS+keepalive+nginx實現叢集高效能負載均衡配置詳解
2020-12-18
LinuxNginx負載
搭建 MySQL 高可用高效能叢集
2021-06-20
MySql
Kubernetes叢集排程器原理剖析及思考
2019-04-01
kubernetes叢集內排程與負載均衡
2020-11-21
負載
大資料排程元件之Apache DolphinScheduler
2024-11-20
大資料元件Apache
Apache DolphinScheduler + OceanBase，搭建分散式大資料排程平臺的實踐
2024-11-07
Apache分散式大資料
基於 Rainbond 部署 DolphinScheduler 高可用叢集
2022-07-14
AI
搭建zookeeper叢集（偽叢集）
2020-11-22
CNCF 沙箱專案 OCM Placement 多叢集排程指南
2022-02-22
深入剖析Redis系列(三) - Redis叢集模式搭建與原理詳解
2018-09-05
Redis模式
Kubernetes叢集日誌詳解
2022-01-19
zookeeper叢集及kafka叢集搭建
2021-06-28
Kafka
Spring Boot Quartz 分散式叢集任務排程實現
2019-06-21
Spring Bootquartz分散式
美團叢集排程系統的雲原生實踐
2022-02-22
搭建kubernetes 叢集的安裝過程和方法
2022-04-06
spark叢集搭建整理之解決億級人群標籤問題
2018-05-29
Spark
1. 企業級排程器LVS初識、工作模式詳解
2022-03-16
模式
linux下搭建ZooKeeper叢集（偽叢集）
2019-03-27
Linux
Redis系列：搭建Redis叢集(叢集模式)
2020-09-23
Redis模式
搭建ELK叢集
2018-11-15
Ambari叢集搭建
2018-11-28
kafka叢集搭建
2019-01-19
Kafka
Hadoop搭建叢集
2018-06-26
Hadoop
zookeeper 叢集搭建
2020-09-23
搭建 Redis 叢集
2020-10-04
Redis
nacos 叢集搭建
2021-08-09
mysql叢集搭建
2021-10-19
MySql
redis叢集搭建
2021-09-13
Redis
Hadoop叢集搭建
2023-02-21
Hadoop
Zookeeper叢集搭建
2023-01-13
RabbitMQ叢集搭建
2020-12-18
MQ
HBASE叢集搭建
2020-11-29
【詳細教程】Linux安裝redis並搭建叢集
2020-10-13
LinuxRedis
Elasticsearch叢集搭建教程及生產環境配置
2021-09-01
Elasticsearch
特性更新！DistSQL 叢集治理能力詳解
2022-03-26
SQL