DolphinScheduler分散式叢集部署指南(小白版)

海豚调度發表於2024-06-27

官方文件地址:https://dolphinscheduler.apache.org/zh-cn/docs/3.1.9

DolphinScheduler簡介

摘自官網:Apache DolphinScheduler 是一個分散式易擴充套件的視覺化DAG工作流任務排程開源系統。適用於企業級場景,提供了一個視覺化操作任務、工作流和全生命週期資料處理過程的解決方案。

Apache DolphinScheduler 旨在解決複雜的大資料任務依賴關係,併為應用程式提供資料和各種 OPS 編排中的關係。 解決資料研發ETL依賴錯綜複雜,無法監控任務健康狀態的問題。 DolphinScheduler 以 DAG(Directed Acyclic Graph,DAG)流式方式組裝任務,可以及時監控任務的執行狀態,支援重試、指定節點恢復失敗、暫停、恢復、終止任務等操作。

file

專案安裝依賴環境

  • Linux CentOS == 7.6.18(3臺)
  • JDK == 1.8.151
  • Zookeeper == 3.8.3
  • MySQL == 5.7.30
  • dolhpinscheduler == 3.1.9

環境準備

通用叢集環境準備

準備虛擬機器

IP地址 主機名 CPU配置 記憶體配置 磁碟配置 角色說明
192.168.10.100 hadoop01 4U 8G 100G DS NODE
192.168.10.101 hadoop02 4U 8G 100G DS NODE
192.168.10.102 hadoop03 4U 8G 100G DS NODE

在所有的主機上執行以下命令:

cat >> /etc/hosts << "EOF"
192.168.10.100 hadoop01
192.168.10.101 hadoop02
192.168.10.102 hadoop03
EOF

修改軟體源

替換yum的映象源為清華源

sudo sed -e 's|^mirrorlist=|#mirrorlist=|g' \
         -e 's|^#baseurl=http://mirror.centos.org|baseurl=https://mirrors.tuna.tsinghua.edu.cn|g' \
         -i.bak \
         /etc/yum.repos.d/CentOS-*.repo

修改終端顏色

cat << EOF >> ~/.bashrc
PS1="\[\e[37;47m\][\[\e[32;47m\]\u\[\e[34;47m\]@\h \[\e[36;47m\]\w\[\e[0m\]]\\$ "
EOF

讓修改生效

source ~/.bashrc

修改sshd服務最佳化

sed -ri 's@UseDNS yes@UseDNS no@g' /etc/ssh/sshd_config

sed -ri 's#GSSAPIAuthentication yes#GSSAPIAuthentication no#g' /etc/ssh/sshd_config

grep ^UseDNS /etc/ssh/sshd_config

grep ^GSSAPIAuthentication /etc/ssh/sshd_config`

關閉防火牆

systemctl disable --now firewalld && systemctl is-enabled firewalld

systemctl status firewalld

禁用selinux

sed -ri 's#(SELINUX=)enforcing#\1disabled#' /etc/selinux/config

grep ^SELINUX= /etc/selinux/config

setenforce 0

getenforce

配置叢集免密登入和同步指令碼

1)修改主機列表

cat >> /etc/hosts << 'EOF'
192.168.10.100 hadoop01
192.168.10.101 hadoop02
192.168.10.102 hadoop03
EOF

2)hadoop01節點上生成金鑰對

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa -q

3)hadoop01配置所有叢集節點的免密登入

for ((host_id=1;host_id<=3;host_id++));do ssh-copy-id hadoop0${host_id} ;done

4)免密登入測試

ssh hadoop01
ssh hadoop02
ssh hadoop03

5)所有節點安裝rsync資料同步工具

線上安裝

yum install -y rsync

離線安裝方式一

yum localinstall -y rsync-2.7.0.rpm

離線安裝方式二

rpm -ivh rsync-2.7.0.rpm --force --nodeps

6)編寫同步指令碼

vim /usr/local/sbin/data_rsync.sh

指令碼內容如下:

#!/bin/bash
# Author: kkarma

if  [ $# -ne 1 ];then
    echo "Usage: $0 /path/to/file(絕對路徑)"
	exit
fi

#判斷檔案是否存在
if  [ ! -e $1 ];then
    echo "[ $1 ] dir or file not found!"
	exit
fi

# 獲取父路徑
fullpath=`dirname $1`

# 獲取子路徑
basename=`basename $1`

# 進入到父路徑
cd $fullpath

for ((host_id=1;host_id<=3;host_id++))
    do
	  # 使得終端輸出變為綠色
	  tput setaf 2
	  echo ==== rsyncing hadoop0${host_id}: $basename ====
	  # 使得終端恢復原來的顏色
	  tput setaf 7
	  # 將資料同步到其他兩個節點
	  rsync -az $basename `whoami@hadoop0${host_id}:$fullpath`
	  if [ $? -eq 0 ];then
	      echo "命令執行成功!"
	  fi
done

7)授權同步指令碼

chmod 755 /usr/local/sbin/data_rsync.sh

2.1.8.叢集時間同步

1)安裝常用的Linux工具

yum install -y vim net-tools

2)安裝chrony服務

yum install -y ntpdate chrony

3)修改chrony服務配置檔案

vim /etc/chrony.conf

註釋掉官方的時間伺服器,換成國內的時間伺服器即可

server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst

4)配置chronyd服務開機自啟

systemctl enable --now chronyd

5)檢視chronyd服務

systemctl status chronyd

修改sysctl.conf系統配置

編輯sysctl.conf檔案

vm.swappiness = 0
kernel.sysrq = 1

net.ipv4.neigh.default.gc_stale_time = 120

# see details in https://help.aliyun.com/knowledge_detail/39428.html
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_announce = 2

# see details in https://help.aliyun.com/knowledge_detail/41334.html
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.tcp_synack_retries = 2



fs.file-max = 6815744
vm.max_map_count = 262144
fs.aio-max-nr = 1048576
kernel.shmall = 2097152
kernel.shmmax = 536870912
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.suid_dumpable=1

net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048586

修改limit.conf配置檔案

在/etc/security/limits.conf檔案的末尾追加以下內容
如果已經建立了專門用來管理Elasticsearch的賬號(例如賬號名稱為elastic),則配置如下:

elastic soft nofile 65535
elastic hard nofile 65535

如果嫌麻煩, 直接使用下面這種配置也可以

* soft nofile 65535
* hard nofile 65535

以上修改完成之後,建議重啟伺服器讓系統配置生效。

JDK安裝

這部分跳過,很簡單,基本隨便找個部落格文章照著配置就能搞定。

叢集安裝

這裡本來想跳過安裝, 直接使用CDH叢集中的zookeeper叢集的,實際操作發現當使用低版本的Zookeeper叢集,並在dolphinscheduler打包時進行低版本ZK適配之後,
部署成功之後總是叢集啟動總是會出現各種問題,所以這裡就不折騰了,直接另外安裝了一組Zookeeper叢集, 下面給大家講講Zookeeper叢集的安裝部署方式

下載安裝

首先配置叢集的主機名,確保透過主機名稱可以相互訪問叢集節點

vim /etc/hosts

在檔案中追加如下內容(所有節點都需要進行此操作

192.168.10.100 hadoop01
192.168.10.101 hadoop02
192.168.10.102 hadoop03

Zookkeper下載地址:https://zookeeper.apache.org/releases.html#download

下載之後將安裝包上傳到所有的叢集主機上,解壓安裝到/opt/software

file

file
在安裝目錄下,建立data和logs目錄(所有節點都需要進行此操作)

mkdir -p /opt/software/zookeeper/data

mkdir -p /opt/software/zookeeper/logs

file

叢集配置

進入到安裝目錄下的conf目錄/opt/software/zookeeper/conf,配置zookeeper的配置檔案zoo.cfg

複製zoo_sample.cfg檔案並重新命名為zoo.cfg(所有節點都需要進行此操作)

file

cp /opt/software/zookeeper/conf/zoo_sample.cfg /opt/software/zookeeper/conf/zoo.cfg

配置檔案的修改內容如下:

tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/opt/software/zookeeper/data
# the port at which the clients will connect
# 這裡為了避免與主機上的hadoop叢集依賴的Zookeeper叢集發生衝突, 修改了服務端的埠以及ZK節點之間的通訊埠
clientPort=2191
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

## Metrics Providers
#
# https://prometheus.io Metrics Exporter
#metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
#metricsProvider.httpPort=7000
#metricsProvider.exportJvmInfo=true

# zookeeper新版本啟動的過程中,zookeeper新增的稽核日誌是預設關閉,在windows下啟動需要開啟
#audit.enable=true

# 這裡指定Zookeeper叢集的內部通訊配置, 有幾個節點就配置幾條
server.1=hadoop01:2999:3999
server.2=hadoop02:2999:3999
server.3=hadoop03:2999:3999

配置叢集中各個節點的server_id, 這個配置需要和在zoo.cfg檔案中的配置保持一致:

在hadoop01節點上執行以下命令

echo 1 > /opt/software/zookeeper/data/myid

file

在hadoop02節點上執行以下命令

echo 2 > /opt/software/zookeeper/data/myid

在hadoop03節點上執行以下命令

echo 3 > /opt/software/zookeeper/data/myid

測試驗證

首先設定叢集的啟停指令碼

vim /opt/software/zookeeper/zk-start-all.sh

指令碼的內容如下:

注意:

  • zookeeper叢集的啟動依賴JDK, 會用到JAVA_HOME變數, 所以需要先安裝JDK,設定JAVA的系統環境變數
  • 以下指令碼的執行,如果沒有配置叢集的免密登入,每次都需要輸入密碼,所以需要先進行叢集免密登入設定
#!/bin/bash

case $1 in
"start"){
    #遍歷叢集所有機器
	for i in hadoop01 hadoop02 hadoop03
	do
		#控制檯輸出日誌
		echo =============zookeeper $i 啟動====================
		#啟動命令
		ssh $i "/opt/software/zookeeper/bin/zkServer.sh start"
	done
}
;;
"stop"){
	for i in hadoop01 hadoop02 hadoop03
	do
		echo =============zookeeper $i 停止====================
		ssh $i "/opt/software/zookeeper/bin/zkServer.sh stop"
	done
}
;;
"status"){
	for i in hadoop01 hadoop02 hadoop03
	do
		echo =============檢視 zookeeper $i 狀態====================
		ssh $i "/opt/software/zookeeper/bin/zkServer.sh status"
	done
}
;;
esac
chmod 755 /opt/software/zookeeper/zk-start-all.sh

我這裡已經啟動過叢集正在使用,就不演示啟動了,演示一下查詢狀態命令,/opt/software/zookeeper/zk-start-all.sh status,出現如下報錯:

file

解決方法: 找到每臺節點主機的/opt/software/zookeeper/bin/zkEnv.sh檔案,在指令碼檔案程式碼部分的最前面 加上自己的JAVA_HOME路徑即可。

file

進入hadoop01的/opt/software/zookeeper目錄下,執行./zk-start-all.sh status命令檢視Zookeeper 叢集狀態,返回結果如下圖:OK,叢集的啟停指令碼基本沒啥問題了。

file

zk叢集啟停、狀態查詢的命令如下:

sh /opt/software/zookeeper/zk-start-all.sh start

# 停止zookeeper叢集
sh /opt/software/zookeeper/zk-start-all.sh stop

# 可以查詢叢集各節點的狀態跟角色資訊
sh /opt/software/zookeeper/zk-start-all.sh status

MySQL安裝

MySQL安裝可以參考我的另外一篇部落格伺服器linux-CentOS7.系統下使用mysql..tar.gz包安裝mysql資料庫詳解

叢集部署

下載DolphinScheduler

下載地址:https://dlcdn.apache.org/dolphinscheduler/3.1.9/apache-dolphinscheduler-3.1.9-bin.tar.gz

直接透過wget命令下載到伺服器的某個路徑下,如果伺服器無法聯網, 只能先聯網下載二進位制安裝包到本地,然後再透過ssh客戶端工具上傳到伺服器叢集的每個節點。

建立dolphinscheduler的叢集執行賬戶並設定

建立安裝執行dolphinscheduler叢集的使用者ds
在root賬號下,執行新增普通使用者的命令

useradd dolphinscheduler

設定dolphinscheduler使用者的密碼

passwd dolphinscheduler

dolphinscheduler使用者具有執行sudo命令免密執行的許可權

sed -i '$adolphinscheduler  ALL=(ALL)  NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults    requirett/#Defaults    requirett/g' /etc/sudoers

複製二進位制安裝包apache-dolphinscheduler-3.1.9-bin.tar.gz/opt/packages目錄(沒有則建立此目錄)下

修改apache-dolphinscheduler-3.1.9-bin.tar.gz安裝包的所屬使用者和使用者組為dolphinscheduler

chmod -R dolphinscheduler:dolphinscheduler /opt/packages/apache-dolphinscheduler-3.1.9-bin.tar.gz

配置使用者的叢集免密登入

切換到dolphinscheduler使用者,配置叢集免密(這裡只需要在hadoop01上執行就可以)

2)hadoop01節點上生成金鑰對

ssh-keygen -t rsa

3)hadoop01配置所有叢集節點的免密登入

for ((host_id=1;host_id<=3;host_id++));do ssh-copy-id hadoop0${host_id} ;done

4)免密登入測試

ssh hadoop01
ssh hadoop02
ssh hadoop03

資料庫初始化

dolphinscheduler預設使用的資料庫的名稱是dolphinscheduler, 我們這裡先建立資料庫並建立管理使用者並授權

create database `dolphinscheduler` DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_general_ci;

-- 建立 dolphinScheduler 使用者專門使用者管理dolphinscheduler資料庫
CREATE USER 'dolphinscheduler'@'%' IDENTIFIED BY 'dolphinscheduler';

-- 給予庫的訪問許可權
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%';

-- 讓許可權配置修改生效
flush privileges;

解壓二進位制安裝包

tar -zxf /opt/packages/apache-dolphinscheduler-3.1.9-bin.tar.gz

mv 

修改安裝指令碼和引數配置

dolphinscheduler中主要包含api-servermaster-serverworker-server三個服務,配置檔案 /opt/oackages/apache-dolphinscheduler-3.1.9-bin/bin/env/install_env.sh 主要就是用來配置哪些機器將被安裝 DolphinScheduler 以及每臺機器對應安裝哪些服務。

# INSTALL MACHINE
# ---------------------------------------------------------
# A comma separated list of machine hostname or IP would be installed DolphinScheduler,
# including master, worker, api, alert. If you want to deploy in pseudo-distributed
# mode, just write a pseudo-distributed hostname
# Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"
#ips=${ips:-"ds1,ds2,ds3,ds4,ds5"}
ips="hadoop01,hadoop02,hadoop03"

# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
sshPort=${sshPort:-"22"}

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# ---------------------------------------------------------
# INSTALL MACHINE
# ---------------------------------------------------------
# A comma separated list of machine hostname or IP would be installed DolphinScheduler,
# including master, worker, api, alert. If you want to deploy in pseudo-distributed
# mode, just write a pseudo-distributed hostname
# Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"
#ips=${ips:-"ds1,ds2,ds3,ds4,ds5"}

# 在哪些主機節點上安裝Dolphinscheduler,多臺服務之間使用英文逗號分隔
ips="hadoop01,hadoop02,hadoop03"

# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
sshPort=${sshPort:-"22"}

# A comma separated list of machine hostname or IP would be installed Master server, it
# must be a subset of configuration `ips`.
# Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"
#masters=${masters:-"hadoop01"}

# 叢集中那些被指定為master節點,多臺服務之間使用英文逗號分隔
masters="hadoop01,hadoop02"

# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a
# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts
# Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"
#workers=${workers:-"ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"}

# 叢集中那些被指定為worker節點,多臺服務之間使用英文逗號分隔,那幾臺被指定為預設,就在節點名稱後新增":default"
workers="hadoop02:default,hadoop03:default"

# A comma separated list of machine hostname or IP would be installed Alert server, it
# must be a subset of configuration `ips`.
# Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"
#alertServer=${alertServer:-"ds3"}

# 叢集中那些被指定為alert告警節點,多臺服務之間使用英文逗號分隔
alertServer="hadoop03"

# A comma separated list of machine hostname or IP would be installed API server, it
# must be a subset of configuration `ips`.
# Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"
#apiServers=${apiServers:-"ds1"}

# 叢集中那個節點用來安裝api-server服務
apiServers="hadoop01"

# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.
# Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path.
#installPath=${installPath:-"/tmp/dolphinscheduler"}
#installPath="/opt/software/dolphinscheduler"

# dolphinscheduler在叢集中的預設安裝路徑/opt/software/dolphinscheduler
installPath="/opt/software/dolphinscheduler"

# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`
# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs
# to be created by this user
# 指定dolphinscheduler叢集的安裝使用者
deployUser=${deployUser:-"dolphinscheduler"}

# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.
#zkRoot=${zkRoot:-"/dolphinscheduler"}

# 指定dolphinscheduler叢集在zookeeper中的註冊根路徑
zkRoot=${zkRoot:-"/dolphinscheduler"}

配置檔案 /opt/packages/apache-dolphinscheduler-3.1.9-bin/bin/env/dolphinscheduler_env.sh 主要就是用來配置 DolphinScheduler 的資料庫連線資訊、一些dolphinscheduler支援的排程任務型別外部依賴路徑或庫檔案,如 JAVA_HOMEDATAX_HOMESPARK_HOME 都是在這裡定義的。

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# JAVA_HOME, will use it to start DolphinScheduler server
#export JAVA_HOME=${JAVA_HOME:-/opt/java/openjdk}

#配置JAVA_HOME變數
export JAVA_HOME=${JAVA_HOME:-/usr/java/jdk1.8.0_181-cloudera}

# Database related configuration, set database type, username and password
#export SPRING_DATASOURCE_URL


#配置Dolphinscheduler的資料庫連線資訊
export SPRING_DATASOURCE_URL="jdbc:mysql://localhost:3306/dolphinscheduler?serverTimezone=UTC&useTimezone=true&useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai"
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-GMT+8}
export SPRING_DATASOURCE_USERNAME=dolphinscheduler
export SPRING_DATASOURCE_PASSWORD=dolphinscheduler

# DolphinScheduler server related configuration
export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}

# Registry center configuration, determines the type and link of the registry center

#配置Dolphinscheduler的使用的註冊中心型別為Zookeeper
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
#export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-localhost:2191}

#配置Dolphinscheduler的使用的註冊中心zookeeper叢集連線資訊
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-hadoop01:2191,hadoop02:2191,hadoop03:2191}

# Tasks related configurations, need to change the configuration if you use the related tasks.
#Dolphinscheduler中各個任務型別相關的系統環境變數配置,找到你可能使用到的任務型別可能使用到的服務在伺服器上的安裝路徑,配置到這裡就可以,最好在叢集安裝之前配置好
#export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
#export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
#export HADOOP_CONF_DIR=etc/hadoop/conf
#export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
#export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
#export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
#export PYTHON_HOME=/opt/soft/python
#export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
#export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
#export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
#export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/soft/seatunnel}
#export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}
#export SQOOP_HOME=${SQOOP_HOME:-/opt/soft/sqoop}

export PATH=$HADOOP_HOME/bin:$SQOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$SEATUNNEL_HOME/bin:$CHUNJUN_HOME/bin:$PATH

關閉Python 閘道器(預設開啟)

Python 閘道器服務會預設與 api-server 一起啟動,如果不想啟動則需要更改 api-server 配置檔案 /opt/packages/apache-dolphinscheduler-3.1.9-bin/api-server/conf/application.yaml 中的 python-gateway.enabled : false 來禁用它。

vim ./api-server/conf/application.yaml

file

執行資料庫初始化指令碼

#切換到資料庫指令碼所在目錄
cd /opt/packages/apache-dolphinscheduler-3.1.9-bin/tools/sql/sql
#從SQL備份檔案中還原資料庫
mysql -udolphinscheduler -p dolphinscheduler < dolphinscheduler_mysql.sql

配置資料來源驅動檔案

MySQL 驅動檔案必須使用 JDBC Driver 8.0.16 及以上的版本,需要手動下載 mysql-connector-java 並移動到 DolphinScheduler 的每個模組的 libs 目錄下,其中包括 5 個目錄:

/opt/packages/apache-dolphinscheduler-3.1.9-bin/api-server/libs

/opt/packages/apache-dolphinscheduler-3.1.9-bin/alert-server/libs

/opt/packages/apache-dolphinscheduler-3.1.9-bin/master-server/libs

/opt/packages/apache-dolphinscheduler-3.1.9-bin/worker-server/libs

/opt/packages/apache-dolphinscheduler-3.1.9-bin/tools/libs

將mysql的驅動複製到這些模組的依賴路徑下

cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/api-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/alert-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/master-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/worker-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/tools/libs/

當然除了mysql之外,可能還涉及SQLServer、Oracle、Hive等資料來源驅動,整合方式和MySQL是一樣的, 不過最好在叢集安裝之前就將需要的依賴都提前新增到對應模組的libs目錄下, 這樣叢集安裝之後就不用再處理了, 不過之後再處理資料來源依賴也是可以的。
file

以上資料庫依賴有需要可以私信流郵箱,我看到會發給你們的。

執行叢集安裝

首先,再次修改/opt/packages/apache-dolphinscheduler-3.1.9-bin的所屬使用者和使用者組為dolphinscheduler

chmod -R dolphinscheduler:dolphinscheduler /opt/packages/apache-dolphinscheduler-3.1.9-bin

切換到dolphinscheudler使用者

su - dolphinscheudler

切換到解壓根目錄

cd /opt/packages/apache-dolphinscheduler-3.1.9-bin

執行叢集安裝指令碼install.sh

./bin/install.sh

安裝指令碼執行完成後, 會自動檢測叢集各個節點的資訊

file

叢集啟停測試

安裝完成之後, 所有節點上Dolphinscheduler服務的預設安裝目錄都是/opt/software/dolphinscheduler

啟動之前, 確保zookeeper服務正常啟動, 否則叢集無法正常啟動成功。

hadoop01節點上切換到dolphinscheduler系統使用者

su - dolphinscheduler

切換到dolphinscheduler安裝目錄

cd /opt/software/dolphinscheduler

執行叢集常用操作命令

#一鍵啟動叢集命令
./bin/start-all.sh

#一鍵停止叢集命令
./bin/stop-all.sh

#一鍵查詢叢集狀態命令
./bin/status-all.sh

訪問UI地址:http://hadoop01的IP:12345/dolphinscheduler/ui

使用者名稱:admin 密碼:dolphinscheduler123

file

file

OK, 至此DolphinScheduler分散式叢集就搭建完成了。

本文由 白鯨開源 提供釋出支援!

相關文章