官方文件地址:https://dolphinscheduler.apache.org/zh-cn/docs/3.1.9
DolphinScheduler簡介
摘自官網:Apache DolphinScheduler 是一個分散式易擴充套件的視覺化DAG工作流任務排程開源系統。適用於企業級場景,提供了一個視覺化操作任務、工作流和全生命週期資料處理過程的解決方案。
Apache DolphinScheduler 旨在解決複雜的大資料任務依賴關係,併為應用程式提供資料和各種 OPS 編排中的關係。 解決資料研發ETL依賴錯綜複雜,無法監控任務健康狀態的問題。 DolphinScheduler 以 DAG(Directed Acyclic Graph,DAG)流式方式組裝任務,可以及時監控任務的執行狀態,支援重試、指定節點恢復失敗、暫停、恢復、終止任務等操作。
專案安裝依賴環境
- Linux CentOS == 7.6.18(3臺)
- JDK == 1.8.151
- Zookeeper == 3.8.3
- MySQL == 5.7.30
- dolhpinscheduler == 3.1.9
環境準備
通用叢集環境準備
準備虛擬機器
IP地址 | 主機名 | CPU配置 | 記憶體配置 | 磁碟配置 | 角色說明 |
---|---|---|---|---|---|
192.168.10.100 | hadoop01 | 4U | 8G | 100G | DS NODE |
192.168.10.101 | hadoop02 | 4U | 8G | 100G | DS NODE |
192.168.10.102 | hadoop03 | 4U | 8G | 100G | DS NODE |
在所有的主機上執行以下命令:
cat >> /etc/hosts << "EOF"
192.168.10.100 hadoop01
192.168.10.101 hadoop02
192.168.10.102 hadoop03
EOF
修改軟體源
替換yum的映象源為清華源
sudo sed -e 's|^mirrorlist=|#mirrorlist=|g' \
-e 's|^#baseurl=http://mirror.centos.org|baseurl=https://mirrors.tuna.tsinghua.edu.cn|g' \
-i.bak \
/etc/yum.repos.d/CentOS-*.repo
修改終端顏色
cat << EOF >> ~/.bashrc
PS1="\[\e[37;47m\][\[\e[32;47m\]\u\[\e[34;47m\]@\h \[\e[36;47m\]\w\[\e[0m\]]\\$ "
EOF
讓修改生效
source ~/.bashrc
修改sshd服務最佳化
sed -ri 's@UseDNS yes@UseDNS no@g' /etc/ssh/sshd_config
sed -ri 's#GSSAPIAuthentication yes#GSSAPIAuthentication no#g' /etc/ssh/sshd_config
grep ^UseDNS /etc/ssh/sshd_config
grep ^GSSAPIAuthentication /etc/ssh/sshd_config`
關閉防火牆
systemctl disable --now firewalld && systemctl is-enabled firewalld
systemctl status firewalld
禁用selinux
sed -ri 's#(SELINUX=)enforcing#\1disabled#' /etc/selinux/config
grep ^SELINUX= /etc/selinux/config
setenforce 0
getenforce
配置叢集免密登入和同步指令碼
1)修改主機列表
cat >> /etc/hosts << 'EOF'
192.168.10.100 hadoop01
192.168.10.101 hadoop02
192.168.10.102 hadoop03
EOF
2)hadoop01節點上生成金鑰對
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa -q
3)hadoop01配置所有叢集節點的免密登入
for ((host_id=1;host_id<=3;host_id++));do ssh-copy-id hadoop0${host_id} ;done
4)免密登入測試
ssh hadoop01
ssh hadoop02
ssh hadoop03
5)所有節點安裝rsync資料同步工具
線上安裝
yum install -y rsync
離線安裝方式一
yum localinstall -y rsync-2.7.0.rpm
離線安裝方式二
rpm -ivh rsync-2.7.0.rpm --force --nodeps
6)編寫同步指令碼
vim /usr/local/sbin/data_rsync.sh
指令碼內容如下:
#!/bin/bash
# Author: kkarma
if [ $# -ne 1 ];then
echo "Usage: $0 /path/to/file(絕對路徑)"
exit
fi
#判斷檔案是否存在
if [ ! -e $1 ];then
echo "[ $1 ] dir or file not found!"
exit
fi
# 獲取父路徑
fullpath=`dirname $1`
# 獲取子路徑
basename=`basename $1`
# 進入到父路徑
cd $fullpath
for ((host_id=1;host_id<=3;host_id++))
do
# 使得終端輸出變為綠色
tput setaf 2
echo ==== rsyncing hadoop0${host_id}: $basename ====
# 使得終端恢復原來的顏色
tput setaf 7
# 將資料同步到其他兩個節點
rsync -az $basename `whoami@hadoop0${host_id}:$fullpath`
if [ $? -eq 0 ];then
echo "命令執行成功!"
fi
done
7)授權同步指令碼
chmod 755 /usr/local/sbin/data_rsync.sh
2.1.8.叢集時間同步
1)安裝常用的Linux工具
yum install -y vim net-tools
2)安裝chrony服務
yum install -y ntpdate chrony
3)修改chrony服務配置檔案
vim /etc/chrony.conf
註釋掉官方的時間伺服器,換成國內的時間伺服器即可
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
4)配置chronyd服務開機自啟
systemctl enable --now chronyd
5)檢視chronyd服務
systemctl status chronyd
修改sysctl.conf系統配置
編輯sysctl.conf檔案
vm.swappiness = 0
kernel.sysrq = 1
net.ipv4.neigh.default.gc_stale_time = 120
# see details in https://help.aliyun.com/knowledge_detail/39428.html
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_announce = 2
# see details in https://help.aliyun.com/knowledge_detail/41334.html
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.tcp_synack_retries = 2
fs.file-max = 6815744
vm.max_map_count = 262144
fs.aio-max-nr = 1048576
kernel.shmall = 2097152
kernel.shmmax = 536870912
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.suid_dumpable=1
net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048586
修改limit.conf配置檔案
在/etc/security/limits.conf檔案的末尾追加以下內容
如果已經建立了專門用來管理Elasticsearch的賬號(例如賬號名稱為elastic),則配置如下:
elastic soft nofile 65535
elastic hard nofile 65535
如果嫌麻煩, 直接使用下面這種配置也可以
* soft nofile 65535
* hard nofile 65535
以上修改完成之後,建議重啟伺服器讓系統配置生效。
JDK安裝
這部分跳過,很簡單,基本隨便找個部落格文章照著配置就能搞定。
叢集安裝
這裡本來想跳過安裝, 直接使用CDH叢集中的zookeeper叢集的,實際操作發現當使用低版本的Zookeeper叢集,並在dolphinscheduler打包時進行低版本ZK適配之後,
部署成功之後總是叢集啟動總是會出現各種問題,所以這裡就不折騰了,直接另外安裝了一組Zookeeper叢集, 下面給大家講講Zookeeper叢集的安裝部署方式
下載安裝
首先配置叢集的主機名,確保透過主機名稱可以相互訪問叢集節點
vim /etc/hosts
在檔案中追加如下內容(所有節點都需要進行此操作)
192.168.10.100 hadoop01
192.168.10.101 hadoop02
192.168.10.102 hadoop03
Zookkeper下載地址:https://zookeeper.apache.org/releases.html#download
下載之後將安裝包上傳到所有的叢集主機上,解壓安裝到/opt/software
在安裝目錄下,建立data和logs目錄(所有節點都需要進行此操作)
mkdir -p /opt/software/zookeeper/data
mkdir -p /opt/software/zookeeper/logs
叢集配置
進入到安裝目錄下的conf目錄/opt/software/zookeeper/conf
,配置zookeeper的配置檔案zoo.cfg
複製zoo_sample.cfg
檔案並重新命名為zoo.cfg
(所有節點都需要進行此操作)
cp /opt/software/zookeeper/conf/zoo_sample.cfg /opt/software/zookeeper/conf/zoo.cfg
配置檔案的修改內容如下:
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/opt/software/zookeeper/data
# the port at which the clients will connect
# 這裡為了避免與主機上的hadoop叢集依賴的Zookeeper叢集發生衝突, 修改了服務端的埠以及ZK節點之間的通訊埠
clientPort=2191
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
## Metrics Providers
#
# https://prometheus.io Metrics Exporter
#metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
#metricsProvider.httpPort=7000
#metricsProvider.exportJvmInfo=true
# zookeeper新版本啟動的過程中,zookeeper新增的稽核日誌是預設關閉,在windows下啟動需要開啟
#audit.enable=true
# 這裡指定Zookeeper叢集的內部通訊配置, 有幾個節點就配置幾條
server.1=hadoop01:2999:3999
server.2=hadoop02:2999:3999
server.3=hadoop03:2999:3999
配置叢集中各個節點的server_id, 這個配置需要和在zoo.cfg
檔案中的配置保持一致:
在hadoop01節點上執行以下命令
echo 1 > /opt/software/zookeeper/data/myid
在hadoop02節點上執行以下命令
echo 2 > /opt/software/zookeeper/data/myid
在hadoop03節點上執行以下命令
echo 3 > /opt/software/zookeeper/data/myid
測試驗證
首先設定叢集的啟停指令碼
vim /opt/software/zookeeper/zk-start-all.sh
指令碼的內容如下:
注意:
- zookeeper叢集的啟動依賴JDK, 會用到
JAVA_HOME
變數, 所以需要先安裝JDK,設定JAVA的系統環境變數 - 以下指令碼的執行,如果沒有配置叢集的免密登入,每次都需要輸入密碼,所以需要先進行叢集免密登入設定
#!/bin/bash
case $1 in
"start"){
#遍歷叢集所有機器
for i in hadoop01 hadoop02 hadoop03
do
#控制檯輸出日誌
echo =============zookeeper $i 啟動====================
#啟動命令
ssh $i "/opt/software/zookeeper/bin/zkServer.sh start"
done
}
;;
"stop"){
for i in hadoop01 hadoop02 hadoop03
do
echo =============zookeeper $i 停止====================
ssh $i "/opt/software/zookeeper/bin/zkServer.sh stop"
done
}
;;
"status"){
for i in hadoop01 hadoop02 hadoop03
do
echo =============檢視 zookeeper $i 狀態====================
ssh $i "/opt/software/zookeeper/bin/zkServer.sh status"
done
}
;;
esac
chmod 755 /opt/software/zookeeper/zk-start-all.sh
我這裡已經啟動過叢集正在使用,就不演示啟動了,演示一下查詢狀態命令,/opt/software/zookeeper/zk-start-all.sh status
,出現如下報錯:
解決方法: 找到每臺節點主機的/opt/software/zookeeper/bin/zkEnv.sh
檔案,在指令碼檔案程式碼部分的最前面 加上自己的JAVA_HOME
路徑即可。
進入hadoop01的/opt/software/zookeeper
目錄下,執行./zk-start-all.sh status
命令檢視Zookeeper 叢集狀態,返回結果如下圖:OK,叢集的啟停指令碼基本沒啥問題了。
zk叢集啟停、狀態查詢的命令如下:
sh /opt/software/zookeeper/zk-start-all.sh start
# 停止zookeeper叢集
sh /opt/software/zookeeper/zk-start-all.sh stop
# 可以查詢叢集各節點的狀態跟角色資訊
sh /opt/software/zookeeper/zk-start-all.sh status
MySQL安裝
MySQL安裝可以參考我的另外一篇部落格伺服器linux-CentOS7.系統下使用mysql..tar.gz包安裝mysql資料庫詳解
叢集部署
下載DolphinScheduler
下載地址:https://dlcdn.apache.org/dolphinscheduler/3.1.9/apache-dolphinscheduler-3.1.9-bin.tar.gz
直接透過wget
命令下載到伺服器的某個路徑下,如果伺服器無法聯網, 只能先聯網下載二進位制安裝包到本地,然後再透過ssh客戶端工具上傳到伺服器叢集的每個節點。
建立dolphinscheduler的叢集執行賬戶並設定
建立安裝執行dolphinscheduler叢集的使用者ds
在root賬號下,執行新增普通使用者的命令
useradd dolphinscheduler
設定dolphinscheduler
使用者的密碼
passwd dolphinscheduler
讓dolphinscheduler
使用者具有執行sudo
命令免密執行的許可權
sed -i '$adolphinscheduler ALL=(ALL) NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
複製二進位制安裝包apache-dolphinscheduler-3.1.9-bin.tar.gz
到/opt/packages目錄(沒有則建立此目錄)下
修改apache-dolphinscheduler-3.1.9-bin.tar.gz
安裝包的所屬使用者和使用者組為dolphinscheduler
chmod -R dolphinscheduler:dolphinscheduler /opt/packages/apache-dolphinscheduler-3.1.9-bin.tar.gz
配置使用者的叢集免密登入
切換到dolphinscheduler
使用者,配置叢集免密(這裡只需要在hadoop01上執行就可以)
2)hadoop01節點上生成金鑰對
ssh-keygen -t rsa
3)hadoop01配置所有叢集節點的免密登入
for ((host_id=1;host_id<=3;host_id++));do ssh-copy-id hadoop0${host_id} ;done
4)免密登入測試
ssh hadoop01
ssh hadoop02
ssh hadoop03
資料庫初始化
dolphinscheduler預設使用的資料庫的名稱是dolphinscheduler
, 我們這裡先建立資料庫並建立管理使用者並授權
create database `dolphinscheduler` DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_general_ci;
-- 建立 dolphinScheduler 使用者專門使用者管理dolphinscheduler資料庫
CREATE USER 'dolphinscheduler'@'%' IDENTIFIED BY 'dolphinscheduler';
-- 給予庫的訪問許可權
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%';
-- 讓許可權配置修改生效
flush privileges;
解壓二進位制安裝包
tar -zxf /opt/packages/apache-dolphinscheduler-3.1.9-bin.tar.gz
mv
修改安裝指令碼和引數配置
dolphinscheduler中主要包含api-server
、master-server
、 worker-server
三個服務,配置檔案 /opt/oackages/apache-dolphinscheduler-3.1.9-bin/bin/env/install_env.sh
主要就是用來配置哪些機器將被安裝 DolphinScheduler 以及每臺機器對應安裝哪些服務。
# INSTALL MACHINE
# ---------------------------------------------------------
# A comma separated list of machine hostname or IP would be installed DolphinScheduler,
# including master, worker, api, alert. If you want to deploy in pseudo-distributed
# mode, just write a pseudo-distributed hostname
# Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"
#ips=${ips:-"ds1,ds2,ds3,ds4,ds5"}
ips="hadoop01,hadoop02,hadoop03"
# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
sshPort=${sshPort:-"22"}
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# ---------------------------------------------------------
# INSTALL MACHINE
# ---------------------------------------------------------
# A comma separated list of machine hostname or IP would be installed DolphinScheduler,
# including master, worker, api, alert. If you want to deploy in pseudo-distributed
# mode, just write a pseudo-distributed hostname
# Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"
#ips=${ips:-"ds1,ds2,ds3,ds4,ds5"}
# 在哪些主機節點上安裝Dolphinscheduler,多臺服務之間使用英文逗號分隔
ips="hadoop01,hadoop02,hadoop03"
# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
sshPort=${sshPort:-"22"}
# A comma separated list of machine hostname or IP would be installed Master server, it
# must be a subset of configuration `ips`.
# Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"
#masters=${masters:-"hadoop01"}
# 叢集中那些被指定為master節點,多臺服務之間使用英文逗號分隔
masters="hadoop01,hadoop02"
# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a
# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts
# Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"
#workers=${workers:-"ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"}
# 叢集中那些被指定為worker節點,多臺服務之間使用英文逗號分隔,那幾臺被指定為預設,就在節點名稱後新增":default"
workers="hadoop02:default,hadoop03:default"
# A comma separated list of machine hostname or IP would be installed Alert server, it
# must be a subset of configuration `ips`.
# Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"
#alertServer=${alertServer:-"ds3"}
# 叢集中那些被指定為alert告警節點,多臺服務之間使用英文逗號分隔
alertServer="hadoop03"
# A comma separated list of machine hostname or IP would be installed API server, it
# must be a subset of configuration `ips`.
# Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"
#apiServers=${apiServers:-"ds1"}
# 叢集中那個節點用來安裝api-server服務
apiServers="hadoop01"
# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.
# Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path.
#installPath=${installPath:-"/tmp/dolphinscheduler"}
#installPath="/opt/software/dolphinscheduler"
# dolphinscheduler在叢集中的預設安裝路徑/opt/software/dolphinscheduler
installPath="/opt/software/dolphinscheduler"
# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`
# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs
# to be created by this user
# 指定dolphinscheduler叢集的安裝使用者
deployUser=${deployUser:-"dolphinscheduler"}
# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.
#zkRoot=${zkRoot:-"/dolphinscheduler"}
# 指定dolphinscheduler叢集在zookeeper中的註冊根路徑
zkRoot=${zkRoot:-"/dolphinscheduler"}
配置檔案 /opt/packages/apache-dolphinscheduler-3.1.9-bin/bin/env/dolphinscheduler_env.sh
主要就是用來配置 DolphinScheduler 的資料庫連線資訊、一些dolphinscheduler支援的排程任務型別外部依賴路徑或庫檔案,如 JAVA_HOME
、DATAX_HOME
和SPARK_HOME
都是在這裡定義的。
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# JAVA_HOME, will use it to start DolphinScheduler server
#export JAVA_HOME=${JAVA_HOME:-/opt/java/openjdk}
#配置JAVA_HOME變數
export JAVA_HOME=${JAVA_HOME:-/usr/java/jdk1.8.0_181-cloudera}
# Database related configuration, set database type, username and password
#export SPRING_DATASOURCE_URL
#配置Dolphinscheduler的資料庫連線資訊
export SPRING_DATASOURCE_URL="jdbc:mysql://localhost:3306/dolphinscheduler?serverTimezone=UTC&useTimezone=true&useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai"
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-GMT+8}
export SPRING_DATASOURCE_USERNAME=dolphinscheduler
export SPRING_DATASOURCE_PASSWORD=dolphinscheduler
# DolphinScheduler server related configuration
export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}
# Registry center configuration, determines the type and link of the registry center
#配置Dolphinscheduler的使用的註冊中心型別為Zookeeper
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
#export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-localhost:2191}
#配置Dolphinscheduler的使用的註冊中心zookeeper叢集連線資訊
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-hadoop01:2191,hadoop02:2191,hadoop03:2191}
# Tasks related configurations, need to change the configuration if you use the related tasks.
#Dolphinscheduler中各個任務型別相關的系統環境變數配置,找到你可能使用到的任務型別可能使用到的服務在伺服器上的安裝路徑,配置到這裡就可以,最好在叢集安裝之前配置好
#export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
#export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
#export HADOOP_CONF_DIR=etc/hadoop/conf
#export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
#export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
#export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
#export PYTHON_HOME=/opt/soft/python
#export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
#export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
#export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
#export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/soft/seatunnel}
#export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}
#export SQOOP_HOME=${SQOOP_HOME:-/opt/soft/sqoop}
export PATH=$HADOOP_HOME/bin:$SQOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$SEATUNNEL_HOME/bin:$CHUNJUN_HOME/bin:$PATH
關閉Python 閘道器(預設開啟)
Python 閘道器服務
會預設與 api-server
一起啟動,如果不想啟動則需要更改 api-server
配置檔案 /opt/packages/apache-dolphinscheduler-3.1.9-bin/api-server/conf/application.yaml
中的 python-gateway.enabled : false
來禁用它。
vim ./api-server/conf/application.yaml
執行資料庫初始化指令碼
#切換到資料庫指令碼所在目錄
cd /opt/packages/apache-dolphinscheduler-3.1.9-bin/tools/sql/sql
#從SQL備份檔案中還原資料庫
mysql -udolphinscheduler -p dolphinscheduler < dolphinscheduler_mysql.sql
配置資料來源驅動檔案
MySQL 驅動檔案必須使用 JDBC Driver 8.0.16 及以上的版本,需要手動下載 mysql-connector-java 並移動到 DolphinScheduler 的每個模組的 libs 目錄下,其中包括 5 個目錄:
/opt/packages/apache-dolphinscheduler-3.1.9-bin/api-server/libs
/opt/packages/apache-dolphinscheduler-3.1.9-bin/alert-server/libs
/opt/packages/apache-dolphinscheduler-3.1.9-bin/master-server/libs
/opt/packages/apache-dolphinscheduler-3.1.9-bin/worker-server/libs
/opt/packages/apache-dolphinscheduler-3.1.9-bin/tools/libs
將mysql的驅動複製到這些模組的依賴路徑下
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/api-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/alert-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/master-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/worker-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/tools/libs/
當然除了mysql之外,可能還涉及SQLServer、Oracle、Hive等資料來源驅動,整合方式和MySQL是一樣的, 不過最好在叢集安裝之前就將需要的依賴都提前新增到對應模組的libs目錄下, 這樣叢集安裝之後就不用再處理了, 不過之後再處理資料來源依賴也是可以的。
以上資料庫依賴有需要可以私信流郵箱,我看到會發給你們的。
執行叢集安裝
首先,再次修改/opt/packages/apache-dolphinscheduler-3.1.9-bin
的所屬使用者和使用者組為dolphinscheduler
chmod -R dolphinscheduler:dolphinscheduler /opt/packages/apache-dolphinscheduler-3.1.9-bin
切換到dolphinscheudler
使用者
su - dolphinscheudler
切換到解壓根目錄
cd /opt/packages/apache-dolphinscheduler-3.1.9-bin
執行叢集安裝指令碼install.sh
./bin/install.sh
安裝指令碼執行完成後, 會自動檢測叢集各個節點的資訊
叢集啟停測試
安裝完成之後, 所有節點上Dolphinscheduler服務的預設安裝目錄都是/opt/software/dolphinscheduler
啟動之前, 確保zookeeper服務正常啟動, 否則叢集無法正常啟動成功。
在hadoop01
節點上切換到dolphinscheduler
系統使用者
su - dolphinscheduler
切換到dolphinscheduler
安裝目錄
cd /opt/software/dolphinscheduler
執行叢集常用操作命令
#一鍵啟動叢集命令
./bin/start-all.sh
#一鍵停止叢集命令
./bin/stop-all.sh
#一鍵查詢叢集狀態命令
./bin/status-all.sh
訪問UI地址:http://hadoop01的IP:12345/dolphinscheduler/ui
使用者名稱:admin
密碼:dolphinscheduler123
OK, 至此DolphinScheduler分散式叢集就搭建完成了。
本文由 白鯨開源 提供釋出支援!