引言
隨著大資料的發展,任務排程系統成為了資料處理和管理中至關重要的部分。Apache DolphinScheduler 是一款優秀的開源分散式工作流排程平臺,在大資料場景中得到廣泛應用。
在本文中,我們將對 Apache DolphinScheduler 1.3.9 版本的原始碼進行深入分析,主要分析一下Master和Worker的互動設計。
感興趣的朋友也可以回顧我們上一篇文章:Apache DolphinScheduler-1.3.9原始碼分析(一)
Worker配置檔案
# worker listener port
worker.listen.port=1234
# worker execute thread number to limit task instances in parallel
# worker可並行的任務數限制
worker.exec.threads=100
# worker heartbeat interval, the unit is second
# worker傳送心跳間隔
worker.heartbeat.interval=10
# worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2
# worker最大cpu平均負載,只有系統cpu平均負載低於該值,才能執行任務
# 預設值為-1,則最大cpu平均負載=系統cpu核數 * 2
worker.max.cpuload.avg=-1
# worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G
# worker的預留記憶體,只有當系統可用記憶體大於等於該值,才能執行任務,單位為GB
# 預設0.3G
worker.reserved.memory=0.3
# default worker groups separated by comma, like 'worker.groups=default,test'
# 工作組名稱,多個用,隔開
worker.groups=default
WorkerServer啟動
public void run() {
// init remoting server
NettyServerConfig serverConfig = new NettyServerConfig();
serverConfig.setListenPort(workerConfig.getListenPort());
this.nettyRemotingServer = new NettyRemotingServer(serverConfig);
this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_REQUEST, new TaskExecuteProcessor());
this.nettyRemotingServer.registerProcessor(CommandType.TASK_KILL_REQUEST, new TaskKillProcessor());
this.nettyRemotingServer.registerProcessor(CommandType.DB_TASK_ACK, new DBTaskAckProcessor());
this.nettyRemotingServer.registerProcessor(CommandType.DB_TASK_RESPONSE, new DBTaskResponseProcessor());
this.nettyRemotingServer.start();
// worker registry
try {
this.workerRegistry.registry();
this.workerRegistry.getZookeeperRegistryCenter().setStoppable(this);
Set<String> workerZkPaths = this.workerRegistry.getWorkerZkPaths();
this.workerRegistry.getZookeeperRegistryCenter().getRegisterOperator().handleDeadServer(workerZkPaths, ZKNodeType.WORKER, Constants.DELETE_ZK_OP);
} catch (Exception e) {
logger.error(e.getMessage(), e);
throw new RuntimeException(e);
}
// retry report task status
this.retryReportTaskStatusThread.start();
/**
* register hooks, which are called before the process exits
*/
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
if (Stopper.isRunning()) {
close("shutdownHook");
}
}));
}
註冊四個Command:
- TASK_EXECUTE_REQUEST:task執行請求
- TASK_KILL_REQUEST:task停止請求
- DB_TASK_ACK:Worker接受到Master的排程請求,回應master
- DB_TASK_RESPONSE:
- 註冊WorkerServer到Zookeeper,併傳送心跳
- 報告Task執行狀態
RetryReportTaskStatusThread
這是一個兜底機制,主要負責定時輪詢向Master彙報任務的狀態,直到Master回覆狀態的ACK,避免任務狀態丟失;
每隔5分鐘,檢查一下responceCache
中的ACK Cache和Response Cache是否為空,如果不為空則向Master傳送ack_command
和response command
請求。
public void run() {
ResponceCache responceCache = ResponceCache.get();
while (Stopper.isRunning()){
// sleep 5 minutes
ThreadUtils.sleep(RETRY_REPORT_TASK_STATUS_INTERVAL);
try {
if (!responceCache.getAckCache().isEmpty()){
Map<Integer,Command> ackCache = responceCache.getAckCache();
for (Map.Entry<Integer, Command> entry : ackCache.entrySet()){
Integer taskInstanceId = entry.getKey();
Command ackCommand = entry.getValue();
taskCallbackService.sendAck(taskInstanceId,ackCommand);
}
}
if (!responceCache.getResponseCache().isEmpty()){
Map<Integer,Command> responseCache = responceCache.getResponseCache();
for (Map.Entry<Integer, Command> entry : responseCache.entrySet()){
Integer taskInstanceId = entry.getKey();
Command responseCommand = entry.getValue();
taskCallbackService.sendResult(taskInstanceId,responseCommand);
}
}
}catch (Exception e){
logger.warn("retry report task status error", e);
}
}
}
Master與Worker的互動設計
Apache DolphinScheduler Master和Worker模組是兩個獨立的JVM程序,可以部署在不同的伺服器上,Master與Worker的通訊都是透過Netty實現RPC互動的,一共用到7種處理器。
模組 | 處理器 | 作用 |
---|---|---|
master | masterTaskResponseProcessor | 處理TaskExecuteResponseCommand訊息,將訊息新增到TaskResponseService的任務響應佇列中 |
master | masterTaskAckProcessor | 處理TaskExecuteAckCommand訊息,將訊息新增到TaskResponseService的任務響應佇列中 |
master | masterTaskKillResponseProcessor | 處理TaskKillResponseCommand訊息,並在日誌中列印訊息內容 |
worker | workerTaskExecuteProcessor | 處理TaskExecuteRequestCommand訊息,併傳送TaskExecuteAckCommand到master,提交任務執行 |
worker | workerTaskKillProcessor | 處理TaskKillRequestCommand訊息,呼叫kill -9 pid殺死任務對應的程序,並向master傳送TaskKillResponseCommand訊息 |
worker | workerDBTaskAckProcessor | 處理DBTaskAckCommand訊息,針對執行成功的任務,從ResponseCache中刪除 |
worker | workerDBTaskResponseProcessor | 處理DBTaskResponseCommand訊息,針對執行成功的任務,從ResponseCache中刪除 |
分發任務如何互動
master#TaskPriorityQueueConsumer
Master任務裡有一個TaskPriorityQueueConsumer
,會從TaskPriorityQueue
裡每次取3個Task分發給Worker執行,這裡會建立TaskExecuteRequestCommand
。
TaskPriorityQueueConsumer#run()
@Override
public void run() {
List<TaskPriority> failedDispatchTasks = new ArrayList<>();
while (Stopper.isRunning()){
try {
// 每一批次分發任務數量,master.dispatch.task.num = 3
int fetchTaskNum = masterConfig.getMasterDispatchTaskNumber();
failedDispatchTasks.clear();
for(int i = 0; i < fetchTaskNum; i++){
if(taskPriorityQueue.size() <= 0){
Thread.sleep(Constants.SLEEP_TIME_MILLIS);
continue;
}
// if not task , blocking here
// 從佇列裡面獲取task
TaskPriority taskPriority = taskPriorityQueue.take();
// 分發給worker執行
boolean dispatchResult = dispatch(taskPriority);
if(!dispatchResult){
failedDispatchTasks.add(taskPriority);
}
}
if (!failedDispatchTasks.isEmpty()) {
// 分發失敗的任務,需要重新加入佇列中,等待重新分發
for (TaskPriority dispatchFailedTask : failedDispatchTasks) {
taskPriorityQueue.put(dispatchFailedTask);
}
// If there are tasks in a cycle that cannot find the worker group,
// sleep for 1 second
if (taskPriorityQueue.size() <= failedDispatchTasks.size()) {
TimeUnit.MILLISECONDS.sleep(Constants.SLEEP_TIME_MILLIS);
}
}
}catch (Exception e){
logger.error("dispatcher task error",e);
}
}
}
dispatcher
/**
* dispatch task
*
* @param taskPriority taskPriority
* @return result
*/
protected boolean dispatch(TaskPriority taskPriority) {
boolean result = false;
try {
int taskInstanceId = taskPriority.getTaskId();
TaskExecutionContext context = getTaskExecutionContext(taskInstanceId);
// 這裡建立TaskExecuteRequestCommand
ExecutionContext executionContext = new ExecutionContext(context.toCommand(), ExecutorType.WORKER, context.getWorkerGroup());
if (taskInstanceIsFinalState(taskInstanceId)){
// when task finish, ignore this task, there is no need to dispatch anymore
return true;
}else{
// 分發任務
// 分發演算法支援:低負載優先演算法,隨機演算法, 輪詢演算法。
result = dispatcher.dispatch(executionContext);
}
} catch (ExecuteException e) {
logger.error("dispatch error: {}",e.getMessage());
}
return result;
}
TaskExecutionContext
// 摘錄自org.apache.dolphinscheduler.server.entity.TaskExecutionContext#toCommand
public Command toCommand(){
TaskExecuteRequestCommand requestCommand = new TaskExecuteRequestCommand();
requestCommand.setTaskExecutionContext(FastJsonSerializer.serializeToString(this));
return requestCommand.convert2Command();
}
分發演算法實現
隨機演算法
public class RandomSelector<T> implements Selector<T> {
private final Random random = new Random();
public T select(final Collection<T> source) {
if (source == null || source.size() == 0) {
throw new IllegalArgumentException("Empty source.");
}
if (source.size() == 1) {
return (T) source.toArray()[0];
}
int size = source.size();
int randomIndex = random.nextInt(size);
return (T) source.toArray()[randomIndex];
}
}
輪詢演算法
public class RoundRobinSelector<T> implements Selector<T> {
private final AtomicInteger index = new AtomicInteger(0);
public T select(Collection<T> source) {
if (source == null || source.size() == 0) {
throw new IllegalArgumentException("Empty source.");
}
if (source.size() == 1) {
return (T)source.toArray()[0];
}
int size = source.size();
return (T) source.toArray()[index.getAndIncrement() % size];
}
}
低負載優先演算法
public class LowerWeightRoundRobin implements Selector<HostWeight>{
public HostWeight select(Collection<HostWeight> sources){
int totalWeight = 0;
int lowWeight = 0;
HostWeight lowerNode = null;
for (HostWeight hostWeight : sources) {
totalWeight += hostWeight.getWeight();
hostWeight.setCurrentWeight(hostWeight.getCurrentWeight() + hostWeight.getWeight());
if (lowerNode == null || lowWeight > hostWeight.getCurrentWeight() ) {
lowerNode = hostWeight;
lowWeight = hostWeight.getCurrentWeight();
}
}
lowerNode.setCurrentWeight(lowerNode.getCurrentWeight() + totalWeight);
return lowerNode;
}
}
TaskExecuteRequestCommand
TaskExecuteProcessor
構造方法
public TaskExecuteProcessor() {
this.taskCallbackService = SpringApplicationContext.getBean(TaskCallbackService.class);
this.workerConfig = SpringApplicationContext.getBean(WorkerConfig.class);
// worker.exec.threads,預設100
this.workerExecService = ThreadUtils.newDaemonFixedThreadExecutor("Worker-Execute-Thread", workerConfig.getWorkerExecThreads());
this.taskExecutionContextCacheManager = SpringApplicationContext.getBean(TaskExecutionContextCacheManagerImpl.class);
}
process()方法
public void process(Channel channel, Command command) {
Preconditions.checkArgument(CommandType.TASK_EXECUTE_REQUEST == command.getType(),
String.format("invalid command type : %s", command.getType()));
// 序列化TaskExecuteRequestCommand
TaskExecuteRequestCommand taskRequestCommand = FastJsonSerializer.deserialize(
command.getBody(), TaskExecuteRequestCommand.class);
logger.info("received command : {}", taskRequestCommand);
if (taskRequestCommand == null) {
logger.error("task execute request command is null");
return;
}
String contextJson = taskRequestCommand.getTaskExecutionContext();
TaskExecutionContext taskExecutionContext = JSONObject.parseObject(contextJson, TaskExecutionContext.class);
if (taskExecutionContext == null) {
logger.error("task execution context is null");
return;
}
// 存入taskExecutionContextCacheManager
setTaskCache(taskExecutionContext);
// 建立任務日誌
Logger taskLogger = LoggerFactory.getLogger(LoggerUtils.buildTaskId(LoggerUtils.TASK_LOGGER_INFO_PREFIX,
taskExecutionContext.getProcessDefineId(),
taskExecutionContext.getProcessInstanceId(),
taskExecutionContext.getTaskInstanceId()));
taskExecutionContext.setHost(NetUtils.getAddr(workerConfig.getListenPort()));
taskExecutionContext.setStartTime(new Date());
taskExecutionContext.setLogPath(getTaskLogPath(taskExecutionContext));
// local execute path
String execLocalPath = getExecLocalPath(taskExecutionContext);
logger.info("task instance local execute path : {}", execLocalPath);
taskExecutionContext.setExecutePath(execLocalPath);
// ThreadLocal儲存任務日誌
FileUtils.taskLoggerThreadLocal.set(taskLogger);
try {
// 建立執行
FileUtils.createWorkDirAndUserIfAbsent(execLocalPath, taskExecutionContext.getTenantCode());
} catch (Throwable ex) {
String errorLog = String.format("create execLocalPath : %s", execLocalPath);
LoggerUtils.logError(Optional.ofNullable(logger), errorLog, ex);
LoggerUtils.logError(Optional.ofNullable(taskLogger), errorLog, ex);
taskExecutionContextCacheManager.removeByTaskInstanceId(taskExecutionContext.getTaskInstanceId());
}
FileUtils.taskLoggerThreadLocal.remove();
taskCallbackService.addRemoteChannel(taskExecutionContext.getTaskInstanceId(),
new NettyRemoteChannel(channel, command.getOpaque()));
// 向master傳送TaskExecuteAckCommand
this.doAck(taskExecutionContext);
// submit task
workerExecService.submit(new TaskExecuteThread(taskExecutionContext, taskCallbackService, taskLogger));
}
private void doAck(TaskExecutionContext taskExecutionContext){
// tell master that task is in executing
TaskExecuteAckCommand ackCommand = buildAckCommand(taskExecutionContext);
ResponceCache.get().cache(taskExecutionContext.getTaskInstanceId(),ackCommand.convert2Command(),Event.ACK);
taskCallbackService.sendAck(taskExecutionContext.getTaskInstanceId(), ackCommand.convert2Command());
}
TaskExecuteThread
構造方法
public TaskExecuteThread(TaskExecutionContext taskExecutionContext
, TaskCallbackService taskCallbackService
, Logger taskLogger) {
this.taskExecutionContext = taskExecutionContext;
this.taskCallbackService = taskCallbackService;
this.taskExecutionContextCacheManager = SpringApplicationContext.getBean(TaskExecutionContextCacheManagerImpl.class);
this.taskLogger = taskLogger;
}
執行方法
public void run() {
TaskExecuteResponseCommand responseCommand = new TaskExecuteResponseCommand(taskExecutionContext.getTaskInstanceId());
try {
logger.info("script path : {}", taskExecutionContext.getExecutePath());
// task node
TaskNode taskNode = JSONObject.parseObject(taskExecutionContext.getTaskJson(), TaskNode.class);
// copy hdfs/minio file to local
// 下載需要的資源,例如Spark/Flink jar,udf等
downloadResource(taskExecutionContext.getExecutePath(),
taskExecutionContext.getResources(),
logger);
taskExecutionContext.setTaskParams(taskNode.getParams());
taskExecutionContext.setEnvFile(CommonUtils.getSystemEnvPath());
taskExecutionContext.setDefinedParams(getGlobalParamsMap());
// set task timeout
setTaskTimeout(taskExecutionContext, taskNode);
taskExecutionContext.setTaskAppId(String.format("%s_%s_%s",
taskExecutionContext.getProcessDefineId(),
taskExecutionContext.getProcessInstanceId(),
taskExecutionContext.getTaskInstanceId()));
// 建立任務
task = TaskManager.newTask(taskExecutionContext, taskLogger);
// 初始化任務
task.init();
// 構建任務所需要的引數
preBuildBusinessParams();
// 執行任務
task.handle();
// 任務執行完成後的動作
task.after();
responseCommand.setStatus(task.getExitStatus().getCode());
responseCommand.setEndTime(new Date());
responseCommand.setProcessId(task.getProcessId());
responseCommand.setAppIds(task.getAppIds());
logger.info("task instance id : {},task final status : {}", taskExecutionContext.getTaskInstanceId(), task.getExitStatus());
} catch (Exception e) {
logger.error("task scheduler failure", e);
// 如果出現異常,kill task
kill();
responseCommand.setStatus(ExecutionStatus.FAILURE.getCode());
responseCommand.setEndTime(new Date());
responseCommand.setProcessId(task.getProcessId());
responseCommand.setAppIds(task.getAppIds());
} finally {
// 從cache中去除任務執行上下文。
taskExecutionContextCacheManager.removeByTaskInstanceId(taskExecutionContext.getTaskInstanceId());
// 快取responseCommand
ResponceCache.get().cache(taskExecutionContext.getTaskInstanceId(), responseCommand.convert2Command(), Event.RESULT);
// 向master傳送ResponseCommand
taskCallbackService.sendResult(taskExecutionContext.getTaskInstanceId(), responseCommand.convert2Command());
// 清除task執行路徑
clearTaskExecPath();
}
}
master#TaskResponseService
Worker在正常執行分發任務的時候,會向Master傳送ACK Command 和 Response Command。
在Master中,則由TaskAckProcessor
和TaskResponseProcessor
進行處理。
TaskAckProcessor
public void process(Channel channel, Command command) {
Preconditions.checkArgument(CommandType.TASK_EXECUTE_ACK == command.getType(), String.format("invalid command type : %s", command.getType()));
TaskExecuteAckCommand taskAckCommand = FastJsonSerializer.deserialize(command.getBody(), TaskExecuteAckCommand.class);
logger.info("taskAckCommand : {}", taskAckCommand);
// 新增快取
taskInstanceCacheManager.cacheTaskInstance(taskAckCommand);
String workerAddress = ChannelUtils.toAddress(channel).getAddress();
ExecutionStatus ackStatus = ExecutionStatus.of(taskAckCommand.getStatus());
// TaskResponseEvent
TaskResponseEvent taskResponseEvent = TaskResponseEvent.newAck(ackStatus,
taskAckCommand.getStartTime(),
workerAddress,
taskAckCommand.getExecutePath(),
taskAckCommand.getLogPath(),
taskAckCommand.getTaskInstanceId(),
channel);
// 主要處理邏輯
taskResponseService.addResponse(taskResponseEvent);
}
TaskResponseProcessor
public void process(Channel channel, Command command) {
Preconditions.checkArgument(CommandType.TASK_EXECUTE_RESPONSE == command.getType(), String.format("invalid command type : %s", command.getType()));
TaskExecuteResponseCommand responseCommand = FastJsonSerializer.deserialize(command.getBody(), TaskExecuteResponseCommand.class);
logger.info("received command : {}", responseCommand);
// 快取
taskInstanceCacheManager.cacheTaskInstance(responseCommand);
// TaskResponseEvent
TaskResponseEvent taskResponseEvent = TaskResponseEvent.newResult(ExecutionStatus.of(responseCommand.getStatus()),
responseCommand.getEndTime(),
responseCommand.getProcessId(),
responseCommand.getAppIds(),
responseCommand.getTaskInstanceId(),
channel);
// 主要處理邏輯
taskResponseService.addResponse(taskResponseEvent);
}
TaskResponseService
透過TaskResponseProcessor
和TaskAckProcessor
發現,其主要邏輯都在TaskResponseService
類中,而TaskResponseService
中處理事件,是透過TaskResponseWorker
執行緒實現的。
// TaskResponseEvent佇列是阻塞佇列
private final BlockingQueue<TaskResponseEvent> eventQueue = new LinkedBlockingQueue<>(5000);
class TaskResponseWorker extends Thread {
@Override
public void run() {
while (Stopper.isRunning()){
try {
// 如果沒有任務事件,則會阻塞在這裡
TaskResponseEvent taskResponseEvent = eventQueue.take();
// 任務例項狀態持久化到資料庫
persist(taskResponseEvent);
} catch (InterruptedException e){
break;
} catch (Exception e){
logger.error("persist task error",e);
}
}
logger.info("TaskResponseWorker stopped");
}
}
/**
* persist taskResponseEvent
* @param taskResponseEvent taskResponseEvent
*/
private void persist(TaskResponseEvent taskResponseEvent){
Event event = taskResponseEvent.getEvent();
Channel channel = taskResponseEvent.getChannel();
switch (event){
case ACK:
try {
TaskInstance taskInstance = processService.findTaskInstanceById(taskResponseEvent.getTaskInstanceId());
if (taskInstance != null) {
ExecutionStatus status = taskInstance.getState().typeIsFinished() ? taskInstance.getState() : taskResponseEvent.getState();
processService.changeTaskState(status,
taskResponseEvent.getStartTime(),
taskResponseEvent.getWorkerAddress(),
taskResponseEvent.getExecutePath(),
taskResponseEvent.getLogPath(),
taskResponseEvent.getTaskInstanceId());
}
// 向worker傳送DB_TASK_ACK請求
DBTaskAckCommand taskAckCommand = new DBTaskAckCommand(ExecutionStatus.SUCCESS.getCode(), taskResponseEvent.getTaskInstanceId());
channel.writeAndFlush(taskAckCommand.convert2Command());
}catch (Exception e){
logger.error("worker ack master error",e);
DBTaskAckCommand taskAckCommand = new DBTaskAckCommand(ExecutionStatus.FAILURE.getCode(),-1);
channel.writeAndFlush(taskAckCommand.convert2Command());
}
break;
case RESULT:
try {
TaskInstance taskInstance = processService.findTaskInstanceById(taskResponseEvent.getTaskInstanceId());
if (taskInstance != null){
processService.changeTaskState(taskResponseEvent.getState(),
taskResponseEvent.getEndTime(),
taskResponseEvent.getProcessId(),
taskResponseEvent.getAppIds(),
taskResponseEvent.getTaskInstanceId());
}
// 向worker傳送DB_TASK_RESPONSE請求
DBTaskResponseCommand taskResponseCommand = new DBTaskResponseCommand(ExecutionStatus.SUCCESS.getCode(),taskResponseEvent.getTaskInstanceId());
channel.writeAndFlush(taskResponseCommand.convert2Command());
}catch (Exception e){
logger.error("worker response master error",e);
DBTaskResponseCommand taskResponseCommand = new DBTaskResponseCommand(ExecutionStatus.FAILURE.getCode(),-1);
channel.writeAndFlush(taskResponseCommand.convert2Command());
}
break;
default:
throw new IllegalArgumentException("invalid event type : " + event);
}
}
Worker#DBTaskAckProcessor和DBTaskResponseProcessor
Worker接受到Master的db_task_ack command
和db_task_response command
,對應的處理器為DBTaskAckProcessor
和DBTaskResponseProcessor
,其邏輯都是從ResponceCache
刪除對應的task instance command
。
DBTaskAckProcessor
public void process(Channel channel, Command command) {
Preconditions.checkArgument(CommandType.DB_TASK_ACK == command.getType(),
String.format("invalid command type : %s", command.getType()));
DBTaskAckCommand taskAckCommand = FastJsonSerializer.deserialize(
command.getBody(), DBTaskAckCommand.class);
if (taskAckCommand == null){
return;
}
if (taskAckCommand.getStatus() == ExecutionStatus.SUCCESS.getCode()){
ResponceCache.get().removeAckCache(taskAckCommand.getTaskInstanceId());
}
}
DBTaskResponseProcessor
public void process(Channel channel, Command command) {
Preconditions.checkArgument(CommandType.DB_TASK_RESPONSE == command.getType(),
String.format("invalid command type : %s", command.getType()));
DBTaskResponseCommand taskResponseCommand = FastJsonSerializer.deserialize(
command.getBody(), DBTaskResponseCommand.class);
if (taskResponseCommand == null){
return;
}
if (taskResponseCommand.getStatus() == ExecutionStatus.SUCCESS.getCode()){
ResponceCache.get().removeResponseCache(taskResponseCommand.getTaskInstanceId());
}
}
停止任務如何互動
MasterTaskExecThread#waitTaskQuit
public Boolean waitTaskQuit(){
// query new state
taskInstance = processService.findTaskInstanceById(taskInstance.getId());
while (Stopper.isRunning()){
try {
// 省略程式碼...
// task instance add queue , waiting worker to kill
// 如果master接受到cancal請求,或者工作流狀態為準備停止的狀態
// master會給worker傳送kill request command請求
if(this.cancel || this.processInstance.getState() == ExecutionStatus.READY_STOP){
cancelTaskInstance();
}
// 省略程式碼...
} catch (Exception e) {
// 省略程式碼...
}
}
return true;
}
private void cancelTaskInstance() throws Exception{
if(alreadyKilled){
return;
}
alreadyKilled = true;
taskInstance = processService.findTaskInstanceById(taskInstance.getId());
if(StringUtils.isBlank(taskInstance.getHost())){
taskInstance.setState(ExecutionStatus.KILL);
taskInstance.setEndTime(new Date());
processService.updateTaskInstance(taskInstance);
return;
}
// 構造TaskKillRequestCommand
TaskKillRequestCommand killCommand = new TaskKillRequestCommand();
killCommand.setTaskInstanceId(taskInstance.getId());
ExecutionContext executionContext = new ExecutionContext(killCommand.convert2Command(), ExecutorType.WORKER);
Host host = Host.of(taskInstance.getHost());
executionContext.setHost(host);
nettyExecutorManager.executeDirectly(executionContext);
logger.info("master kill taskInstance name :{} taskInstance id:{}",
taskInstance.getName(), taskInstance.getId() );
}
Worker#TaskKillProcessor
TaskKillProcessor
用於處理Master傳送的Kill request command
。
public void process(Channel channel, Command command) {
Preconditions.checkArgument(CommandType.TASK_KILL_REQUEST == command.getType(), String.format("invalid command type : %s", command.getType()));
TaskKillRequestCommand killCommand = FastJsonSerializer.deserialize(command.getBody(), TaskKillRequestCommand.class);
logger.info("received kill command : {}", killCommand);
Pair<Boolean, List<String>> result = doKill(killCommand);
taskCallbackService.addRemoteChannel(killCommand.getTaskInstanceId(),
new NettyRemoteChannel(channel, command.getOpaque()));
// 向master傳送kill response command
TaskKillResponseCommand taskKillResponseCommand = buildKillTaskResponseCommand(killCommand,result);
taskCallbackService.sendResult(taskKillResponseCommand.getTaskInstanceId(), taskKillResponseCommand.convert2Command());
taskExecutionContextCacheManager.removeByTaskInstanceId(taskKillResponseCommand.getTaskInstanceId());
}
private Pair<Boolean, List<String>> doKill(TaskKillRequestCommand killCommand){
boolean processFlag = true;
List<String> appIds = Collections.emptyList();
int taskInstanceId = killCommand.getTaskInstanceId();
TaskExecutionContext taskExecutionContext = taskExecutionContextCacheManager.getByTaskInstanceId(taskInstanceId);
try {
Integer processId = taskExecutionContext.getProcessId();
if (processId.equals(0)) {
taskExecutionContextCacheManager.removeByTaskInstanceId(taskInstanceId);
logger.info("the task has not been executed and has been cancelled, task id:{}", taskInstanceId);
return Pair.of(true, appIds);
}
// 執行Kill -9 命令直接刪除程序
// spark or flink如果是提交到叢集,暫時Kill不掉
String pidsStr = ProcessUtils.getPidsStr(taskExecutionContext.getProcessId());
if (StringUtils.isNotEmpty(pidsStr)) {
String cmd = String.format("sudo kill -9 %s", ProcessUtils.getPidsStr(taskExecutionContext.getProcessId()));
logger.info("process id:{}, cmd:{}", taskExecutionContext.getProcessId(), cmd);
OSUtils.exeCmd(cmd);
}
} catch (Exception e) {
processFlag = false;
logger.error("kill task error", e);
}
// find log and kill yarn job
Pair<Boolean, List<String>> yarnResult = killYarnJob(Host.of(taskExecutionContext.getHost()).getIp(),
taskExecutionContext.getLogPath(),
taskExecutionContext.getExecutePath(),
taskExecutionContext.getTenantCode());
return Pair.of(processFlag && yarnResult.getLeft(), yarnResult.getRight());
}
master#TaskKillResponseProcessor
TaskKillResponseProcessor用於master處理worker停止任務的響應請求。
public void process(Channel channel, Command command) {
Preconditions.checkArgument(CommandType.TASK_KILL_RESPONSE == command.getType(), String.format("invalid command type : %s", command.getType()));
TaskKillResponseCommand responseCommand = FastJsonSerializer.deserialize(command.getBody(), TaskKillResponseCommand.class);
logger.info("received task kill response command : {}", responseCommand);
}
透過對 Apache DolphinScheduler 1.3.9 的原始碼分析,我們深入瞭解了其核心模組的設計和實現。
如果你對 Apache DolphinScheduler 的原始碼有興趣,可以深入研究其任務排程策略的細節部分,或者根據自身業務場景進行二次開發,充分發揮 DolphinScheduler 的排程能力。
本文完!
本文由 白鯨開源 提供釋出支援!