Node.js 任務佇列Bull的原始碼淺析

keith666發表於2019-02-26

原文地址: www.jianshu.com/p/1ed50e6d4…

Bull是基於Redis的一個Node.js任務佇列管理庫,支援延遲佇列,優先順序任務,重複任務,以及原子操作等多種功能.

本文將從基本的使用來分析Bull的原始碼,對於repeat job,seperate processes等暫不展開.

Bull: Premium Queue package for handling jobs and messages in NodeJS.

相關的資訊如下:

基本使用

Bull的使用分為三個步驟:

  1. 建立佇列
  2. 繫結任務處理函式
  3. 新增任務

如下示例:

const Bull = require('bull')
// 1. 建立佇列
const myFirstQueue = new Bull('my-first-queue');
// 2. 繫結任務處理函式
myFirstQueue.process(async (job, data) => {
  return doSomething(data);
});
// 3. 新增任務
const job = await myFirstQueue.add({
  foo: 'bar'
});
複製程式碼

建立佇列

建立佇列是先通過require然後再通過new來實現的,因此要先找到require的入口.開啟package.json:

{
  "name": "bull",
  "version": "3.7.0",
  "description": "Job manager",
  "main": "./index.js",
  ...
}
複製程式碼

看到入口為index.js,開啟:

module.exports = require('./lib/queue');
module.exports.Job = require('./lib/job');
複製程式碼

從而找到目標函式所在檔案./lib/queue:

module.exports = Queue;
複製程式碼

可以看到exports的是Queue,接著去分析Queue函式:

const Queue = function Queue(name, url, opts) {
  ...
  // 預設設定
  this.settings = _.defaults(opts.settings, { 
    lockDuration: 30000,
    stalledInterval: 30000,
    maxStalledCount: 1,
    guardInterval: 5000,
    retryProcessDelay: 5000,
    drainDelay: 5, // 空佇列時brpoplpush的等待時間
    backoffStrategies: {}
  });
  ...
  // Bind these methods to avoid constant rebinding and/or creating closures
  // in processJobs etc.
  this.moveUnlockedJobsToWait = this.moveUnlockedJobsToWait.bind(this);
  this.processJob = this.processJob.bind(this);
  this.getJobFromId = Job.fromId.bind(null, this);
  ...
};
複製程式碼

主要是進行引數初始化和函式的繫結.

繫結任務處理函式

該步驟是從myFirstQueue.process開始的,先看process函式:

Queue.prototype.process = function (name, concurrency, handler) {
  ...
  this.setHandler(name, handler); // 1. 繫結handler

  return this._initProcess().then(() => {
    return this.start(concurrency); // 2. 啟動佇列
  });
};
複製程式碼

該函式做了兩個事情:

  1. 繫結handler
  2. 啟動佇列

先看繫結handler:

Queue.prototype.setHandler = function (name, handler) {
  ...
  if (this.handlers[name]) {
    throw new Error('Cannot define the same handler twice ' + name);
  }
  ...
  if (typeof handler === 'string') {
    ...
  } else {
    handler = handler.bind(this);
    // 將handler和名字儲存起來
    if (handler.length > 1) {
      this.handlers[name] = promisify(handler);
    } else {
      this.handlers[name] = function () {
       ...
    }
  }
};
複製程式碼

再看佇列的啟動:

Queue.prototype.start = function (concurrency) {
  return this.run(concurrency).catch(err => {
    this.emit('error', err, 'error running queue');
    throw err;
  });
};
複製程式碼

run函式:

Queue.prototype.run = function (concurrency) {
  const promises = [];

  return this.isReady()
    .then(() => {
      return this.moveUnlockedJobsToWait(); // 將unlocked的任務移動到wait佇列
    })
    .then(() => {
      return utils.isRedisReady(this.bclient);
    })
    .then(() => {
      while (concurrency--) {
        promises.push(
          new Promise(resolve => {
            this.processJobs(concurrency, resolve); // 處理任務
          })
        );
      }

      this.startMoveUnlockedJobsToWait(); // unlocked job定時檢查

      return Promise.all(promises);
    });
};
複製程式碼

unlocked job(stalled job): job的執行需要鎖,正常情況下job在active時會獲取鎖(有過期時間lockDuration,定時延長lockRenewTime),complete時釋放鎖,如果job在active時無鎖,說明程式被阻塞或崩潰導致鎖過期

processJobs:

Queue.prototype.processJobs = function (index, resolve, job) {
  const processJobs = this.processJobs.bind(this, index, resolve);
  process.nextTick(() => {
    this._processJobOnNextTick(processJobs, index, resolve, job);
  });
};
複製程式碼

再看_processJobOnNextTick:

        // 關鍵程式碼
        const gettingNextJob = job ? Promise.resolve(job) : this.getNextJob();

        return (this.processing[index] = gettingNextJob
          .then(this.processJob)
          .then(processJobs, err => {
            ...
          }));
複製程式碼

上述程式碼可以作如下描述:

  1. job為空時用getNextJob函式來獲取job
  2. 執行processJob函式
  3. 執行processJobs函式

先看getNextJob:

if (this.drained) {
    //
    // Waiting for new jobs to arrive
    //
    console.log('bclient start get new job');
    return this.bclient
      .brpoplpush(this.keys.wait, this.keys.active, this.settings.drainDelay)
      .then(
        jobId => {
          if (jobId) {
            return this.moveToActive(jobId);
          }
        },
        err => {
         ...
        }
      );
  } else {
    return this.moveToActive();
  }
複製程式碼

運用Redis的PUSH/POP機制來獲取訊息,超時時間為drainDelay.

接著來看processJob:

Queue.prototype.processJob = function (job) {
  ... 
  const handleCompleted = result => {
    return job.moveToCompleted(result).then(jobData => {
      ...
      return jobData ? this.nextJobFromJobData(jobData[0], jobData[1]) : null;
    });
  };

  // 延長鎖的時間
  lockExtender();
  const handler = this.handlers[job.name] || this.handlers['*'];

  if (!handler) {
    ...
  } else {
    let jobPromise = handler(job);
    ...

    return jobPromise
      .then(handleCompleted) 
      .catch(handleFailed)
      .finally(() => {
        stopTimer();
      });
  }
};
複製程式碼

可以看到任務處理成功後會呼叫handleCompleted,在其中呼叫的是job的moveToCompleted,中間還有一些呼叫,最終會呼叫lua指令碼moveToFinished:

  ...
  -- Try to get next job to avoid an extra roundtrip if the queue is not closing, 
  -- and not rate limited.
  ...
複製程式碼

該指令碼到作用是將job移動到completed或failed佇列,然後取下一個job.

processJob執行完後就又重複執行processJobs,這就是一個迴圈,這個是核心,如下圖:

image.png

新增任務

直接看add函式:

Queue.prototype.add = function (name, data, opts) {
  ...
  if (opts.repeat) {
    ...
  } else {
    return Job.create(this, name, data, opts);
  }
};
複製程式碼

呼叫的是Job中的create函式:

Job.create = function (queue, name, data, opts) {
  const job = new Job(queue, name, data, opts); // 1. 建立job

  return queue
    .isReady()
    .then(() => {
      return addJob(queue, job); // 2. 新增job到佇列中
    })
    ...
};
複製程式碼

繼續沿著addJob,最終會呼叫的是lua指令碼的addJob,根據job設定將job存入redis.

問題

1. 為什麼會出現錯誤: job stalled more than allowable limit

在run函式中執行了函式this.startMoveUnlockedJobsToWait(),來看看該函式:

Queue.prototype.startMoveUnlockedJobsToWait = function () {
  clearInterval(this.moveUnlockedJobsToWaitInterval);
  if (this.settings.stalledInterval > 0 && !this.closing) {
    this.moveUnlockedJobsToWaitInterval = setInterval(
      this.moveUnlockedJobsToWait,
      this.settings.stalledInterval
    );
  }
};
複製程式碼

該函式是用來定時執行moveUnlockedJobsToWait函式:

Queue.prototype.moveUnlockedJobsToWait = function () {
  ...
  return scripts
    .moveUnlockedJobsToWait(this)
    .then(([failed, stalled]) => {
      const handleFailedJobs = failed.map(jobId => {
        return this.getJobFromId(jobId).then(job => {
          this.emit(
            'failed',
            job,
            new Error('job stalled more than allowable limit'),
            'active'
          );
          return null;
        });
      });
      ...
    })
    ...
    ;
};
複製程式碼

該函式會通過scripts的moveUnlockedJobsToWait函式最終呼叫lua指令碼moveUnlockedJobsToWait:

  ...
  local MAX_STALLED_JOB_COUNT = tonumber(ARGV[1])
  ...
        if(stalledCount > MAX_STALLED_JOB_COUNT) then
          rcall("ZADD", KEYS[4], ARGV[3], jobId)
          rcall("HSET", jobKey, "failedReason", "job stalled more than allowable limit")
          table.insert(failed, jobId)
        else
          -- Move the job back to the wait queue, to immediately be picked up by a waiting worker.
          rcall("RPUSH", dst, jobId)
          rcall('PUBLISH', KEYS[1] .. '@', jobId)
          table.insert(stalled, jobId)
        end
  ...
return {failed, stalled}
複製程式碼
  • MAX_STALLED_JOB_COUNT: 預設為1

該指令碼會將stalled的job取出並返回,從而生成如題錯誤.

參考

相關文章