MQTT協議 paho.mqtt.golang keepAlive原始碼淺析

# 閱讀本文,你將瞭解:
# 1.MQTT 協議KeepAlive機制
# 2.MQTT 協議KeepAlive機制 golang實現原理
# 3.關於KeepAlive 時長設定建議


最近在公司做mqtt協議壓測的時候,發現少量mqtt裝置在執行publish過程中,connect連線被協議層主動斷掉了。而客戶端catch到的errors只有一句var ErrNotConnected = errors.New("Not Connected"),但已知網路通訊沒有問題,且研發大佬表示MQTT broker負載並沒有打滿。因此自己便開始嘗試看paho.mqtt.golang包的相關原始碼,從中獲取一些有用的資訊,便有了此文。

1. MQTT 協議 Keep ALive機制

MQTT Keep Alive

MQTT includes a keep alive function that provides a workaround for the issue of half-open connections (or at least makes it possible to assess if the connection is still open).


Keep alive ensures that the connection between the broker and client is still open and that the broker and the client are aware of being connected. When the client establishes a connection to the broker, the client communicates a time interval in seconds to the broker. This interval defines the maximum length of time that the broker and client may not communicate with each other.


The MQTT specification says the following:

“The Keep Alive … is the maximum time interval that is permitted to elapse between the point at which the Client finishes transmitting one Control Packet and the point it starts sending the next. It is the responsibility of the Client to ensure that the interval between Control Packets being sent does not exceed the Keep Alive value. In the absence of sending any other Control Packets, the Client MUST send a PINGREQ Packet.”

As long as messages are exchanged frequently and the keep-alive interval is not exceeded, there is no need to send an extra message to establish whether the connection is still open.


If the client does not send a messages during the keep-alive period, it must send a PINGREQ packet to the broker to confirm that it is available and to make sure that the broker is also still available.


The broker must disconnect a client that does not send a message or a PINGREQ packet in one and a half times the keep alive interval. Likewise, the client is expected to close the connection if it does not receive a response from the broker in a reasonable amount of time.



通過上文資訊,我們可以知道,mqtt協議層支援keepAlive機制,使之client與broker之間能維持一個長連線,而要保持一個長連線,就需client與broker之間存在間隔通訊,或者payload 等操作。mqtt協議是基於TCP協議之上的,因此也就不難理解client與broker之間的keepAlive通訊機制了。

2.MQTT keepAlive golang原始碼淺析


// 這是[paho.mqtt.golang](包中 ping的原始碼
func keepalive(c *client) {
	defer c.workers.Done()
	DEBUG.Println(PNG, "keepalive starting")
	var checkInterval int64
	var pingSent time.Time
//從 client 物件中,獲取設定的keepAlive值,即opts.SetKeepAlive(time.Duration(ktime) * time.Second)
	if c.options.KeepAlive > 10 {
		checkInterval = 5
	} else {
		checkInterval = c.options.KeepAlive / 2
  // 建立一個timeTicker(),來輪巡的檢查client的keepAlive 值
  // 注意: 這裡可以看出,輪巡機制的newTicker()時間間隔是在1~5s內做一次輪巡
	intervalTicker := time.NewTicker(time.Duration(checkInterval * int64(time.Second)))
	defer intervalTicker.Stop()
	for {
		select {
		case <-c.stop:
			DEBUG.Println(PNG, "keepalive stopped")
		case <-intervalTicker.C:
			lastSent := c.lastSent.Load().(time.Time)//最近一次client 傳送pingrep包時間
			lastReceived := c.lastReceived.Load().(time.Time)//最近一次client 接受pingresp包的時間

			DEBUG.Println(PNG, "ping check", time.Since(lastSent).Seconds())
      // 如果符合,上次傳送間隔>=keepAlive 或則 最近接受pingresp包時間>=keepalive 進入client 傳送pingrep邏輯中
			if time.Since(lastSent) >= time.Duration(c.options.KeepAlive*int64(time.Second)) || time.Since(lastReceived) >= time.Duration(c.options.KeepAlive*int64(time.Second)) {
				if atomic.LoadInt32(&c.pingOutstanding) == 0 {
					DEBUG.Println(PNG, "keepalive sending ping")
					ping := packets.NewControlPacket(packets.Pingreq).(*packets.PingreqPacket)
					//We don't want to wait behind large messages being sent, the Write call
					//will block until it it able to send the packet.
					atomic.StoreInt32(&c.pingOutstanding, 1) //將 c.pingOutstanding更新為1
					ping.Write(c.conn)// 向broker 傳送ping包
					c.lastSent.Store(time.Now())//更新最近一次client  send pingrep time
					pingSent = time.Now()// 更新pingrep send time 
      //  判斷client 是否接受到pingresp包
      // 	注意預設的 PingTimeout: 10 * time.Second
      //如果c.pingOutstanding>0 且 pingSent間隔>PingTimeout 說明client 沒有接受到broker的 pingresp包
			if atomic.LoadInt32(&c.pingOutstanding) > 0 && time.Now().Sub(pingSent) >= c.options.PingTimeout {
				CRITICAL.Println(PNG, "pingresp not received, disconnecting")
				c.errors <- errors.New("pingresp not received, disconnecting")

3. 總結


  • timeTricker()輪巡週期的巧妙處理

    if c.options.KeepAlive > 10 {
    	checkInterval = 5
    } else {
    	checkInterval = c.options.KeepAlive / 2
    intervalTicker := time.NewTicker(time.Duration(checkInterval * int64(time.Second)))

    我認為timeticker()時間週期太短,會導致cpu頻繁切換上下文。因此我個人建議,keepAlive time的間隔時長設定在(30~60)s之間;不太建議將keepAlive 時間間隔設定為<10s
