從一次翻車現場到GCD的原始碼分析

___克里斯___發表於2020-03-01

一切都起源於一次Fabric上的crash分析。

Crash Log

Fabric上突然出現一些下載業務使用GCD group引發的crash,如下:

#0. Crashed: com.apple.main-thread
0  libdispatch.dylib              0x192759b3c dispatch_group_leave.cold.1 + 36
1  libdispatch.dylib              0x19272ad84 _dispatch_group_wake + 114
2  MTXX                           0x103be1af8 __38-[xxxxxx downloadCompletion]_block_invoke + 108 (xxxxxx.m:108)
3  libdispatch.dylib              0x192728b7c _dispatch_call_block_and_release + 32
4  libdispatch.dylib              0x192729fd8 _dispatch_client_callout + 20
5  libdispatch.dylib              0x192735cc8 _dispatch_main_queue_callback_4CF + 968
6  CoreFoundation                 0x1929ffcc8 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 16
7  CoreFoundation                 0x1929faa24 __CFRunLoopRun + 1980
8  CoreFoundation                 0x1929f9f40 CFRunLoopRunSpecific + 480
9  GraphicsServices               0x19cc8a534 GSEventRunModal + 108
10 UIKitCore                      0x196b85580 UIApplicationMain + 1940
11 MTXX                           0x105c6af10 main + 16 (main.m:16)
12 libdyld.dylib                  0x192878e18 start + 4
複製程式碼

憑藉以前的經驗,這顯然是GCD group的enter/leave沒有匹配引發的問題。dispatch_group_enter函式已經明確說了要跟dispatch_group_leave成對使用。

/*!
 * @function dispatch_group_enter
 *
 * @abstract
 * Manually indicate a block has entered the group
 *
 * @discussion
 * Calling this function indicates another block has joined the group through
 * a means other than dispatch_group_async(). Calls to this function must be
 * balanced with dispatch_group_leave().
 *
 * @param group
 * The dispatch group to update.
 * The result of passing NULL in this parameter is undefined.
 */
API_AVAILABLE(macos(10.6), ios(4.0))
DISPATCH_EXPORT DISPATCH_NONNULL_ALL DISPATCH_NOTHROW
void
dispatch_group_enter(dispatch_group_t group);
複製程式碼

第一次分析

那麼,經過仔細的review,發現確實有一處漏洞可能導致dispatch_group_leave不執行。程式碼邏輯大概如下,僅列出了本文可能相關的部分虛擬碼:

- (void)downloadURLs:(NSURL *)urls finishCompletion:(void(^)(NSURL *URL))finishCompletion {
    dispatch_group_t dispatchGroup = dispatch_group_create();
    for (NSURL *url in urls) {
        dispatch_group_enter(dispatchGroup);
        [self downloadURL:url finishCompletion:^(NSURL *url, BOOL isSuccess) {
            // 下載成功與否的邏輯程式碼
            // xxxxx
            dispatch_group_leave(dispatchGroup);
        }];
    }
    dispatch_group_notify(dispatchGroup, dispatch_get_main_queue(), ^{
        if (finishCompletion) {
            finishCompletion();
        }
    });
}

- (void)downloadURL:(NSURL *)url finishCompletion:(void(^)(NSURL *URL, BOOL isSuccess))finishCompletion {
    // 各種邏輯,if-else判斷等。。。專案程式碼比較久了的原因。

    // 其中有一個暫停任務的判斷,大概程式碼如下:
    DownloadItem *downloadItem = [self downloadItemForURL:url];
    if (downloadItem正在暫停) {
        // 繼續下載操作
        return;
    }

    // xxxxxx
    // 觸發實際的下載操作
}
複製程式碼

注意,因為程式碼比較久的原因,執行繼續下載操作的時候,並未將finishCompletion傳遞,因此finishCompletion也就沒有機會執行了。所以導致group的enter/leave不匹配,修改程式碼如下:

if (downloadItem正在暫停) {
    // 繼續下載操作
    downloadItem.finishCompletion = finishCompletion;
    return;
}
複製程式碼

一番探索

改了之後,心裡卻依然感覺不太踏實,果真就是這樣修改的麼?

仔細思考這一番解釋:

dispatch_group_enter: Calling this function indicates another block has joined the group through a means other than dispatch_group_async(). Calls to this function must be balanced with dispatch_group_leave().

也沒說缺少dispatch_group_leave就會導致崩潰?那就用程式碼來試一試:

試驗程式碼

缺少dispatch_group_leave

- (void)group_leave_not_crash_1 {
    dispatch_group_t group = dispatch_group_create();
    
    dispatch_group_enter(group);
    dispatch_async(dispatch_get_global_queue(0, 0), ^{
        NSLog(@"global_queue block 1");
    });
    
    dispatch_group_enter(group);
    dispatch_async(dispatch_get_global_queue(0, 0), ^{
        NSLog(@"global_queue block 2");
    });
    
    dispatch_group_notify(group, dispatch_get_main_queue(), ^{
        NSLog(@"dispatch_group_notify");
    });
    
    NSLog(@"done");
}
複製程式碼

輸出:

done
global_queue block 1
global_queue block 2
複製程式碼

並未發生崩潰。啪啪打臉的聲音倒是有的。

缺少dispatch_group_enter

- (void)group_leave_crash {
    dispatch_group_t group = dispatch_group_create();
    
    dispatch_async(dispatch_get_global_queue(0, 0), ^{
        NSLog(@"dispatch_group_notify main_queue block 1");
        dispatch_group_leave(group);
    });
    
    dispatch_async(dispatch_get_global_queue(0, 0), ^{
        NSLog(@"dispatch_group_notify main_queue block 2");
        dispatch_group_leave(group);
    });
    
    dispatch_group_notify(group, dispatch_get_main_queue(), ^{
        NSLog(@"dispatch_group_notify");
    });
    
    NSLog(@"done");
}
複製程式碼

兩句過度呼叫dispatch_group_leave的地方都會導致崩潰。

dispatch_group_enter與dispatch_group_leave不嚴格匹配

- (void)group_leave_not_crash_2 {
    dispatch_group_t group = dispatch_group_create();
    
    dispatch_group_enter(group);
    dispatch_async(dispatch_get_global_queue(0, 0), ^{
        NSLog(@"global_queue block 1");
    });
    
    dispatch_async(dispatch_get_global_queue(0, 0), ^{
        NSLog(@"global_queue block 2");
        dispatch_group_leave(group);
    });
    
    dispatch_group_notify(group, dispatch_get_main_queue(), ^{
        NSLog(@"dispatch_group_notify");
    });
    
    NSLog(@"done");
}

- (void)group_leave_not_crash_3 {
    dispatch_group_t group = dispatch_group_create();
    
    dispatch_async(dispatch_get_global_queue(0, 0), ^{
        NSLog(@"global_queue block 1");
        dispatch_group_leave(group);
    });
    
    dispatch_group_enter(group);
    dispatch_async(dispatch_get_global_queue(0, 0), ^{
        NSLog(@"global_queue block 2");
    });
    
    dispatch_group_notify(group, dispatch_get_main_queue(), ^{
        NSLog(@"dispatch_group_notify");
    });
    
    NSLog(@"done");
}
複製程式碼

輸出結果都是:

done
global_queue block 1
global_queue block 2
dispatch_group_notify
複製程式碼

dispatch_group_enter與dispatch_group_leave並未嚴格地一一對應,但dispatch_group_notify的那個notification block成功執行了。這個有點奇怪。。。

結論

  1. 僅有dispatch_group_enter,缺少dispatch_group_leave,不會有問題
  2. 缺少dispatch_group_enter,執行dispatch_group_leave,直接導致崩潰
  3. dispatch_group_enter與dispatch_group_leave不嚴格匹配,但是個數匹配,不會有問題

libdispatch的原始碼解析

分析崩潰堆疊

缺少dispatch_group_enter的那個demo,是在dispatch_group_leave(group);那一行直接導致的崩潰:Thread 1: EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)。如果列印group物件,為 <OS_dispatch_group: group[0x600001088190] = { xref = 1, ref = 1, count = 1073741823, gen = 0, waiters = 0, notifs = 0 }>

看一看呼叫堆疊:

0x108e8fb30 <+0>:  pushq  %rbp
     0x108e8fb31 <+1>:  movq   %rsp, %rbp
     0x108e8fb34 <+4>:  subq   $0x20, %rsp
     0x108e8fb38 <+8>:  movq   %rdi, -0x8(%rbp)
     0x108e8fb3c <+12>: movq   %rsi, -0x10(%rbp)
     0x108e8fb40 <+16>: callq  0x108e900a8               ; symbol stub for: dispatch_group_create
     0x108e8fb45 <+21>: movq   %rax, -0x18(%rbp)
     0x108e8fb49 <+25>: movq   -0x18(%rbp), %rdi
     0x108e8fb4d <+29>: callq  0x108e900b4               ; symbol stub for: dispatch_group_leave
 ->  0x108e8fb52 <+34>: xorl   %ecx, %ecx
     0x108e8fb54 <+36>: movl   %ecx, %esi
     0x108e8fb56 <+38>: leaq   -0x18(%rbp), %rax
     0x108e8fb5a <+42>: movq   %rax, %rdi
     0x108e8fb5d <+45>: callq  0x108e900f6               ; symbol stub for: objc_storeStrong
     0x108e8fb62 <+50>: addq   $0x20, %rsp
     0x108e8fb66 <+54>: popq   %rbp
     0x108e8fb67 <+55>: retq
複製程式碼
libdispatch.dylib`dispatch_group_leave:
     0x10f528955 <+0>:  movl   $0x4, %eax
     0x10f52895a <+5>:  lock
     0x10f52895b <+6>:  xaddq  %rax, 0x30(%rdi)
     0x10f528960 <+11>: cmpl   $-0x4, %eax
     0x10f528963 <+14>: jae    0x10f52896d               ; <+24>
     0x10f528965 <+16>: andl   $-0x4, %eax
     0x10f528968 <+19>: testl  %eax, %eax
     0x10f52896a <+21>: je     0x10f5289a3               ; <+78>
     0x10f52896c <+23>: retq
     0x10f52896d <+24>: addq   $0x4, %rax
     0x10f528971 <+28>: movq   %rax, %rsi
     0x10f528974 <+31>: movq   %rax, %rcx
     0x10f528977 <+34>: andq   $-0x4, %rcx
     0x10f52897b <+38>: testl  $0xfffffffc, %esi         ; imm = 0xFFFFFFFC
     0x10f528981 <+44>: cmovneq %rax, %rcx
     0x10f528985 <+48>: andq   $-0x3, %rcx
     0x10f528989 <+52>: cmpq   %rcx, %rax
     0x10f52898c <+55>: je     0x10f528999               ; <+68>
     0x10f52898e <+57>: movq   %rsi, %rax
     0x10f528991 <+60>: lock
     0x10f528992 <+61>: cmpxchgq %rcx, 0x30(%rdi)
     0x10f528997 <+66>: jne    0x10f528971               ; <+28>
     0x10f528999 <+68>: movl   $0x1, %edx
     0x10f52899e <+73>: jmp    0x10f5289af               ; _dispatch_group_wake
     0x10f5289a3 <+78>: pushq  %rbp
     0x10f5289a4 <+79>: movq   %rsp, %rbp
     0x10f5289a7 <+82>: movq   %rax, %rdi
     0x10f5289aa <+85>: callq  0x10f55a66d               ; dispatch_group_leave.cold.1
複製程式碼
libdispatch.dylib`dispatch_group_leave.cold.1:
     0x10f55a66d <+0>:  movq   %rdi, %rax
     0x10f55a670 <+3>:  leaq   0x5bd6(%rip), %rcx        ; "BUG IN CLIENT OF LIBDISPATCH: Unbalanced call to dispatch_group_leave()"
     0x10f55a677 <+10>: movq   %rcx, 0x27ad2(%rip)       ; gCRAnnotations + 8
     0x10f55a67e <+17>: movq   %rax, 0x27afb(%rip)       ; gCRAnnotations + 56
 ->  0x10f55a685 <+24>: ud2
複製程式碼

崩潰的關鍵資訊如下,也指明瞭確實是引發了Unbalanced call,而且跟Fabric上的crash log一致。

Thread 1: EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)

dispatch_group_leave(group);
callq  0x10f55a66d               ; dispatch_group_leave.cold.1
"BUG IN CLIENT OF LIBDISPATCH: Unbalanced call to dispatch_group_leave()"
複製程式碼

因此,可以確定該crash也同樣是過度呼叫了dispatch_group_leave函式導致的,所以第一次的修改果然是錯誤的。

過度呼叫dispatch_group_leave確實會崩潰,但具體原因是什麼?想弄懂以上的這些,只能去研究GCD的原始碼了。

dispatch_group_leave

dispatch_group_leave的原始碼如下:

void
dispatch_group_leave(dispatch_group_t dg)
{
	// The value is incremented on a 64bits wide atomic so that the carry for
	// the -1 -> 0 transition increments the generation atomically.
	uint64_t new_state, old_state = os_atomic_add_orig2o(dg, dg_state,
			DISPATCH_GROUP_VALUE_INTERVAL, release);
	uint32_t old_value = (uint32_t)(old_state & DISPATCH_GROUP_VALUE_MASK);

	if (unlikely(old_value == DISPATCH_GROUP_VALUE_1)) {
		old_state += DISPATCH_GROUP_VALUE_INTERVAL;
		do {
			new_state = old_state;
			if ((old_state & DISPATCH_GROUP_VALUE_MASK) == 0) {
				new_state &= ~DISPATCH_GROUP_HAS_WAITERS;
				new_state &= ~DISPATCH_GROUP_HAS_NOTIFS;
			} else {
				// If the group was entered again since the atomic_add above,
				// we can't clear the waiters bit anymore as we don't know for
				// which generation the waiters are for
				new_state &= ~DISPATCH_GROUP_HAS_NOTIFS;
			}
			if (old_state == new_state) break;
		} while (unlikely(!os_atomic_cmpxchgv2o(dg, dg_state,
				old_state, new_state, &old_state, relaxed)));
		return _dispatch_group_wake(dg, old_state, true);
	}

	if (unlikely(old_value == 0)) {
		DISPATCH_CLIENT_CRASH((uintptr_t)old_value,
				"Unbalanced call to dispatch_group_leave()");
	}
}
複製程式碼

Unbalanced call出現的時機,就是old_value為0的時候。os_atomic_add_orig2o操作是一個加操作,即往dispatch_group_t物件中的某個欄位dg_bits加一個值DISPATCH_GROUP_VALUE_INTERVAL,而加之前的舊值就是old_value。

所以,當old_value已經為0的時候,再執行dispatch_group_leave呼叫,就會觸發Unbalanced call的崩潰。

dispatch_group_enter

那只有一個dispatch_group_enter,而沒有對應的leave是不會崩潰的。如果是因為dispatch_group_enter的Unbalanced call,會出現什麼情況呢?

void
dispatch_group_enter(dispatch_group_t dg)
{
	// The value is decremented on a 32bits wide atomic so that the carry
	// for the 0 -> -1 transition is not propagated to the upper 32bits.
	uint32_t old_bits = os_atomic_sub_orig2o(dg, dg_bits,
			DISPATCH_GROUP_VALUE_INTERVAL, acquire);
	uint32_t old_value = old_bits & DISPATCH_GROUP_VALUE_MASK;
	if (unlikely(old_value == 0)) {
		_dispatch_retain(dg); // <rdar://problem/22318411>
	}
	if (unlikely(old_value == DISPATCH_GROUP_VALUE_MAX)) {
		DISPATCH_CLIENT_CRASH(old_bits,
				"Too many nested calls to dispatch_group_enter()");
	}
}
複製程式碼

這個enter就很好理解了。os_atomic_sub_orig2o操作是一個減操作,即往dispatch_group_t物件中的某個欄位dg_bits減一個值DISPATCH_GROUP_VALUE_INTERVAL,而減之前的舊值就是old_value。當old_value為DISPATCH_GROUP_VALUE_MAX的時候,再執行dispatch_group_enter呼叫,就會觸發Unbalanced call的崩潰。

測試一下:

- (void)group_enter_crash_1 {
    dispatch_group_t group = dispatch_group_create();
    
    while (YES) {
        dispatch_group_enter(group); // 要挺久的,直接觸發dispatch_group_enter.cold.2
        // <OS_dispatch_group: group[0x600003c73a70] = { xref = 1, ref = 2, count = 0, gen = 0, waiters = 0, notifs = 0 }>
    }
}
複製程式碼

確實發生了崩潰,不過需要幾秒鐘,要使得os_atomic_sub_orig2o操作發生相當多的數量,才能使得old_value為DISPATCH_GROUP_VALUE_MAX的條件發生。此時的關鍵堆疊資訊為:

Thread 1: EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    
dispatch_group_enter
callq  0x10e155687               ; dispatch_group_enter.cold.1
movl   %eax, %edi
callq  0x10e155697               ; dispatch_group_enter.cold.2
"BUG IN CLIENT OF LIBDISPATCH: Too many nested calls to dispatch_group_enter()"
複製程式碼

因此,GCD的group enter/leave操作,就是會對一個欄位值執行加/減操作,而避免Unbalanced call的方式就是成對出現。這也解釋了dispatch_group_enter與dispatch_group_leave不嚴格匹配的時候,不會導致崩潰的原因。

dispatch_group_create

dispatch_group_create的原始碼如下,顯然就只是一個初始化操作,然後給對應的group enter/leave需要的欄位值賦一個初始值,這裡應該是0。

DISPATCH_ALWAYS_INLINE
static inline dispatch_group_t
_dispatch_group_create_with_count(uint32_t n)
{
	dispatch_group_t dg = _dispatch_object_alloc(DISPATCH_VTABLE(group),
			sizeof(struct dispatch_group_s));
	dg->do_next = DISPATCH_OBJECT_LISTLESS;
	dg->do_targetq = _dispatch_get_default_queue(false);
	if (n) {
		os_atomic_store2o(dg, dg_bits,
				-n * DISPATCH_GROUP_VALUE_INTERVAL, relaxed);
		os_atomic_store2o(dg, do_ref_cnt, 1, relaxed); // <rdar://22318411>
	}
	return dg;
}

dispatch_group_t
dispatch_group_create(void)
{
	return _dispatch_group_create_with_count(0);
}
複製程式碼

dispatch group enter/leave的原理

到這裡,已經基本明確了dispatch_group_enter和dispatch_group_leave的原理。dispatch_group_enter將dispatch_group_t物件中的某個欄位dg_bits的值執行減操作(減一),而dispatch_group_leave將其執行加操作(加一)。當dispatch_group_leave執行的時候,一定要確保之前呼叫過dispatch_group_enter(該欄位值小於一),這也就是balanced call的意思。

第二次分析

有了以上的分析,已經可以明確第一次的分析是錯誤的。

再看一下downloadURL的實際操作中finishCompletion的呼叫時機:

- (void)downloadURL:(NSURL *)url finishCompletion:(void(^)(NSURL *URL, BOOL isSuccess))finishCompletion {
    // 各種邏輯,if-else判斷等。。。專案程式碼比較久了的原因。

    if (根據url和downloadItems判定,是否正在下載中) {
        // 相應操作
        return;
    }
    
    // 其中有一個暫停任務的判斷,大概程式碼如下:
    DownloadItem *downloadItem = [self downloadItemForURL:url];
    if (downloadItem正在暫停) {
        // 繼續下載操作
        return;
    }

    // xxxxxx
    // 觸發實際的下載操作
    // 1. 使用url構建一個NSURLRequest,再構建一個AFDownloadRequestOperation
    // 2. 根據url構建一個downloadItem物件,傳入下載完成回撥finishCompletion,裝入downloadItems字典中。
    // 3. 設定CompletionBlock,其中根據url來獲取downloadItem,根據條件來執行其finishCompletion
    // 4. 新增到queue中,發起下載請求
    [task setCompletionBlockWithSuccess:^(AFHTTPRequestOperation *operation, id responseObject) {
        DownloadItem *downloadItem = [self getDownloadItem:operation.request.URL];
        // 根據下載狀態,執行downloadItem中的finishCompletion
    }];
}
複製程式碼

程式碼中使用downloadItems字典來儲存下載封裝物件downloadItem,finishCompletion即為外部傳入的下載完成回撥。。

self.downloadItems[url] = downloadItem;

- (DownloadItem *)getDownloadItem:(NSURL *)url
{
    return self.downloadItems[url.absoluteString];
}
複製程式碼

雖然程式碼比較久了,但流程看起來好像沒啥問題。。。然而仔細一想,涉及到downloadItems字典的邏輯貌似最容易埋坑,思考一番果然恍然大悟。

問題確實就出在downloadItems字典這一塊:

  1. 假設傳入URL,構建downloadItem A物件,傳入finishCompletion A,存入downloadItems字典中。根據URL發起下載操作 A。這一步流程正常。
  2. 再次下載同樣一個連結URL,正常不會有問題,那萬一出現多執行緒場景呢?程式碼對downloadItems字典的相關操作沒有做執行緒保護。
  3. 假設多執行緒場景下:使用同一個URL,可能同時符合過濾條件,而觸發實際的下載操作。即,構建downloadItem B物件,傳入finishCompletion B,存入downloadItems字典中。根據URL發起下載操作 B。
  4. 此時downloadItems字典中,URL對應的downloadItem從之前的downloadItem A,變成了downloadItem B。
  5. 兩個下載操作都完成後,根據URL取出downloadItem,此時只能取到downloadItem B,執行finishCompletion B,裡邊包含一個dispatch_group_leave操作。因此下載操作 A和下載操作 B都會觸發finishCompletion B。導致B相關流程,出現dispatch_group_leave的Unbalanced call,導致崩潰。

知道了根本原因就好辦了,改動其實也很簡單,在下載任務task完成的回撥setCompletionBlockWithSuccess中,不要從downloadItems字典中取出downloadItem。而是通過捕獲當前的區域性變數downloadItem即可獲取到正確的downloadItem。

總結

iOS相關的官方文件,大部分都寫得非常好。但是也有個別一些,如GCD group,寫得太簡略,讓人很容易似懂非懂。這個時候,就是需要show me the code的時候了。

One More Thing

知道了GCD group enter/leave的原理,相信以後便不會再犯類似的錯誤了。最後,還有一個疑問,dispatch_group_notify裡邊的notification block到底是如何觸發執行的呢?

dispatch_group_t

關於dispatch_group_t這個結構體,之前一直沒有分析。

typedef struct dispatch_group_s *dispatch_group_t;

struct dispatch_group_s {
	DISPATCH_OBJECT_HEADER(group);
	DISPATCH_UNION_LE(uint64_t volatile dg_state,
			uint32_t dg_bits,
			uint32_t dg_gen
	) DISPATCH_ATOMIC64_ALIGN;
	struct dispatch_continuation_s *volatile dg_notify_head;
	struct dispatch_continuation_s *volatile dg_notify_tail;
};
複製程式碼

dg_bits是enter/leave需要的欄位值,而該值在其他GCD介面中也需要使用。兩個dispatch_continuation_t物件,dg_notify_head和dg_notify_tail則是group notification block相關的結構了,可以看出封裝notification block的結構是以連結串列形式儲存的group中的。

typedef struct dispatch_continuation_s {
	DISPATCH_CONTINUATION_HEADER(continuation);
} *dispatch_continuation_t;

// If dc_flags is less than 0x1000, then the object is a continuation.
// Otherwise, the object has a private layout and memory management rules. The
// layout until after 'do_next' must align with normal objects.
#define DISPATCH_CONTINUATION_HEADER(x) \
	union { \
		const void *do_vtable; \
		uintptr_t dc_flags; \
	}; \
	union { \
		pthread_priority_t dc_priority; \
		int dc_cache_cnt; \
		uintptr_t dc_pad; \
	}; \
	struct voucher_s *dc_voucher; \
	struct dispatch_##x##_s *volatile do_next; \
	dispatch_function_t dc_func; \
	void *dc_ctxt; \
	void *dc_data; \
	void *dc_other
複製程式碼

dispatch_continuation_t結構體的內容其實不多,不過沒啥註釋,基本看不出來啥。

dispatch_group_notify

看一下dispatch_group_notify的原始碼:

DISPATCH_ALWAYS_INLINE
static inline void
_dispatch_group_notify(dispatch_group_t dg, dispatch_queue_t dq,
		dispatch_continuation_t dsn)
{
	uint64_t old_state, new_state;
	dispatch_continuation_t prev;

	dsn->dc_data = dq;
	_dispatch_retain(dq);

	prev = os_mpsc_push_update_tail(os_mpsc(dg, dg_notify), dsn, do_next);
	if (os_mpsc_push_was_empty(prev)) _dispatch_retain(dg);
	os_mpsc_push_update_prev(os_mpsc(dg, dg_notify), prev, dsn, do_next);
	if (os_mpsc_push_was_empty(prev)) {
		os_atomic_rmw_loop2o(dg, dg_state, old_state, new_state, release, {
			new_state = old_state | DISPATCH_GROUP_HAS_NOTIFS;
			if ((uint32_t)old_state == 0) {
				os_atomic_rmw_loop_give_up({
					return _dispatch_group_wake(dg, new_state, false);
				});
			}
		});
	}
}

DISPATCH_NOINLINE
void
dispatch_group_notify_f(dispatch_group_t dg, dispatch_queue_t dq, void *ctxt,
		dispatch_function_t func)
{
	dispatch_continuation_t dsn = _dispatch_continuation_alloc();
	_dispatch_continuation_init_f(dsn, dq, ctxt, func, 0, DC_FLAG_CONSUME);
	_dispatch_group_notify(dg, dq, dsn);
}

#ifdef __BLOCKS__
void
dispatch_group_notify(dispatch_group_t dg, dispatch_queue_t dq,
		dispatch_block_t db)
{
	dispatch_continuation_t dsn = _dispatch_continuation_alloc();
	_dispatch_continuation_init(dsn, dq, db, 0, DC_FLAG_CONSUME);
	_dispatch_group_notify(dg, dq, dsn);
}
#endif
複製程式碼

notification block的執行,顯然是_dispatch_group_wake呼叫觸發的。若dispatch_group_notify函式呼叫之前,並未有執行過dispatch_group_enter,則會直接觸發_dispatch_group_wake。

dispatch_group_notify函式會使用_dispatch_continuation_init函式,將一個dispatch_block_t物件db存入dispatch_group_t物件dg中。

_dispatch_continuation_init

_dispatch_continuation_init函式中則是對dispatch_continuation_t物件的各種初始化操作。

DISPATCH_ALWAYS_INLINE
static inline dispatch_qos_t
_dispatch_continuation_init_f(dispatch_continuation_t dc,
		dispatch_queue_class_t dqu, void *ctxt, dispatch_function_t f,
		dispatch_block_flags_t flags, uintptr_t dc_flags)
{
	pthread_priority_t pp = 0;
	dc->dc_flags = dc_flags | DC_FLAG_ALLOCATED;
	dc->dc_func = f;
	dc->dc_ctxt = ctxt;
	// in this context DISPATCH_BLOCK_HAS_PRIORITY means that the priority
	// should not be propagated, only taken from the handler if it has one
	if (!(flags & DISPATCH_BLOCK_HAS_PRIORITY)) {
		pp = _dispatch_priority_propagate();
	}
	_dispatch_continuation_voucher_set(dc, flags);
	return _dispatch_continuation_priority_set(dc, dqu, pp, flags);
}

DISPATCH_ALWAYS_INLINE
static inline dispatch_qos_t
_dispatch_continuation_init(dispatch_continuation_t dc,
		dispatch_queue_class_t dqu, dispatch_block_t work,
		dispatch_block_flags_t flags, uintptr_t dc_flags)
{
	void *ctxt = _dispatch_Block_copy(work);

	dc_flags |= DC_FLAG_BLOCK | DC_FLAG_ALLOCATED;
	if (unlikely(_dispatch_block_has_private_data(work))) {
		dc->dc_flags = dc_flags;
		dc->dc_ctxt = ctxt;
		// will initialize all fields but requires dc_flags & dc_ctxt to be set
		return _dispatch_continuation_init_slow(dc, dqu, flags);
	}

	dispatch_function_t func = _dispatch_Block_invoke(work);
	if (dc_flags & DC_FLAG_CONSUME) {
		func = _dispatch_call_block_and_release;
	}
	return _dispatch_continuation_init_f(dc, dqu, ctxt, func, flags, dc_flags);
}

DISPATCH_ALWAYS_INLINE
static inline dispatch_qos_t
_dispatch_continuation_priority_set(dispatch_continuation_t dc,
		dispatch_queue_class_t dqu,
		pthread_priority_t pp, dispatch_block_flags_t flags)
{
	dispatch_qos_t qos = DISPATCH_QOS_UNSPECIFIED;
#if HAVE_PTHREAD_WORKQUEUE_QOS
	dispatch_queue_t dq = dqu._dq;

	if (likely(pp)) {
		bool enforce = (flags & DISPATCH_BLOCK_ENFORCE_QOS_CLASS);
		bool is_floor = (dq->dq_priority & DISPATCH_PRIORITY_FLAG_FLOOR);
		bool dq_has_qos = (dq->dq_priority & DISPATCH_PRIORITY_REQUESTED_MASK);
		if (enforce) {
			pp |= _PTHREAD_PRIORITY_ENFORCE_FLAG;
			qos = _dispatch_qos_from_pp_unsafe(pp);
		} else if (!is_floor && dq_has_qos) {
			pp = 0;
		} else {
			qos = _dispatch_qos_from_pp_unsafe(pp);
		}
	}
	dc->dc_priority = pp;
#else
	(void)dc; (void)dqu; (void)pp; (void)flags;
#endif
	return qos;
}
複製程式碼

注意_dispatch_continuation_init函式中,引數dispatch_block_t work即為傳入的notification block。

void *ctxt = _dispatch_Block_copy(work);
// xxxxxx
dc->dc_ctxt = ctxt;
複製程式碼

notification block實際存入了dispatch_continuation_t物件dc的dc_ctxt欄位中了。

_dispatch_group_wake

DISPATCH_NOINLINE
static void
_dispatch_group_wake(dispatch_group_t dg, uint64_t dg_state, bool needs_release)
{
	uint16_t refs = needs_release ? 1 : 0; // <rdar://problem/22318411>

	if (dg_state & DISPATCH_GROUP_HAS_NOTIFS) {
		dispatch_continuation_t dc, next_dc, tail;

		// Snapshot before anything is notified/woken <rdar://problem/8554546>
		dc = os_mpsc_capture_snapshot(os_mpsc(dg, dg_notify), &tail);
		do {
			dispatch_queue_t dsn_queue = (dispatch_queue_t)dc->dc_data;
			next_dc = os_mpsc_pop_snapshot_head(dc, tail, do_next);
			_dispatch_continuation_async(dsn_queue, dc,
					_dispatch_qos_from_pp(dc->dc_priority), dc->dc_flags);
			_dispatch_release(dsn_queue);
		} while ((dc = next_dc));

		refs++;
	}

	if (dg_state & DISPATCH_GROUP_HAS_WAITERS) {
		_dispatch_wake_by_address(&dg->dg_gen);
	}

	if (refs) _dispatch_release_n(dg, refs);
}

#define os_mpsc_capture_snapshot(Q, tail)  ({ \
		os_mpsc_node_type(Q) _head = os_mpsc_get_head(Q); \
		os_atomic_store(_os_mpsc_head Q, NULL, relaxed); \
		/* 22708742: set tail to NULL with release, so that NULL write */ \
		/* to head above doesn't clobber head from concurrent enqueuer */ \
		*(tail) = os_atomic_xchg(_os_mpsc_tail Q, NULL, release); \
		_head; \
	})

#define os_mpsc_pop_snapshot_head(head, tail, _o_next) ({ \
		typeof(head) _head = (head), _tail = (tail), _n = NULL; \
		if (_head != _tail) _n = os_mpsc_get_next(_head, _o_next); \
		_n; \
	})
複製程式碼

通過 os_mpsc_pop_snapshot_head 的定義,以及 next_dc = os_mpsc_pop_snapshot_head(dc, tail, do_next); 這一句程式碼,可以看出_dispatch_group_wake函式的主要邏輯也就是對dispatch_continuation_t next_dc這個一個連結串列結構,依次取出其中的元素dispatch_continuation_t dc,執行函式呼叫_dispatch_continuation_async,這也就是觸發notification block執行的實際程式碼。

_dispatch_continuation_async(dsn_queue, dc,
					_dispatch_qos_from_pp(dc->dc_priority), dc->dc_flags);
複製程式碼

_dispatch_continuation_async

DISPATCH_ALWAYS_INLINE
static inline void
_dispatch_continuation_async(dispatch_queue_class_t dqu,
		dispatch_continuation_t dc, dispatch_qos_t qos, uintptr_t dc_flags)
{
#if DISPATCH_INTROSPECTION
	if (!(dc_flags & DC_FLAG_NO_INTROSPECTION)) {
		_dispatch_trace_item_push(dqu, dc);
	}
#else
	(void)dc_flags;
#endif
	return dx_push(dqu._dq, dc, qos);
}
複製程式碼

看這個dx_push(dqu._dq, dc, qos);

#define dx_push(x, y, z) dx_vtable(x)->dq_push(x, y, z)
#define dx_vtable(x) (&(x)->do_vtable->_os_obj_vtable)
複製程式碼

這個do_vtable是啥呢?即為之前構建dispatch_continuation_t物件的時候,其中的DISPATCH_CONTINUATION_HEADER巨集定義中的欄位。

union { \
		const void *do_vtable; \
		uintptr_t dc_flags; \
	}; \
複製程式碼
#define DISPATCH_QUEUE_VTABLE_HEADER(x); \
	DISPATCH_OBJECT_VTABLE_HEADER(x); \
	void (*const dq_activate)(dispatch_queue_class_t, bool *allow_resume); \
	void (*const dq_wakeup)(dispatch_queue_class_t, dispatch_qos_t, \
			dispatch_wakeup_flags_t); \
	void (*const dq_push)(dispatch_queue_class_t, dispatch_object_t, \
			dispatch_qos_t)
複製程式碼

所以,由此可以看出,在_dispatch_group_wake呼叫時,通過將notification block丟入(dx_push)到指定的queue中,則完成了GCD group的一個完整流程。

_dispatch_workloop_push

關於 #define dx_push(x, y, z) dx_vtable(x)->dq_push(x, y, z) ,通過DISPATCH_VTABLE_INSTANCE巨集將dq_push與_dispatch_workloop_push關聯起來。

DISPATCH_VTABLE_INSTANCE(workloop,
	.do_type        = DISPATCH_WORKLOOP_TYPE,
	.do_dispose     = _dispatch_workloop_dispose,
	.do_debug       = _dispatch_queue_debug,
	.do_invoke      = _dispatch_workloop_invoke,

	.dq_activate    = _dispatch_queue_no_activate,
	.dq_wakeup      = _dispatch_workloop_wakeup,
	.dq_push        = _dispatch_workloop_push,
);
複製程式碼

_dispatch_workloop_push的函式原型如下:

void
_dispatch_workloop_push(dispatch_workloop_t dwl, dispatch_object_t dou,
		dispatch_qos_t qos)
{
	struct dispatch_object_s *prev;

	if (unlikely(_dispatch_object_is_waiter(dou))) {
		return _dispatch_workloop_push_waiter(dwl, dou._dsc, qos);
	}

	if (qos < _dispatch_priority_qos(dwl->dq_priority)) {
		qos = _dispatch_priority_qos(dwl->dq_priority);
	}
	if (qos == DISPATCH_QOS_UNSPECIFIED) {
		qos = _dispatch_priority_fallback_qos(dwl->dq_priority);
	}
	prev = _dispatch_workloop_push_update_tail(dwl, qos, dou._do);
	if (unlikely(os_mpsc_push_was_empty(prev))) {
		_dispatch_retain_2_unsafe(dwl);
	}
	_dispatch_workloop_push_update_prev(dwl, qos, prev, dou._do);
	if (unlikely(os_mpsc_push_was_empty(prev))) {
		return _dispatch_workloop_wakeup(dwl, qos, DISPATCH_WAKEUP_CONSUME_2 |
				DISPATCH_WAKEUP_MAKE_DIRTY);
	}
}
複製程式碼

前邊的 dx_push(dqu._dq, dc, qos); 即等同於 _dispatch_workloop_push(dqu._dq, dc, qos); 操作。

呼叫_dispatch_workloop_push即完成了將dispatch_continuation_t物件dc丟到dispatch_queue_class_t的_dq中(queue),同時還有qos引數。

至於queue中的block的實際執行程式碼,要繼續從GCD原始碼找答案了。這裡先埋一個坑,以後再填吧!

參考資料

  1. dispatch
  2. iOS疑難問題排查之深入探究dispatch_group crash

相關文章