Pthreads並行程式設計之spin lock與mutex效能對比分析

工程師WWW發表於2013-11-27

POSIX threads(簡稱Pthreads)是在多核平臺上進行並行程式設計的一套常用的API。執行緒同步(Thread Synchronization)是並行程式設計中非常重要的通訊手段，其中最典型的應用就是用Pthreads提供的鎖機制(lock)來對多個執行緒之間共享的臨界區(Critical Section)進行保護(另一種常用的同步機制是barrier)。

Pthreads提供了多種鎖機制：
(1) Mutex（互斥量）：pthread_mutex_***
(2) Spin lock（自旋鎖）：pthread_spin_***
(3) Condition Variable（條件變數）：pthread_con_***
(4) Read/Write lock（讀寫鎖）：pthread_rwlock_***

Pthreads提供的Mutex鎖操作相關的API主要有：
pthread_mutex_lock (pthread_mutex_t *mutex);
pthread_mutex_trylock (pthread_mutex_t *mutex);
pthread_mutex_unlock (pthread_mutex_t *mutex);

Pthreads提供的與Spin Lock鎖操作相關的API主要有：
pthread_spin_lock (pthread_spinlock_t *lock);
pthread_spin_trylock (pthread_spinlock_t *lock);
pthread_spin_unlock (pthread_spinlock_t *lock);

從實現原理上來講，Mutex屬於sleep-waiting型別的鎖。例如在一個雙核的機器上有兩個執行緒(執行緒A和執行緒B)，它們分別執行在Core0和Core1上。假設執行緒A想要通過pthread_mutex_lock操作去得到一個臨界區的鎖，而此時這個鎖正被執行緒B所持有，那麼執行緒A就會被阻塞(blocking)，Core0 會在此時進行上下文切換(Context Switch)將執行緒A置於等待佇列中，此時Core0就可以執行其他的任務(例如另一個執行緒C)而不必進行忙等待。而Spin lock則不然，它屬於busy-waiting型別的鎖，如果執行緒A是使用pthread_spin_lock操作去請求鎖，那麼執行緒A就會一直在 Core0上進行忙等待並不停的進行鎖請求，直到得到這個鎖為止。

如果大家去查閱Linux glibc中對pthreads API的實現NPTL(Native POSIX Thread Library) 的原始碼的話(使用”getconf GNU_LIBPTHREAD_VERSION”命令可以得到我們系統中NPTL的版本號)，就會發現pthread_mutex_lock()操作如果沒有鎖成功的話就會呼叫system_wait()的系統呼叫（現在NPTL的實現採用了使用者空間的futex，不需要頻繁進行系統呼叫，效能已經大有改善），並將當前執行緒加入該mutex的等待佇列裡。而spin lock則可以理解為在一個while(1)迴圈中用內嵌的彙編程式碼實現的鎖操作(印象中看過一篇論文介紹說在linux核心中spin lock操作只需要兩條CPU指令，解鎖操作只用一條指令就可以完成)。有興趣的朋友可以參考另一個名為sanos的微核心中pthreds API的實現：mutex.c spinlock.c，儘管與NPTL中的程式碼實現不盡相同，但是因為它的實現非常簡單易懂，對我們理解spin lock和mutex的特性還是很有幫助的。

那麼在實際程式設計中mutex和spin lcok哪個的效能更好呢？我們知道spin lock在Linux核心中有非常廣泛的利用，那麼這是不是說明spin lock的效能更好呢？下面讓我們來用實際的程式碼測試一下（請確保你的系統中已經安裝了最近的g++）。

//
 Name: spinlockvsmutex1.cc

//
 Source: 
http://www.alexonlinux.com/pthread-mutex-vs-pthread-spinlock

//
 Compiler(spin lock version): g++ -o spin_version -DUSE_SPINLOCK spinlockvsmutex1.cc -lpthread

//
 Compiler(mutex version): g++ -o mutex_version spinlockvsmutex1.cc -lpthread

#include
 <stdio.h>

#include
 <unistd.h>

#include
 <sys/syscall.h>

#include
 <errno.h>

#include
 <sys/time.h>

#include
 <list>

#include
 <pthread.h>

#define
 LOOPS 50000000

using

namespace 
std;

list<int>
 the_list;

#ifdef
 USE_SPINLOCK

pthread_spinlock_t
 spinlock;

#else

pthread_mutex_t
 mutex;

#endif

//Get
 the thread id

pid_t
 gettid() { return

syscall( __NR_gettid ); }

void

*consumer(void

*ptr)

{

    int

i;

    printf("Consumer
 TID %lun",
 (unsigned long)gettid());

    while

(1)

    {

#ifdef
 USE_SPINLOCK

        pthread_spin_lock(&spinlock);

#else

        pthread_mutex_lock(&mutex);

#endif

        if

(the_list.empty())

        {

#ifdef
 USE_SPINLOCK

            pthread_spin_unlock(&spinlock);

#else

            pthread_mutex_unlock(&mutex);

#endif

            break;

        }

        i
 = the_list.front();

        the_list.pop_front();

#ifdef
 USE_SPINLOCK

        pthread_spin_unlock(&spinlock);

#else

        pthread_mutex_unlock(&mutex);

#endif

    }

    return

NULL;

}

int

main()

{

    int

i;

    pthread_t
 thr1, thr2;

    struct

timeval tv1, tv2;

#ifdef
 USE_SPINLOCK

    pthread_spin_init(&spinlock,
 0);

#else

    pthread_mutex_init(&mutex,
 NULL);

#endif

    //
 Creating the list content...

    for

(i = 0; i < LOOPS; i++)

        the_list.push_back(i);

    //
 Measuring time before starting the threads...

    gettimeofday(&tv1,
 NULL);

    pthread_create(&thr1,
 NULL, consumer, NULL);

    pthread_create(&thr2,
 NULL, consumer, NULL);

    pthread_join(thr1,
 NULL);

    pthread_join(thr2,
 NULL);

    //
 Measuring time after threads finished...

    gettimeofday(&tv2,
 NULL);

    if

(tv1.tv_usec > tv2.tv_usec)

    {

        tv2.tv_sec--;

        tv2.tv_usec
 += 1000000;

    }

    printf("Result
 - %ld.%ldn",
 tv2.tv_sec - tv1.tv_sec,

        tv2.tv_usec
 - tv1.tv_usec);

#ifdef
 USE_SPINLOCK

    pthread_spin_destroy(&spinlock);

#else

    pthread_mutex_destroy(&mutex);

#endif

    return

0;

}

該程式執行過程如下：主執行緒先初始化一個list結構，並根據LOOPS的值將對應數量的entry插入該list，之後建立兩個新執行緒，它們都執行consumer()這個任務。兩個被建立的新執行緒同時對這個list進行pop操作。主執行緒會計算從建立兩個新執行緒到兩個新執行緒結束之間所用的時間，輸出為下文中的”Result “。

測試機器引數：
Ubuntu 9.04 X86_64
Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz
4.0 GB Memory

從下面是測試結果：

gchen@gchen-desktop:~/Workspace/mutex$
 g++ -o spin_version -DUSE_SPINLOCK spinvsmutex1.cc -lpthread

gchen@gchen-desktop:~/Workspace/mutex$
 g++ -o mutex_version spinvsmutex1.cc -lpthread

gchen@gchen-desktop:~/Workspace/mutex$
time

./spin_version

Consumer
 TID 5520

Consumer
 TID 5521

Result
 - 5.888750

 

real   
 0m10.918s

user   
 0m15.601s

sys  
  0m0.804s

 

gchen@gchen-desktop:~/Workspace/mutex$
time

./mutex_version

Consumer
 TID 5691

Consumer
 TID 5692

Result
 - 9.116376

 

real  
  0m14.031s

user  
  0m12.245s

sys  
  0m4.368s

可以看見spin lock的版本在該程式中表現出來的效能更好。另外值得注意的是sys時間，mutex版本花費了更多的系統呼叫時間，這就是因為mutex會在鎖衝突時呼叫system wait造成的。

但是，是不是說spin lock就一定更好了呢？讓我們再來看一個鎖衝突程度非常劇烈的例項程式：

//Name:
 svm2.c

//Source:

http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Locks

//Compile(spin
 lock version): gcc -o spin -DUSE_SPINLOCK svm2.c -lpthread

//Compile(mutex
 version): gcc -o mutex svm2.c -lpthread

#include
 <stdio.h>

#include
 <stdlib.h>

#include
 <pthread.h>

#include
 <sys/syscall.h>

#define       
 THREAD_NUM     2

pthread_t
 g_thread[THREAD_NUM];

#ifdef
 USE_SPINLOCK

pthread_spinlock_t
 g_spin;

#else

pthread_mutex_t
 g_mutex;

#endif

__uint64_t
 g_count;

pid_t
 gettid()

{

    return

syscall(SYS_gettid);

}

void

*run_amuck(void

*arg)

{

       int

i, j;

       printf("Thread
 %lu started.n",
 (unsigned long)gettid());

       for

(i = 0; i < 10000; i++) {

#ifdef
 USE_SPINLOCK

           pthread_spin_lock(&g_spin);

#else

               pthread_mutex_lock(&g_mutex);

#endif

               for

(j = 0; j < 100000; j++) {

                       if

(g_count++ == 123456789)

                               printf("Thread
 %lu wins!n",
 (unsigned long)gettid());

               }

#ifdef
 USE_SPINLOCK

           pthread_spin_unlock(&g_spin);

#else

               pthread_mutex_unlock(&g_mutex);

#endif

       }

       printf("Thread
 %lu finished!n",
 (unsigned long)gettid());

       return

(NULL);

}

int

main(int

argc, char

*argv[])

{

       int

i, threads = THREAD_NUM;

       printf("Creating
 %d threads...n",
 threads);

#ifdef
 USE_SPINLOCK

       pthread_spin_init(&g_spin,
 0);

#else

       pthread_mutex_init(&g_mutex,
 NULL);

#endif

       for

(i = 0; i < threads; i++)

               pthread_create(&g_thread[i],
 NULL, run_amuck, (void

*) i);

       for

(i = 0; i < threads; i++)

               pthread_join(g_thread[i],
 NULL);

       printf("Done.n");

       return

(0);

}

這個程式的特徵就是臨界區非常大，這樣兩個執行緒的鎖競爭會非常的劇烈。當然這個是一個極端情況，實際應用程式中臨界區不會如此大，鎖競爭也不會如此激烈。測試結果顯示mutex版本效能更好：

gchen@gchen-desktop:~/Workspace/mutex$
time

./spin

Creating
 2 threads...

Thread
 31796 started.

Thread
 31797 started.

Thread
 31797 wins!

Thread
 31797 finished!

Thread
 31796 finished!

Done.

 

real   
 0m5.748s

user   
 0m10.257s

sys   
 0m0.004s

 

gchen@gchen-desktop:~/Workspace/mutex$
time

./mutex

Creating
 2 threads...

Thread
 31801 started.

Thread
 31802 started.

Thread
 31802 wins!

Thread
 31802 finished!

Thread
 31801 finished!

Done.

 

real   
 0m4.823s

user   
 0m4.772s

sys   
 0m0.032s

另外一個值得注意的細節是spin lock耗費了更多的user time。這就是因為兩個執行緒分別執行在兩個核上，大部分時間只有一個執行緒能拿到鎖，所以另一個執行緒就一直在它執行的core上進行忙等待，CPU佔用率一直是100%；而mutex則不同，當對鎖的請求失敗後上下文切換就會發生，這樣就能空出一個核來進行別的運算任務了。（其實這種上下文切換對已經拿著鎖的那個執行緒效能也是有影響的，因為當該執行緒釋放該鎖時它需要通知作業系統去喚醒那些被阻塞的執行緒，這也是額外的開銷）

總結
（1）Mutex適合對鎖操作非常頻繁的場景，並且具有更好的適應性。儘管相比spin lock它會花費更多的開銷（主要是上下文切換），但是它能適合實際開發中複雜的應用場景，在保證一定效能的前提下提供更大的靈活度。

（2）spin lock的lock/unlock效能更好(花費更少的cpu指令)，但是它只適應用於臨界區執行時間很短的場景。而在實際軟體開發中，除非程式設計師對自己的程式的鎖操作行為非常的瞭解，否則使用spin lock不是一個好主意(通常一個多執行緒程式中對鎖的操作有數以萬次，如果失敗的鎖操作(contended lock requests)過多的話就會浪費很多的時間進行空等待)。

（3）更保險的方法或許是先（保守的）使用 Mutex，然後如果對效能還有進一步的需求，可以嘗試使用spin lock進行調優。畢竟我們的程式不像Linux kernel那樣對效能需求那麼高(Linux Kernel最常用的鎖操作是spin lock和rw lock)。

2010年3月3日補記：這個觀點在Oracle的文件中得到了支援：

During configuration, Berkeley DB selects a mutex implementation for the architecture. Berkeley DB normally prefers blocking-mutex implementations over non-blocking ones. For example, Berkeley DB will select POSIX pthread mutex interfaces rather than assembly-code test-and-set spin mutexes because pthread mutexes are usually more efficient and less likely to waste CPU cycles spinning without getting any work accomplished.

p.s.呼叫syscall(SYS_gettid)和syscall( __NR_gettid )都可以得到當前執行緒的id:)

【轉】spin lock 和mutex
2024-03-11
Mutex
spin_lock、spin_lock_irq、spin_lock_irqsave區別【轉】
2017-11-15
TAS 指令與PostgreSQL spin lock
2015-04-06
SQL
Go 併發程式設計之 Mutex
2020-11-15
Go程式設計Mutex
synchronized 與 Lock 的對比
2020-10-19
synchronized
併發程式設計之：Lock
2021-09-02
程式設計
JAVA程式設計習慣之equals對比
2020-10-30
Java程式設計
_gcry_ath_mutex_lock: Assertion `*lock == ((ath_mutex_t) 0)' failed.
2012-03-19
GCMutexAI
C# 針對特定的條件進行鎖操作，不用lock，而是mutex
2019-07-22
C#Mutex
mysql: __lll_mutex_lock_wait出現的分析
2014-11-18
MySqlMutexAI
Python並行程式設計(二)：多執行緒鎖機制利用Lock與RLock實現執行緒同步
2019-04-09
Python並行行程程式設計執行緒
Go併發程式設計--Mutex/RWMutex
2021-10-31
Go程式設計Mutex
SQL與Pandas大資料分析效能對比（Haki Benita）
2021-04-30
SQL大資料
Linux核心同步機制之（五）：Read Write spin lock【轉】
2019-03-07
Linux
並口與程式設計 (轉)
2008-01-09
程式設計
Java 多執行緒併發程式設計之互斥鎖 Reentrant Lock
2017-04-25
Java執行緒程式設計
遊戲架構設計——高效能並行程式設計
2022-06-01
遊戲架構並行行程程式設計
Java併發程式設計之鎖機制之Lock介面
2018-10-11
Java程式設計
Java非同步程式設計：CompletableFuture與Future的對比
2024-09-01
Java非同步程式設計
ASP.NET Core 程式內與程式外的效能對比
2020-09-09
ASP.NET
Mobx 與 Redux 的效能對比
2018-12-17
Redux
【Go進階—併發程式設計】Mutex
2022-02-27
Go程式設計Mutex
好程式設計師大資料學習路線之Logstach與flume對比
2019-08-13
程式設計師大資料
C++11中的mutex, lock, condition variable實現分析
2016-09-21
C++Mutex
程式分析與優化 - 10 指令級並行
2022-07-09
優化並行
流計算框架 Flink 與 Storm 的效能對比
2019-04-29
框架ORM
關於 x86_64 架構下 atomic、mutex、rwlock 的效能對比
2016-11-06
架構Mutex
Python：對程式做效能分析及計時統計
2022-11-27
Python
C++11多執行緒程式設計(二)——互斥鎖mutex用法
2020-12-06
C++執行緒程式設計Mutex
Docker中使用Xhprof 對程式碼進行效能分析
2021-09-18
Docker
【SQL】Oracle程式設計藝術指令碼學習之runsat（語句執行消耗對比）
2021-11-24
SQLOracle程式設計指令碼
5款Java效能分析工具的對比
2012-03-30
Java
Java併發程式設計：Lock
2020-12-23
Java程式設計
PostgreSQL spin 與 lwlock
2015-04-20
SQL
Groovy 2與Java的效能對比
2012-08-31
Java
MySQL與MongoDB設計例項對比QY
2022-03-21
MySqlMongoDB
對node工程進行壓力測試與效能分析
2018-08-26
Go 語言併發程式設計之互斥鎖詳解 sync.Mutex
2024-09-29
Go程式設計Mutex

Pthreads並行程式設計之spin lock與mutex效能對比分析

相關文章