MySQL案例-不同尋常的[ERROR]Can't create a new thread (errno 11)

wangwenan6發表於2017-09-20
-------------------------------------------------------------------------------------------------正文---------------------------------------------------------------------------------------------------------------

場景:
MySQL-5.7.17, 程式端報異常

點選(此處)摺疊或開啟

  1. OperationalError: (1135, "Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug")

結論:
肯定不是files open limit和innodb_open_files的問題~
PS: 是的話, 就沒有這篇部落格了~
先賣個關子~\(≧▽≦)/~


不同尋常的地方:
程式在建立約32300+的資料庫連線之後, 必定會出現連線異常, 清理掉部分連線以後, 會恢復正常, 但是再次到達32300+的數量之後, 問題還是會出現;

在測試環境必現, 5.7.17和5.7.19都有這個問題;



分析:
首先考慮到的就是files open limit和innodb_open_files的問題, 但是試過了, 都沒有用;

把記憶體相關的設定, files open之類的設定進行調整之後, 這個問題依然沒有解決, 感覺問題可能並沒有出在MySQL, 難道是系統層面的一些限制or bug?

遂編譯了5.7.19版本的MySQL, 開啟了debug, 並寫了一個簡單的python指令碼來hold 32300+的資料庫連線;


點選(此處)摺疊或開啟

  1. import MySQLdb
  2. import sys
  3. import time

  4. loop = 10000
  5. conn_list = []

  6. def my_conn(ip) :
  7.     return MySQLdb.connect(host=ip
  8.             ,port=3306
  9.             ,user='temp'
  10.             ,passwd='test')

  11. def conn_test(ip) :
  12.     for i in range(1,loop) :
  13.         conn = my_conn(ip)
  14.         conn_list.append(conn)
  15.     num = 0
  16.     while(True) :
  17.         print num
  18.         if num == loop - 1 :
  19.             num = 0
  20.             time.sleep(10)
  21.         num = num + 1
  22.         time.sleep(1)
  23.     print rst
  24.     return True


  25. if __name__ == '__main__' :
  26.     conn_test("192.168.1.1")

多次嘗試之下, 確認在建立到第32373個連線時一定會報錯, 那麼看一下mysql trace:

這是出問題的時候的資訊:

點選(此處)摺疊或開啟

  1. T@0: >Per_thread_connection_handler::add_connection
  2. T@0: | >my_raw_malloc
  3. T@0: | | my: size: 232 my_flags: 16
  4. T@0: | | exit: ptr: 0x44046ec0
  5. T@0: | <my_raw_malloc 219
  6. T@0: | >my_free
  7. T@0: | | my: ptr: 0x44046ec0
  8. T@0: | <my_free 292
  9. T@0: >Per_thread_connection_handler::add_connection

這是正常的時候:

點選(此處)摺疊或開啟

  1. T@0: >Per_thread_connection_handler::add_connection
  2. T@0: | >my_raw_malloc
  3. T@0: | | my: size: 232 my_flags: 16
  4. T@0: | | exit: ptr: 0x4238d9c0
  5. T@0: | <my_raw_malloc 219
  6. T@0: | info: Thread created
  7. T@0: <Per_thread_connection_handler::add_connection 425

那麼確實如錯誤資訊描述一般, mysql在建立新連線的時候遇到了問題,
具體的來說, 是在申請完mysql建立connection需要的記憶體之後, 發生了問題, 所以釋放掉了這一部分記憶體, 並丟擲異常;
那麼看看在這個方法裡面, mysql在幹嘛:

點選(此處)摺疊或開啟

  1. connection_handler_per_thread.cc

  2. bool Per_thread_connection_handler::add_connection(Channel_info* channel_info)
  3. {
  4.   int error= 0;
  5.   my_thread_handle id;

  6.   DBUG_ENTER("Per_thread_connection_handler::add_connection");

  7.   // Simulate thread creation for test case before we check thread cache
  8.   DBUG_EXECUTE_IF("fail_thread_create", error= 1; goto handle_error;);

  9.   if (!check_idle_thread_and_enqueue_connection(channel_info))
  10.     DBUG_RETURN(false);

  11.   /*
  12.     There are no idle threads avaliable to take up the new
  13.     connection. Create a new thread to handle the connection
  14.   */
  15.   channel_info->set_prior_thr_create_utime();
  16.   error= mysql_thread_create(key_thread_one_connection, &id,     //<----在這裡, error不是0
  17.                              &connection_attrib,
  18.                              handle_connection,
  19.                              (void*) channel_info);
  20. #ifndef DBUG_OFF
  21. handle_error:
  22. #endif // !DBUG_OFF

  23.   if (error)                                                     //<----所以進入了這個if邏輯
  24.   {
  25.     connection_errors_internal++;
  26.     if (!create_thd_err_log_throttle.log())
  27.       sql_print_error("Can't create thread to handle new connection(errno= %d)",
  28.                       error);
  29.     channel_info->send_error_and_close_channel(ER_CANT_CREATE_THREAD,
  30.                                                error, true);
  31.     Connection_handler_manager::dec_connection_count();
  32.     DBUG_RETURN(true);
  33.   }

  34.   Global_THD_manager::get_instance()->inc_thread_created();
  35.   DBUG_PRINT("info",("Thread created"));
  36.   DBUG_RETURN(false);
  37. }


既然是mysql_thread_create出了問題, 那繼續往下追蹤, 通過各種def的轉換, 最終到了這段程式碼;
PS: trace中的my_free是一個很重要的資訊, 通過這個資訊可以確認到並不是MySQL自身的程式碼出現了問題~


點選(此處)摺疊或開啟

  1. my_thread.c

  2. int my_thread_create(my_thread_handle *thread, const my_thread_attr_t *attr,
  3.                      my_start_routine func, void *arg)
  4. {
  5. #ifndef _WIN32
  6.   return pthread_create(&thread->thread, attr, func, arg);
  7. #else
  8.   ......
  9. }

可以發現, 從add_connection開始, 一路呼叫各種方法, 最終error的返回值是由pthread_create決定的;

而出問題的這個方法, 其實是glibc的函式, 就算在gdb中進行除錯, 也無法看到具體的程式碼, 如果hold住32000+連線後再用gdb除錯, 那效率真是....
(╯‵□′)╯︵┻━┻

於是在google搜了一下pthread_create方法和Can't create a new thread的關鍵字, 找到一些資訊, 大體上的說法就是一些Linux系統層面的引數會限制可建立的執行緒數;

似乎是有些眉目了, 於是仔細找了一圈, 發現一個比較早的討論帖, 正好是在討論不能建立32K連線數的問題;
相關連結:
https://listman.redhat.com/archives/phil-list/2003-August/msg00005.html
https://listman.redhat.com/archives/phil-list/2003-August/msg00010.html
https://listman.redhat.com/archives/phil-list/2003-August/msg00025.html

把討論的內容貼過來:

點選(此處)摺疊或開啟

  1. Hi,
    
    I was using the 'thread-limit' program from http://people.redhat.com/alikins/tuning_utils/thread-limit.c to test the
    number of threads it could create.  It seems that it was always hitting
    some limit at 32K threads (cannot create thread 32762, to be exact). The
    error is ENOMEM.  Here's the kernel/ulimit settings,
    
    /proc/sys/kernel/pid_max 300000
    /proc/sys/kernel/threads-max 100000
    
    ulimit -a
    core file size        (blocks, -c) 0
    data seg size         (kbytes, -d) unlimited
    file size             (blocks, -f) unlimited
    max locked memory     (kbytes, -l) unlimited
    max memory size       (kbytes, -m) unlimited
    open files                    (-n) 100000
    pipe size          (512 bytes, -p) 8
    stack size            (kbytes, -s) 32
    cpu time             (seconds, -t) unlimited
    max user processes            (-u) 100000
    virtual memory        (kbytes, -v) unlimited
    
    It gave the same result on both a Debian 3 box with NPTL 0.56 compiled
    with gcc 3.4 CVS and GlibC CVS, kernel 2.5.70, and vanilla Redhat 9.
    
    I know I must be missing something because 100K threads with NPTL was
    reported.  Thanks.
    
    -- 
    Feng Zhou
    Graduate Student,
    CS Division, U.C. Berkeley http://www.cs.berkeley.edu/~zf/



點選(此處)摺疊或開啟

  1. 
    
    
    On Fri, 7 Aug 2003, Feng Zhou wrote:
    
    > I was using the 'thread-limit' program from
    > http://people.redhat.com/alikins/tuning_utils/thread-limit.c to test the
    > number of threads it could create.  It seems that it was always hitting
    > some limit at 32K threads (cannot create thread 32762, to be exact). The
    > error is ENOMEM.  Here's the kernel/ulimit settings,
    
    what is the current value of your /proc/sys/vm/max_map_count tunable? Can
    you max out RAM if you double the current limit?
    
    	Ingo


點選(此處)摺疊或開啟

  1. Yes, that's it.  Actually I have to change MAX_MAP_COUNT in
    include/linux/sched.h and recompile the 2.5.70 kernel because it doesn't
    have such a sysctl file.  After doubling the value from 65536 to 131072,
    I can create 65530 thread before it fails with ENOMEM.  
    
    BTW, the system begins thrashing at around 63000 threads, where resident
    set of the process is around 250MB.  This makes sense to me because each
    empty thread actually uses the first 4K page in its 16K stack.  Given
    the system has 1GB of physical memory.  The kernel memory each thread
    uses seems to be around 12KB ((1GB-250MB)/63000).
    
    - Feng Zhou
    
    On Mon, 2003-08-11 at 02:29, Ingo Molnar wrote:
    > On Fri, 7 Aug 2003, Feng Zhou wrote:
    > 
    > > I was using the 'thread-limit' program from
    > > http://people.redhat.com/alikins/tuning_utils/thread-limit.c to test the
    > > number of threads it could create.  It seems that it was always hitting
    > > some limit at 32K threads (cannot create thread 32762, to be exact). The
    > > error is ENOMEM.  Here's the kernel/ulimit settings,
    > 
    > what is the current value of your /proc/sys/vm/max_map_count tunable? Can
    > you max out RAM if you double the current limit?
    > 
    > 	Ingo
    > 

事實上, 就如這個討論所言,  在調整了
max_map_count的設定之後, mysql也可以建立超過32000+的連線了!

那麼這個引數調高以後, 有什麼影響呢?
這篇文章有提到這個引數的影響: https://www.novell.com/support/kb/doc.php?id=7000830

擷取其中的重要部分:

點選(此處)摺疊或開啟

  1. How are they affected? Well, since there will be more elements in the VM red-black tree, all operations on the VMA will take longer. The slow-down of most operations is logarithmic, e.g. further mmap's, munmap's et al. as well as handling page faults (both major and minor). Some operations will slow down linearly, e.g. copying the VMAs when a new process is forked.

  2. In short, there is absolutely no impact on memory footprint or performance for processes which use the same number of maps. On the other hand, processes where one of the memory mapping functions would have failed with ENOMEM because of hitting the limit, will now be allowed to consume the additional kernel memory with all the implications described above.


結論:
所以調高vm.max_map_count之後, 程式就不會再丟擲異常了~
從實際結果測試來看, 預設的65535能支援32000+的連結, 翻倍的話, 應該能支援雙倍的連結上限





來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29510932/viewspace-2145248/,如需轉載,請註明出處,否則將追究法律責任。

相關文章