JVM原始碼分析之一個Java程式究竟能建立多少執行緒

概述

雖然這篇文章的標題打著JVM原始碼分析的旗號，不過本文不僅僅從JVM原始碼角度來分析，更多的來自於Linux Kernel的原始碼分析，今天要說的是JVM裡比較常見的一個問題

這個問題可能有幾種表述

一個Java程式到底能建立多少執行緒？
到底有哪些因素決定了能建立多少執行緒？
java.lang.OutOfMemoryError: unable to create new native thread的異常究竟是怎麼回事

不過我這裡先宣告下可能不能完全百分百將各種因素都理出來，因為畢竟我不是做Linux Kernel開發的，還有不少細節沒有注意到的，我將我能分析到的因素和大家分享一下，如果大家在平時工作中還碰到別的因素，歡迎在文章下面留言，讓更多人蔘與進來討論

從JVM說起

執行緒大家都熟悉，new Thread().start()即會建立一個執行緒，這裡我首先指出一點new Thread()其實並不會建立一個真正的執行緒，只有在呼叫了start方法之後才會建立一個執行緒，這個大家分析下Java程式碼就知道了，Thread的建構函式是純Java程式碼，start方法會調到一個native方法start0裡，而start0其實就是JVM_StartThread這個方法

JVM_ENTRY(void, JVM_StartThread(JNIEnv* env, jobject jthread))

  ...
          
      // We could also check the stillborn flag to see if this thread was already stopped, but
      // for historical reasons we let the thread detect that itself when it starts running

      jlong size =
             java_lang_Thread::stackSize(JNIHandles::resolve_non_null(jthread));
      // Allocate the C++ Thread structure and create the native thread.  The
      // stack size retrieved from java is signed, but the constructor takes
      // size_t (an unsigned type), so avoid passing negative values which would
      // result in really large stacks.
      size_t sz = size > 0 ? (size_t) size : 0;
      native_thread = new JavaThread(&thread_entry, sz);
        
  ...    

  if (native_thread->osthread() == NULL) {
    ...
    THROW_MSG(vmSymbols::java_lang_OutOfMemoryError(),
              "unable to create new native thread");
  }

  Thread::start(native_thread);

JVM_END

從上面程式碼裡首先要大家關注下最後的那個if判斷if (native_thread->osthread() == NULL) ，如果osthread為空，那將會丟擲大家比較熟悉的unable to create new native thread OOM異常，因此osthread為空非常關鍵，後面會看到什麼情況下osthread會為空

另外大家應該注意到了native_thread = new JavaThread(&thread_entry, sz)，在這裡才會真正建立一個執行緒

JavaThread::JavaThread(ThreadFunction entry_point, size_t stack_sz) :
  Thread()
#ifndef SERIALGC
  , _satb_mark_queue(&_satb_mark_queue_set),
  _dirty_card_queue(&_dirty_card_queue_set)
#endif // !SERIALGC
{
  if (TraceThreadEvents) {
    tty->print_cr("creating thread %p", this);
  }
  initialize();
  _jni_attach_state = _not_attaching_via_jni;
  set_entry_point(entry_point);
  // Create the native thread itself.
  // %note runtime_23
  os::ThreadType thr_type = os::java_thread;
  thr_type = entry_point == &compiler_thread_entry ? os::compiler_thread :
                                                     os::java_thread;
  os::create_thread(this, thr_type, stack_sz);

}

上面程式碼裡的os::create_thread(this, thr_type, stack_sz)會通過pthread_create來建立執行緒，而Linux下對應的實現如下：

bool os::create_thread(Thread* thread, ThreadType thr_type, size_t stack_size) {
  assert(thread->osthread() == NULL, "caller responsible");

  // Allocate the OSThread object
  OSThread* osthread = new OSThread(NULL, NULL);
  if (osthread == NULL) {
    return false;
  }

  // set the correct thread state
  osthread->set_thread_type(thr_type);

  // Initial state is ALLOCATED but not INITIALIZED
  osthread->set_state(ALLOCATED);

  thread->set_osthread(osthread);

  // init thread attributes
  pthread_attr_t attr;
  pthread_attr_init(&attr);
  pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);

  // stack size
  if (os::Linux::supports_variable_stack_size()) {
    // calculate stack size if it`s not specified by caller
    if (stack_size == 0) {
      stack_size = os::Linux::default_stack_size(thr_type);

      switch (thr_type) {
      case os::java_thread:
        // Java threads use ThreadStackSize which default value can be
        // changed with the flag -Xss
        assert (JavaThread::stack_size_at_create() > 0, "this should be set");
        stack_size = JavaThread::stack_size_at_create();
        break;
      case os::compiler_thread:
        if (CompilerThreadStackSize > 0) {
          stack_size = (size_t)(CompilerThreadStackSize * K);
          break;
        } // else fall through:
          // use VMThreadStackSize if CompilerThreadStackSize is not defined
      case os::vm_thread:
      case os::pgc_thread:
      case os::cgc_thread:
      case os::watcher_thread:
        if (VMThreadStackSize > 0) stack_size = (size_t)(VMThreadStackSize * K);
        break;
      }
    }

    stack_size = MAX2(stack_size, os::Linux::min_stack_allowed);
    pthread_attr_setstacksize(&attr, stack_size);
  } else {
    // let pthread_create() pick the default value.
  }

  // glibc guard page
  pthread_attr_setguardsize(&attr, os::Linux::default_guard_size(thr_type));

  ThreadState state;

  {
    // Serialize thread creation if we are running with fixed stack LinuxThreads
    bool lock = os::Linux::is_LinuxThreads() && !os::Linux::is_floating_stack();
    if (lock) {
      os::Linux::createThread_lock()->lock_without_safepoint_check();
    }

    pthread_t tid;
    int ret = pthread_create(&tid, &attr, (void* (*)(void*)) java_start, thread);

    pthread_attr_destroy(&attr);

    if (ret != 0) {
      if (PrintMiscellaneous && (Verbose || WizardMode)) {
        perror("pthread_create()");
      }
      // Need to clean up stuff we`ve allocated so far
      thread->set_osthread(NULL);
      delete osthread;
      if (lock) os::Linux::createThread_lock()->unlock();
      return false;
    }

    // Store pthread info into the OSThread
    osthread->set_pthread_id(tid);
     ...
  }
   ...
  return true;
}

如果在new OSThread的過程中就失敗了，那顯然osthread為NULL，那再回到上面第一段程式碼，此時會丟擲java.lang.OutOfMemoryError: unable to create new native thread的異常，而什麼情況下new OSThread會失敗，比如說記憶體不夠了，而這裡的記憶體其實是C Heap，而非Java Heap，由此可見從JVM的角度來說，影響執行緒建立的因素包括了Xmx，MaxPermSize，MaxDirectMemorySize，ReservedCodeCacheSize等，因為這些引數會影響剩餘的記憶體

另外注意到如果pthread_create執行失敗，那通過thread->set_osthread(NULL)會設定空值，這個時候osthread也為NULL，因此也會丟擲上面的OOM異常，導致建立執行緒失敗，因此接下來要分析下pthread_create失敗的因素

glibc中的pthread_create

`stack_size`

pthread_create的實現在glibc裡，

int
__pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
              void *(*start_routine) (void *), void *arg)
{
  STACK_VARIABLES;

  const struct pthread_attr *iattr = (struct pthread_attr *) attr;
  struct pthread_attr default_attr;
  ...
  struct pthread *pd = NULL;
  int err = ALLOCATE_STACK (iattr, &pd);
  int retval = 0;

  if (__glibc_unlikely (err != 0))
    /* Something went wrong.  Maybe a parameter of the attributes is
       invalid or we could not allocate memory.  Note we have to
       translate error codes.  */
    {
      retval = err == ENOMEM ? EAGAIN : err;
      goto out;
    }
    
    ...
   
  }

上面我主要想說的一段程式碼是int err = ALLOCATE_STACK (iattr, &pd)，顧名思義就是分配執行緒棧，簡單來說就是根據iattr裡指定的stackSize，通過mmap分配一塊記憶體出來給執行緒作為棧使用

那我們來說說stackSize，這個大家應該都明白，執行緒要執行，要有一些棧空間，試想一下，如果分配棧的時候記憶體不夠了，是不是建立肯定失敗？而stackSize在JVM下是可以通過-Xss指定的，當然如果沒有指定也有預設的值，下面是JDK6之後(含)預設值的情況

// return default stack size for thr_type
size_t os::Linux::default_stack_size(os::ThreadType thr_type) {
  // default stack size (compiler thread needs larger stack)
#ifdef AMD64
  size_t s = (thr_type == os::compiler_thread ? 4 * M : 1 * M);
#else
  size_t s = (thr_type == os::compiler_thread ? 2 * M : 512 * K);
#endif // AMD64
  return s;
}

估計不少人有一個疑問，棧記憶體到底屬於-Xmx控制的Java Heap裡的部分嗎，這裡明確告訴大家不屬於，因此從glibc的這塊邏輯來看，JVM裡的Xss也是影響執行緒建立的一個非常重要的因素。

Linux Kernel裡的clone

如果棧分配成功，那接下來就要建立執行緒了，大概邏輯如下

retval = create_thread (pd, iattr, true, STACK_VARIABLES_ARGS,
                  &thread_ran);

而create_thread其實是呼叫的系統呼叫clone

const int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SYSVSEM
               | CLONE_SIGHAND | CLONE_THREAD
               | CLONE_SETTLS | CLONE_PARENT_SETTID
               | CLONE_CHILD_CLEARTID
               | 0);

  TLS_DEFINE_INIT_TP (tp, pd);

  if (__glibc_unlikely (ARCH_CLONE (&start_thread, STACK_VARIABLES_ARGS,
                    clone_flags, pd, &pd->tid, tp, &pd->tid)
            == -1))
    return errno;

系統呼叫這塊就切入到了Linux Kernel裡

clone系統呼叫最終會呼叫do_fork方法，接下來通過剖解這個方法來分析Kernel裡還存在哪些因素

`max_user_processes`

   retval = -EAGAIN;
    if (atomic_read(&p->real_cred->user->processes) >=
            task_rlimit(p, RLIMIT_NPROC)) {
        if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE) &&
            p->real_cred->user != INIT_USER)
            goto bad_fork_free;
    }

先看這麼一段，這裡其實就是判斷使用者的程式數有多少，大家知道在linux下，程式和執行緒其資料結構都是一樣的，因此這裡說的程式數可以理解為輕量級執行緒數，而這個最大值是可以通過ulimit -u可以查到的，所以如果當前使用者起的執行緒數超過了這個限制，那肯定是不會建立執行緒成功的，可以通過ulimit -u value來修改這個值

`max_map_count`

在這個過程中不乏有malloc的操作，底層是通過系統呼叫brk來實現的，或者上面提到的棧是通過mmap來分配的，不管是malloc還是mmap，在底層都會有類似的判斷

if (mm->map_count > sysctl_max_map_count)
        return -ENOMEM;

如果程式被分配的記憶體段超過sysctl_max_map_count就會失敗，而這個值在linux下對應/proc/sys/vm/max_map_count，預設值是65530，可以通過修改上面的檔案來改變這個閾值

`max_threads`

還存在max_threads的限制，程式碼如下

/*
     * If multiple threads are within copy_process(), then this check
     * triggers too late. This doesn`t hurt, the check is only there
     * to stop root fork bombs.
     */
    retval = -EAGAIN;
    if (nr_threads >= max_threads)
        goto bad_fork_cleanup_count;

如果要修改或者檢視可以通過/proc/sys/kernel/threads-max來操作，
這個值是受到實體記憶體的限制，在fork_init的時候就計算好了

    /*
     * The default maximum number of threads is set to a safe
     * value: the thread structures can take up at most half
     * of memory.
     */
    max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);

    /*
     * we need to allow at least 20 threads to boot a system
     */
    if(max_threads < 20)
        max_threads = 20;

`pid_max`

pid也存在限制

   if (pid != &init_struct_pid) {
        retval = -ENOMEM;
        pid = alloc_pid(p->nsproxy->pid_ns);
        if (!pid)
            goto bad_fork_cleanup_io;

        if (clone_flags & CLONE_NEWPID) {
            retval = pid_ns_prepare_proc(p->nsproxy->pid_ns);
            if (retval < 0)
                goto bad_fork_free_pid;
        }
    }

而alloc_pid的定義如下

struct pid *alloc_pid(struct pid_namespace *ns)
{
    struct pid *pid;
    enum pid_type type;
    int i, nr;
    struct pid_namespace *tmp;
    struct upid *upid;

    pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL);
    if (!pid)
        goto out;

    tmp = ns;
    for (i = ns->level; i >= 0; i--) {
        nr = alloc_pidmap(tmp);
        if (nr < 0)
            goto out_free;

        pid->numbers[i].nr = nr;
        pid->numbers[i].ns = tmp;
        tmp = tmp->parent;
    }
    ...
}

在alloc_pidmap中會判斷pid_max,而這個值的定義如下


/*
 * This controls the default maximum pid allocated to a process
 */
#define PID_MAX_DEFAULT (CONFIG_BASE_SMALL ? 0x1000 : 0x8000)

/*
 * A maximum of 4 million PIDs should be enough for a while.
 * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.]
 */
#define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : 
    (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))
    
int pid_max = PID_MAX_DEFAULT;

#define RESERVED_PIDS        300

int pid_max_min = RESERVED_PIDS + 1;
int pid_max_max = PID_MAX_LIMIT;

這個值可以通過/proc/sys/kernel/pid_max來檢視或者修改

總結

通過對JVM，glibc，Linux kernel的原始碼分析，我們暫時得出了一些影響執行緒建立的因素，包括

JVM：Xmx，Xss，MaxPermSize，MaxDirectMemorySize，ReservedCodeCacheSize等
Kernel：max_user_processes，max_map_count，max_threads，pid_max等

由於對kernel的原始碼研讀時間有限，不一定總結完整，大家可以補充

JVM原始碼分析之一個Java程式究竟能建立多少執行緒

概述

從JVM說起

glibc中的pthread_create

stack_size

Linux Kernel裡的clone

max_user_processes

max_map_count

max_threads

pid_max

總結

相關文章

`stack_size`

`max_user_processes`

`max_map_count`

`max_threads`

`pid_max`