奇怪的Python本地執行緒Python's Thread Locals Are Weird

garfielder007發表於2016-05-05

The Weirdness

What do you think this script prints?:

import thread, threading, sys

class Weeper(object):
    def __del__(self):
        sys.stdout.write('oh cruel world %s\n' % thread.get_ident())

local = threading.local()

def target():
    local.weeper = Weeper()

t = threading.Thread(target=target)
t.start()
t.join()
sys.stdout.write('done %s\n' % thread.get_ident())
getattr(local, 'whatever', None)

If you guessed something like this:

oh cruel world 4475731968
done 140735297751392

...then you'd be right, in Python after 2.7.1. In Python 2.7.0 and older (including the whole 2.6 series), the order of messages is reversed:

done 140735297751392
oh cruel world 140735297751392

In New Python, the Weeper is deleted as soon as its thread dies, and __del__ runs on the dying thread. In Old Python, the Weeper isn't deleted until the thread is dead and a different thread accesses the local's __dict__. Thus the Weeper is deleted at the line getattr(local, 'whatever', None), after the thread dies, and Weeper.__del__ runs on the main thread.

What if we remove the getattr call? In Old Python, this happens:

done 140735297751392
Exception AttributeError: "'NoneType' object has no attribute 'get_ident'"
    in <bound method Weeper.__del__ of <__main__.Weeper object at 0x104f95590>>
    ignored

Without getattr, the Weeper isn't deleted until interpreter shutdown. The shutdown sequence is complex and hard to predict—in this case the threadmodule has been set to None before the Weeper is deleted, so Weeper.__del__ can't do thread.get_ident().

Thread Locals in Old Python

To understand why locals act this way in Old Python, let's look at the implementation in C. The core interpreter's PyThreadState struct has a dictattribute, and each threading.local object has a key attribute formatted like"thread.local.<memory address of self>". Each local has a __dict__ of attributes per thread, stored in PyThreadState's dict with the local's key.

threadmodule.c includes a function _ldict(localobject *self) which takes a local and finds its __dict__ for the current thread. _ldict() finds and returns the local's __dict__ for this thread, and stores it in self->dict.

This architecture has, in my opinion, a bug. Here's the implementation of_ldict():

static PyObject * _ldict(localobject *self)
{
    PyObject *tdict = PyThreadState_GetDict(); // get PyThreadState->dict for this thread
    PyObject *ldict = PyDict_GetItem(tdict, self->key);
    if (ldict == NULL) {
        ldict = PyDict_New(); /* we own ldict */
        PyDict_SetItem(tdict, self->key, ldict);
        Py_CLEAR(self->dict);
        Py_INCREF(ldict);
        self->dict = ldict; /* still borrowed */

        if (Py_TYPE(self)->tp_init != PyBaseObject_Type.tp_init) {
            Py_TYPE(self)->tp_init((PyObject*)self, self->args, self->kw);
        }
    }

    /* The call to tp_init above may have caused another thread to run.
       Install our ldict again. */
    if (self->dict != ldict) {
        Py_CLEAR(self->dict);
        Py_INCREF(ldict);
        self->dict = ldict;
    }

    return ldict;
}

I've edited for brevity. There's a few interesting things here—one is the check for a custom __init__ method. If this object is a subclass of local which overrides __init__, then __init__ is called whenever a new thread accesses this local's attributes for the first time.

But the main thing I'm showing you is the two calls to Py_CLEAR(self->dict), which decrements self->dict's refcount. It's called when a thread accesses this local's attributes for the first time, or if this thread is accessing the local's attributes after a different thread has accessed them—that is, if self->dict != ldict.

So now we clearly understand why a thread's locals aren't deleted immediately after it dies:

The worker thread stores a Weeper in local.weeper. _ldict() creates a new __dict__ for this thread and stores it as a value in PyThreadState->dict, and stores it in local->dict. So there are two references to this thread's __dict__: one from PyThreadState, one from local.
The worker thread dies, and the interpreter deletes its PyThreadState. Now there's one reference to the dead thread's __dict__: local->dict.
Finally, we do getattr(local, 'whatever', None) from the main thread. In_ldict(), self->dict != ldict, so self->dict is dereferenced and replaced with the main thread's __dict__. Now the dead thread's __dict__ has finally been completely dereferenced, and the Weeper is deleted.

The bug is that _ldict() both returns the local's __dict__ for the current thread, and stores a reference to it. This is why the __dict__ isn't deleted as soon as its thread dies: there's a useless but persistent reference to the__dict__ until another thread comes along and clears it.

Thread Locals in New Python

In New Python, the architecture's a little more complex. EachPyThreadState's dict contains a dummy for each local, and each local holds a dict mapping weak references of dummies to a per-thread __dict__.

When a thread is dying and its PyThreadState is deleted, weakref callbacks fire immediately on that thread, removing the thread's __dict__ for each local. Conversely, when a local is deleted, it removes its dummy fromPyThreadState->dict.

_ldict() in New Python acts more sanely than in Old Python. It finds the current thread's dummy in the PyThreadState, and gets the __dict__ for this thread from the dummy. But unlike in Old Python, it doesn't store a extra reference to __dict__ anywhere. It simply returns it:

static PyObject * _ldict(localobject *self)
{
    PyObject *tdict, *ldict, *dummy;
    tdict = PyThreadState_GetDict();
    dummy = PyDict_GetItem(tdict, self->key);
    if (dummy == NULL) {
        ldict = _local_create_dummy(self);
        if (Py_TYPE(self)->tp_init != PyBaseObject_Type.tp_init) {
            Py_TYPE(self)->tp_init((PyObject*)self, self->args, self->kw);
        }
    } else {
        ldict = ((localdummyobject *) dummy)->localdict;
    }

    return ldict;
}

This whole weakrefs-to-dummies technique is, apparently, intended to deal with some cyclic garbage collection problem I don't understand very well. I believe the real reason why New Python acts as expected when executing my script, and why Old Python acts weird, is that Old Python stores the extra useless reference to the __dict__ and New Python does not.

Update: I finally found the bug reports that describe Old Python's weirdness and 2.7.1's solution. See:

from: https://emptysqua.re/blog/pythons-thread-locals-are-weird/

python 多執行緒之thread
2014-04-01
Python執行緒thread
python多執行緒之從Thread類繼承
2019-02-27
Python執行緒thread繼承
Python模組學習：thread 多執行緒處理
2015-05-28
Pythonthread執行緒
Thread（執行緒）
2018-03-06
thread執行緒
Python多執行緒之_thread與threading模組
2023-05-11
Python執行緒thread
什麼是Python執行緒?Python執行緒如何建立?
2021-08-18
Python執行緒
Python的執行緒池
2018-12-15
Python執行緒
python中的執行緒
2018-02-09
Python執行緒
Java 中的執行緒 thread
2020-06-21
Java執行緒thread
python之多執行緒
2018-08-16
Python執行緒
python多執行緒
2017-12-26
Python執行緒
Python執行緒指南
2015-09-16
Python執行緒
Python 多執行緒
2016-05-24
Python執行緒
python多執行緒程式設計1— python對多執行緒的支援
2013-04-02
Python執行緒程式設計
通過transmittable-thread-local原始碼理解執行緒池執行緒本地變數傳遞的原理
2020-05-02
MITthread原始碼執行緒變數
python中的執行緒池
2021-09-11
Python執行緒
理解 Python 中的執行緒
2013-11-26
Python執行緒
理解Python中的執行緒
2016-06-02
Python執行緒
kernel-執行緒thread
2014-12-15
執行緒thread
Python3 多執行緒程式設計（thread、threading模組)
2017-07-17
Python執行緒程式設計thread
python基礎執行緒-管理併發執行緒
2020-09-27
Python執行緒
python多執行緒中：如何關閉執行緒？
2024-03-13
Python執行緒
Python程式VS執行緒
2019-02-19
Python執行緒
python之執行緒鎖
2019-02-16
Python執行緒
python 程式、執行緒（二）
2021-09-09
Python執行緒
python 程式、執行緒（一）
2021-09-09
Python執行緒
python執行緒筆記
2017-03-19
Python執行緒筆記
python執行緒池示例
2017-08-25
Python執行緒
python多執行緒示例
2016-07-07
Python執行緒
Python 執行緒池使用
2024-06-14
Python執行緒
python執行緒池的實現
2018-10-07
Python執行緒
Thread執行緒終止interrupt
2021-09-09
thread執行緒
new Thread與執行緒建立
2018-09-06
thread執行緒
【 Thread】建立執行緒的2種方法
2021-09-09
thread執行緒
thread 描述執行緒的一個類
2017-02-05
thread執行緒
python3 多執行緒
2019-10-18
Python執行緒
04.python-多執行緒
2019-06-04
Python執行緒
python有多執行緒嗎
2021-09-11
Python執行緒

奇怪的Python本地執行緒Python's Thread Locals Are Weird

The Weirdness

Thread Locals in Old Python

Thread Locals in New Python

相關文章