7.Python3原始碼—Dict物件

whj0709發表於2018-06-06

7.1. 雜湊表

雜湊表的基本思想,是通過一定的函式將需搜尋的鍵值對映為一個整數,將這個整數視為索引值去訪問某片連續的記憶體區域。理論上,在最優情況下,雜湊表能提供O(1)複雜度的搜尋效率。

用於對映的函式稱為雜湊函式(hash function),而對映後的值稱為元素的雜湊值(hash value)。在雜湊表的實現中,所選擇的雜湊函式的優劣將直接決定所實現的雜湊表的搜尋效率的高低。

在使用雜湊表的過程中,不同的物件經過雜湊函式的作用,可能被對映為相同的雜湊值。而且隨著需要儲存的資料的增多,這樣的衝突就會發生得越來越頻繁。雜湊衝突是雜湊技術與生俱來的問題。

裝載率是雜湊表中已使用空間和總空間的比值。如果雜湊表一共可以容納10個元素,而當前已經裝入了6個元素,那麼裝載率就是6/10。研究表明,當雜湊表的裝載率大於2/3時,雜湊衝突發生的概率就會大大增加。

當產生雜湊衝突時,Python會通過一個二次探測函式f,計算下一個候選位置addr,如果位置addr可用,則可將待插入元素放到位置addr;如果位置addr不可用,則Python會再次使用探測函式f,獲得下一個候選位置,如此不斷探測,總會找到一個可用的位置。

當需要刪除某條探測鏈上的某個元素時,問題就產生了。假如這條鏈的首元素位置為a,尾元素的位置為c,現在需要刪除中間的某個位置b上的元素。如果直接將位置b上的元素刪除,則會導致探測鏈的斷裂,從而找不到c。所以,在採用開放定址的衝突解決策略的雜湊表中,刪除某條探測鏈上的元素時不能進行真正的刪除,而是進行一種“偽刪除”操作,必須要讓該元素還存在於探測鏈上,擔當承前啟後的重任。

7.2. Dict物件資料結構

7.2.1. Dict物件的C原始碼

PyDictObject的C原始碼如下:

// dictobject.h
typedef struct _dictkeysobject PyDictKeysObject;

/* The ma_values pointer is NULL for a combined table
 * or points to an array of PyObject* for a split table
 */
typedef struct {
    PyObject_HEAD

    /* Number of items in the dictionary */
    Py_ssize_t ma_used;

    /* Dictionary version: globally unique, value change each time the dictionary is modified */
    uint64_t ma_version_tag;

    PyDictKeysObject *ma_keys;

    /* If ma_values is NULL, the table is "combined": keys and values are stored in ma_keys.
       If ma_values is not NULL, the table is splitted: keys are stored in ma_keys and values are stored in ma_values */
    PyObject **ma_values;
} PyDictObject;

PyDictObject結構裡最重要的一個屬性是ma_keys,在combined table模式下ma_keys存著key和value。ma_keys為PyDictKeysObject 型別,PyDictKeysObject的C原始碼如下:

// dict-common.h
typedef struct {
    /* Cached hash code of me_key. */
    Py_hash_t me_hash;
    PyObject *me_key;
    PyObject *me_value; /* This field is only meaningful for combined tables */
} PyDictKeyEntry;

/* dict_lookup_func() returns index of entry which can be used like DK_ENTRIES(dk)[index].
 * -1 when no entry found, -3 when compare raises error.
 */
typedef Py_ssize_t (*dict_lookup_func)
    (PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject **value_addr);

/* See dictobject.c for actual layout of DictKeysObject */
struct _dictkeysobject {
    Py_ssize_t dk_refcnt;

    /* Size of the hash table (dk_indices). It must be a power of 2. */
    Py_ssize_t dk_size;

    /* Function to lookup in the hash table (dk_indices): */
    dict_lookup_func dk_lookup;

    /* Number of usable entries in dk_entries. */
    Py_ssize_t dk_usable;

    /* Number of used entries in dk_entries. */
    Py_ssize_t dk_nentries;

    /* Actual hash table of dk_size entries. It holds indices in dk_entries,
       or DKIX_EMPTY(-1) or DKIX_DUMMY(-2).

       Indices must be: 0 <= indice < USABLE_FRACTION(dk_size).

       The size in bytes of an indice depends on dk_size:

       - 1 byte if dk_size <= 0xff (char*)
       - 2 bytes if dk_size <= 0xffff (int16_t*)
       - 4 bytes if dk_size <= 0xffffffff (int32_t*)
       - 8 bytes otherwise (int64_t*)

       Dynamically sized, 8 is minimum. */
    union {
        int8_t as_1[8];
        int16_t as_2[4];
        int32_t as_4[2];
#if SIZEOF_VOID_P > 4
        int64_t as_8[1];
#endif
    } dk_indices;
};

7.2.2. PyDictKeysObject記憶體佈局

記憶體佈局如下:

+---------------+
| dk_refcnt     |
| dk_size       |
| dk_lookup     |
| dk_usable     |
| dk_nentries   |
+---------------+
| dk_indices    |
|               |
+---------------+
| dk_entries    |
|               |
+---------------+

其中:

  • dk_indices是實際的hash表,它儲存dk_entries中的index,或DKIX_EMPTY(-1),或DKIX_DUMMY(-2)。dk_indices的size為dk_size。
  • dk_size不同,index的型別也不同,如下表。值得注意的是,由於-1和-2有特殊的含義,所以索引必須是有符號整數,所以dk_size <= 128的時候使用int8,而不是dk_size <=256的時候使用int8。
int8  for          dk_size <= 128
int16 for 256   <= dk_size <= 2**15
int32 for 2**16 <= dk_size <= 2**31
int64 for 2**32 <= dk_size
  • dk_entries是PyDictKeyEntry型別的陣列,陣列的size是USABLE_FRACTION(dk_size),即可dk_size * 2/3。通過DK_ENTRIES(dk)可以獲取dk_entries的指標。

7.2.3. PyDictKeysObject的兩種形式

分為combined table和split table:

  • combined table:ma_values == NULL,dk_refcnt == 1,ma_keys儲存key和值;
  • split table:ma_values != NULL,dk_refcnt >= 1,ma_keys儲存key,me_value陣列裡儲存值;

下面的分析基於combined table。

7.3. Dict物件

Dict物件是“變長物件”。

7.3.1. Python中的建立

Python中Dict物件最重要的建立方法為PyDict_New,如下Python語句最終會呼叫到PyDict_New:

test = {1:`hello`}

7.3.2. PyDict_New的C呼叫棧

// compile.c
PyAST_CompileObject

// symtable.c
=>PySymtable_BuildObject
=>symtable_new

// dictobject.c
=> PyDict_New

7.3.3. PyDict_New原始碼

// dictobject.c
#define USABLE_FRACTION(n) (((n) << 1)/3)
#define PyDict_MINSIZE 8

#define DK_SIZE(dk) ((dk)->dk_size)

#define DK_IXSIZE(dk)                          
    (DK_SIZE(dk) <= 0xff ?                     
        1 : DK_SIZE(dk) <= 0xffff ?            
            2 : DK_SIZE(dk) <= 0xffffffff ?    
                4 : sizeof(int64_t))

#define DK_ENTRIES(dk) 
    ((PyDictKeyEntry*)(&(dk)->dk_indices.as_1[DK_SIZE(dk) * DK_IXSIZE(dk)]))

PyObject *
PyDict_New(void)
{
    PyDictKeysObject *keys = new_keys_object(PyDict_MINSIZE);
    if (keys == NULL)
        return NULL;
    return new_dict(keys, NULL);
}

new_keys_object用於建立PyDictKeysObject物件。計算usable,如果緩衝區沒有可用的PyDictKeysObject物件則計算PyDictKeysObject物件的實際大小並分配記憶體,然後各種設定,最後設定查詢函式lookdict_unicode_nodummy,並memset記憶體。原始碼如下:

// dictobject.c
static PyDictKeysObject *new_keys_object(Py_ssize_t size)
{
    PyDictKeysObject *dk;
    Py_ssize_t es, usable;

    assert(size >= PyDict_MINSIZE);
    assert(IS_POWER_OF_2(size));

    usable = USABLE_FRACTION(size);
    if (size <= 0xff) {
        es = 1;
    }
    else if (size <= 0xffff) {
        es = 2;
    }
#if SIZEOF_VOID_P > 4
    else if (size <= 0xffffffff) {
        es = 4;
    }
#endif
    else {
        es = sizeof(Py_ssize_t);
    }

    if (size == PyDict_MINSIZE && numfreekeys > 0) {
        dk = keys_free_list[--numfreekeys];
    }
    else {
        dk = PyObject_MALLOC(sizeof(PyDictKeysObject)
                             - Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)
                             + es * size
                             + sizeof(PyDictKeyEntry) * usable);
        if (dk == NULL) {
            PyErr_NoMemory();
            return NULL;
        }
    }
    DK_DEBUG_INCREF dk->dk_refcnt = 1;
    dk->dk_size = size;
    dk->dk_usable = usable;
    dk->dk_lookup = lookdict_unicode_nodummy;
    dk->dk_nentries = 0;
    memset(&dk->dk_indices.as_1[0], 0xff, es * size);
    memset(DK_ENTRIES(dk), 0, sizeof(PyDictKeyEntry) * usable);
    return dk;
}

new_dict建立PyDictObject 物件,PyDictKeysObject物件的殼。如果緩衝區沒有可用的PyDictObject物件則分配記憶體,並且各種設定。原始碼如下:

// dictobject.c
static PyObject *
new_dict(PyDictKeysObject *keys, PyObject **values)
{
    PyDictObject *mp;
    assert(keys != NULL);
    if (numfree) {
        mp = free_list[--numfree];
        assert (mp != NULL);
        assert (Py_TYPE(mp) == &PyDict_Type);
        _Py_NewReference((PyObject *)mp);
    }
    else {
        mp = PyObject_GC_New(PyDictObject, &PyDict_Type);
        if (mp == NULL) {
            DK_DECREF(keys);
            free_values(values);
            return NULL;
        }
    }
    mp->ma_keys = keys;
    mp->ma_values = values;
    mp->ma_used = 0;
    mp->ma_version_tag = DICT_NEXT_VERSION();
    assert(_PyDict_CheckConsistency(mp));
    return (PyObject *)mp;
}

可以看到:

  • PyDictKeysObject物件緩衝:通過操作numfreekeys實現,numfreekeys在free_keys_object和dictresize方法中進行調整。
// dictobject.c
#define PyDict_MAXFREELIST 80

static PyDictKeysObject *keys_free_list[PyDict_MAXFREELIST];
static int numfreekeys = 0;
  • PyDictObject物件緩衝:通過操作numfree實現,numfree在dict_dealloc方法中進行調整。
// dictobject.c
#define PyDict_MAXFREELIST 80

static PyDictObject *free_list[PyDict_MAXFREELIST];
static int numfree = 0;

7.4. Dict物件的查詢

Dict物件的查詢是Dict物件最重要的方法。Python Dict物件預設的查詢方法為lookdict_unicode_nodummy,在lookdict_unicode_nodummy方法裡會判斷如果key不是unicode型別,則將查詢方法設定為lookdict,並呼叫lookdict進行查詢。

lookdict_unicode_nodummy與lookdict最重要的區別在於,hash相同的情況下對key的比對,lookdict_unicode_nodummy呼叫的unicode_eq,而lookdict呼叫的PyObject_RichCompareBool。由於Python原始碼實現中大量的使用string作為key的dict物件,所以這是一項優化。但是對於整個查詢機制而言,只要分析明白其中一個方法即可。

lookdict_unicode_nodummy原始碼如下:

// dictobject.c
#define DK_MASK(dk) (((dk)->dk_size)-1)

static Py_ssize_t _Py_HOT_FUNCTION
lookdict_unicode_nodummy(PyDictObject *mp, PyObject *key,
                   Py_hash_t hash, PyObject **value_addr)
{
    assert(mp->ma_values == NULL);
    if (!PyUnicode_CheckExact(key)) {
        mp->ma_keys->dk_lookup = lookdict;
        return lookdict(mp, key, hash, value_addr);
    }

    PyDictKeyEntry *ep0 = DK_ENTRIES(mp->ma_keys);
    size_t mask = DK_MASK(mp->ma_keys);
    size_t perturb = (size_t)hash;
    size_t i = (size_t)hash & mask;

    for (;;) {
        Py_ssize_t ix = dk_get_index(mp->ma_keys, i);
        assert (ix != DKIX_DUMMY);
        if (ix == DKIX_EMPTY) {
            *value_addr = NULL;
            return DKIX_EMPTY;
        }
        PyDictKeyEntry *ep = &ep0[ix];
        assert(ep->me_key != NULL);
        assert(PyUnicode_CheckExact(ep->me_key));
        if (ep->me_key == key ||
            (ep->me_hash == hash && unicode_eq(ep->me_key, key))) {
            *value_addr = ep->me_value;
            return ix;
        }
        perturb >>= PERTURB_SHIFT;
        i = mask & (i*5 + perturb + 1);
    }
    Py_UNREACHABLE();
}

lookdict_unicode_nodummy中通過:

i = (size_t)hash & mask;
Py_ssize_t ix = dk_get_index(dk, i);

計算i在dk_entries中的索引。

如果該索引:

1. DKIX_EMPTY,即沒有找到,則value=NULL,並返回索引值(DKIX_EMPTY);
2. DKIX_DUMMY,跳轉到4;
3. 索引值ix >= 0:
    3.1. 如果key物件完全一致,則返回value和索引值;
    3.2. 如果hash值一致,則呼叫unicode_eq或PyObject_RichCompareBool 比較key
        3.2.1. 如果key相等,則返回value和索引;
        3.2.2. 否則跳轉到4;
    3.3. 否則跳轉到4
4. 根據探測函式(i = mask & (i*5 + perturb + 1))計算下一個ix;

所以lookdict_unicode_nodummy方法:

1. 要麼返回NULL和DKIX_EMPTY;
2. 要麼返回value和索引;

7.5. Dict物件的維護

7.5.1. 設定/新增元素

如下Python語句最終會呼叫到PyDict_SetItem:

test = {100: 200}

PyDict_SetItem的C呼叫棧:

// pystate.c
PyInterpreterState_New

// ceval.c
=>_PyEval_EvalFrameDefault (case BUILD_MAP)

// dictobject.c
=> PyDict_SetItem

PyDict_SetItem原始碼:

// dictobject.c
int
PyDict_SetItem(PyObject *op, PyObject *key, PyObject *value)
{
    PyDictObject *mp;
    Py_hash_t hash;
    if (!PyDict_Check(op)) {
        PyErr_BadInternalCall();
        return -1;
    }
    assert(key);
    assert(value);
    mp = (PyDictObject *)op;
    if (!PyUnicode_CheckExact(key) ||
        (hash = ((PyASCIIObject *) key)->hash) == -1)
    {
        hash = PyObject_Hash(key);
        if (hash == -1)
            return -1;
    }

    /* insertdict() handles any resizing that might be necessary */
    return insertdict(mp, key, hash, value);
}

PyDict_SetItem方法中做了一下檢查,呼叫insertdict方法:

// dictobject.c
static int
insertdict(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject *value)
{
    PyObject *old_value;
    PyDictKeyEntry *ep;

    Py_INCREF(key);
    Py_INCREF(value);
    if (mp->ma_values != NULL && !PyUnicode_CheckExact(key)) {
        if (insertion_resize(mp) < 0)
            goto Fail;
    }

    Py_ssize_t ix = mp->ma_keys->dk_lookup(mp, key, hash, &old_value);
    if (ix == DKIX_ERROR)
        goto Fail;

    assert(PyUnicode_CheckExact(key) || mp->ma_keys->dk_lookup == lookdict);
    MAINTAIN_TRACKING(mp, key, value);

    /* When insertion order is different from shared key, we can`t share
     * the key anymore.  Convert this instance to combine table.
     */
    if (_PyDict_HasSplitTable(mp) &&
        ((ix >= 0 && old_value == NULL && mp->ma_used != ix) ||
         (ix == DKIX_EMPTY && mp->ma_used != mp->ma_keys->dk_nentries))) {
        if (insertion_resize(mp) < 0)
            goto Fail;
        ix = DKIX_EMPTY;
    }

    if (ix == DKIX_EMPTY) {
        /* Insert into new slot. */
        assert(old_value == NULL);
        if (mp->ma_keys->dk_usable <= 0) {
            /* Need to resize. */
            if (insertion_resize(mp) < 0)
                goto Fail;
        }
        Py_ssize_t hashpos = find_empty_slot(mp->ma_keys, hash);
        ep = &DK_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
        dk_set_index(mp->ma_keys, hashpos, mp->ma_keys->dk_nentries);
        ep->me_key = key;
        ep->me_hash = hash;
        if (mp->ma_values) {
            assert (mp->ma_values[mp->ma_keys->dk_nentries] == NULL);
            mp->ma_values[mp->ma_keys->dk_nentries] = value;
        }
        else {
            ep->me_value = value;
        }
        mp->ma_used++;
        mp->ma_version_tag = DICT_NEXT_VERSION();
        mp->ma_keys->dk_usable--;
        mp->ma_keys->dk_nentries++;
        assert(mp->ma_keys->dk_usable >= 0);
        assert(_PyDict_CheckConsistency(mp));
        return 0;
    }

    if (_PyDict_HasSplitTable(mp)) {
        mp->ma_values[ix] = value;
        if (old_value == NULL) {
            /* pending state */
            assert(ix == mp->ma_used);
            mp->ma_used++;
        }
    }
    else {
        assert(old_value != NULL);
        DK_ENTRIES(mp->ma_keys)[ix].me_value = value;
    }

    mp->ma_version_tag = DICT_NEXT_VERSION();
    Py_XDECREF(old_value); /* which **CAN** re-enter (see issue #22653) */
    assert(_PyDict_CheckConsistency(mp));
    Py_DECREF(key);
    return 0;

Fail:
    Py_DECREF(value);
    Py_DECREF(key);
    return -1;
}

insertdict中呼叫lookdict_unicode_nodummy或lookdict方法尋找Dice物件:

1. 沒有找到key
    1.1. 檢查是否已經用完空間,如果用完,呼叫insertion_resize調整dict大小;
    1.2. 呼叫find_empty_slot方法尋找探測鏈上第一個為DKIX_DUMMY或DKIX_EMPTY的,設定索引值,各種設定和調整
2. 否則設定value即可

find_empty_slot方法原始碼如下:

// dictobject.c
static Py_ssize_t
find_empty_slot(PyDictKeysObject *keys, Py_hash_t hash)
{
    assert(keys != NULL);

    const size_t mask = DK_MASK(keys);
    size_t i = hash & mask;
    Py_ssize_t ix = dk_get_index(keys, i);
    for (size_t perturb = hash; ix >= 0;) {
        perturb >>= PERTURB_SHIFT;
        i = (i*5 + perturb + 1) & mask;
        ix = dk_get_index(keys, i);
    }
    return i;
}

7.5.2. 刪除元素

如下Python語句最終會呼叫到PyDict_DelItem:

test = {100: 200}
del test[100]

PyDict_SetItem的C呼叫棧:

// dictobject.c
dict_ass_sub
=> PyDict_DelItem

PyDict_SetItem原始碼:

// dictobject.c
int
PyDict_DelItem(PyObject *op, PyObject *key)
{
    Py_hash_t hash;
    assert(key);
    if (!PyUnicode_CheckExact(key) ||
        (hash = ((PyASCIIObject *) key)->hash) == -1) {
        hash = PyObject_Hash(key);
        if (hash == -1)
            return -1;
    }

    return _PyDict_DelItem_KnownHash(op, key, hash);
}

PyDict_SetItem方法中做了一下檢查,呼叫_PyDict_DelItem_KnownHash方法:

// dictobject.c
int
_PyDict_DelItem_KnownHash(PyObject *op, PyObject *key, Py_hash_t hash)
{
    Py_ssize_t ix;
    PyDictObject *mp;
    PyObject *old_value;

    if (!PyDict_Check(op)) {
        PyErr_BadInternalCall();
        return -1;
    }
    assert(key);
    assert(hash != -1);
    mp = (PyDictObject *)op;
    ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &old_value);
    if (ix == DKIX_ERROR)
        return -1;
    if (ix == DKIX_EMPTY || old_value == NULL) {
        _PyErr_SetKeyError(key);
        return -1;
    }

    // Split table doesn`t allow deletion.  Combine it.
    if (_PyDict_HasSplitTable(mp)) {
        if (dictresize(mp, DK_SIZE(mp->ma_keys))) {
            return -1;
        }
        ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &old_value);
        assert(ix >= 0);
    }

    return delitem_common(mp, hash, ix, old_value);
}

查詢key,查詢到了呼叫delitem_common。需要注意的是查詢到了,只是把狀態設定為DKIX_DUMMY, 並沒有從探測鏈上摘除。delitem_common原始碼如下:

// dictobject.c
static int
delitem_common(PyDictObject *mp, Py_hash_t hash, Py_ssize_t ix,
               PyObject *old_value)
{
    PyObject *old_key;
    PyDictKeyEntry *ep;

    Py_ssize_t hashpos = lookdict_index(mp->ma_keys, hash, ix);
    assert(hashpos >= 0);

    mp->ma_used--;
    mp->ma_version_tag = DICT_NEXT_VERSION();
    ep = &DK_ENTRIES(mp->ma_keys)[ix];
    dk_set_index(mp->ma_keys, hashpos, DKIX_DUMMY);
    ENSURE_ALLOWS_DELETIONS(mp);
    old_key = ep->me_key;
    ep->me_key = NULL;
    ep->me_value = NULL;
    Py_DECREF(old_key);
    Py_DECREF(old_value);

    assert(_PyDict_CheckConsistency(mp));
    return 0;
}

static Py_ssize_t
lookdict_index(PyDictKeysObject *k, Py_hash_t hash, Py_ssize_t index)
{
    size_t mask = DK_MASK(k);
    size_t perturb = (size_t)hash;
    size_t i = (size_t)hash & mask;

    for (;;) {
        Py_ssize_t ix = dk_get_index(k, i);
        if (ix == index) {
            return i;
        }
        if (ix == DKIX_EMPTY) {
            return DKIX_EMPTY;
        }
        perturb >>= PERTURB_SHIFT;
        i = mask & (i*5 + perturb + 1);
    }
    Py_UNREACHABLE();
}

7.5.3. 調整大小

無論是設定、插入還是刪除,在滿足一定條件下(參見上面的程式碼分析),都會呼叫insertion_resize。insertion_resize方法會重新建立PyDictKeysObject物件,在這個過程中會拷貝舊的物件所有的資料。

// dictobject.c
#define GROWTH_RATE(d) (((d)->ma_used*2)+((d)->ma_keys->dk_size>>1))

static int
insertion_resize(PyDictObject *mp)
{
    return dictresize(mp, GROWTH_RATE(mp));
}

static int
dictresize(PyDictObject *mp, Py_ssize_t minsize)
{
    Py_ssize_t newsize, numentries;
    PyDictKeysObject *oldkeys;
    PyObject **oldvalues;
    PyDictKeyEntry *oldentries, *newentries;

    /* Find the smallest table size > minused. */
    for (newsize = PyDict_MINSIZE;
         newsize < minsize && newsize > 0;
         newsize <<= 1)
        ;
    if (newsize <= 0) {
        PyErr_NoMemory();
        return -1;
    }

    oldkeys = mp->ma_keys;

    /* Allocate a new table. */
    mp->ma_keys = new_keys_object(newsize);
    if (mp->ma_keys == NULL) {
        mp->ma_keys = oldkeys;
        return -1;
    }
    // New table must be large enough.
    assert(mp->ma_keys->dk_usable >= mp->ma_used);
    if (oldkeys->dk_lookup == lookdict)
        mp->ma_keys->dk_lookup = lookdict;

    numentries = mp->ma_used;
    oldentries = DK_ENTRIES(oldkeys);
    newentries = DK_ENTRIES(mp->ma_keys);
    oldvalues = mp->ma_values;
    if (oldvalues != NULL) {
        /* Convert split table into new combined table.
         * We must incref keys; we can transfer values.
         * Note that values of split table is always dense.
         */
        for (Py_ssize_t i = 0; i < numentries; i++) {
            assert(oldvalues[i] != NULL);
            PyDictKeyEntry *ep = &oldentries[i];
            PyObject *key = ep->me_key;
            Py_INCREF(key);
            newentries[i].me_key = key;
            newentries[i].me_hash = ep->me_hash;
            newentries[i].me_value = oldvalues[i];
        }

        DK_DECREF(oldkeys);
        mp->ma_values = NULL;
        if (oldvalues != empty_values) {
            free_values(oldvalues);
        }
    }
    else {  // combined table.
        if (oldkeys->dk_nentries == numentries) {
            memcpy(newentries, oldentries, numentries * sizeof(PyDictKeyEntry));
        }
        else {
            PyDictKeyEntry *ep = oldentries;
            for (Py_ssize_t i = 0; i < numentries; i++) {
                while (ep->me_value == NULL)
                    ep++;
                newentries[i] = *ep++;
            }
        }

        assert(oldkeys->dk_lookup != lookdict_split);
        assert(oldkeys->dk_refcnt == 1);
        if (oldkeys->dk_size == PyDict_MINSIZE &&
            numfreekeys < PyDict_MAXFREELIST) {
            DK_DEBUG_DECREF keys_free_list[numfreekeys++] = oldkeys;
        }
        else {
            DK_DEBUG_DECREF PyObject_FREE(oldkeys);
        }
    }

    build_indices(mp->ma_keys, newentries, numentries);
    mp->ma_keys->dk_usable -= numentries;
    mp->ma_keys->dk_nentries = numentries;
    return 0;
}

由於擴/縮容,所以要調整索引,呼叫build_indices方法:

// dictobject.c
static void
build_indices(PyDictKeysObject *keys, PyDictKeyEntry *ep, Py_ssize_t n)
{
    size_t mask = (size_t)DK_SIZE(keys) - 1;
    for (Py_ssize_t ix = 0; ix != n; ix++, ep++) {
        Py_hash_t hash = ep->me_hash;
        size_t i = hash & mask;
        for (size_t perturb = hash; dk_get_index(keys, i) != DKIX_EMPTY;) {
            perturb >>= PERTURB_SHIFT;
            i = mask & (i*5 + perturb + 1);
        }
        dk_set_index(keys, i, ix);
    }
}

7.6. Dict物件的特性

支援tp_as_sequence、tp_as_mapping兩種操作。

7.6.1. 序列操作

// dictobject.c
&dict_as_sequence,                      /* tp_as_sequence */
// dictobject.c
static PySequenceMethods dict_as_sequence = {
    0,                          /* sq_length */
    0,                          /* sq_concat */
    0,                          /* sq_repeat */
    0,                          /* sq_item */
    0,                          /* sq_slice */
    0,                          /* sq_ass_item */
    0,                          /* sq_ass_slice */
    PyDict_Contains,            /* sq_contains */
    0,                          /* sq_inplace_concat */
    0,                          /* sq_inplace_repeat */
};

其中:

  • PyDict_Contains
test = {200:100}
200 in test    # True
100 in test    # False

7.6.2. 關聯操作

// dictobject.c
&dict_as_mapping,                          /* tp_as_mapping */
// dictobject.c
static PyMappingMethods dict_as_mapping = {
    (lenfunc)dict_length, /*mp_length*/
    (binaryfunc)dict_subscript, /*mp_subscript*/
    (objobjargproc)dict_ass_sub, /*mp_ass_subscript*/
};

其中:

  • dict_length
test = {200:100}
len(test)
  • dict_subscript
test = {200:100}
test[200]
  • dict_ass_sub
test = {}
test[200] = 100

7.6.3. to string

// dictobject.c
(reprfunc)dict_repr,                        /* tp_repr */
0,                                          /* tp_str */

7.6.4. hash

// dictobject.c
PyObject_HashNotImplemented,                /* tp_hash */

7.6.5. 比較

// dictobject.c
dict_richcompare,                           /* tp_richcompare */

7.6.6. 內建方法

// dictobject.c
mapp_methods,                              /* tp_methods */

7.7. 參考

  • Python原始碼剖析


相關文章