Python collections 模組筆記

wcode發表於2018-04-01

原文網址 : https://juejin.im/post/5ac0ab37518825558c479265

namedtuple

collections.namedtuple 是一個工廠函式，它可以用來構建一個帶欄位名的元組和一個有名字的類——這個帶名字的類對除錯程式有很大幫助。

我們可以這樣建立一個 User 類：

 Card = collections.namedtuple('User', ['name', 'age', 'height'])
複製程式碼

如何用具名元組來記錄一個城市的資訊

In [1]: from collections import namedtuple

In [2]: City = namedtuple('City', 'name country population coordinates')

In [3]: tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))

In [4]: tokyo
Out[4]: City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))

In [5]: tokyo.population
Out[5]: 36.933

In [6]: tokyo.coordinates
Out[6]: (35.689722, 139.691667)

In [7]: tokyo[1]
Out[7]: 'JP'
複製程式碼

建立一個具名元組需要兩個引數，一個是類名，另一個是類的各個欄位的名字。後者可以是由數個字串組成的可迭代物件，或者是由空格分隔開的欄位名組成的字串。

除了從普通元組那裡繼承來的屬性之外，具名元組還有一些自己專有的屬性。

In [8]: City._fields
Out[8]: ('name', 'country', 'population', 'coordinates')

In [9]: LatLong = namedtuple('LatLong', 'lat long')

In [10]: delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))

In [11]: delhi = City._make(delhi_data)

In [12]: delhi._asdict()
Out[12]: 
OrderedDict([('name', 'Delhi NCR'),
             ('country', 'IN'),
             ('population', 21.935),
             ('coordinates', LatLong(lat=28.613889, long=77.208889))])

In [13]: for key, value in delhi._asdict().items():
    ...:     print(key + ':', value)
    ...:     
name: Delhi NCR
country: IN
population: 21.935
coordinates: LatLong(lat=28.613889, long=77.208889)

複製程式碼

_fields 屬性是一個包含這個類所有欄位名稱的元組。

用 _make() 通過接受一個可迭代物件來生成這個類的一個例項，它的作用跟 City(*delhi_data) 是一樣的。

_asdict() 把具名元組以 collections.OrderedDict 的形式返回，我們可以利用它來把元組裡的資訊友好地呈現出來。

defaultdict

首先我們看一個例子。

用 dict 統計一個 list 中字串出現的次數：

In [1]: langs = ['java', 'php', 'python', 'C#', 'kotlin', 'swift', 'python']

In [2]: res_dict = {}

In [3]: for lang in langs:
   ...:     if lang in res_dict:
   ...:         res_dict[lang] += 1
   ...:     else:
   ...:         res_dict[lang] = 1
   ...: 

In [4]: res_dict
Out[4]: {'C#': 1, 'java': 1, 'kotlin': 1, 'php': 1, 'python': 2, 'swift': 1}

複製程式碼

這裡每次迴圈都要判斷一次，可以呼叫 setdefault 方法來消除判斷。

In [1]: langs = ['java', 'php', 'python', 'C#', 'kotlin', 'swift', 'python']

In [2]: res_dict = {}

In [3]: for lang in langs:
   ...:     res_dict.setdefault(lang, 0)
   ...:     res_dict[lang] += 1
   ...: 

In [4]: res_dict
Out[4]: {'C#': 1, 'java': 1, 'kotlin': 1, 'php': 1, 'python': 2, 'swift': 1}
複製程式碼

但是現在還有一個錯誤，每次取值使還要進行一次判斷，否則如果值不存在就會拋異常：

In [5]: res_dict['c++']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-5-269671e9ed5a> in <module>()
----> 1 res_dict['c++']

KeyError: 'c++'

複製程式碼

有時候為了方便起見，就算某個鍵在對映裡不存在，我們也希望在通過這個鍵讀取值的時候能得到一個預設值。有兩個途徑能幫我們達到這個目的，一個是通過 defaultdict 這個型別而不是普通的 dict，另一個是給自己定義一個 dict 的子類，然後在子類中實現 __missing__ 方法。

使用 defaultdict

In [7]: from collections import defaultdict

In [8]: res_dict= defaultdict(int)

In [9]: for lang in langs:
   ...:     res_dict[lang] += 1
   ...: 

In [10]: res_dict
Out[10]: 
defaultdict(int,
            {'C#': 1,
             'java': 1,
             'kotlin': 1,
             'php': 1,
             'python': 2,
             'swift': 1})

In [11]: res_dict['c++']
Out[11]: 0
複製程式碼

這樣就完美解決了上述所有問題， defaultdict 建構函式接收一個可呼叫的物件，當 __getitem__ 方法找不到值的時候就會呼叫該物件返回一個值。

所以我們可以返回更復雜的預設值：

In [25]: def gen_dict():
    ...:     return {'name': 'None', 'age': 0}
    ...: 

In [26]: res_dict = defaultdict(gen_dict)

In [27]: res_dict['zhangsan']
Out[27]: {'age': 0, 'name': 'None'}

複製程式碼

`missing` 方法

In [28]: class CustomDict(dict):
    ...:     
    ...:     def __missing__(self, key):
    ...:         return {'name': 'None', 'age': 18}
    ...: 

In [29]: res_dict = CustomDict()

In [30]: res_dict['lisi']
Out[30]: {'age': 18, 'name': 'None'}

複製程式碼

deque

collections.deque 類（雙向佇列）是一個執行緒安全、可以快速從兩端新增或者刪除元素的資料型別。而且如果想要有一種資料型別來存放“最近用到的幾個元素”，deque 也是一個很好的選擇。這是因為在新建一個雙向佇列的時候，你可以指定這個佇列的大小，如果這個佇列滿員了，還可以從反向端刪除過期的元素，然後在尾端新增新的元素。

In [1]: from collections import deque

In [2]: dq = deque(range(10), maxlen=10)

In [3]: dq
Out[3]: deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [4]: dq.rotate(3)

In [5]: dq
Out[5]: deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6])

In [6]: dq.rotate(-4)

In [7]: dq
Out[7]: deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])

In [8]: dq.appendleft(-1)

In [9]: dq
Out[9]: deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [10]: dq.extend([11, 22, 33])

In [11]: dq
Out[11]: deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33])

In [12]: dq.extendleft([10, 20, 30, 40])

In [13]: dq
Out[13]: deque([40, 30, 20, 10, 3, 4, 5, 6, 7, 8])

複製程式碼

maxlen 是一個可選引數，代表這個佇列可以容納的元素的數量，而且一旦設定，這個屬性就不能修改了。

佇列的旋轉操作 (rotate) 接受一個引數 n，當 n > 0 時，佇列的最右邊的 n 個元素會被移動到佇列的左邊。當 n < 0 時，最左邊的 n 個元素會被移動到右邊。

當試圖對一個已滿（len(d) == d.maxlen）的佇列做尾部新增操作的時候，它頭部的元素會被刪除掉。

extendleft(iter) 方法會把迭代器裡的元素逐個新增到雙向佇列的左邊，因此迭代器裡的元素會逆序出現在佇列裡。

Counter

這個對映型別會給鍵準備一個整數計數器。每次更新一個鍵的時候都會增加這個計數器。所以這個型別可以用來給可雜湊表物件計數，或者是當成多重集來用——多重集合就是集合裡的元素可以出現不止一次。Counter 實現了 + 和 - 運算子用來合併記錄，還有像 most_common([n]) 這類很有用的方法。most_common([n]) 會按照次序返回對映裡最常見的 n 個鍵和它們的計數

In [1]: from collections import Counter

In [2]: langs = ['java', 'php', 'python', 'C#', 'kotlin', 'swift', 'python']

In [3]: ct = Counter(langs)

In [4]: ct
Out[4]: Counter({'C#': 1, 'java': 1, 'kotlin': 1, 'php': 1, 'python': 2, 'swift': 1})

In [5]: ct.update(['java', 'c'])

In [6]: ct
Out[6]: 
Counter({'C#': 1,
         'c': 1,
         'java': 2,
         'kotlin': 1,
         'php': 1,
         'python': 2,
         'swift': 1})

In [7]: ct.most_common(2)
Out[7]: [('java', 2), ('python', 2)]
複製程式碼

當然，也可以直接操作字串：

In [9]: ct = Counter('abracadabra')

In [10]: ct
Out[10]: Counter({'a': 5, 'b': 2, 'c': 1, 'd': 1, 'r': 2})

In [11]: ct.update('aaaaazzz')

In [12]: ct
Out[12]: Counter({'a': 10, 'b': 2, 'c': 1, 'd': 1, 'r': 2, 'z': 3})

In [13]: ct.most_common(2)
Out[13]: [('a', 10), ('z', 3)]

複製程式碼

OrderedDict

這個型別在新增鍵的時候會保持順序，因此鍵的迭代次序總是一致的。OrderedDict 的 popitem 方法預設刪除並返回的是字典裡的最後一個元素，但是如果像 my_odict.popitem(last=False) 這樣呼叫它，那麼它刪除並返回第一個被新增進去的元素。

move_to_end(key, last=True) 將現有 key 移至有序字典的末尾。如果 last=True（預設），則 item 移動到右側，如果 last=False，則移動到開始。如果 key 不存在，則引發 KeyError：

In [1]: from collections import OrderedDict

In [2]: d = OrderedDict.fromkeys('abcde')

In [3]: d.move_to_end('b')

In [4]: ''.join(d.keys())
Out[4]: 'acdeb'

In [5]: d.move_to_end('b', last=False)

In [6]: ''.join(d.keys())
Out[6]: 'bacde'

複製程式碼

由於 OrderedDict 會記住它的插入順序，因此它可以與 sorted 結合使用來建立一個排序後的字典：

In [11]: d = {'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}
# 根據 key 排序
In [12]: OrderedDict(sorted(d.items(), key=lambda t:t[0]))
Out[12]: OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
# 根據 value 排序
In [13]: OrderedDict(sorted(d.items(), key=lambda t:t[1]))
Out[13]: OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])
# 根據 key 的長度排序
In [14]: OrderedDict(sorted(d.items(), key=lambda t: len(t[0])))
Out[14]: OrderedDict([('pear', 1), ('apple', 4), ('banana', 3), ('orange', 2)])

複製程式碼

刪除條目時，新排序的字典會保持排序順序。但是，當新增新的 key 時，key 被追加到最後，並不保持排序。

ChainMap

ChainMap 類提供用於快速連結多個 dict，以便將它們視為單個單元。它通常比建立新 dict 和執行多個 update() 呼叫要快得多。

In [1]: from collections import ChainMap

In [2]: d1 = {'java': 3, 'python': 4}

In [3]: d2 = {'c++': 1, 'java': 2}

In [4]: for key, val in ChainMap(d1, d2).items():
   ...:     print(key, val)
   ...:     
c++ 1
java 3
python 4

複製程式碼

後出現的重複的 key 將被忽略

ChainMap 將連結的 dict 儲存在一個列表中。該列表是公開的，可以使用 maps 屬性進行訪問或更新。

In [10]: c1 = ChainMap(d1, d2)

In [11]: c1.maps[0]
Out[11]: {'java': 3, 'python': 4}

In [12]: c1.maps[0]['python'] = 2

In [13]: c1.items()
Out[13]: ItemsView(ChainMap({'java': 3, 'python': 2}, {'c++': 1, 'java': 2}))

In [14]: dict(c1)
Out[14]: {'c++': 1, 'java': 3, 'python': 2}

複製程式碼

參考

python必學模組-collections
8.3. collections — Container datatypes
《流暢的 Python》相關章節

python collections模組
2021-09-09
Python
python模組之collections模組
2019-01-04
Python
python—collections模組(defaultdict、Counter、OrderedDict)
2020-11-17
Python
collections模組
2018-12-10
Python中的collections.Counter模組
2019-03-04
Python
不可不知的python模組–collections
2019-02-16
Python
每日一模組-collections
2024-04-18
python 基礎筆記——常用模組
2019-08-25
Python筆記
Python原生資料結構增強模組collections
2022-01-06
Python資料結構
time模組，collections模組，佇列和棧
2019-03-20
佇列
【廖雪峰python進階筆記】模組
2018-07-09
Python筆記
Python學習筆記_函式_匯入模組
2019-11-16
Python筆記函式
logging模組配置筆記
2018-10-16
筆記
Python3學習筆記4 , 迴圈、模組
2018-07-02
Python筆記
《Python 簡明教程》讀書筆記系列三 —— 模組
2020-04-19
Python筆記
Python筆記之paramiko模組安裝和使用示例
2022-07-15
Python筆記
pickle模組 collections模組在物件導向中的應用
2019-03-28
物件
【Java學習筆記】Collections集合
2021-01-03
Java筆記
python3 筆記17.呼叫模組from...import...
2018-10-18
Python筆記Import
Python 3 學習筆記之——變數作用域、模組和包
2019-02-16
Python筆記變數
自學Python筆記-pygame模組《外星人入侵》練習篇
2020-11-08
Python筆記GAM
python 模組：itsdangerous 模組
2020-02-16
Python
Python模組：time模組
2021-09-09
Python
Nginx 學習筆記--程式與模組
2020-04-07
Nginx筆記
《Haskell趣學指南》筆記之模組
2019-05-04
Haskell筆記
Python模組之urllib模組
2020-10-30
Python
Python：使用logging模組記錄日誌
2021-09-09
Python
Nginxhttp模組(學習筆記二十一)
2018-05-17
NginxHTTP筆記
Nginxupload上傳模組(學習筆記十七)
2018-05-17
Nginx筆記
Android開發筆記[18]-使用本地模組
2024-04-23
Android筆記
Nginx 學習筆記--程式與模組（二）
2020-04-08
Nginx筆記
beego cache模組原始碼分析筆記四
2018-12-26
Go原始碼筆記
Python 模組
2021-11-23
Python
[Python模組學習] glob模組
2018-05-26
Python
Python數模筆記-Sklearn（1）介紹
2021-05-09
Python筆記
node學習筆記第八節：模組化
2018-08-03
筆記
Python中模組是什麼？Python有哪些模組?
2021-09-15
Python
Python學習筆記-基礎篇(14)-安裝第三方模組
2021-09-09
Python筆記