Python 之 itertools 詳解

wcode發表於2018-05-11

原文網址 : https://juejin.im/post/5af56230f265da0b93485cca

翻譯總結自官方文件: itertools — Functions creating iterators for efficient looping
上一篇：Python 內建函式大全

itertools 包含許多有用的工具函式，熟練的運用可以極大的提高工作效率。

其中包含三類迭代器。

可以無限產出的迭代器

迭代器	引數	結果	舉例
count()	start, [step]	start, start+step, start+2*step, ...	`count(10) --> 10 11 12 13 14 ...`
cycle()	p	p0, p1, ... plast, p0, p1, ...	`cycle('ABCD') --> A B C D A B C D ...`
repeat()	elem [,n]	elem, elem, elem, ...無限次或 n 次	`repeat(10, 3) --> 10 10 10`

在最短輸入序列上終止的迭代器

迭代器	引數	結果	舉例
accumulate()	p [,func]	p0, p0+p1, p0+p1+p2, …	`accumulate([1,2,3,4,5]) --> 1 3 6 10 15`
chain()	p, q, …	p0, p1, … plast, q0, q1, …	`chain('ABC', 'DEF') --> A B C D E F`
chain.from_iterable()	iterable	p0, p1, … plast, q0, q1, …	`chain.from_iterable(['ABC', 'DEF']) --> A B C D E F`
compress()	data, selectors	(d[0] if s[0]), (d[1] if s[1]), …	`compress('ABCDEF', [1,0,1,0,1,1]) --> A C E F`
dropwhile()	pred, seq	seq[n], seq[n+1], 從 pred 返回 False 開始	`dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1`
filterfalse()	pred, seq	pred(elem) 為 False 的 seq 元素	`filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8`
groupby()	iterable[, key]	按 key(v) 的值分組的子迭代器
islice()	seq, [start,] stop [, step]	`elements from seq[start:stop:step]`	`islice('ABCDEFG', 2, None) --> C D E F G`
starmap()	func, seq	func(seq[0]), func(seq[1]), …	`starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000`
takewhile()	pred, seq	seq[0], seq[1], 直到 pred 返回 False	`takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4`
tee()	it, n	it1, it2, … itn 將一個迭代器分成 n 個
zip_longest()	p, q, …	(p[0], q[0]), (p[1], q[1]), …	`zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-`

組合迭代器

迭代器	引數	結果
product()	p, q, … [repeat=1]	笛卡爾積，相當於一個巢狀的 for 迴圈
permutations()	p[, r]	排列順序可變，元素不重複。（每一項用長度為 r 的元組表示。
combinations()	p, r	排列順序不可變，元素不重複。（每一項用長度為 r 的元組表示。
combinations_with_replacement()	p, r	排列順序不可變，元素可重複。（每一項用長度為 r 的元組表示。
`product('ABCD', repeat=2)`		`AA AB AC AD BA BB BC BD CA CB CC CD DA DB DC DD`
`permutations('ABCD', 2)`		`AB AC AD BA BC BD CA CB CD DA DB DC`
`combinations('ABCD', 2)`		`AB AC AD BC BD CD`
`combinations_with_replacement('ABCD', 2)`		`AA AB AC AD BB BC BD CC CD DD`

可以無限產出的迭代器

count

itertools.count(start=0, step=1)

建立一個迭代器，返回從數字 start 開始的均勻間隔的值。通常用作 map() 的引數來生成連續的資料點。另外，和 zip() 一起使用可以新增序列號。大致相當於：

def count(start=0, step=1):
    # count(10) --> 10 11 12 13 14 ...
    # count(2.5, 0.5) -> 2.5 3.0 3.5 ...
    n = start
    while True:
        yield n
        n += step
複製程式碼

使用浮點數進行計數時，有時可以通過替換乘法程式碼(multiplicative code)來實現更高的精度，例如：(start + step * i for i in count())。

!> 在 Python 3.1 中進行了更改：新增了 step 引數並允許使用非整數引數。

cycle

itertools.cycle(iterable)

從迭代器 iterable 中返回元素並儲存每個元素的副本。當迭代器 iterable 耗盡時，從儲存的副本中返回元素。無限重複。大致相當於：

def cycle(iterable):
    # cycle('ABCD') --> A B C D A B C D A B C D ...
    saved = []
    for element in iterable:
        yield element
        saved.append(element)
    while saved:
        for element in saved:
              yield element
複製程式碼

!> 請注意，cycle() 可能會佔用大量的輔助儲存（取決於 iterable 的長度）。

repeat

itertools.repeat(object[, times])

建立一個一直返回 object 的迭代器。除非指定了 times 引數，否則無限期地執行。用作 map() 的引數時，當作呼叫函式的不變引數。與 zip() 一起使用時，用來建立元組記錄的不變部分。

大致相當於：

def repeat(object, times=None):
    # repeat(10, 3) --> 10 10 10
    if times is None:
        while True:
            yield object
    else:
        for i in range(times):
            yield object
複製程式碼

repeat() 的一個常見用途是為 map 或 zip 提供一個常量值流：

>>> list(map(pow, range(10), repeat(2)))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
複製程式碼

在最短輸入序列上終止的迭代器

accumulate

itertools.accumulate(iterable[, func])

建立一個迭代器，它返回累計的和或其他二元函式的累計結果（通過可選的 func 引數指定）。如果提供了 func，它應該是兩個引數的函式。iterable 的元素可以是任何能夠被接受為 func 引數的型別。（例如，如果是預設的相加操作，元素可以是任何可相加型別，包括 Decimal 或 Fraction。）如果傳入的迭代器為空，則輸出迭代器也將為空。

說再多不如看個例子

In : list(itertools.accumulate([1, 2, 3, 4, 5]))
Out: [1, 3, 6, 10, 15]
複製程式碼

傳入序列為 1, 2, 3, 4, 5

輸出應該為 1, 1+2, 1+2+3, 1+2+3+4, 1+2+3+4+5 結果就是 1, 3, 6, 10, 15

再看個例子

In : list(itertools.accumulate([1, 2, 3, 4, 5], func=lambda x, y:x*y))
Out: [1, 2, 6, 24, 120]
複製程式碼

傳入序列為 1, 2, 3, 4, 5

輸出應該為 1, 1*2, 1*2*3, 1*2*3*4, 1*2*3*4*5 結果就是 1, 2, 6, 24, 120

大致相當於：

def accumulate(iterable, func=operator.add):
    'Return running totals'
    # accumulate([1,2,3,4,5]) --> 1 3 6 10 15
    # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
    it = iter(iterable)
    try:
        total = next(it)
    except StopIteration:
        return
    yield total
    for element in it:
        total = func(total, element)
        yield total
複製程式碼

!> func 引數在 Python 3.3 以上可用

chain

itertools.chain(*iterables)

建立一個迭代器，它從第一個迭代器中返回元素，直到它耗盡，然後繼續下一個迭代器，直到所有迭代器都耗盡。用於將連續序列作為單個序列進行處理。

看個例子

In : a1 = ["a", "b", "c"]

In : a2 = ("d", "e")

In : a3 = (it for it in range(3))

In : list(itertools.chain(a1, a2, a3))
Out: ['a', 'b', 'c', 'd', 'e', 0, 1, 2]
複製程式碼

大致相當於：

def chain(*iterables):
    # chain('ABC', 'DEF') --> A B C D E F
    for it in iterables:
        for element in it:
            yield element
複製程式碼

chain.from_iterable

classmethod chain.from_iterable(iterable)

chain.from_iterable 和 chain 差不多，只不過引數只能是一個可迭代序列，且序列中的每一項必須可迭代。

看個例子

In : a1 = ["a", "b", "c"]

In : a2 = ("d", "e")

In : a3 = (it for it in range(3))

In : list(itertools.chain.from_iterable([a1, a2, a3]))
Out: ['a', 'b', 'c', 'd', 'e', 0, 1, 2]
複製程式碼

大致相當於：

def from_iterable(iterables):
    # chain.from_iterable(['ABC', 'DEF']) --> A B C D E F
    for it in iterables:
        for element in it:
            yield element
複製程式碼

compress

itertools.compress(data, selectors)

建立一個迭代器，用於過濾來自 data 的元素，僅返回那些在 selectors 中對應位置上有元素並且計算結果為 True 的元素。data 或 selectors 迭代器耗盡時停止。

大致相當於：

def compress(data, selectors):
    # compress('ABCDEF', [1,0,1,0,1,1]) --> A C E F
    return (d for d, s in zip(data, selectors) if s)
複製程式碼

dropwhile

itertools.dropwhile(predicate, iterable)

將 iterable 中的元素依次交給 predicate 處理，直到 predicate(elem) 的值為 False 時，返回從 elem 開始的所有元素。

大致相當於：

def dropwhile(predicate, iterable):
    # dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1
    iterable = iter(iterable)
    for x in iterable:
        if not predicate(x):
            yield x
            break
    for x in iterable:
        yield x
複製程式碼

filterfalse

itertools.filterfalse(predicate, iterable)

返回 predicate(elem) 為 False 的結果集，與 filter 的相反：

舉個例子

In : list(itertools.filterfalse(lambda x: x%2, range(10)))
Out: [0, 2, 4, 6, 8]

In : list(filter(lambda x: x%2, range(10)))
Out: [1, 3, 5, 7, 9]
複製程式碼

大致相當於：

def filterfalse(predicate, iterable):
    # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8
    if predicate is None:
        predicate = bool
    for x in iterable:
        if not predicate(x):
            yield x
複製程式碼

groupby

itertools.groupby(iterable, key=None)

根據 key 給 iterable 中的元素分組，如果 key 為 None，則依據元素自身分組。

注意，注意，注意：必須先排序後才能分組，因為 groupby 是通過比較相鄰元素來分組的。

看個例子

In : {k:list(g) for k, g in itertools.groupby('aaaabbbcccccc')}
Out:
{'a': ['a', 'a', 'a', 'a'],
 'b': ['b', 'b', 'b'],
 'c': ['c', 'c', 'c', 'c', 'c', 'c']}

# 打亂順序後結果就不正確了

In : {k:list(g) for k, g in itertools.groupby('aaccaabbbccc')}
Out: {'a': ['a', 'a'], 'c': ['c', 'c', 'c'], 'b': ['b', 'b', 'b']}
複製程式碼

因為分組後每組資料是生成器，所以用 list 轉換下，看起來更直觀。

groupby() 大致等價於：

class groupby:
    # [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
    # [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
    def __init__(self, iterable, key=None):
        if key is None:
            key = lambda x: x
        self.keyfunc = key
        self.it = iter(iterable)
        self.tgtkey = self.currkey = self.currvalue = object()
    def __iter__(self):
        return self
    def __next__(self):
        while self.currkey == self.tgtkey:
            self.currvalue = next(self.it)    # Exit on StopIteration
            self.currkey = self.keyfunc(self.currvalue)
        self.tgtkey = self.currkey
        return (self.currkey, self._grouper(self.tgtkey))
    def _grouper(self, tgtkey):
        while self.currkey == tgtkey:
            yield self.currvalue
            try:
                self.currvalue = next(self.it)
            except StopIteration:
                return
            self.currkey = self.keyfunc(self.currvalue)
複製程式碼

islice

itertools.islice(iterable, stop)
itertools.islice(iterable, start, stop[, step])

建立一個迭代器，從可迭代的序列 iterable 中返回資料，從 start 位置開始，到 stop，步長為 step

大致相當於：

def islice(iterable, *args):
    # islice('ABCDEFG', 2) --> A B
    # islice('ABCDEFG', 2, 4) --> C D
    # islice('ABCDEFG', 2, None) --> C D E F G
    # islice('ABCDEFG', 0, None, 2) --> A C E G
    s = slice(*args)
    start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
    it = iter(range(start, stop, step))
    try:
        nexti = next(it)
    except StopIteration:
        # Consume *iterable* up to the *start* position.
        for i, element in zip(range(start), iterable):
            pass
        return
    try:
        for i, element in enumerate(iterable):
            if i == nexti:
                yield element
                nexti = next(it)
    except StopIteration:
        # Consume to *stop*.
        for i, element in zip(range(i + 1, stop), iterable):
            pass
複製程式碼

starmap

itertools.starmap(function, iterable)

把 iterable 中的元素傳遞給 function(*elem) 處理，然後返回結果。 iterable 中的元素 elem 可以是任意值，這取決於 function 函式。

大致相當於：

def starmap(function, iterable):
    # starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000
    for args in iterable:
        yield function(*args)
複製程式碼

再看個例子：找出一組序列中每個序列的最小值。

In : a1 = [(1, 2,), (5,1), (32, 22, 11)]

In : list(itertools.starmap(min,a1))
Out: [1, 1, 11]
複製程式碼

takewhile

itertools.takewhile(predicate, iterable)

將 iterable 中的元素依次交給 predicate 處理，返回 predicate(elem) 的值為 True 的元素，當 predicate(elem) 的值為 False 時立即結束（如果處理第一個元素就返回 False ，那麼返回空列表），這和 dropwhile() 相反。

大致相當於：

def takewhile(predicate, iterable):
    # takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4
    for x in iterable:
        if predicate(x):
            yield x
        else:
            break
複製程式碼

tee

itertools.tee(iterable, n=2)

從 iterable 中返回 n 個獨立的迭代器。

下面的 Python 程式碼有助於解釋什麼是 tee（儘管實際實現更復雜，僅使用了一個底層的 FIFO 佇列）。

大致相當於：

def tee(iterable, n=2):
    it = iter(iterable)
    deques = [collections.deque() for i in range(n)]
    def gen(mydeque):
        while True:
            if not mydeque:             # when the local deque is empty
                try:
                    newval = next(it)   # fetch a new value and
                except StopIteration:
                    return
                for d in deques:        # load it to all the deques
                    d.append(newval)
            yield mydeque.popleft()
    return tuple(gen(d) for d in deques)
複製程式碼

看個栗子

In : a1 = [1, 2, 3, 4, 5]

In : [list(it) for it in itertools.tee(a1)]
Out: [[1, 2, 3, 4, 5], [1, 2, 3, 4, 5]]

In : [list(it) for it in itertools.tee(a1, 3)]
Out: [[1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5]]
複製程式碼

zip_longest

itertools.zip_longest(*iterables, fillvalue=None)

製作一個迭代器，用於聚合來自每個迭代器的元素。如果迭代的長度不均勻，缺少的值將用 fillvalue 填充。繼續迭代下去，直到最長的迭代耗盡。

大致相當於：

class ZipExhausted(Exception):
    pass

def zip_longest(*args, **kwds):
    # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    counter = len(args) - 1
    def sentinel():
        nonlocal counter
        if not counter:
            raise ZipExhausted
        counter -= 1
        yield fillvalue
    fillers = repeat(fillvalue)
    iterators = [chain(it, sentinel(), fillers) for it in args]
    try:
        while iterators:
            yield tuple(map(next, iterators))
    except ZipExhausted:
        pass
複製程式碼

如果其中一個 iterables 可能是無限的，那麼 zip_longest() 函式應該包含限制呼叫次數的方式（例如 islice() 或 takewhile()）。

fillvalue 如果未指定，預設為 None。

看個例子

In : a1 = [23, 18, 56]

In : a2 = ('zhang', 'li', 'wang', 'zhao', 'qiao')

In : list(itertools.zip_longest(a1, a2, fillvalue=20))
Out: [(23, 'zhang'), (18, 'li'), (56, 'wang'), (20, 'zhao'), (20, 'qiao')]
複製程式碼

組合迭代器

product

itertools.product(*iterables, repeat=1)

大致相當於生成器表示式中的巢狀 for 迴圈。例如，product(A, B) 與 ((x,y) for x in A for y in B) 返回的結果相同。

要計算迭代器與自身的乘積，請使用可選的 repeat 關鍵字引數指定重複次數。例如，product(A, repeat=4) 意味著與 product(A, A, A, A) 相同。

這個函式大致等價於下面的程式碼，只是實際的實現不會在記憶體中建立中間結果：

def product(*args, repeat=1):
    # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
    # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
    pools = [tuple(pool) for pool in args] * repeat
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)
複製程式碼

permutations

itertools.permutations(iterable, r=None)

將 iterable 中的元素按照 r 個 r 個組合。（比如 r=2，那就是兩個兩個組合）

排列順序可變，元素不重複（同一個組合內不重複）。

In : list(itertools.permutations('123', 2))
Out: [('1', '2'), ('1', '3'), ('2', '1'), ('2', '3'), ('3', '1'), ('3', '2')]
複製程式碼

如果未指定 r 或者是 None，那麼 r 預設為 iterable 的長度。

In : list(itertools.permutations('123'))
Out:
[('1', '2', '3'),
 ('1', '3', '2'),
 ('2', '1', '3'),
 ('2', '3', '1'),
 ('3', '1', '2'),
 ('3', '2', '1')]
複製程式碼

大致相當於：

def permutations(iterable, r=None):
    # permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
    # permutations(range(3)) --> 012 021 102 120 201 210
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    if r > n:
        return
    indices = list(range(n))
    cycles = list(range(n, n-r, -1))
    yield tuple(pool[i] for i in indices[:r])
    while n:
        for i in reversed(range(r)):
            cycles[i] -= 1
            if cycles[i] == 0:
                indices[i:] = indices[i+1:] + indices[i:i+1]
                cycles[i] = n - i
            else:
                j = cycles[i]
                indices[i], indices[-j] = indices[-j], indices[i]
                yield tuple(pool[i] for i in indices[:r])
                break
        else:
            return
複製程式碼

combinations

itertools.combinations(iterable, r)

將 iterable 中的元素按照 r 個 r 個組合。（比如 r=2，那就是兩個兩個組合）

排列順序不可變，元素不重複（同一個組合內不重複）。

In : list(itertools.combinations('123', 2))
Out: [('1', '2'), ('1', '3'), ('2', '3')]
複製程式碼

combinations 必須要指定 r 了，因為排列順序不可變，所有 123 長度為 3 的話只有一種情況。

大致相當於：

def combinations(iterable, r):
    # combinations('ABCD', 2) --> AB AC AD BC BD CD
    # combinations(range(4), 3) --> 012 013 023 123
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = list(range(r))
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)
複製程式碼

combinations_with_replacement

itertools.combinations_with_replacement(iterable, r)

將 iterable 中的元素按照 r 個 r 個組合。（比如 r=2，那就是兩個兩個組合）

排列順序不可變，元素可重複。

In : list(itertools.combinations_with_replacement('123', 2))
Out: [('1', '1'), ('1', '2'), ('1', '3'), ('2', '2'), ('2', '3'), ('3', '3')]
複製程式碼

大致相當於：

def combinations_with_replacement(iterable, r):
    # combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
    pool = tuple(iterable)
    n = len(pool)
    if not n and r:
        return
    indices = [0] * r
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != n - 1:
                break
        else:
            return
        indices[i:] = [indices[i] + 1] * (r - i)
        yield tuple(pool[i] for i in indices)
複製程式碼

itertools.count() function in Python 3
2024-10-25
FunctionPython
Python中itertools 模組的用法
2024-04-17
Python
Python之time模組詳解
2021-09-11
Python
每週一個 Python 模組 | itertools
2018-11-15
Python
Python|Python互動之mongoDB互動詳解
2018-08-28
PythonMongoDB
Python之ini配置檔案詳解
2022-03-07
Python
Python標準庫13 迴圈器 (itertools)
2019-11-28
Python
python中的itertools模組簡單使用
2021-11-03
Python
Python爬蟲之selenium庫使用詳解
2018-05-16
Python爬蟲
python爬蟲常用庫之urllib詳解
2018-03-11
Python爬蟲
python之logging日誌模組詳解
2020-10-26
Python
Python基礎之七：編碼詳解
2020-10-23
Python
python爬蟲常用庫之requests詳解
2019-03-04
Python爬蟲
python爬蟲常用庫之BeautifulSoup詳解
2018-04-01
Python爬蟲
Python3之正規表示式詳解
2019-07-25
Python
Python學習之異常處理詳解
2020-04-10
Python
python自帶效能強悍的標準庫 itertools
2021-12-12
Python
Python 中的設計模式詳解之：策略模式
2019-04-18
Python設計模式
詳解Python GIL
2018-10-08
Python
Python @property 詳解
2019-02-12
Python
Python列表詳解
2021-04-09
Python
JavaScript之this詳解
2018-11-30
JavaScript
ITERTOOLS模組小結
2019-03-07
itertools 模組學習
2024-12-02
python模組詳解
2019-03-04
Python
python中dict詳解
2019-04-09
Python
Python協程詳解
2019-09-23
Python
python操作Redis詳解
2020-08-16
PythonRedis
Python self用法詳解
2020-10-02
Python
Python元組詳解
2021-04-10
Python
Python 擴充之詳解深拷貝和淺拷貝
2018-12-08
Python
BeetleX之WebSocket詳解
2019-02-27
Web
Flutter之ElevatedButton詳解
2023-10-09
Flutter
MySql之EXPLAN詳解
2020-10-30
MySql
Java 之 volatile 詳解
2018-04-21
Java
CSAPP 之 CacheLab 詳解
2022-05-18
APP
CSAPP 之 AttackLab 詳解
2022-05-15
APP
CSAPP 之 BombLab 詳解
2022-05-14
APP

Python 之 itertools 詳解

可以無限產出的迭代器

count

cycle

repeat

在最短輸入序列上終止的迭代器

accumulate

chain

chain.from_iterable

compress

dropwhile

filterfalse

groupby

islice

starmap

takewhile

tee

zip_longest

組合迭代器

product

permutations

combinations

combinations_with_replacement

相關文章