一、概念描述
可迭代物件就是可以迭代的物件,我們可以通過內建的iter函式獲取其迭代器,可迭代物件內部需要實現__iter__函式來返回其關聯的迭代器;
迭代器是負責具體資料的逐個遍歷的,其通過實現__next__函式得以逐個的訪問關聯的資料元素;同時通過實現__iter__來實現對可迭代物件的相容;
生成器是一種迭代器模式,其實現了資料的惰性生成,即只有使用的時候才會生成對應的元素;
二、序列的可迭代性
python內建的序列可以通過for進行迭代,直譯器會呼叫iter函式獲取序列的迭代器,由於iter函式相容序列實現的__getitem__,會自動建立一個迭代器;
迭代器的
import re
from dis import dis
class WordAnalyzer:
reg_word = re.compile('\w+')
def __init__(self, text):
self.words = self.__class__.reg_word.findall(text)
def __getitem__(self, index):
return self.words[index]
def iter_word_analyzer():
wa = WordAnalyzer('this is mango word analyzer')
print('start for wa')
for w in wa:
print(w)
print('start while wa_iter')
wa_iter = iter(wa)
while True:
try:
print(next(wa_iter))
except StopIteration as e:
break;
iter_word_analyzer()
dis(iter_word_analyzer)
# start for wa
# this
# is
# mango
# word
# analyzer
# start while wa_iter
# this
# is
# mango
# word
# analyzer
# 15 0 LOAD_GLOBAL 0 (WordAnalyzer)
# 2 LOAD_CONST 1 ('this is mango word analyzer')
# 4 CALL_FUNCTION 1
# 6 STORE_FAST 0 (wa)
#
# 16 8 LOAD_GLOBAL 1 (print)
# 10 LOAD_CONST 2 ('start for wa')
# 12 CALL_FUNCTION 1
# 14 POP_TOP
#
# 17 16 LOAD_FAST 0 (wa)
# 18 GET_ITER
# >> 20 FOR_ITER 12 (to 34)
# 22 STORE_FAST 1 (w)
#
# 18 24 LOAD_GLOBAL 1 (print)
# 26 LOAD_FAST 1 (w)
# 28 CALL_FUNCTION 1
# 30 POP_TOP
# 32 JUMP_ABSOLUTE 20
#
# 20 >> 34 LOAD_GLOBAL 1 (print)
# 36 LOAD_CONST 3 ('start while wa_iter')
# 38 CALL_FUNCTION 1
# 40 POP_TOP
#
# 21 42 LOAD_GLOBAL 2 (iter)
# 44 LOAD_FAST 0 (wa)
# 46 CALL_FUNCTION 1
# 48 STORE_FAST 2 (wa_iter)
#
# 23 >> 50 SETUP_FINALLY 16 (to 68)
#
# 24 52 LOAD_GLOBAL 1 (print)
# 54 LOAD_GLOBAL 3 (next)
# 56 LOAD_FAST 2 (wa_iter)
# 58 CALL_FUNCTION 1
# 60 CALL_FUNCTION 1
# 62 POP_TOP
# 64 POP_BLOCK
# 66 JUMP_ABSOLUTE 50
#
# 25 >> 68 DUP_TOP
# 70 LOAD_GLOBAL 4 (StopIteration)
# 72 JUMP_IF_NOT_EXC_MATCH 114
# 74 POP_TOP
# 76 STORE_FAST 3 (e)
# 78 POP_TOP
# 80 SETUP_FINALLY 24 (to 106)
#
# 26 82 POP_BLOCK
# 84 POP_EXCEPT
# 86 LOAD_CONST 0 (None)
# 88 STORE_FAST 3 (e)
# 90 DELETE_FAST 3 (e)
# 92 JUMP_ABSOLUTE 118
# 94 POP_BLOCK
# 96 POP_EXCEPT
# 98 LOAD_CONST 0 (None)
# 100 STORE_FAST 3 (e)
# 102 DELETE_FAST 3 (e)
# 104 JUMP_ABSOLUTE 50
# >> 106 LOAD_CONST 0 (None)
# 108 STORE_FAST 3 (e)
# 110 DELETE_FAST 3 (e)
# 112 RERAISE
# >> 114 RERAISE
# 116 JUMP_ABSOLUTE 50
# >> 118 LOAD_CONST 0 (None)
# 120 RETURN_VALUE
三、經典的迭代器模式
標準的迭代器需要實現兩個介面方法,一個可以獲取下一個元素的__next__方法和直接返回self的__iter__方法;
迭代器迭代完所有的元素的時候會丟擲StopIteration異常,但是python內建的for、列表推到、元組拆包等會自動處理這個異常;
實現__iter__主要為了方便使用迭代器,這樣就可以最大限度的方便使用迭代器;
迭代器只能迭代一次,如果需要再次迭代就需要再次呼叫iter方法獲取新的迭代器,這就要求每個迭代器維護自己的內部狀態,即一個物件不能既是可迭代物件同時也是迭代器;
從經典的物件導向設計模式來看,可迭代物件可以隨時生成自己關聯的迭代器,而迭代器負責具體的元素的迭代處理;
import re
from dis import dis
class WordAnalyzer:
reg_word = re.compile('\w+')
def __init__(self, text):
self.words = self.__class__.reg_word.findall(text)
def __iter__(self):
return WordAnalyzerIterator(self.words)
class WordAnalyzerIterator:
def __init__(self, words):
self.words = words
self.index = 0
def __iter__(self):
return self;
def __next__(self):
try:
word = self.words[self.index]
except IndexError:
raise StopIteration()
self.index +=1
return word
def iter_word_analyzer():
wa = WordAnalyzer('this is mango word analyzer')
print('start for wa')
for w in wa:
print(w)
print('start while wa_iter')
wa_iter = iter(wa)
while True:
try:
print(next(wa_iter))
except StopIteration as e:
break;
iter_word_analyzer()
# start for wa
# this
# is
# mango
# word
# analyzer
# start while wa_iter
# this
# is
# mango
# word
# analyzer
四、生成器也是迭代器
生成器是呼叫生成器函式生成的,生成器函式是含有yield的工廠函式;
生成器本身就是迭代器,其支援使用next函式遍歷生成器,同時遍歷完也會丟擲StopIteration異常;
生成器執行的時候會在yield語句的地方暫停,並返回yield右邊的表示式的值;
def gen_func():
print('first yield')
yield 'first'
print('second yield')
yield 'second'
print(gen_func)
g = gen_func()
print(g)
for val in g:
print(val)
g = gen_func()
print(next(g))
print(next(g))
print(next(g))
# <function gen_func at 0x7f1198175040>
# <generator object gen_func at 0x7f1197fb6cf0>
# first yield
# first
# second yield
# second
# first yield
# first
# second yield
# second
# StopIteration
我們可以將__iter__作為生成器函式
import re
from dis import dis
class WordAnalyzer:
reg_word = re.compile('\w+')
def __init__(self, text):
self.words = self.__class__.reg_word.findall(text)
def __iter__(self):
for word in self.words:
yield word
def iter_word_analyzer():
wa = WordAnalyzer('this is mango word analyzer')
print('start for wa')
for w in wa:
print(w)
print('start while wa_iter')
wa_iter = iter(wa)
while True:
try:
print(next(wa_iter))
except StopIteration as e:
break;
iter_word_analyzer()
# start for wa
# this
# is
# mango
# word
# analyzer
# start while wa_iter
# this
# is
# mango
# word
# analyzer
五、實現惰性迭代器
迭代器的一大亮點就是通過__next__來實現逐個元素的遍歷,這個大資料容器的遍歷帶來了可能性;
我們以前的實現在初始化的時候,直接呼叫re.findall得到了所有的序列元素,並不是一個很好的實現;我們可以通過re.finditer來在遍歷的時候得到資料;
import re
from dis import dis
class WordAnalyzer:
reg_word = re.compile('\w+')
def __init__(self, text):
# self.words = self.__class__.reg_word.findall(text)
self.text = text
def __iter__(self):
g = self.__class__.reg_word.finditer(self.text)
print(g)
for match in g:
yield match.group()
def iter_word_analyzer():
wa = WordAnalyzer('this is mango word analyzer')
print('start for wa')
for w in wa:
print(w)
print('start while wa_iter')
wa_iter = iter(wa)
wa_iter1= iter(wa)
while True:
try:
print(next(wa_iter))
except StopIteration as e:
break;
iter_word_analyzer()
# start for wa
# <callable_iterator object at 0x7feed103e040>
# this
# is
# mango
# word
# analyzer
# start while wa_iter
# <callable_iterator object at 0x7feed103e040>
# this
# is
# mango
# word
# analyzer
六、使用生成器表示式簡化惰性迭代器
生成器表示式是生成器的宣告性定義,與列表推到的語法類似,只是生成元素是惰性的;
def gen_func():
print('first yield')
yield 'first'
print('second yield')
yield 'second'
l = [x for x in gen_func()]
for x in l:
print(x)
print()
ge = (x for x in gen_func())
print(ge)
for x in ge:
print(x)
# first yield
# second yield
# first
# second
#
# <generator object <genexpr> at 0x7f78ff5dfd60>
# first yield
# first
# second yield
# second
使用生成器表示式實現word analyzer
import re
from dis import dis
class WordAnalyzer:
reg_word = re.compile('\w+')
def __init__(self, text):
# self.words = self.__class__.reg_word.findall(text)
self.text = text
def __iter__(self):
# g = self.__class__.reg_word.finditer(self.text)
# print(g)
# for match in g:
# yield match.group()
ge = (match.group() for match in self.__class__.reg_word.finditer(self.text))
print(ge)
return ge
def iter_word_analyzer():
wa = WordAnalyzer('this is mango word analyzer')
print('start for wa')
for w in wa:
print(w)
print('start while wa_iter')
wa_iter = iter(wa)
while True:
try:
print(next(wa_iter))
except StopIteration as e:
break;
iter_word_analyzer()
# start for wa
# <generator object WordAnalyzer.__iter__.<locals>.<genexpr> at 0x7f4178189200>
# this
# is
# mango
# word
# analyzer
# start while wa_iter
# <generator object WordAnalyzer.__iter__.<locals>.<genexpr> at 0x7f4178189200>
# this
# is
# mango
# word
# analyzer