python - function list generator

michael_wq發表於2020-11-11

*args

傳遞多個變數進來

**kwargs

傳遞個字典過來

def func(**kwargs):
	for key, value in kwargs.items():
		print(key + ':' + value)

lambda

map(func, seq) 會遍歷所有items在seq中

⚠️:要使用list(xxxxx)來讀取資料
在這裡插入圖片描述

filter(func, seq)

表達的是一個判斷,返回為True的原數值

list(map(lambda x : x % 2, range(10)))
# => [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
list(filter(lambda x : x % 2, range(10)))
# => [1, 3, 5, 7, 9]

iter() + next()

e.g 每一次呼叫next都會讀一個資料

# Create a list of strings: flash
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# Create an iterator for flash: superhero
superhero = iter(flash)

# Print each item from the iterator
print(next(superhero)) # jay garrick
print(next(superhero)) # barry allen
print(next(superhero)) # wally west
print(next(superhero)) # bart allen

Q: 要做一個這樣的tuple list怎麼搞?
[(0, ‘a’), (1, ‘b’), (2, ‘c’), (3, ‘dd’)]

[(100, ‘a’), (101, ‘b’), (102, ‘c’), (103, ‘dd’)]

a = ['a', 'b', 'c', 'dd']
enu_a = enumerate(a)
print(type(enu_a))
#<class 'enumerate'>

list_enu_a = list(enu_a)
print(list_enu_a)
#[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'dd')]

list(enumerate(a, start=100))
# [(100, 'a'), (101, 'b'), (102, 'c'), (103, 'dd')]

Q: 如何去遍歷讀取enumerate裡頭的值?

# Unpack and print the tuple pairs
for index1, value1 in enumerate(enu_a):
    print(index1, value1)

# Change the start index
for index2, value2 in enumerate(enu_a, start=1):
    print(index2, value2)

Q: zip()有啥用?

a = ['a', 'b', 'c', 'dd']
b = ['q', 'w', 'e', 'rr']
c = ['a', 's', 'd', 'ff']
list(zip(a, b, c))
# [('a', 'q', 'a'), ('b', 'w', 's'), ('c', 'e', 'd'), ('dd', 'rr', 'ff')]

Q: 情景介紹:Processing large amounts of Twitter data
Sometimes, the data we have to process reaches a size that is too much for a computer’s memory to handle. This is a common problem faced by data scientists. A solution to this is to process an entire data source chunk by chunk, instead of a single go all at once.

使用chunksize批量處理

# Initialize an empty dictionary: counts_dict
counts_dict ={}

# Iterate over the file chunk by chunk
for chunk in pd.read_csv('tweets.csv', chunksize=10):

    # Iterate over the column in DataFrame
    for entry in chunk['lang']:
        if entry in counts_dict.keys():
            counts_dict[entry] += 1
        else:
            counts_dict[entry] = 1

# Print the populated dictionary
print(counts_dict)

Q:這個matrix怎麼用[…]一行寫出來?
在這裡插入圖片描述

matrix = [[col for col in range(5)] for row in range(5)]

or

matrix = [[col for col in range(5)]] * 5

generator function

應用:list創造出來的資料如果很大的話非常佔記憶體,這時候可以換一種方法處理,就是用generator,思想是逐步計算出來,而不是一次性全算出來(記憶體不夠用)。其中yield就相當於return,具體看這篇文章。next()與其連用讀取資料
e.g.

# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Create a generator object: lengths
lengths = (len(person) for person in lannister)

# Iterate over and print the values in lengths
for value in lengths:
    print(value)

相當於

# Create a list of strings
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']
# Define generator function get_lengths
def get_lengths(input_list):
    """Generator function that yields the
    length of the strings in input_list."""
    # Yield the length of a string
    for person in input_list:
        yield len(person)

# Print the values generated by get_lengths()
for value in get_lengths(lannister):
    print(value)

相關文章