寫 Python 指令碼經常需要處理大量資料,某些操作一旦沒注意就會耗時巨大,造成「code 一時爽,耗時火葬場」……
import numpy as np
from tqdm import tqdm
import time
小任務:對列表B的每一個元素,尋找與列表A中差值的絕對值最小的index,新增到C中
A = np.random.rand(10000)
B = np.random.rand(10000)
[快速版本] 使用 argmin 直接返回
ss = time.time()
C = [abs(A - b).argmin() for b in tqdm(B)]
ee = time.time()
print('cost {}'.format(ee - ss))
100%|██████████| 10000/10000 [00:00<00:00, 39883.84it/s]
cost 0.2566795349121094
[龜速版本] for-loop 遍歷查詢
ss = time.time()
C = []
for b in tqdm(B):
min_i = -1
min_d = float('inf')
for i, a in enumerate(A):
d = abs(a - b)
if d < min_d:
min_d = d
min_i = i
C.append(min_i)
ee = time.time()
print('cost {}'.format(ee - ss))
100%|██████████| 10000/10000 [00:45<00:00, 217.41it/s]
cost 45.99927496910095
小任務:對列表B的每一個元素,尋找是否為列表A某個元素的屬性,有則把對應的index新增到C中
class Apple(object):
def __init__(self, a):
self.a = a
A = [Apple(a) for a in np.random.randint(1, 10000, 10000)]
B = np.random.randint(1, 10000, 10000)
[快速版本] 預先使用字典做好對應
ss = time.time()
A_keys = {one.a: index for index, one in enumerate(A)}
C = [A_keys[b] for b in tqdm(B) if b in A_keys]
ee = time.time()
print('cost {}'.format(ee - ss))
100%|██████████| 10000/10000 [00:00<00:00, 2108220.16it/s]
cost 0.008636236190795898
[龜速版本] 通過列表的 index 返回
ss = time.time()
C = []
A_keys = [one.a for one in A]
for b in tqdm(B):
if b in A_keys:
C.append(A_keys.index(b))
ee = time.time()
print('cost {}'.format(ee - ss))
100%|██████████| 10000/10000 [00:01<00:00, 5652.18it/s]
cost 1.7722303867340088
本作品採用《CC 協議》,轉載必須註明作者和本文連結