Pytorch - Dataloader

kingchou007發表於2024-04-01

Basically the DataLoader works with the Dataset object. So to use the DataLoader you need to get your data into this Dataset wrapper. To do this you only need to implement two magic methods: __getitem__ and __len__. The __getitem__ takes an index and returns a tuple of (x, y) pair. The __len__ is just your usual length that returns the size of the data. And that’s that. [1]

Dataloader如何讀取資料

import torch

# Define some sample data
X = torch.randn(5,3)  # input
y = torch.randn(5,3)  # labe

print(X,y)

我們的資料如下:

tensor([[-0.5138, -1.7766, -0.6183],
        [ 0.2235,  0.1974,  0.2892],
        [ 1.6249, -0.5768, -1.5081],
        [ 0.5972, -0.1788,  0.7579],
        [ 1.3844, -0.5480, -1.5612]]) 
tensor([[-0.5818,  0.1668,  0.5073],
        [-1.7707, -0.2907,  1.4918],
        [ 1.2157, -2.8250, -0.0247],
        [ 0.2748,  0.1086,  1.6052],
        [-0.7613, -1.3326, -0.5267]])

然後我們從dataloader讀取。

# batch_size = 1, 這意味著只能一次只能讀取一個資料
# shuffle = True, 在每個訓練週期(epoch)開始時,資料集中的資料將被隨機打亂
dataloader = DataLoader(dataset, batch_size=1, shuffle=False)

for i, (batch_x, batch_y) in enumerate(dataloader):
    print(f"Batch {i}: input shape {batch_x}, \n label shape {batch_y}")

我們可以得到:

Batch 0: input shape tensor([[-0.5138, -1.7766, -0.6183]]), label shape tensor([[-0.5818,  0.1668,  0.5073]])
Batch 1: input shape tensor([[ 0.5972, -0.1788,  0.7579]]), label shape tensor([[0.2748, 0.1086, 1.6052]])
Batch 2: input shape tensor([[ 1.6249, -0.5768, -1.5081]]), label shape tensor([[ 1.2157, -2.8250, -0.0247]])
Batch 3: input shape tensor([[ 1.3844, -0.5480, -1.5612]]), label shape tensor([[-0.7613, -1.3326, -0.5267]])
Batch 4: input shape tensor([[0.2235, 0.1974, 0.2892]]), label shape tensor([[-1.7707, -0.2907,  1.4918]])

從batch size拿出來的輸入的順序和放進去的順序是一樣的嗎?
answer: 所以這個問題被回答了,如果shuffle = true, 那就不是,因為資料會被隨機打亂。否則就是相同的順序。

相關文章