Pytorch實戰入門(一):搭建MLP

秋山丶雪緒發表於2020-12-05

前言

環境:
  Python 3.7.4
  Pytorch 1.3.1

任務:
  構建一個多層感知機(Multi-Layer Perceptron,MLP)學習隨機生成的輸入和輸出資料。
ps:MLP好像別名挺多的其實就是全連線網路,有的也叫人工神經網路(Artificial Neural Network,ANN);其中的全連線層(Fully Connected Layers)一般搭網路時取層名為 FC;而 FCN 一般是全卷積網路(Fully Convolutional Networks)

流程:

  1. 用 numpy 構建 MLP,回顧一下原理等
  2. 逐步將程式碼轉變為用 pytorch 寫,熟悉一些常用 API 和框架套路
  3. 最後得到一個與大部分 pytorch 工程類似的程式碼,麻雀雖小五臟俱全

API 查詢: Pytorch官網
若對神經網路基礎不熟悉建議 3Blue1Brown 的深度學習視訊(p1p2p3

1. numpy 構建神經網路

import numpy as np

N = 64  # 訓練資料數量
D_in = 1000  # 訓練資料維度,也是輸入層神經元個數
H = 100  # 隱藏層神經元個數
D_out = 10  # 輸出層神經元個數,也是訓練資料標籤的維度

# 隨機生成訓練資料
x = np.random.randn(N, D_in)    # [64, 1000] 64個1000維的輸出
y = np.random.randn(N, D_out)   # [64, 10] 64個10維的對應輸出

# 隨機初始化網路的權重(此網路忽略偏置 bias)
w1 = np.random.randn(D_in, H)   # [1000, 100] 輸入層和隱藏層間的權重
w2 = np.random.randn(H, D_out)  # [100, 10] 隱藏層和輸出層間的權重
learning_rate = 1e-6  # 設定學習率

# 開始訓練網路,迭代500次
for it in range(500):
    # Forward Pass 前向傳播
    z1 = x.dot(w1)              # x*w1, 輸出[64, 100]
    a1 = np.maximum(z1, 0)      # 啟用層 relu, 小於0取0, 大於0不變 
    y_pred = a1.dot(w2)         # a1*w2, 輸出[64, 10]
    # Loss 計算損失
    loss = np.square(y_pred - y).sum()  # MSE均方誤差損失
    print(it, loss)
    
    # Backward Pass 反向傳播
    # Gradient 計算梯度, 暫略具體計算公式的推導, 備註中的維度變化可以簡單驗證計算正確性
    grad_y_pred = 2.0 * (y_pred - y)         # [64,10]
    grad_w2 = a1.T.dot(grad_y_pred)          # [100,10] = [100,64] * [64,10]
    grad_a1 = grad_y_pred.dot(w2.T)          # [64,100] = [64,10] * [10,100]
    grad_z1 = grad_a1.copy()                 
    grad_z1[z1<0] = 0                        # [64,100]
    grad_w1 = x.T.dot(grad_z1)               # [1000,100] = [1000,64] * [64,100]
    # update weights 更新權重
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

輸出:

0 36093135.37125358
1 32398017.92254483
2 30489201.634114698
3 25977952.979760103
......
497 5.757372208142832e-06
498 5.501219493141868e-06
499 5.256529222978903e-06

2. numpy 程式碼轉 pytorch 程式碼

2.1 程式碼框架不變,單純替換api

import torch

N = 64
D_in = 1000
H = 100
D_out = 10

# np.random.randn() 變為 torch.randn()
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
w1 = torch.randn(D_in, H) 
w2 = torch.randn(H, D_out)
learning_rate = 1e-6

for it in range(500):
    # Forward Pass
    z1 = x.mm(w1)         # dot() 變為 mm()
    a1 = z1.clamp(min=0)  # np.maximum() 變為 .clamp(min,max), 將數值夾在設定的 min,max 之間
    y_pred = a1.mm(w2)
    # Loss
    loss = (y_pred - y).pow(2).sum().item()
    # np.square變為.pow(2), 計算完的 loss 是一個 tensor, 通過 .item() 獲取數值
    print(it, loss)
    # Backward Pass
    # Gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = a1.t().mm(grad_y_pred)          # .T變為.t()
    grad_a1 = grad_y_pred.mm(w2.t())
    grad_z1 = grad_a1.clone()                 # .copy()變為.clone()
    grad_z1[z1<0] = 0
    grad_w1 = x.t().mm(grad_z1)
    # update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

輸出:

0 32200542.0
1 31099402.0
2 35729468.0
......
497 0.00011954964429605752
498 0.00011727648234227672
499 0.00011476786312414333

2.2 自動計算梯度

import torch

N = 64
D_in = 1000
H = 100
D_out = 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# requires_grad 設為 True 後即可自動求導
# 在1.3.1版本預設為 False
w1 = torch.randn(D_in, H, requires_grad = True)
w2 = torch.randn(H, D_out, requires_grad = True)
learning_rate = 1e-6

for it in range(500):
    # Forward Pass
    # 自動求導不需要網路中間的輸出, 一行解決前向傳播
    y_pred = x.mm(w1).clamp(min=0).mm(w2)  
    # Loss
    loss = (y_pred - y).pow(2).sum()    
    print(it, loss.item())
    
    # Backward Pass
    # Gradient
    loss.backward()
    # update weights
    with torch.no_grad():
        w1 -= learning_rate * w1.grad  # .grad 即可獲取對應梯度
        w2 -= learning_rate * w2.grad
        w1.grad.zero_()    # 梯度在每次使用前需要清零,不然會不斷累加
        w2.grad.zero_()

2.3 構建網路和損失函式

import torch
import torch.nn as nn

N = 64
D_in = 1000
H = 100
D_out = 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out) 

# 用 nn.Sequential 直接搭建網路
model = nn.Sequential(nn.Linear(D_in, H, bias = False),
                      nn.ReLU(),
                      nn.Linear(H, D_out, bias = False))
# 權重初始化
nn.init.normal_(model[0].weight)
nn.init.normal_(model[2].weight)
# 損失函式
loss_fn = nn.MSELoss(reduction = 'sum')
learning_rate = 1e-6

for it in range(500):
    # Forward Pass
    y_pred = model(x)  # 相當於 model.forward(x)
    # Loss
    loss = loss_fn(y_pred, y)
    print(it, loss.item())
    # Backward Pass
    # Gradient
    loss.backward()
    # update weights
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad
    model.zero_grad()

2.4 自動做梯度下降

import torch
import torch.nn as nn

N = 64
D_in = 1000
H = 100
D_out = 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = nn.Sequential(nn.Linear(D_in, H, bias = False),
                      nn.ReLU(),
                      nn.Linear(H, D_out, bias = False))
# 權重初始化
nn.init.normal_(model[0].weight)
nn.init.normal_(model[2].weight)
# 損失函式
loss_fn = nn.MSELoss(reduction = 'sum')
# optimizer 優化器, 用來做梯度下降
learning_rate = 1e-4    # Adam 通常用 1e-3 到 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
learning_rate = 1e-6    # SGD 通常用 1e-6
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)

for it in range(500):
    # Forward Pass
    y_pred = model(x)
    # Loss
    loss = loss_fn(y_pred, y)
    print(it, loss.item())
    # Backward Pass
    # Gradient
    loss.backward()
    # update weights
    optimizer.step()
    optimizer.zero_grad()
  • 使用 Adam 做優化器,並使用 nn.init.normal_() 初始化權重時的輸出:
0 22112800.0
1 22049420.0
2 21986150.0
......
497 4801638.0
498 4786025.0
499 4770454.0
  • 使用 Adam 做優化器,並備註掉兩行初始化權重時的輸出:
0 665.029296875
1 648.2096557617188
2 631.8590087890625
......
497 1.3149491451258655e-07
498 1.2488904133078904e-07
499 1.1852827697111934e-07
  • 使用 SGD 做優化器,並使用 nn.init.normal_() 初始化權重時的輸出:
0 30186936.0
1 27895552.0
2 28996608.0
......
497 3.768844544538297e-05
498 3.709576412802562e-05
499 3.662251037894748e-05
  • 使用 SGD 做優化器,並備註掉兩行初始化權重時的輸出:
0 663.9910278320312
1 663.4700927734375
2 662.9500122070312
......
497 470.6908264160156
498 470.3971252441406
499 470.1034240722656

  可以看出使用不同的優化器時,權重的初始化方式會很大程度影響網路訓練的效果和速度,其中具體原理尚待補充研究。

2.5 一般專案的寫法

import torch
import torch.nn as nn

N = 64
D_in = 1000
H = 100
D_out = 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# 建立一個類構件模型,對構件複雜模型有很大好處
class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        self.linear1 = nn.Linear(D_in, H, bias = False)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(H, D_out, bias = False)
        
    def forward(self, x):
        y_pred = self.linear2(self.relu(self.linear1(x)))
        return y_pred
# model
model = TwoLayerNet(D_in, H, D_out)
# loss
loss_fn = nn.MSELoss(reduction = 'sum')
# optimizer
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)

for it in range(500):
    # Forward Pass
    y_pred = model(x)
    # Loss
    loss = loss_fn(y_pred, y)
    print(it, loss.item())
    # Backward Pass
    loss.backward()
    # update model parameters
    optimizer.step()
    optimizer.zero_grad()

相關文章