Pytorch實戰入門(一):搭建MLP
前言
環境:
Python 3.7.4
Pytorch 1.3.1
任務:
構建一個多層感知機(Multi-Layer Perceptron,MLP)學習隨機生成的輸入和輸出資料。
ps:MLP好像別名挺多的其實就是全連線網路,有的也叫人工神經網路(Artificial Neural Network,ANN);其中的全連線層(Fully Connected Layers)一般搭網路時取層名為 FC;而 FCN 一般是全卷積網路(Fully Convolutional Networks)
流程:
- 用 numpy 構建 MLP,回顧一下原理等
- 逐步將程式碼轉變為用 pytorch 寫,熟悉一些常用 API 和框架套路
- 最後得到一個與大部分 pytorch 工程類似的程式碼,麻雀雖小五臟俱全
API 查詢: Pytorch官網
若對神經網路基礎不熟悉建議 3Blue1Brown 的深度學習視訊(p1,p2,p3)
1. numpy 構建神經網路
import numpy as np
N = 64 # 訓練資料數量
D_in = 1000 # 訓練資料維度,也是輸入層神經元個數
H = 100 # 隱藏層神經元個數
D_out = 10 # 輸出層神經元個數,也是訓練資料標籤的維度
# 隨機生成訓練資料
x = np.random.randn(N, D_in) # [64, 1000] 64個1000維的輸出
y = np.random.randn(N, D_out) # [64, 10] 64個10維的對應輸出
# 隨機初始化網路的權重(此網路忽略偏置 bias)
w1 = np.random.randn(D_in, H) # [1000, 100] 輸入層和隱藏層間的權重
w2 = np.random.randn(H, D_out) # [100, 10] 隱藏層和輸出層間的權重
learning_rate = 1e-6 # 設定學習率
# 開始訓練網路,迭代500次
for it in range(500):
# Forward Pass 前向傳播
z1 = x.dot(w1) # x*w1, 輸出[64, 100]
a1 = np.maximum(z1, 0) # 啟用層 relu, 小於0取0, 大於0不變
y_pred = a1.dot(w2) # a1*w2, 輸出[64, 10]
# Loss 計算損失
loss = np.square(y_pred - y).sum() # MSE均方誤差損失
print(it, loss)
# Backward Pass 反向傳播
# Gradient 計算梯度, 暫略具體計算公式的推導, 備註中的維度變化可以簡單驗證計算正確性
grad_y_pred = 2.0 * (y_pred - y) # [64,10]
grad_w2 = a1.T.dot(grad_y_pred) # [100,10] = [100,64] * [64,10]
grad_a1 = grad_y_pred.dot(w2.T) # [64,100] = [64,10] * [10,100]
grad_z1 = grad_a1.copy()
grad_z1[z1<0] = 0 # [64,100]
grad_w1 = x.T.dot(grad_z1) # [1000,100] = [1000,64] * [64,100]
# update weights 更新權重
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
輸出:
0 36093135.37125358
1 32398017.92254483
2 30489201.634114698
3 25977952.979760103
......
497 5.757372208142832e-06
498 5.501219493141868e-06
499 5.256529222978903e-06
2. numpy 程式碼轉 pytorch 程式碼
2.1 程式碼框架不變,單純替換api
import torch
N = 64
D_in = 1000
H = 100
D_out = 10
# np.random.randn() 變為 torch.randn()
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
w1 = torch.randn(D_in, H)
w2 = torch.randn(H, D_out)
learning_rate = 1e-6
for it in range(500):
# Forward Pass
z1 = x.mm(w1) # dot() 變為 mm()
a1 = z1.clamp(min=0) # np.maximum() 變為 .clamp(min,max), 將數值夾在設定的 min,max 之間
y_pred = a1.mm(w2)
# Loss
loss = (y_pred - y).pow(2).sum().item()
# np.square變為.pow(2), 計算完的 loss 是一個 tensor, 通過 .item() 獲取數值
print(it, loss)
# Backward Pass
# Gradient
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = a1.t().mm(grad_y_pred) # .T變為.t()
grad_a1 = grad_y_pred.mm(w2.t())
grad_z1 = grad_a1.clone() # .copy()變為.clone()
grad_z1[z1<0] = 0
grad_w1 = x.t().mm(grad_z1)
# update weights
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
輸出:
0 32200542.0
1 31099402.0
2 35729468.0
......
497 0.00011954964429605752
498 0.00011727648234227672
499 0.00011476786312414333
2.2 自動計算梯度
import torch
N = 64
D_in = 1000
H = 100
D_out = 10
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# requires_grad 設為 True 後即可自動求導
# 在1.3.1版本預設為 False
w1 = torch.randn(D_in, H, requires_grad = True)
w2 = torch.randn(H, D_out, requires_grad = True)
learning_rate = 1e-6
for it in range(500):
# Forward Pass
# 自動求導不需要網路中間的輸出, 一行解決前向傳播
y_pred = x.mm(w1).clamp(min=0).mm(w2)
# Loss
loss = (y_pred - y).pow(2).sum()
print(it, loss.item())
# Backward Pass
# Gradient
loss.backward()
# update weights
with torch.no_grad():
w1 -= learning_rate * w1.grad # .grad 即可獲取對應梯度
w2 -= learning_rate * w2.grad
w1.grad.zero_() # 梯度在每次使用前需要清零,不然會不斷累加
w2.grad.zero_()
2.3 構建網路和損失函式
import torch
import torch.nn as nn
N = 64
D_in = 1000
H = 100
D_out = 10
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# 用 nn.Sequential 直接搭建網路
model = nn.Sequential(nn.Linear(D_in, H, bias = False),
nn.ReLU(),
nn.Linear(H, D_out, bias = False))
# 權重初始化
nn.init.normal_(model[0].weight)
nn.init.normal_(model[2].weight)
# 損失函式
loss_fn = nn.MSELoss(reduction = 'sum')
learning_rate = 1e-6
for it in range(500):
# Forward Pass
y_pred = model(x) # 相當於 model.forward(x)
# Loss
loss = loss_fn(y_pred, y)
print(it, loss.item())
# Backward Pass
# Gradient
loss.backward()
# update weights
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad
model.zero_grad()
2.4 自動做梯度下降
import torch
import torch.nn as nn
N = 64
D_in = 1000
H = 100
D_out = 10
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
model = nn.Sequential(nn.Linear(D_in, H, bias = False),
nn.ReLU(),
nn.Linear(H, D_out, bias = False))
# 權重初始化
nn.init.normal_(model[0].weight)
nn.init.normal_(model[2].weight)
# 損失函式
loss_fn = nn.MSELoss(reduction = 'sum')
# optimizer 優化器, 用來做梯度下降
learning_rate = 1e-4 # Adam 通常用 1e-3 到 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
learning_rate = 1e-6 # SGD 通常用 1e-6
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)
for it in range(500):
# Forward Pass
y_pred = model(x)
# Loss
loss = loss_fn(y_pred, y)
print(it, loss.item())
# Backward Pass
# Gradient
loss.backward()
# update weights
optimizer.step()
optimizer.zero_grad()
- 使用 Adam 做優化器,並使用
nn.init.normal_()
初始化權重時的輸出:
0 22112800.0
1 22049420.0
2 21986150.0
......
497 4801638.0
498 4786025.0
499 4770454.0
- 使用 Adam 做優化器,並備註掉兩行初始化權重時的輸出:
0 665.029296875
1 648.2096557617188
2 631.8590087890625
......
497 1.3149491451258655e-07
498 1.2488904133078904e-07
499 1.1852827697111934e-07
- 使用 SGD 做優化器,並使用
nn.init.normal_()
初始化權重時的輸出:
0 30186936.0
1 27895552.0
2 28996608.0
......
497 3.768844544538297e-05
498 3.709576412802562e-05
499 3.662251037894748e-05
- 使用 SGD 做優化器,並備註掉兩行初始化權重時的輸出:
0 663.9910278320312
1 663.4700927734375
2 662.9500122070312
......
497 470.6908264160156
498 470.3971252441406
499 470.1034240722656
可以看出使用不同的優化器時,權重的初始化方式會很大程度影響網路訓練的效果和速度,其中具體原理尚待補充研究。
2.5 一般專案的寫法
import torch
import torch.nn as nn
N = 64
D_in = 1000
H = 100
D_out = 10
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# 建立一個類構件模型,對構件複雜模型有很大好處
class TwoLayerNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
super(TwoLayerNet, self).__init__()
self.linear1 = nn.Linear(D_in, H, bias = False)
self.relu = nn.ReLU()
self.linear2 = nn.Linear(H, D_out, bias = False)
def forward(self, x):
y_pred = self.linear2(self.relu(self.linear1(x)))
return y_pred
# model
model = TwoLayerNet(D_in, H, D_out)
# loss
loss_fn = nn.MSELoss(reduction = 'sum')
# optimizer
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
for it in range(500):
# Forward Pass
y_pred = model(x)
# Loss
loss = loss_fn(y_pred, y)
print(it, loss.item())
# Backward Pass
loss.backward()
# update model parameters
optimizer.step()
optimizer.zero_grad()
相關文章
- Pytorch入門中 —— 搭建網路模型PyTorch模型
- PyTorch入門PyTorch
- webpack 快速入門 系列 —— 實戰一Web
- Pytorch入門-TransformsPyTorchORM
- Pytorch Dataset入門PyTorch
- 微信小程式入門到實戰(一)微信小程式
- Kafka實戰-入門Kafka
- ElasticSearch實戰-入門Elasticsearch
- podman 入門實戰
- 小程式從入門到實戰系列(一)
- MongoDB一篇從入門到實戰MongoDB
- Pytorch入門演練PyTorch
- pytorch入門(七):unsqueezePyTorch
- Pytorch入門下 —— 其他PyTorch
- Pytorch入門-dataloaderPyTorch
- Flutter For Web入門實戰FlutterWeb
- React實戰入門指南React
- phoneGap入門實戰
- Docker 實戰教程之從入門到提高(一)Docker
- 如何入門Pytorch之四:搭建神經網路訓練MNISTPyTorch神經網路
- 入門(一)搭建GAE環境
- 逆向入門分析實戰(二)
- 機器學習入門實戰疑問機器學習
- Locust 從入門到實戰
- Linux入門到實戰Linux
- Gin + GORM 入門到實戰GoORM
- Redis 從入門到實戰Redis
- metaq入門部署到實戰
- Systemd 入門教程:實戰篇
- 【Pytorch教程】迅速入門Pytorch深度學習框架PyTorch深度學習框架
- shiro實戰系列(二)之入門實戰續
- Python網路爬蟲實戰(一)快速入門Python爬蟲
- PyTorch深度學習入門筆記(一)PyTorch環境配置及安裝PyTorch深度學習筆記
- Pytorch DistributedDataParallel(DDP)教程二:快速入門實踐篇PyTorchParallel
- Docker從入門到實戰pdfDocker
- 滲透測試入門實戰
- Gulp4.0入門和實戰
- Docker實戰-從入門到跑路Docker