🥥 Table of Content
Resource 1: 最佳化器|SGD|Momentum|Adagrad|RMSProp|Adam - Bilibili
Resource 2: AdamW and Adam with weight decay
Resource 3: 非凸函式上,隨機梯度下降能否收斂?網友熱議:能,但有條件,且比凸函式收斂更難 - 機器之心
- Gradient Descent
- SGD
- SGD with Momentum
- Adagrad
- RMSProp
- Adam
- AdamW
🥑 Get Started!
Gradient Descent
SGD
import torch
from torch.optim import SGD
optimizer = SGD(model.parameters(), lr=0.01)
SGD with Momentum
import torch
from torch.optim import SGD
optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)
Adagrad
RMSProp
Adam
AdamW