pytorch和tensorflow的愛恨情仇之引數初始化

西西嘛呦發表於2020-10-07

pytorch版本：1.6.0

tensorflow版本：1.15.0

關於引數初始化，主要的就是一些數學中的分佈，比如正態分佈、均勻分佈等等。

1、pytorch

（1）自定義可訓練引數

torch.bernoulli(input, out=None) → Tensor	從伯努利分佈中抽取二進位制隨機數 (0 或 1)
torch.multinomial(input, num_samples, replacement=False, out=None)→ LongTensor	返回一個張量, 其中每一行包含在 input 張量對應行中多項式分佈取樣的 num_samples 索引
torch.normal(means, std, out=None)	返回一個隨機數張量, 隨機數從給定平均值和標準差的離散正態分佈中抽取.
torch.normal(mean=0.0, std, out=None)	功能與上面函式類似, 但所有被抽取的元素共享均值
torch.normal(means, std=1.0, out=None)	功能與上面函式類似, 但所有被抽取的元素共享標準差
torch.rand(sizes, out=None*) → Tensor	在區間 [0,1)中, 返回一個填充了均勻分佈的隨機數的張量.這個張量的形狀由可變引數 sizes 來定義
torch.randn(sizes, out=None*) → Tensor	返回一個從正態分佈中填充隨機數的張量, 其均值為 0 , 方差為 1 .這個張量的形狀被可變引數 sizes 定義
torch.randperm(n, out=None) → LongTensor	返回一個從 0 to n - 1 的整數的隨機排列
In-place random sampling (直接隨機取樣)
torch.Tensor.bernoulli_()	torch.bernoulli() 的 in-place 版本
torch.Tensor.cauchy_()	從柯西分佈中抽取數字
torch.Tensor.exponential_()	從指數分佈中抽取數字
torch.Tensor.geometric_()	從幾何分佈中抽取元素
torch.Tensor.log_normal_()	對數正態分佈中的樣本
torch.Tensor.normal_()	是 torch.normal() 的 in-place 版本
torch.Tensor.random_()	離散均勻分佈中取樣的數字
torch.Tensor.uniform_()	正態分佈中取樣的數字

說明：像這種normal_()最後帶下劃線的是對原始的資料進行操作。

當然還有一些像：torch.zeros()、torch.zeros_()、torch.ones()、torch.ones_()等函式；

以下的例子是使用這些分佈進行的引數初始化：

a = torch.Tensor(3, 3).bernoulli_()

tensor([[1., 1., 1.],
        [0., 1., 0.],
        [0., 1., 0.]])

a = torch.Tensor(3, 3).normal_(0,1)

tensor([[ 0.7777,  0.9153, -0.1495],
        [-0.0533,  1.6500, -1.2531],
        [-0.5321,  0.1954, -1.3835]])

然後我們將其放到torch.tensor()中，並設定可進行梯度計算：

b = torch.tensor(a,requires_grad=True)

E:\anaconda2\envs\python36\lib\site-packages\ipykernel_launcher.py:1: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  """Entry point for launching an IPython kernel.
Out[7]:
tensor([[ 0.7777,  0.9153, -0.1495],
        [-0.0533,  1.6500, -1.2531],
        [-0.5321,  0.1954, -1.3835]], requires_grad=True)

這裡報了以上警告，我們按照提示修改成以下即可：

c = a.clone().detach().requires_grad_(True)

結果是一樣的：

tensor([[ 0.7777,  0.9153, -0.1495],
        [-0.0533,  1.6500, -1.2531],
        [-0.5321,  0.1954, -1.3835]], requires_grad=True)

（2）在網路中初始化層引數

PyTorch 中引數的預設初始化在各個層的 reset_parameters() 方法中。

class Net(nn.Module):
    def __init__(self,input,hidden,classes):
        super(Net, self).__init__()
        self.input = input
        self.hidden = hidden
        self.classes = classes
        
        self.w0 = nn.Parameter(torch.Tensor(self.input,self.hidden))
        self.b0 = nn.Parameter(torch.Tensor(self.hidden))
        self.w1 = nn.Parameter(torch.Tensor(self.hidden,self.classes))
        self.b1 = nn.Parameter(torch.Tensor(self.classes))
        self.reset_parameters()
        
    def reset_parameters(self):
        nn.init.normal_(self.w0)
        nn.init.constant_(self.b0,0)
        nn.init.normal_(self.w1)
        nn.init.constant_(self.b1,0)
        
        
    def forward(self,x):
        out = torch.matmul(x,self.w0)+self.b0
        out = F.relu(out)
        out = torch.matmul(out,self.w1)+self.b1
        return out

nn.Parameter()函式的作用：使用這個函式的目的也是想讓某些變數在學習的過程中不斷的修改其值以達到最優化；

可以使用torch.nn.init()中的初始化方法：

w = torch.empty(2, 3)

# 1. 均勻分佈 - u(a,b)
# torch.nn.init.uniform_(tensor, a=0, b=1)
nn.init.uniform_(w)
# tensor([[ 0.0578,  0.3402,  0.5034],
#         [ 0.7865,  0.7280,  0.6269]])

# 2. 正態分佈 - N(mean, std)
# torch.nn.init.normal_(tensor, mean=0, std=1)
nn.init.normal_(w)
# tensor([[ 0.3326,  0.0171, -0.6745],
#        [ 0.1669,  0.1747,  0.0472]])

# 3. 常數 - 固定值 val
# torch.nn.init.constant_(tensor, val)
nn.init.constant_(w, 0.3)
# tensor([[ 0.3000,  0.3000,  0.3000],
#         [ 0.3000,  0.3000,  0.3000]])

# 4. 對角線為 1，其它為 0
# torch.nn.init.eye_(tensor)
nn.init.eye_(w)
# tensor([[ 1.,  0.,  0.],
#         [ 0.,  1.,  0.]])

# 5. Dirac delta 函式初始化，僅適用於 {3, 4, 5}-維的 torch.Tensor
# torch.nn.init.dirac_(tensor)
w1 = torch.empty(3, 16, 5, 5)
nn.init.dirac_(w1)

# 6. xavier_uniform 初始化
# torch.nn.init.xavier_uniform_(tensor, gain=1)
# From - Understanding the difficulty of training deep feedforward neural networks - Bengio 2010
nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain('relu'))
# tensor([[ 1.3374,  0.7932, -0.0891],
#         [-1.3363, -0.0206, -0.9346]])

# 7. xavier_normal 初始化
# torch.nn.init.xavier_normal_(tensor, gain=1)
nn.init.xavier_normal_(w)
# tensor([[-0.1777,  0.6740,  0.1139],
#         [ 0.3018, -0.2443,  0.6824]])

# 8. kaiming_uniform 初始化
# From - Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - HeKaiming 2015
# torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')
nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')
# tensor([[ 0.6426, -0.9582, -1.1783],
#         [-0.0515, -0.4975,  1.3237]])

# 9. kaiming_normal 初始化
# torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')
nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu')
# tensor([[ 0.2530, -0.4382,  1.5995],
#         [ 0.0544,  1.6392, -2.0752]])

# 10. 正交矩陣 - (semi)orthogonal matrix
# From - Exact solutions to the nonlinear dynamics of learning in deep linear neural networks - Saxe 2013
# torch.nn.init.orthogonal_(tensor, gain=1)
nn.init.orthogonal_(w)
# tensor([[ 0.5786, -0.5642, -0.5890],
#         [-0.7517, -0.0886, -0.6536]])

# 11. 稀疏矩陣 - sparse matrix 
# 非零元素採用正態分佈 N(0, 0.01) 初始化.
# From - Deep learning via Hessian-free optimization - Martens 2010
# torch.nn.init.sparse_(tensor, sparsity, std=0.01)
nn.init.sparse_(w, sparsity=0.1)
# tensor(1.00000e-03 *
#        [[-0.3382,  1.9501, -1.7761],
#         [ 0.0000,  0.0000,  0.0000]])

如果是pytorch中自帶的層的引數，我們可以這麼進行初始化：

for m in model.modules():
    if isinstance(m, (nn.Conv2d, nn.Linear)):
        nn.init.xavier_uniform_(m.weight)

上面這段程式碼的意思是：遍歷模型的每一層，如果是nn.Conv2d和nn.Linear型別，則獲取它的權重引數m.weight進行xavier_uniform初始化，同樣的，可以通過m.bias來獲取偏置項。下面看一下pytorch版本的殘差網路進行引數初始化的程式碼：

for m in self.modules():
    if isinstance(m, nn.Conv2d):
        nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
    elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
        nn.init.constant_(m.weight, 1)
        nn.init.constant_(m.bias, 0)

該程式碼塊是在__ini__中使用的，這裡的self就指代了當前model。

參考：

https://blog.csdn.net/ys1305/article/details/94332007

2、tensorflow

（1）自定義引數初始化

建立一個2*3的矩陣，並讓所有元素的值為0.（型別為tf.float）

a = tf.zeros([2,3], dtype = tf.float32)

建立一個3*4的矩陣，並讓所有元素的值為1.

b = tf.ones([3,4])

建立一個1*10的矩陣，使用2來填充。（型別為tf.int32，可忽略）

c = tf.constant(2, dtype=tf.int32, shape=[1,10])

建立一個1*10的矩陣，其中的元素符合正態分佈，平均值是20，標準偏差是3.

d = tf.random_normal([1,10],mean = 20, stddev = 3)

上面所有的值都可以用來初始化變數。例如用0.01來填充一個1*2的矩陣來初始化一個叫bias的變數。

bias = tf.Variable(tf.zeros([1,2]) + 0.01)

（2）誰用型別__initializer() 進行初始化

初始化常量

import tensorflow as tf
 
value = [0, 1, 2, 3, 4, 5, 6, 7]
init = tf.constant_initializer(value)
 
with tf.Session() as sess:
 
  x = tf.get_variable('x', shape=[8], initializer=init)
  x.initializer.run()
  print(x.eval())
 
#output:
#[ 0.  1.  2.  3.  4.  5.  6.  7.]

tf.zeros_initializer() 和 tf.ones_initializer() 類，分別用來初始化全0和全1的tensor物件。

import tensorflow as tf
 
init_zeros=tf.zeros_initializer()
init_ones = tf.ones_initializer
 
 
with tf.Session() as sess:
 
  x = tf.get_variable('x', shape=[8], initializer=init_zeros)
  y = tf.get_variable('y', shape=[8], initializer=init_ones)
  x.initializer.run()
  y.initializer.run()
  print(x.eval())
  print(y.eval())
 
#output:
# [ 0.  0.  0.  0.  0.  0.  0.  0.]
# [ 1.  1.  1.  1.  1.  1.  1.  1.]

初始化為正態分佈

初始化引數為正太分佈在神經網路中應用的最多，可以初始化為標準正太分佈和截斷正太分佈。

tf中使用 tf.random_normal_initializer() 類來生成一組符合標準正太分佈的tensor。

tf中使用 tf.truncated_normal_initializer() 類來生成一組符合截斷正太分佈的tensor。

mean：正太分佈的均值，預設值0
stddev：正太分佈的標準差，預設值1
seed：隨機數種子，指定seed的值可以每次都生成同樣的資料
dtype：資料型別

import tensorflow as tf
 
init_random = tf.random_normal_initializer(mean=0.0, stddev=1.0, seed=None, dtype=tf.float32)
init_truncated = tf.truncated_normal_initializer(mean=0.0, stddev=1.0, seed=None, dtype=tf.float32)
 
 
with tf.Session() as sess:
 
  x = tf.get_variable('x', shape=[10], initializer=init_random)
  y = tf.get_variable('y', shape=[10], initializer=init_truncated)
  x.initializer.run()
  y.initializer.run()
 
  print(x.eval())
  print(y.eval())
 
 
#output:
# [-0.40236568 -0.35864913 -0.94253045 -0.40153521  0.1552504   1.16989613
#   0.43091929 -0.31410623  0.70080078 -0.9620409 ]
# [ 0.18356581 -0.06860946 -0.55245203  1.08850253 -1.13627422 -0.1006074
#   0.65564936  0.03948414  0.86558545 -0.4964745 ]

初始化為均勻分佈

tf中使用 tf.random_uniform_initializer 類來生成一組符合均勻分佈的tensor。

minval: 最小值
maxval：最大值
seed：隨機數種子
dtype：資料型別

import tensorflow as tf
 
init_uniform = tf.random_uniform_initializer(minval=0, maxval=10, seed=None, dtype=tf.float32)
 
 
with tf.Session() as sess:
 
  x = tf.get_variable('x', shape=[10], initializer=init_uniform)
  x.initializer.run()
 
  print(x.eval())
 
# output:
# [ 6.93343639  9.41196823  5.54009819  1.38017178  1.78720832  5.38881063
#   3.39674473  8.12443542  0.62157512  8.36026382]

其它的一些：

tf.orthogonal_initializer() 初始化為正交矩陣的隨機數，形狀最少需要是二維的

tf.glorot_uniform_initializer() 初始化為與輸入輸出節點數相關的均勻分佈隨機數

tf.glorot_normal_initializer（）初始化為與輸入輸出節點數相關的截斷正太分佈隨機數

在使用時：

with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)

使用以上方式將引數進行初始化。

補充：從兩個方法的名稱上，可以簡單理解一下，Variable是定義變數，而get_variable是獲取變數（只不過如果獲取不到就重新定義一個變數）

具體差異可以參考：https://blog.csdn.net/kevindree/article/details/86936476

參考：

https://blog.csdn.net/dcrmg/article/details/80034075

pytorch和tensorflow的愛恨情仇之定義可訓練的引數
2020-10-06
PyTorch
pytorch和tensorflow的愛恨情仇之基本資料型別
2020-10-02
PyTorch資料型別
pytorch和tensorflow的愛恨情仇之一元線性迴歸例子（keras插足啦）
2020-12-16
PyTorchKeras
setTimeout&Promise&Async之間的愛恨情仇
2018-11-28
Promise
[譯] React 路由和 React 元件的愛恨情仇
2018-12-25
React路由元件
Pytorch之Embedding與Linear的愛恨糾葛
2023-02-13
PyTorch
Charles與Fiddler的愛恨情仇之讓抓包飛起來
2024-04-11
與數論的愛恨情仇--01：判斷大素數的Miller-Rabin
2019-04-15
傳說中圖片防盜鏈的愛恨情仇
2019-04-17
S 鎖與 X 鎖的愛恨情仇《死磕MySQL系列四》
2021-11-02
MySql
帶你瞭解COD與BF的愛恨情仇——歸本溯源（下篇）
2019-07-15
MySQL常見的兩種儲存引擎：MyISAM與InnoDB的愛恨情仇
2018-06-05
MySql儲存引擎
PyTorch常用引數初始化方法詳解
2022-03-08
PyTorch
走進volatile的世界，探索它與可見性，有序性，原子性之間的愛恨情仇！
2024-03-18
直播預告|一鍵觀看關聯網路與團伙欺詐的愛恨情仇
2022-11-08
01_pytorch和tensorflow的區別
2021-04-13
PyTorch
PyTorch和TensorFlow比較 - thegradient
2019-10-11
PyTorch
作業系統和併發的愛恨糾葛
2020-08-07
作業系統
java 執行緒池的初始化引數解釋和引數設定
2018-06-22
Java執行緒
初始化引數遊標之cursor_sharing
2020-07-22
讓人又愛又恨的ESLint
2018-12-10
EsLint
16 初始化引數
2020-03-24
C# Queue與RabbitMQ的愛恨情仇（文末附原始碼）：Q與MQ訊息佇列簡單應用（二）
2019-06-04
C#MQ原始碼佇列
pytorch---之隨機種子初始化
2019-03-04
PyTorch隨機
2.7.7 清除初始化引數的值
2020-03-13
Oracle初始化引數的來源
2019-06-15
Oracle
2.6 指定初始化引數
2020-03-09
2.7.5 SPFILE初始化引數
2020-03-13
細數研究生和導師的那些恩怨情仇
2019-11-01
0607-引數初始化策略
2021-04-25
Tensorflow儲存神經網路引數有妙招：Saver和Restore
2021-09-13
神經網路REST
MySQL讓人又愛又恨的多表查詢
2022-03-03
MySql
讓人又愛又恨的Mysql多表查詢
2022-03-23
MySql
【Go】我與sync.Once的愛恨糾纏
2021-01-01
Go
神經網路中的降維和升維方法 (tensorflow & pytorch)
2021-01-09
神經網路PyTorch
2.6.9.1 關於 COMPATIBLE初始化引數
2020-03-09
2.6.8.2 UNDO_TABLESPACE 初始化引數
2020-03-09
2.6.8.1 UNDO_MANAGEMENT 初始化引數
2020-03-09

pytorch和tensorflow的愛恨情仇之引數初始化

相關文章