pytorch-torch.nn介紹

金字塔下的蜗牛發表於2024-09-14

目的:

介紹PyTorch中model.modules(), model.named_modules(), model.children(), model.named_children(), model.parameters(), model.named_parameters(), model.state_dict()這些model例項方法的返回值。

例子:

import torch
import torch.nn as nn

class Net(nn.Module):

def __init__(self, num_class=10):
    super().__init__()
    self.features = nn.Sequential(
        nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3),
        nn.BatchNorm2d(6),
        nn.ReLU(inplace=True),
        nn.MaxPool2d(kernel_size=2, stride=2),
        nn.Conv2d(in_channels=6, out_channels=9, kernel_size=3),
        nn.BatchNorm2d(9),
        nn.ReLU(inplace=True),
        nn.MaxPool2d(kernel_size=2, stride=2)
    )

    self.classifier = nn.Sequential(
        nn.Linear(9*8*8, 128),
        nn.ReLU(inplace=True),
        nn.Dropout(),
        nn.Linear(128, num_class)
    )

def forward(self, x):
    output = self.features(x)
    output = output.view(output.size()[0], -1)
    output = self.classifier(output)

    return output

model = Net()
如上程式碼定義了一個由兩層卷積層,兩層全連線層組成的網路模型。值得注意的是,這個Net由外到內有3個層次:

Net:

----features:

------------Conv2d
------------BatchNorm2d
------------ReLU
------------MaxPool2d
------------Conv2d
------------BatchNorm2d
------------ReLU
------------MaxPool2d

----classifier:

------------Linear
------------ReLU
------------Dropout
------------Linear
網路Net本身是一個nn.Module的子類,它又包含了features和classifier兩個由Sequential容器組成的nn.Module子類,features和classifier各自又包含眾多的網路層,它們都屬於nn.Module子類,所以從外到內共有3個層次。
下面我們來看這幾個例項方法的返回值都是什麼。

In [7]: model.named_modules()
Out[7]: <generator object Module.named_modules at 0x7f5db88f3840>

In [8]: model.modules()
Out[8]: <generator object Module.modules at 0x7f5db3f53c00>

In [9]: model.children()
Out[9]: <generator object Module.children at 0x7f5db3f53408>

In [10]: model.named_children()
Out[10]: <generator object Module.named_children at 0x7f5db80305e8>

In [11]: model.parameters()
Out[11]: <generator object Module.parameters at 0x7f5db3f534f8>

In [12]: model.named_parameters()
Out[12]: <generator object Module.named_parameters at 0x7f5d42da7570>

In [13]: model.state_dict()
Out[13]:
OrderedDict([('features.0.weight', tensor([[[[ 0.1200, -0.1627, -0.0841],
[-0.1369, -0.1525, 0.0541],
[ 0.1203, 0.0564, 0.0908]],
……

可以看出,除了model.state_dict()返回的是一個字典,其他幾個方法返回值都顯示的是一個生成器,是一個可迭代變數,我們透過列表推導式用for迴圈將返回值取出來進一步進行觀察:

In [14]: model_modules = [x for x in model.modules()]

In [15]: model_named_modules = [x for x in model.named_modules()]

In [16]: model_children = [x for x in model.children()]

In [17]: model_named_children = [x for x in model.named_children()]

In [18]: model_parameters = [x for x in model.parameters()]

In [19]: model_named_parameters = [x for x in model.named_parameters()]

  1. model.modules()

model.modules()迭代遍歷模型的所有子層,所有子層即指nn.Module子類,在本文的例子中,Net(), features(), classifier(),以及http://nn.xxx構成的卷積,池化,ReLU, Linear, BN, Dropout等都是nn.Module子類,也就是model.modules()會迭代的遍歷它們所有物件。我們看一下列表model_modules:

In [20]: model_modules
Out[20]:
[Net(
(features): Sequential(
(0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
(1): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(4): Conv2d(6, 9, kernel_size=(3, 3), stride=(1, 1))
(5): BatchNorm2d(9, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): ReLU(inplace)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Linear(in_features=576, out_features=128, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.5)
(3): Linear(in_features=128, out_features=10, bias=True)
)
),
Sequential(
(0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
(1): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(4): Conv2d(6, 9, kernel_size=(3, 3), stride=(1, 1))
(5): BatchNorm2d(9, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): ReLU(inplace)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
),
Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1)),
BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
ReLU(inplace),
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
Conv2d(6, 9, kernel_size=(3, 3), stride=(1, 1)),
BatchNorm2d(9, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
ReLU(inplace),
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
Sequential(
(0): Linear(in_features=576, out_features=128, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.5)
(3): Linear(in_features=128, out_features=10, bias=True)
),
Linear(in_features=576, out_features=128, bias=True),
ReLU(inplace),
Dropout(p=0.5),
Linear(in_features=128, out_features=10, bias=True)]

In [21]: len(model_modules)
Out[21]: 15
可以看出,model_modules列表中共有15個元素,首先是整個Net,然後遍歷了Net下的features子層,進一步遍歷了feature下的所有層,然後又遍歷了classifier子層以及其下的所有層。所以說model.modules()能夠迭代地遍歷模型的所有子層。

  1. model.named_modules()

顧名思義,它就是有名字的model.modules()。model.named_modules()不但返回模型的所有子層,還會返回這些層的名字:

In [28]: len(model_named_modules)
Out[28]: 15

In [29]: model_named_modules
Out[29]:
[('', Net(
(features): Sequential(
(0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
(1): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(4): Conv2d(6, 9, kernel_size=(3, 3), stride=(1, 1))
(5): BatchNorm2d(9, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): ReLU(inplace)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Linear(in_features=576, out_features=128, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.5)
(3): Linear(in_features=128, out_features=10, bias=True)
)
)),
('features', Sequential(
(0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
(1): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(4): Conv2d(6, 9, kernel_size=(3, 3), stride=(1, 1))
(5): BatchNorm2d(9, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): ReLU(inplace)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)),
('features.0', Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))),
('features.1', BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), ('features.2', ReLU(inplace)),
('features.3', MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)),
('features.4', Conv2d(6, 9, kernel_size=(3, 3), stride=(1, 1))),
('features.5', BatchNorm2d(9, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), ('features.6', ReLU(inplace)),
('features.7', MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)),
('classifier',
Sequential(
(0): Linear(in_features=576, out_features=128, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.5)
(3): Linear(in_features=128, out_features=10, bias=True)
)),
('classifier.0', Linear(in_features=576, out_features=128, bias=True)),
('classifier.1', ReLU(inplace)),
('classifier.2', Dropout(p=0.5)),
('classifier.3', Linear(in_features=128, out_features=10, bias=True))]
可以看出,model.named_modules()也遍歷了15個元素,但每個元素都有了自己的名字,從名字可以看出,除了在模型定義時有命名的features和classifier,其它層的名字都是PyTorch內部按一定規則自動命名的。返回層以及層的名字的好處是可以按名字透過迭代的方法修改特定的層,如果在模型定義的時候就給每個層起了名字,比如卷積層都是conv1,conv2...的形式,那麼我們可以這樣處理:

for name, layer in model.named_modules():
if 'conv' in name:
對layer進行處理
當然,在沒有返回名字的情形中,採用isinstance()函式也可以完成上述操作:

for layer in model.modules():
if isinstance(layer, nn.Conv2d):
對layer進行處理
3. model.children()

如果把這個網路模型Net按層次從外到內進行劃分的話,features和classifier是Net的子層,而conv2d, ReLU, BatchNorm, Maxpool2d這些有時features的子層, Linear, Dropout, ReLU等是classifier的子層,上面的model.modules()不但會遍歷模型的子層,還會遍歷子層的子層,以及所有子層。
而model.children()只會遍歷模型的子層,這裡即是features和classifier。

In [22]: len(model_children)
Out[22]: 2

In [22]: model_children
Out[22]:
[Sequential(
(0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
(1): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(4): Conv2d(6, 9, kernel_size=(3, 3), stride=(1, 1))
(5): BatchNorm2d(9, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): ReLU(inplace)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
),
Sequential(
(0): Linear(in_features=576, out_features=128, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.5)
(3): Linear(in_features=128, out_features=10, bias=True)
)]
可以看出,它只遍歷了兩個元素,即features和classifier。

  1. model.named_children()
    model.named_children()就是帶名字的model.children(), 相比model.children(), model.named_children()不但迭代的遍歷模型的子層,還會返回子層的名字:

In [23]: len(model_named_children)
Out[23]: 2

In [24]: model_named_children
Out[24]:
[('features', Sequential(
(0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
(1): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(4): Conv2d(6, 9, kernel_size=(3, 3), stride=(1, 1))
(5): BatchNorm2d(9, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): ReLU(inplace)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)),
('classifier', Sequential(
(0): Linear(in_features=576, out_features=128, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.5)
(3): Linear(in_features=128, out_features=10, bias=True)
))]
對比上面的model.children(), 這裡的model.named_children()還返回了兩個子層的名稱:features 和 classifier .

  1. model.parameters()
    迭代地返回模型的所有引數。

In [30]: len(model_parameters)
Out[30]: 12

In [31]: model_parameters
Out[31]:
[Parameter containing:
tensor([[[[ 0.1200, -0.1627, -0.0841],
[-0.1369, -0.1525, 0.0541],
[ 0.1203, 0.0564, 0.0908]],
……
[[-0.1587, 0.0735, -0.0066],
[ 0.0210, 0.0257, -0.0838],
[-0.1797, 0.0675, 0.1282]]]], requires_grad=True),
Parameter containing:
tensor([-0.1251, 0.1673, 0.1241, -0.1876, 0.0683, 0.0346],
requires_grad=True),
Parameter containing:
tensor([0.0072, 0.0272, 0.8620, 0.0633, 0.9411, 0.2971], requires_grad=True),
Parameter containing:
tensor([0., 0., 0., 0., 0., 0.], requires_grad=True),
Parameter containing:
tensor([[[[ 0.0632, -0.1078, -0.0800],
[-0.0488, 0.0167, 0.0473],
[-0.0743, 0.0469, -0.1214]],
……
[[-0.1067, -0.0851, 0.0498],
[-0.0695, 0.0380, -0.0289],
[-0.0700, 0.0969, -0.0557]]]], requires_grad=True),
Parameter containing:
tensor([-0.0608, 0.0154, 0.0231, 0.0886, -0.0577, 0.0658, -0.1135, -0.0221,
0.0991], requires_grad=True),
Parameter containing:
tensor([0.2514, 0.1924, 0.9139, 0.8075, 0.6851, 0.4522, 0.5963, 0.8135, 0.4010],
requires_grad=True),
Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True),
Parameter containing:
tensor([[ 0.0223, 0.0079, -0.0332, ..., -0.0394, 0.0291, 0.0068],
[ 0.0037, -0.0079, 0.0011, ..., -0.0277, -0.0273, 0.0009],
[ 0.0150, -0.0110, 0.0319, ..., -0.0110, -0.0072, -0.0333],
...,
[-0.0274, -0.0296, -0.0156, ..., 0.0359, -0.0303, -0.0114],
[ 0.0222, 0.0243, -0.0115, ..., 0.0369, -0.0347, 0.0291],
[ 0.0045, 0.0156, 0.0281, ..., -0.0348, -0.0370, -0.0152]],
requires_grad=True),
Parameter containing:
tensor([ 0.0072, -0.0399, -0.0138, 0.0062, -0.0099, -0.0006, -0.0142, -0.0337,
……
-0.0370, -0.0121, -0.0348, -0.0200, -0.0285, 0.0367, 0.0050, -0.0166],
requires_grad=True),
Parameter containing:
tensor([[-0.0130, 0.0301, 0.0721, ..., -0.0634, 0.0325, -0.0830],
[-0.0086, -0.0374, -0.0281, ..., -0.0543, 0.0105, 0.0822],
[-0.0305, 0.0047, -0.0090, ..., 0.0370, -0.0187, 0.0824],
...,
[ 0.0529, -0.0236, 0.0219, ..., 0.0250, 0.0620, -0.0446],
[ 0.0077, -0.0576, 0.0600, ..., -0.0412, -0.0290, 0.0103],
[ 0.0375, -0.0147, 0.0622, ..., 0.0350, 0.0179, 0.0667]],
requires_grad=True),
Parameter containing:
tensor([-0.0709, -0.0675, -0.0492, 0.0694, 0.0390, -0.0861, -0.0427, -0.0638,
-0.0123, 0.0845], requires_grad=True)]
6. model.named_parameters()
如果你是從前面看過來的,就會知道,這裡就是迭代的返回帶有名字的引數,會給每個引數加上帶有 .weight或 .bias的名字以區分權重和偏置:

In [32]: len(model_named_parameters)
Out[32]: 12

In [33]: model_named_parameters
Out[33]:
[('features.0.weight', Parameter containing:
tensor([[[[ 0.1200, -0.1627, -0.0841],
[-0.1369, -0.1525, 0.0541],
[ 0.1203, 0.0564, 0.0908]],
……
[[-0.1587, 0.0735, -0.0066],
[ 0.0210, 0.0257, -0.0838],
[-0.1797, 0.0675, 0.1282]]]], requires_grad=True)),
('features.0.bias', Parameter containing:
tensor([-0.1251, 0.1673, 0.1241, -0.1876, 0.0683, 0.0346],
requires_grad=True)),
('features.1.weight', Parameter containing:
tensor([0.0072, 0.0272, 0.8620, 0.0633, 0.9411, 0.2971], requires_grad=True)),
('features.1.bias', Parameter containing:
tensor([0., 0., 0., 0., 0., 0.], requires_grad=True)),
('features.4.weight', Parameter containing:
tensor([[[[ 0.0632, -0.1078, -0.0800],
[-0.0488, 0.0167, 0.0473],
[-0.0743, 0.0469, -0.1214]],
……
[[-0.1067, -0.0851, 0.0498],
[-0.0695, 0.0380, -0.0289],
[-0.0700, 0.0969, -0.0557]]]], requires_grad=True)),
('features.4.bias', Parameter containing:
tensor([-0.0608, 0.0154, 0.0231, 0.0886, -0.0577, 0.0658, -0.1135, -0.0221,
0.0991], requires_grad=True)),
('features.5.weight', Parameter containing:
tensor([0.2514, 0.1924, 0.9139, 0.8075, 0.6851, 0.4522, 0.5963, 0.8135, 0.4010],
requires_grad=True)),
('features.5.bias', Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)),
('classifier.0.weight', Parameter containing:
tensor([[ 0.0223, 0.0079, -0.0332, ..., -0.0394, 0.0291, 0.0068],
……
[ 0.0045, 0.0156, 0.0281, ..., -0.0348, -0.0370, -0.0152]],
requires_grad=True)),
('classifier.0.bias', Parameter containing:
tensor([ 0.0072, -0.0399, -0.0138, 0.0062, -0.0099, -0.0006, -0.0142, -0.0337,
……
-0.0370, -0.0121, -0.0348, -0.0200, -0.0285, 0.0367, 0.0050, -0.0166],
requires_grad=True)),
('classifier.3.weight', Parameter containing:
tensor([[-0.0130, 0.0301, 0.0721, ..., -0.0634, 0.0325, -0.0830],
[-0.0086, -0.0374, -0.0281, ..., -0.0543, 0.0105, 0.0822],
[-0.0305, 0.0047, -0.0090, ..., 0.0370, -0.0187, 0.0824],
...,
[ 0.0529, -0.0236, 0.0219, ..., 0.0250, 0.0620, -0.0446],
[ 0.0077, -0.0576, 0.0600, ..., -0.0412, -0.0290, 0.0103],
[ 0.0375, -0.0147, 0.0622, ..., 0.0350, 0.0179, 0.0667]],
requires_grad=True)),
('classifier.3.bias', Parameter containing:
tensor([-0.0709, -0.0675, -0.0492, 0.0694, 0.0390, -0.0861, -0.0427, -0.0638,
-0.0123, 0.0845], requires_grad=True))]

torch介紹

torch.nn

Parameters

Parameter繼承自torch.tesor。其和tensor的區別如下:

  1. torch.tensorrequires_grad屬性預設為False, 但是parametersrequires_grad屬性預設為True
  2. Parameter()可以理解為型別轉換函式。將一個不可訓練型別的tensor轉化為可以訓練的parameter。當定義網路的時候,使用Parameter賦值給module的引數的時候,該引數會自動的被加入到module的引數列表中(會自動的出現在parameters()的迭代器中),如果使用torch.tesor來定義網路引數的時候(requires_grad=True可以將tensor變成可訓練的),則不會自動加入
  3. parameter有兩個引數:nn.Parameter(data=None, requires_grad=True)
    • data 輸入tensor
    • requires_grad = True : 預設為True
  4. 引數訪問一般是透過如下,torch.Mdule定義的兩個方法和一個屬性
    1. .parameters() : 返回一個包含模型所有引數的迭代器
    2. .named_parameters() :返回包含引數名稱和引數的迭代器
    3. ._parameters : torch.Module屬性,包含模型所有引數
import torch.nn.functional as F
import os
import numpy as np
import math
import random 
from torch.utils import data 

class LR1(nn.Module):
    def __init__(self, dim):
        super(LR1, self).__init__()
        self.weight = torch.randn(dim, requires_grad=True)
        self.bias = torch.randn(1, requires_grad=True)

    def forward(self, X):
        out = torch.matmul(X,self.weight) + self.bias
        return 1 / (torch.exp(out)+1)

class LR2(nn.Module):
    def __init__(self, dim):
        super(LR2, self).__init__()
        self.weight = torch.nn.Parameter(torch.randn(dim))
        self.bias = torch.nn.Parameter(torch.randn(1))

    def forward(self, X):
        out = torch.matmul(X,self.weight) + self.bias
        return 1 / (torch.exp(out)+1)

net1 = LR1(3)
net2 = LR2(3)
print(net1._parameters)  # OrderedDict()
rint(net2._parameters)   # 輸出weight和bias

# 讀取模型引數
for param in net1.parameters():  # 輸出None
    print('net1: param: ',param)  
for param in net2.parameters(): 
    print('net2:param:', param)
# 輸出
net2:param: Parameter containing:
tensor([-0.1286,  1.6369,  1.8213], requires_grad=True)
net2:param: Parameter containing:
tensor([-0.8231], requires_grad=True)

# 輸出引數名字和具體引數
for name,param in net2.named_parameters():
    print('name={}'.format(name), param)
# 輸出
name=weight Parameter containing:
tensor([-0.1286,  1.6369,  1.8213], requires_grad=True)
name=bias Parameter containing:
tensor([-0.8231], requires_grad=True)

Containers(容器)

nn.Module

任何一個layer、block、netword都是nn.Module的子類。該基類很多重要的方法和屬性,分別如下所示。

1. modules相關方法和屬性

屬性

  • self._modules: 返回值結構:Dict[str, Optional['Module']]
    • 字典型別,作用:用來儲存定義網路的children模組。所謂children模組,是直接基於nn.Modules定義的模組,只針對一階派生,即兒子,不會對兒子再進行深度遍歷

第一類方法

  • children(): 返回當前網路中,所有children模組的迭代器。只返回“兒子”一層
  • modules():返回所有模組的迭代器。將整個模型的所有構成(包含layer,自定義層,block)等由淺入深的依次遍歷出來。
    • 作用:某一些行為需要獲取模型的各個最小layer,比如初始化,模型載入引數等
  • named_modules(): 也是對網路中定義的模組進行深度優先遍歷,返回所有模組。其和modules基本等價,不同的地方在於,其返回的是一個元祖,第一個是名稱,第二個是物件。
  • get_submodule(target):返回特定的模組,注意target是子模組的名字,如果沒有命名,那麼就從‘0’開始

4個函式原始碼如下:

def children(self) -> Iterator['Module']:  # 
       """Return an iterator over immediate children modules."""
        for name, module in self.named_children():
            yield module
def named_children(self) -> Iterator[Tuple[str, 'Module']]: 
        memo = set()
        for name, module in self._modules.items():  # 從self._modules字典中獲取layer和相對應的名字
            if module is not None and module not in memo:
                memo.add(module)
                yield name, module
def modules(self) -> Iterator['Module']:
         for _, module in self.named_modules(): # 呼叫的是named_modules()
            yield module
def named_modules(self, memo: Optional[Set['Module']] = None, prefix: str = '', remove_duplicate: bool = True):
        if memo is None:
            memo = set()
        if self not in memo:
            if remove_duplicate:
                memo.add(self)
            yield prefix, self
            for name, module in self._modules.items():
                if module is None:
                    continue
                submodule_prefix = prefix + ('.' if prefix else '') + name
                yield from module.named_modules(memo, submodule_prefix, remove_duplicate)   # 遞迴呼叫了 

案例1: self._modules

net = nn.Sequential(nn.Linear(2,2),nn.Linear(2,1))
print(net._modules)   # 1
print(net._modules['0']) # 2

# 1號輸出
OrderedDict({'0': Linear(in_features=2, out_features=2, bias=True), '1': Linear(in_features=2, out_features=1, bias=True)})
# 2號輸出
Linear(in_features=2, out_features=2, bias=True)

案例2. modulesnamed_modules的區別

print(net.get_submodule('1'))
# 輸出:Linear(in_features=2, out_features=1, bias=True)

for idx,module in enumerate(net.modules()):
    print(idx,'-->',module)
# 輸出
0 --> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=1, bias=True)
)
1 --> Linear(in_features=2, out_features=2, bias=True)
2 --> Linear(in_features=2, out_features=1, bias=True)

for idx,module in enumerate(net.children()):
    print(idx,'-->',module)
# 輸出
Linear(in_features=2, out_features=2, bias=True)
0 --> Linear(in_features=2, out_features=2, bias=True)
1 --> Linear(in_features=2, out_features=1, bias=True)

第二類方法

  • add_module(name, module): 增加模組。由如下原始碼可以看出,和 register_module完全等價。
def add_module(self, name: str, module: Optional['Module']) -> None:
    ..... # 其他原始碼沒啥用,刪除
    self._modules[name] = module
def register_module(self, name: str, module: Optional['Module']) -> None:
        r"""Alias for :func:`add_module`."""
        self.add_module(name, module)

2. parameters

1. apply

函式體定義在基類torch.Module中,遞迴的呼叫slef.children()將函式f應用於網路的子模組。其和pandas中的apply作用類似。

功能:遞迴的將函式f應用到網路的各個子模組。常用於引數初始化

def init_weight(m):
    if type(m) == nn.Linear:
        m.weight.data.fill_(1.0)
        m.bias.data.fill_(0.0)
        
model = nn.Sequential(nn.Linear(2,2), nn.Linear(2,1))
model.apply(init_weight)
for param in model.parameters():
    print(param)
# 輸出
Parameter containing:
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
Parameter containing:
tensor([0., 0.], requires_grad=True)
Parameter containing:
tensor([[1., 1.]], requires_grad=True)
Parameter containing:

````add_module(name, module)`

eval()

forward

load_stat_dict(state_dict)

modules()

返回一個包含打給你錢模型所有模組的迭代器

parameters()

返回一個包含模型所有引數的迭代器

register_buffer(name, tensor)

給模型新增一個persistent buffer

使用場景:在一些場景中,我們需要儲存一些模型引數,但是這些引數不是模型的最佳化

屬性變數:training

apply函式: 將所有子模組傳給函式f進行處理。

buffer模型快取

  • buffers()
  • get_buffer(target)
  • named_buffers
  • register_buffer(name, tensor, persistent=True)

parameters相關方法:

  • parameters()
  • get_parameter()
  • named_parameters()
  • register_parameter(name, param)

模型相關方法

  • modules
  • get_submodue
  • register_module
  • named_modules

https://www.cnblogs.com/dan-baishucaizi/p/16082456.html#21-module

https://www.cnblogs.com/ArdenWang/p/16104956.html