OctConv:八度卷積復現

華為雲開發者聯盟發表於2023-04-12
摘要:不同於傳統的卷積,八度卷積主要針對影像的高頻訊號與低頻訊號。

本文分享自華為雲社群《OctConv:八度卷積復現》,作者:李長安 。

論文解讀

八度卷積於2019年在論文《Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convol》提出,在當時引起了不小的反響。八度卷積對傳統的convolution進行改進,以降低空間冗餘。其中“Drop an Octave”指降低八個音階,代表頻率減半。

不同於傳統的卷積,八度卷積主要針對影像的高頻訊號與低頻訊號。首先,我們回憶一下數字影像處理中的高頻訊號與低頻訊號的概念。影像中的低頻訊號和高頻訊號也叫做低頻分量和高頻分量。影像中的高頻分量,指的是影像強度(亮度/灰度)變化劇烈的畫素點,例如邊緣(輪廓)、影像的細節處、噪聲(即噪點)(該點之所以稱為噪點,是因為它與周邊畫素點灰度值有明顯差別,也就是影像強度有劇烈的變化,所以噪聲是高頻部分)。影像中的低頻分量,指的是影像強度(亮度/灰度)變換平緩的畫素點,例如大片色塊的地方。例如當我們在讀書的時候,我們會聚焦於書上的文字而不是書紙本身,這裡的文字就是高頻分量,白紙即為低頻分量。

下圖是論文中給出的例子,左圖是原圖,中圖表示低頻訊號,右圖表示高頻訊號。

OctConv:八度卷積復現

在論文中,作者提出較高的頻率通常用精細的細節編碼,較低的頻率通常用全域性結構編碼。所以作者認為那麼既然影像分為高低頻,那麼卷積產生的特徵圖自然也存在高低頻之分。在影像處理中,模型透過高頻特徵圖去學習影像包含的資訊,因為它包含了輪廓、邊緣等的資訊,有助於進行顯著性檢測。相反,低頻特徵圖包含的資訊較少。如果我們用相同的處理方法來處理高頻特徵圖和低頻特徵圖,顯然,前者的效益是遠大於後者的。這就是特徵圖的冗餘資訊:包含資訊較少的低頻部分。所以在論文中作者提出了一種分而治之的方法,稱之為Octave Feature Representation,對高頻特徵圖與低頻特徵圖分離開來進行處理。如下圖所示,作者將低頻特徵圖的解析度降為1/2,這不僅有助於減少冗餘資料,還有利於得到全域性資訊。

OctConv:八度卷積復現

根據尺度空間理念,我們可以知道特徵具有尺度不變性和旋轉不變性。

  • 尺度不變性:人類在識別一個物體時,不管這個物體或遠或近,都能對它進行正確的辨認,這就是所謂的尺度不變性。
  • 旋轉不變性:當這個物體發生旋轉時,我們照樣可以正確地辨認它,這就是所謂的旋轉不變性。
    當用一個機器視覺系統分析未知場景時,計算機沒有辦法預先知識影像中物體尺度,因此,我們需要同時考慮影像在多尺度下的描述,獲知感興趣物體的最佳尺度。例如,高解析度的圖是人近距離的觀察得到的,低解析度的圖是遠距離觀察得到的。

2、復現詳情

2.1 Oct-Conv復現

為了同時做到同一頻率內的更新和不同頻率之間的交流,卷積核分成四部分:

  • 高頻到高頻的卷積核
  • 高頻到低頻的卷積核
  • 低頻到高頻的卷積核
  • 低頻到低頻的卷積核

下圖直觀地展示了八度卷積的卷積核,可以看出四個部分共同組成了大小為 k*k 的卷積核。其中,in和out分別表示輸入和輸出特徵圖的相關屬性,在這篇文章中,輸入的低頻佔比、通道數量都和輸出的一致。

OctConv:八度卷積復現

在瞭解了卷積核之後,下面介紹輸入如何進行八度卷積操作得到輸出結果。如下圖所示,低頻和高頻的輸入經過八度卷積操作得到了低頻和高頻的輸出。紅色表示高頻,藍色表示低頻。綠色的箭頭表示同一頻率內的更新,紅色的箭頭表示不同頻率之間的交流。

H和W分別表示特徵圖的長寬,可以看出低頻特徵圖的長寬都是高頻特徵圖的一半。因為解析度不同,所以不同頻率之間交流之前需要進行解析度的調整:高頻到低頻需要進行池化(降取樣)操作;低頻到高頻需要進行上取樣操作。

OctConv:八度卷積復現
import paddle
import paddle.nn as nn
import math
class OctaveConv(nn.Layer):
 def __init__(self, in_channels, out_channels, kernel_size, alpha_in=0.5, alpha_out=0.5, stride=1, padding=0, dilation=1,
                 groups=1, bias=False):
 super(OctaveConv, self).__init__()
 self.downsample = nn.AvgPool2D(kernel_size=(2, 2), stride=2)
 self.upsample = nn.Upsample(scale_factor=2, mode='nearest')
 assert stride == 1 or stride == 2, "Stride should be 1 or 2."
 self.stride = stride
 self.is_dw = groups == in_channels
 assert 0 <= alpha_in <= 1 and 0 <= alpha_out <= 1, "Alphas should be in the interval from 0 to 1."
 self.alpha_in, self.alpha_out = alpha_in, alpha_out
 self.conv_l2l = None if alpha_in == 0 or alpha_out == 0 else \
 nn.Conv2D(int(alpha_in * in_channels), int(alpha_out * out_channels),
 kernel_size, 1, padding, dilation, math.ceil(alpha_in * groups))
 self.conv_l2h = None if alpha_in == 0 or alpha_out == 1 or self.is_dw else \
 nn.Conv2D(int(alpha_in * in_channels), out_channels - int(alpha_out * out_channels),
 kernel_size, 1, padding, dilation, groups)
 self.conv_h2l = None if alpha_in == 1 or alpha_out == 0 or self.is_dw else \
 nn.Conv2D(in_channels - int(alpha_in * in_channels), int(alpha_out * out_channels),
 kernel_size, 1, padding, dilation, groups)
 self.conv_h2h = None if alpha_in == 1 or alpha_out == 1 else \
 nn.Conv2D(in_channels - int(alpha_in * in_channels), out_channels - int(alpha_out * out_channels),
 kernel_size, 1, padding, dilation, math.ceil(groups - alpha_in * groups))
 def forward(self, x):
 x_h, x_l = x if type(x) is tuple else (x, None)
 x_h = self.downsample(x_h) if self.stride == 2 else x_h
        x_h2h = self.conv_h2h(x_h)
        x_h2l = self.conv_h2l(self.downsample(x_h)) if self.alpha_out > 0 and not self.is_dw else None
 if x_l is not None:
            x_l2l = self.downsample(x_l) if self.stride == 2 else x_l
            x_l2l = self.conv_l2l(x_l2l) if self.alpha_out > 0 else None
 if self.is_dw:
 return x_h2h, x_l2l
 else:
                x_l2h = self.conv_l2h(x_l)
                x_l2h = self.upsample(x_l2h) if self.stride == 1 else x_l2h
 x_h = x_l2h + x_h2h
 x_l = x_h2l + x_l2l if x_h2l is not None and x_l2l is not None else None
 return x_h, x_l
 else:
 return x_h2h, x_h2l
class Conv_BN(nn.Layer):
 def __init__(self, in_channels, out_channels, kernel_size, alpha_in=0.5, alpha_out=0.5, stride=1, padding=0, dilation=1,
                 groups=1, bias=False, norm_layer=nn.BatchNorm2D):
 super(Conv_BN, self).__init__()
 self.conv = OctaveConv(in_channels, out_channels, kernel_size, alpha_in, alpha_out, stride, padding, dilation,
                               groups, bias)
 self.bn_h = None if alpha_out == 1 else norm_layer(int(out_channels * (1 - alpha_out)))
 self.bn_l = None if alpha_out == 0 else norm_layer(int(out_channels * alpha_out))
 def forward(self, x):
 x_h, x_l = self.conv(x)
 x_h = self.bn_h(x_h)
 x_l = self.bn_l(x_l) if x_l is not None else None
 return x_h, x_l
class Conv_BN_ACT(nn.Layer):
 def __init__(self, in_channels=3, out_channels=32, kernel_size=3, alpha_in=0.5, alpha_out=0.5, stride=1, padding=0, dilation=1,
                 groups=1, bias=False, norm_layer=nn.BatchNorm2D, activation_layer=nn.ReLU):
 super(Conv_BN_ACT, self).__init__()
 self.conv = OctaveConv(in_channels, out_channels, kernel_size, alpha_in, alpha_out, stride, padding, dilation,
                               groups, bias)
 self.bn_h = None if alpha_out == 1 else norm_layer(int(out_channels * (1 - alpha_out)))
 self.bn_l = None if alpha_out == 0 else norm_layer(int(out_channels * alpha_out))
 self.act = activation_layer()
 def forward(self, x):
 x_h, x_l = self.conv(x)
 x_h = self.act(self.bn_h(x_h))
 x_l = self.act(self.bn_l(x_l)) if x_l is not None else None
 return x_h, x_l

2.2 Oct-Mobilnetv1復現

Oct-Mobilnetv1的復現即將Mobilnetv1中的原始的Conv2D替換為Oct-Conv,其他均保持不變,在後面列印了Oct-Mobilnetv1的網路結構以及引數量,方便大家檢視。

# Oct-Mobilnetv1
import paddle.nn as nn
__all__ = ['oct_mobilenet']
def conv_bn(inp, oup, stride):
 return nn.Sequential(
 nn.Conv2D(inp, oup, 3, stride, 1),
 nn.BatchNorm2D(oup),
 nn.ReLU()
 )
def conv_dw(inp, oup, stride, alpha_in=0.5, alpha_out=0.5):
 return nn.Sequential(
 Conv_BN_ACT(inp, inp, kernel_size=3, stride=stride, padding=1, groups=inp,  \
 alpha_in=alpha_in, alpha_out=alpha_in if alpha_out != alpha_in else alpha_out),
 Conv_BN_ACT(inp, oup, kernel_size=1, alpha_in=alpha_in, alpha_out=alpha_out)
 )
class OctMobileNet(nn.Layer):
 def __init__(self, num_classes=1000):
 super(OctMobileNet, self).__init__()
 self.features = nn.Sequential(
 conv_bn( 3, 32, 2),
 conv_dw( 32, 64, 1, 0, 0.5),
 conv_dw( 64, 128, 2),
 conv_dw(128, 128, 1),
 conv_dw(128, 256, 2), 
 conv_dw(256, 256, 1),
 conv_dw(256, 512, 2),
 conv_dw(512, 512, 1),
 conv_dw(512, 512, 1),
 conv_dw(512, 512, 1),
 conv_dw(512, 512, 1),
 conv_dw(512, 512, 1, 0.5, 0),
 conv_dw(512, 1024, 2, 0, 0),
 conv_dw(1024, 1024, 1, 0, 0),
 )
 self.avgpool = nn.AdaptiveAvgPool2D((1, 1))
 self.fc = nn.Linear(1024, num_classes)
 def forward(self, x):
 x_h, x_l = self.features(x)
        x = self.avgpool(x_h)
        x = x.reshape([-1, 1024])
        x = self.fc(x)
 return x
def oct_mobilenet(**kwargs):
 """
    Constructs a Octave MobileNet V1 model
    """
 return OctMobileNet(**kwargs)

2.3 OctResNet的復現

Oct-ResNet的復現即將ResNet中的原始的Conv2D替換為Oct-Conv,其他均保持不變,在後面列印了Oct-ResNet的網路結構以及引數量,方便大家檢視。

import paddle.nn as nn
__all__ = ['OctResNet', 'oct_resnet50', 'oct_resnet101', 'oct_resnet152', 'oct_resnet200']
class Bottleneck(nn.Layer):
    expansion = 4
 def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
 base_width=64, alpha_in=0.5, alpha_out=0.5, norm_layer=None, output=False):
 super(Bottleneck, self).__init__()
 if norm_layer is None:
 norm_layer = nn.BatchNorm2D
        width = int(planes * (base_width / 64.)) * groups
 # Both self.conv2 and self.downsample layers downsample the input when stride != 1
 self.conv1 = Conv_BN_ACT(inplanes, width, kernel_size=1, alpha_in=alpha_in, alpha_out=alpha_out, norm_layer=norm_layer)
 self.conv2 = Conv_BN_ACT(width, width, kernel_size=3, stride=stride, padding=1, groups=groups, norm_layer=norm_layer,
 alpha_in=0 if output else 0.5, alpha_out=0 if output else 0.5)
 self.conv3 = Conv_BN(width, planes * self.expansion, kernel_size=1, norm_layer=norm_layer,
 alpha_in=0 if output else 0.5, alpha_out=0 if output else 0.5)
 self.relu = nn.ReLU()
 self.downsample = downsample
 self.stride = stride
 def forward(self, x):
 identity_h = x[0] if type(x) is tuple else x
 identity_l = x[1] if type(x) is tuple else None
 x_h, x_l = self.conv1(x)
 x_h, x_l = self.conv2((x_h, x_l))
 x_h, x_l = self.conv3((x_h, x_l))
 if self.downsample is not None:
 identity_h, identity_l = self.downsample(x)
 x_h += identity_h
 x_l = x_l + identity_l if identity_l is not None else None
 x_h = self.relu(x_h)
 x_l = self.relu(x_l) if x_l is not None else None
 return x_h, x_l
class OctResNet(nn.Layer):
 def __init__(self, block, layers, num_classes=1000, zero_init_residual=False,
                 groups=1, width_per_group=64, norm_layer=None):
 super(OctResNet, self).__init__()
 if norm_layer is None:
 norm_layer = nn.BatchNorm2D
 self.inplanes = 64
 self.groups = groups
 self.base_width = width_per_group
 self.conv1 = nn.Conv2D(3, self.inplanes, kernel_size=7, stride=2, padding=3,
 )
        self.bn1 = norm_layer(self.inplanes)
 self.relu = nn.ReLU()
 self.maxpool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
 self.layer1 = self._make_layer(block, 64, layers[0], norm_layer=norm_layer, alpha_in=0)
 self.layer2 = self._make_layer(block, 128, layers[1], stride=2, norm_layer=norm_layer)
 self.layer3 = self._make_layer(block, 256, layers[2], stride=2, norm_layer=norm_layer)
 self.layer4 = self._make_layer(block, 512, layers[3], stride=2, norm_layer=norm_layer, alpha_out=0, output=True)
 self.avgpool = nn.AdaptiveAvgPool2D((1, 1))
 self.fc = nn.Linear(512 * block.expansion, num_classes)
 def _make_layer(self, block, planes, blocks, stride=1, alpha_in=0.5, alpha_out=0.5, norm_layer=None, output=False):
 if norm_layer is None:
 norm_layer = nn.BatchNorm2D
 downsample = None
 if stride != 1 or self.inplanes != planes * block.expansion:
 downsample = nn.Sequential(
 Conv_BN(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, alpha_in=alpha_in, alpha_out=alpha_out)
 )
        layers = []
 layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
 self.base_width, alpha_in, alpha_out, norm_layer, output))
 self.inplanes = planes * block.expansion
 for _ in range(1, blocks):
 layers.append(block(self.inplanes, planes, groups=self.groups,
 base_width=self.base_width, norm_layer=norm_layer,
 alpha_in=0 if output else 0.5, alpha_out=0 if output else 0.5, output=output))
 return nn.Sequential(*layers)
 def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
 x_h, x_l = self.layer1(x)
 x_h, x_l = self.layer2((x_h,x_l))
 x_h, x_l = self.layer3((x_h,x_l))
 x_h, x_l = self.layer4((x_h,x_l))
        x = self.avgpool(x_h)
        x = x.reshape([x.shape[0], -1])
        x = self.fc(x)
 return x
def oct_resnet50(pretrained=False, **kwargs):
 """Constructs a Octave ResNet-50 model.
 Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = OctResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
 return model
def oct_resnet101(pretrained=False, **kwargs):
 """Constructs a Octave ResNet-101 model.
 Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = OctResNet(Bottleneck, [3, 4, 23, 3], **kwargs)
 return model
def oct_resnet152(pretrained=False, **kwargs):
 """Constructs a Octave ResNet-152 model.
 Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = OctResNet(Bottleneck, [3, 8, 36, 3], **kwargs)
 return model
def oct_resnet200(pretrained=False, **kwargs):
 """Constructs a Octave ResNet-200 model.
 Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = OctResNet(Bottleneck, [3, 24, 36, 3], **kwargs)
 return model

3、對比實驗

實驗資料:Cifar10

CIFAR-10 是由 Hinton 的學生 Alex Krizhevsky 和 Ilya Sutskever 整理的一個用於識別普適物體的小型資料集。一共包含 10 個類別的 RGB 彩色圖 片:飛機( a叩lane )、汽車( automobile )、鳥類( bird )、貓( cat )、鹿( deer )、狗( dog )、蛙類( frog )、馬( horse )、船( ship )和卡車( truck )。圖片的尺寸為 32×32 ,資料集中一共有 50000 張訓練圄片和 10000 張測試圖片。 CIFAR-10 的圖片樣例如圖所示。

OctConv:八度卷積復現

3.1 Oct_MobilNetv1模型網路結構視覺化

Octmobilnet_model = oct_mobilenet(num_classes=10)
# inputs = paddle.randn((1, 2, 224, 224))
# print(model(inputs))
paddle.summary(Octmobilnet_model,(16,3,224,224))

3.2 Oct_MobilNetV1模型訓練

import paddle
from paddle.metric import Accuracy
from paddle.vision.transforms import Compose, Normalize, Resize, Transpose, ToTensor
callback = paddle.callbacks.VisualDL(log_dir='visualdl_log_dir_octmobilenet')
normalize = Normalize(mean=[0.5, 0.5, 0.5],
                    std=[0.5, 0.5, 0.5],
 data_format='HWC')
transform = Compose([ToTensor(), Normalize(), Resize(size=(224,224))])
cifar10_train = paddle.vision.datasets.Cifar10(mode='train',
                                               transform=transform)
cifar10_test = paddle.vision.datasets.Cifar10(mode='test',
                                              transform=transform)
# 構建訓練集資料載入器
train_loader = paddle.io.DataLoader(cifar10_train, batch_size=768, shuffle=True, drop_last=True)
# 構建測試集資料載入器
test_loader = paddle.io.DataLoader(cifar10_test, batch_size=768, shuffle=True, drop_last=True)
Octmobilnet_model = paddle.Model(oct_mobilenet(num_classes=10))
optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=Octmobilnet_model.parameters())
Octmobilnet_model.prepare(
 optim,
 paddle.nn.CrossEntropyLoss(),
 Accuracy()
 )
Octmobilnet_model.fit(train_data=train_loader,
 eval_data=test_loader,
        epochs=12,
        callbacks=callback,
        verbose=1
 )

3.3 MobileNetV1模型網路結構視覺化

from paddle.vision.models import MobileNetV1
mobile_model = MobileNetV1(num_classes=10)
# inputs = paddle.randn((1, 2, 224, 224))
# print(model(inputs))
paddle.summary(mobile_model,(16,3,224,224))

3.4 MobileNetV1模型訓練

import paddle
from paddle.metric import Accuracy
from paddle.vision.transforms import Compose, Normalize, Resize, Transpose, ToTensor
callback = paddle.callbacks.VisualDL(log_dir='visualdl_log_dir_mobilenet')
normalize = Normalize(mean=[0.5, 0.5, 0.5],
                    std=[0.5, 0.5, 0.5],
 data_format='HWC')
transform = Compose([ToTensor(), Normalize(), Resize(size=(224,224))])
cifar10_train = paddle.vision.datasets.Cifar10(mode='train',
                                               transform=transform)
cifar10_test = paddle.vision.datasets.Cifar10(mode='test',
                                              transform=transform)
# 構建訓練集資料載入器
train_loader = paddle.io.DataLoader(cifar10_train, batch_size=768, shuffle=True, drop_last=True)
# 構建測試集資料載入器
test_loader = paddle.io.DataLoader(cifar10_test, batch_size=768, shuffle=True, drop_last=True)
mobile_model = paddle.Model(MobileNetV1(num_classes=10))
optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=mobile_model.parameters())
mobile_model.prepare(
 optim,
 paddle.nn.CrossEntropyLoss(),
 Accuracy()
 )
mobile_model.fit(train_data=train_loader,
 eval_data=test_loader,
        epochs=12,
        callbacks=callback,
        verbose=1
 )

3.5 Oct_ResNet50模型網路結構視覺化

octresnet50_model = oct_resnet50(num_classes=10)
paddle.summary(octresnet50_model,(16,3,224,224))

3.6 Oct_ResNet50模型訓練

import paddle
from paddle.metric import Accuracy
from paddle.vision.transforms import Compose, Normalize, Resize, Transpose, ToTensor
callback = paddle.callbacks.VisualDL(log_dir='visualdl_log_dir_octresnet')
normalize = Normalize(mean=[0.5, 0.5, 0.5],
                    std=[0.5, 0.5, 0.5],
 data_format='HWC')
transform = Compose([ToTensor(), Normalize(), Resize(size=(224,224))])
cifar10_train = paddle.vision.datasets.Cifar10(mode='train',
                                               transform=transform)
cifar10_test = paddle.vision.datasets.Cifar10(mode='test',
                                              transform=transform)
# 構建訓練集資料載入器
train_loader = paddle.io.DataLoader(cifar10_train, batch_size=256, shuffle=True, drop_last=True)
# 構建測試集資料載入器
test_loader = paddle.io.DataLoader(cifar10_test, batch_size=256, shuffle=True, drop_last=True)
oct_resnet50 = paddle.Model(oct_resnet50(num_classes=10))
optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=oct_resnet50.parameters())
oct_resnet50.prepare(
 optim,
 paddle.nn.CrossEntropyLoss(),
 Accuracy()
 )
oct_resnet50.fit(train_data=train_loader,
 eval_data=test_loader,
        epochs=12,
        callbacks=callback,
        verbose=1
 )

3.7 ResNet50模型網路結構視覺化

import paddle
# build model
resmodel = resnet50(num_classes=10)
paddle.summary(resmodel,(16,3,224,224))

3.8 ResNet50模型訓練

import paddle
from paddle.metric import Accuracy
from paddle.vision.transforms import Compose, Normalize, Resize, Transpose, ToTensor
from paddle.vision.models import resnet50
callback = paddle.callbacks.VisualDL(log_dir='visualdl_log_dir_resnet')
normalize = Normalize(mean=[0.5, 0.5, 0.5],
                    std=[0.5, 0.5, 0.5],
 data_format='HWC')
transform = Compose([ToTensor(), Normalize(), Resize(size=(224,224))])
cifar10_train = paddle.vision.datasets.Cifar10(mode='train',
                                               transform=transform)
cifar10_test = paddle.vision.datasets.Cifar10(mode='test',
                                              transform=transform)
# 構建訓練集資料載入器
train_loader = paddle.io.DataLoader(cifar10_train, batch_size=256, shuffle=True, drop_last=True)
# 構建測試集資料載入器
test_loader = paddle.io.DataLoader(cifar10_test, batch_size=256, shuffle=True, drop_last=True)
res_model = paddle.Model(resnet50(num_classes=10))
optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=res_model.parameters())
res_model.prepare(
 optim,
 paddle.nn.CrossEntropyLoss(),
 Accuracy()
 )
res_model.fit(train_data=train_loader,
 eval_data=test_loader,
        epochs=12,
        callbacks=callback,
        verbose=1
 )

3.9 實驗結果

本小節提供消融實驗的結果以及視覺化訓練結果,共計包含四個實驗,分別為octmobinetv1、mobinetv1、octresnet50以及resnet50在資料集Cifar10上的結果對比。

<style> table { margin: auto; } </style>

OctConv:八度卷積復現

圖1:Oct_MobileNetV1對比實驗結果圖

OctConv:八度卷積復現OctConv:八度卷積復現OctConv:八度卷積復現

圖2:Oct_ResNet50對比實驗結果圖

OctConv:八度卷積復現OctConv:八度卷積復現

4、參考資料

d-li14/octconv.pytorch

神經網路學習之OctConv:八度卷積

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

5、總結

目前我們得到的結論與論文中的結論不符,論文提供的程式碼為MXnet框架,本復現參考了PyTorch版本的復現,不能確定是否為框架原因,或者一些訓練設定原因,比如初始化方式或模型迭代次數不夠,有待查證,大家感興趣的也可以就這個問題與我在評論區進行交流。

 

點選關注,第一時間瞭解華為雲新鮮技術~

相關文章