這一篇文章,來講解一下可變卷積的程式碼實現邏輯和視覺化效果。全部基於python,沒有C++。大部分程式碼來自:https://github.com/oeway/pytorch-deform-conv 但是我研究了挺久的,發現這個人的程式碼中存在一些問題,導致可變卷積並沒有實現。之所以發現這個問題是在我視覺化可變卷積的檢測點的時候,發現一些端倪,然後經過修改之後,可以正常視覺化,並且精度有所提升。
1 程式碼邏輯
# 為了視覺化
class ConvOffset2D(nn.Conv2d):
Convolutional layer responsible for learning the 2D offsets and output the
deformed feature map using bilinear interpolation
Note that this layer does not perform convolution on the deformed feature
map. See get_deform_cnn in cnn.py for usage
def __init__(self, filters, init_normal_stddev=0.01, **kwargs):
filters : int
Number of channel of the input feature map
init_normal_stddev : float
Normal kernel initialization
Pass to superclass. See Con2d layer in pytorch
self.filters = filters
self._grid_param = None
super(ConvOffset2D, self).__init__(self.filters, self.filters*2, 3, padding=1, bias=False, **kwargs)
self.weight.data.copy_(self._init_weights(self.weight, init_normal_stddev))
def forward(self, x):
"""Return the deformed featured map"""
x_shape = x.size()
offsets_ = super(ConvOffset2D, self).forward(x)
# offsets: (b*c, h, w, 2)
# 這個self._to_bc_h_w_2就是我修改的程式碼
offsets = self._to_bc_h_w_2(offsets_, x_shape)
# x: (b*c, h, w)
x = self._to_bc_h_w(x, x_shape)
# X_offset: (b*c, h, w)
x_offset = th_batch_map_offsets(x, offsets, grid=self._get_grid(self,x))
# x_offset: (b, h, w, c)
x_offset = self._to_b_c_h_w(x_offset, x_shape)
return x_offset,offsets_
offsets_ = super(ConvOffset2D, self).forward(x)
現在offsets_是一個10通道的28x28的特徵圖。 -
offsets = self._to_bc_h_w_2(offsets_, x_shape)
呼叫這個函式特徵圖從(b,2c, h, w)變成(bxc, h, w, 2)的結構 -
x = self._to_bc_h_w(x, x_shape)
改變原來特徵圖的結構,變成(bxc,h,w) -
x_offset = th_batch_map_offsets(x, offsets, grid=self._get_grid(self,x))
這個相當於把之前的偏移offsets施加到了特徵圖x上 -
x_offset = self._to_b_c_h_w(x_offset, x_shape)
def th_batch_map_offsets(input, offsets, grid=None, order=1):
"""Batch map offsets into input
input : torch.Tensor. shape = (b, s, s)
offsets: torch.Tensor. shape = (b, s, s, 2)
torch.Tensor. shape = (b, s, s)
batch_size = input.size(0)
input_height = input.size(1)
input_width = input.size(2)
offsets = offsets.view(batch_size, -1, 2)
if grid is None:
grid = th_generate_grid(batch_size, input_height, input_width, offsets.data.type(), offsets.data.is_cuda)
coords = offsets + grid
mapped_vals = th_batch_map_coordinates(input, coords)
return mapped_vals
offsets = offsets.view(batch_size, -1, 2)
offsets之前被改造成了(bxc,h,w,2)的樣子,現在再改成(b,cxhxw,2)的樣子 -
coords = offsets + grid
這個感覺是offsets+grid,grid類似於畫素的xy軸,offsets是一個相對偏移,這樣offset+grid就變成了偏移之後的絕對座標,可以直接從特徵圖中定位到對應的元素。因為畫素值的xy軸肯定為整數,因為這個偏移是小數,所以在特徵圖中定位到一個小數座標的元素是通過雙線性差值的方法獲取到這個不存在位置的畫素值的。 -
mapped_vals = th_batch_map_coordinates(input, coords)
2 結果展示
3 完整程式碼
