神經網路中的降維和升維方法 (tensorflow & pytorch)

凌逆戰發表於2021-01-09

原文網址 : https://www.cnblogs.com/LXP-Never/p/13181168.html

　　大名鼎鼎的UNet和我們經常看到的編解碼器模型，他們的模型都是先將資料下采樣，也稱為特徵提取，然後再將下采樣後的特徵恢復回原來的維度。這個特徵提取的過程我們稱為“下采樣”，這個恢復的過程我們稱為“上取樣”，本文就專注於神經網路中的下采樣和上取樣來進行一次總結。寫的不好勿怪哈。

神經網路中的降維方法

池化層

　　池化層（平均池化層、最大池化層），卷積

平均池化層

pytorch
- nn.AvgPool1d
- nn.AvgPool2d
tensorflow
- tf.layers.AveragePooling1D
- tf.layers.AveragePooling2D

最大池化層

pytorch
- nn.MaxPool1d
- nn.MaxPool2d
tensorflow
- tf.layers.MaxPooling1D
- tf.layers.MaxPooling2D

還有另外一些pool層：nn.LPPool、nn.AdaptiveMaxPool、nn.AdaptiveAvgPool、nn.FractionalMaxPool2d

卷積

普通卷積

pytorch
- nn.Conv1d
- nn.Conv2d
tensorflow
- tf.layers.Conv1D
- tf.layers.Conv2D

還有一些獨特的卷積，感興趣的可以自己去了解

擴張卷積 (又稱空洞卷積)： tf.nn.atrous_conv2d
depthwise卷積： tf.nn.depthwise_conv2d
分離卷積： tf.nn.separable_conv2d
量化卷積： tf.nn.quantized_conv2d
...

升維方法

插值方法

插值方法有很多種有：階梯插值、線性插值、三次樣條插值等等

numpy的實現方法我在另外一篇文章中已經介紹過了，為了避免重複，想要了解的同學請移步【插值方法及python實現】

pytorch實現方法

torch.nn.Upsample(size=None, scale_factor=None, mode='nearest', align_corners=None)

對給定多通道的1維（時間）、2維（空間）、3維（體積）資料進行上取樣。

1維(向量資料)，輸入資料Tensor格式為3維：(batch_size, channels, width)
2維(影像資料)，輸入資料Tensor格式為4維：(batch_size, channels, height, width)
3維(點雲資料)，輸入資料Tensor格式為5維：(batch_size, channels, depth, height, width)

引數

size：輸入資料（一維 or 二維 or 三維）
scale_factor：縮放大小
mode：上取樣演算法（nearest（最近鄰插值）、linear（線性插值）、bilinear（雙線性插值）、bicubic（雙三次插值）、trilinear（三次線性插值））
align_corners：如果為True，則輸入和輸出張量的角畫素對齊，從而保留這些畫素處的值。僅在模式為“線性”，“雙線性”或“三線性”時有效。預設值：False

Input：$(N, C, W_{in}), (N, C, H_{in}, W_{in}) 或(N, C, D_{in}, H_{in}, W_{in})$
Output: $(N, C, W_{out}), (N, C, H_{out}, W_{out}) 或(N, C, D_{out}, H_{out}, W_{out})$

$D_{out}=[D_{in}× \text{scale_factor}]$

$H_{out} = [H_{in} \times \text{scale_factor}]$

$W_{out} = [W_{in} \times \text{scale_factor}]$

unpooling

　　Unpooling是在CNN中常用的來表示max pooling的逆操作。這是從2013年紐約大學Matthew D. Zeiler和Rob Fergus發表的《Visualizing and Understanding Convolutional Networks》中產生的idea：鑑於max pooling不可逆，因此使用近似的方式來反轉得到max pooling操作之前的原始情況

　　簡單來說，記住做max pooling的時候的最大item的位置，比如一個3x3的矩陣，max pooling的size為2x2，stride為1，反摺積記住其位置，其餘位置至為0就行：

$$\left[\begin{array}{lll}
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9
\end{array}\right]->(\text { maxpooling })\left[\begin{array}{ll}
5 & 6 \\
8 & 9
\end{array}\right]->(\text { unpooling })\left[\begin{array}{lll}
0 & 0 & 0 \\
0 & 5 & 6 \\
0 & 8 & 9
\end{array}\right]$$

方法一

def unpool_with_with_argmax(pooled, ind, ksize=[1, 2, 2, 1]):
    """https://github.com/sangeet259/tensorflow_unpooling
      To unpool the tensor after  max_pool_with_argmax.
      Argumnets:
          pooled:    the max pooled output tensor
          ind:       argmax indices , the second output of max_pool_with_argmax
          ksize:     ksize should be the same as what you have used to pool
      Returns:
          unpooled:      the tensor after unpooling
      Some points to keep in mind ::
          1. In tensorflow the indices in argmax are flattened, so that a maximum value at position [b, y, x, c] becomes flattened index ((b * height + y) * width + x) * channels + c
          2. Due to point 1, use broadcasting to appropriately place the values at their right locations !
    """
    # Get the the shape of the tensor in th form of a list
    input_shape = pooled.get_shape().as_list()
    # Determine the output shape
    output_shape = (input_shape[0], input_shape[1] * ksize[1], input_shape[2] * ksize[2], input_shape[3])
    # Ceshape into one giant tensor for better workability
    pooled_ = tf.reshape(pooled, [input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3]])
    # The indices in argmax are flattened, so that a maximum value at position [b, y, x, c] becomes flattened index ((b * height + y) * width + x) * channels + c
    # Create a single unit extended cuboid of length bath_size populating it with continous natural number from zero to batch_size
    batch_range = tf.reshape(tf.range(output_shape[0], dtype=ind.dtype), shape=[input_shape[0], 1, 1, 1])
    b = tf.ones_like(ind) * batch_range
    b_ = tf.reshape(b, [input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3], 1])
    ind_ = tf.reshape(ind, [input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3], 1])
    ind_ = tf.concat([b_, ind_], 1)
    ref = tf.Variable(tf.zeros([output_shape[0], output_shape[1] * output_shape[2] * output_shape[3]]))
    # Update the sparse matrix with the pooled values , it is a batch wise operation
    unpooled_ = tf.scatter_nd_update(ref, ind_, pooled_)
    # Reshape the vector to get the final result
    unpooled = tf.reshape(unpooled_, [output_shape[0], output_shape[1], output_shape[2], output_shape[3]])
    return unpooled


original_tensor = tf.random_uniform([1, 4, 4, 3], maxval=100, dtype='float32', seed=2)
pooled_tensor, max_indices = tf.nn.max_pool_with_argmax(original_tensor, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
                                                        padding='SAME')
print(pooled_tensor.shape)  # (1, 2, 2, 3)
unpooled_tensor = unpool_with_with_argmax(pooled_tensor, max_indices)
print(unpooled_tensor.shape)    # (1, 4, 4, 3)

View Code

方法二

from tensorflow.python.ops import gen_nn_ops

inputs = tf.get_variable(name="a", shape=[64, 32, 32, 4], dtype=tf.float32,
                         initializer=tf.random_normal_initializer(mean=0, stddev=1))

# 最大池化
pool1 = tf.nn.max_pool(inputs,
                       ksize=[1, 2, 2, 1],
                       strides=[1, 2, 2, 1],
                       padding='SAME')
print(pool1.shape)  # (64, 16, 16, 4)
# 最大反池化
grad = gen_nn_ops.max_pool_grad(inputs,  # 池化前的tensor，即max pool的輸入
                                pool1,  # 池化後的tensor，即max pool 的輸出
                                pool1,  # 需要進行反池化操作的tensor，可以是任意shape和pool1一樣的tensor
                                ksize=[1, 2, 2, 1],
                                strides=[1, 2, 2, 1],
                                padding='SAME')

print(grad.shape)   # (64, 32, 32, 4)

View Code

在tensorflow 2.4版本中官方已經幫我們實現好了

tf.keras.layers.UpSampling2D(size=(2, 2), data_format=None, interpolation='nearest')

pytorch版本

轉置卷積

　　轉置卷積 (transpose convolution) 也會被稱為反摺積（Deconvolution），與Unpooling不同，使用反摺積來對影像進行上取樣是可以習得的。通常用來對卷積層的結果進行上取樣，使其回到原始圖片的解析度。

pytorch
- nn.ConvTranspose1d(in_channels=N, out_channels=2N, kernel_size=2*S, stride=S, padding=S//2 + S%2, otuput_padding=S%2)
- nn.ConvTranspose2d
tensorflow
- tf.nn.conv2d_transpose
- tf.nn.conv1d_transpose

PixelShuffle

　　pixelshuffle演算法的實現流程如上圖，其實現的功能是：將一個[H, W]的低解析度輸入影像（Low Resolution），通過Sub-pixel操作將其變為[r*H, e*W]的高解析度影像（High Resolution）。

　　但是其實現過程不是直接通過插值等方式產生這個高解析度影像，而是通過卷積先得到$r^2$

二維SubPixel上取樣

[batch, height, width, channels * r * r] --> [batch, height * r, width * r, channels]

tensorflow方法實現

import tensorflow as tf


def _phase_shift(I, r):
    # 相位偏移操作
    bsize, a, b, c = I.get_shape().as_list()
    bsize = tf.shape(I)[0]  # Handling Dimension(None) type for undefined batch dim
    X = tf.reshape(I, (bsize, a, b, r, r))
    X = tf.transpose(X, (0, 1, 2, 4, 3))  # bsize, a, b, 1, 1
    X = tf.split(X, a, 1)  # a, [bsize, b, r, r]
    X = tf.concat([tf.squeeze(x, axis=1) for x in X], axis=2)  # bsize, b, a*r, r
    X = tf.split(X, b, 1)  # b, [bsize, a*r, r]
    X = tf.concat([tf.squeeze(x, axis=1) for x in X], axis=2)  # bsize, a*r, b*r
    return tf.reshape(X, (bsize, a * r, b * r, 1))


def PixelShuffle(X, r, color=False):
    if color:
        Xc = tf.split(X, 3, 3)
        X = tf.concat([_phase_shift(x, r) for x in Xc], axis=3)
    else:
        X = _phase_shift(X, r)
    return X


if __name__ == "__main__":
    X1 = tf.get_variable(name='X1',
                         shape=[2, 8, 8, 4],
                         initializer=tf.random_normal_initializer(stddev=1.0),
                         dtype=tf.float32)
    Y = PixelShuffle(X1, 2)
    print(Y.shape)  # (2, 16, 16, 1)

    X2 = tf.get_variable(name='X2',
                         shape=[2, 8, 8, 4 * 3],
                         initializer=tf.random_normal_initializer(stddev=1.0),
                         dtype=tf.float32)
    Y2 = PixelShuffle(X2, 2, color=True)
    print(Y2.shape)  # (2, 16, 16, 3)

View Code

pytorch方法實現

import torch
import torch.nn as nn

input = torch.randn(size=(1, 9, 4, 4))
ps = nn.PixelShuffle(3)
output = ps(input)
print(output.size())    # torch.Size([1, 1, 12, 12])

View Code

numpy方法實現

def PS(I, r):
  assert len(I.shape) == 3
  assert r>0
  r = int(r)
  O = np.zeros((I.shape[0]*r, I.shape[1]*r, I.shape[2]/(r*2)))
  for x in range(O.shape[0]):
    for y in range(O.shape[1]):
      for c in range(O.shape[2]):
        c += 1
        a = np.floor(x/r).astype("int")
        b = np.floor(y/r).astype("int")
        d = c*r*(y%r) + c*(x%r)
        print a, b, d
        O[x, y, c-1] = I[a, b, d]
  return O

View Code

一維SubPixel上取樣

(batch_size, width, channels * r)-->(batch_size, width * r, channels)

tensorflow實現

import tensorflow as tf


def SubPixel1D(I, r):
    """一維subpixel upsampling layer，
    輸入維度(batch, width, r).
    """
    with tf.name_scope('subpixel'):
        X = tf.transpose(I, [2, 1, 0])  # (r, w, b)
        X = tf.batch_to_space_nd(X, [r], [[0, 0]])  # (1, r*w, b)
        X = tf.transpose(X, [2, 1, 0])
        return X

# 示例
# ---------------------------------------------------
if __name__ == "__main__":
    inputs = tf.get_variable(name='input',
                             shape=[64, 8192, 32],
                             initializer=tf.random_normal_initializer(stddev=1.0),
                             dtype=tf.float32)
    upsample_SubPixel1D = SubPixel1D(I=inputs, r=2)
    print(upsample_SubPixel1D.shape)  # (64, 16384, 16)

View Code

pytorch方法實現

class PixelShuffle1D(nn.Module):
    """
    1D pixel shuffler. https://arxiv.org/pdf/1609.05158.pdf
    Upscales sample length, downscales channel length
    "short" is input, "long" is output
    """

    def __init__(self, upscale_factor):
        super(PixelShuffle1D, self).__init__()
        self.upscale_factor = upscale_factor

    def forward(self, x):
        batch_size, channels, in_width = x.size()

        channels //= self.upscale_factor
        out_width = self.upscale_factor * in_width

        x = x.contiguous().view([batch_size, channels, self.upscale_factor, in_width])
        x = x.permute(0, 1, 3, 2).contiguous()
        x = x.view(batch_size, channels, out_width)

        return x

View Code

sub-pixel or fractional convolution可以看成是transposed convolution的一個特例

Meta upscale module

可以任意上取樣尺寸，還不是很出名，等於後出名了再來補全

參考

　　這裡很多API我還是分享的tensorflow 1.*的，主要原因是因為我最開始學深度學習的時候用的是 tensoflow 1，現在我已經轉學pytorch了，今天看了看tensorflow，2版本已經發布一年多了，1版本相當於是爛尾了，2版本雖然解決了原來的問題，可是人是向前看的，我已經使用pytorch起來，再讓我回頭學tensorflow 2似乎是一件很不情願的事情。而且tensorflow 2 已經在走向沒落了，使用tensorflow 2的開原始碼，除了google自家公司外，真的也越來越少。tensorflow加油吧，我內心深處還是喜歡你的，只不過pytorch太方便了，開源社群也很強大了。

【文件】tensorflow官方文件

【文件】pytorch官方文件

【程式碼】2D_subpixel

【程式碼】1D_pytorch-pixelshuffle1d

【程式碼】1D_pytorch_pixelshuffle

【論文】《Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network》

【動圖】卷積的動畫