大名鼎鼎的UNet和我們經常看到的編解碼器模型,他們的模型都是先將資料下采樣,也稱為特徵提取,然後再將下采樣後的特徵恢復回原來的維度。這個特徵提取的過程我們稱為“下采樣”,這個恢復的過程我們稱為“上取樣”,本文就專注於神經網路中的下采樣和上取樣來進行一次總結。寫的不好勿怪哈。
神經網路中的降維方法
池化層
池化層(平均池化層、最大池化層),卷積
平均池化層
- pytorch
- tensorflow
最大池化層
- pytorch
- tensorflow
還有另外一些pool層:nn.LPPool、
nn.AdaptiveMaxPool
、nn.AdaptiveAvgPool、
nn.FractionalMaxPool2d
卷積
普通卷積
- pytorch
- tensorflow
還有一些獨特的卷積,感興趣的可以自己去了解
- 擴張卷積 (又稱空洞卷積): tf.nn.atrous_conv2d
- depthwise卷積: tf.nn.depthwise_conv2d
- 分離卷積: tf.nn.separable_conv2d
- 量化卷積: tf.nn.quantized_conv2d
- ...
升維方法
插值方法
插值方法有很多種有:階梯插值、線性插值、三次樣條插值等等
numpy的實現方法我在另外一篇文章中已經介紹過了,為了避免重複,想要了解的同學請移步【插值方法及python實現】
pytorch實現方法
torch.nn.Upsample(size=None, scale_factor=None, mode='nearest', align_corners=None)
對給定多通道的1維(時間)、2維(空間)、3維(體積)資料進行上取樣。
- 1維(向量資料),輸入資料Tensor格式為3維:(batch_size, channels, width)
- 2維(影像資料),輸入資料Tensor格式為4維:(batch_size, channels, height, width)
- 3維(點雲資料),輸入資料Tensor格式為5維:(batch_size, channels, depth, height, width)
引數
- size:輸入資料(一維 or 二維 or 三維)
- scale_factor:縮放大小
- mode:上取樣演算法(nearest(最近鄰插值)、linear(線性插值)、bilinear(雙線性插值)、bicubic(雙三次插值)、trilinear(三次線性插值))
- align_corners:如果為True,則輸入和輸出張量的角畫素對齊,從而保留這些畫素處的值。 僅在模式為“線性”,“雙線性”或“三線性”時有效。 預設值:False
返回:
-
Input:$(N, C, W_{in}), (N, C, H_{in}, W_{in}) 或(N, C, D_{in}, H_{in}, W_{in})$
-
Output: $(N, C, W_{out}), (N, C, H_{out}, W_{out}) 或(N, C, D_{out}, H_{out}, W_{out})$
$D_{out}=[D_{in}× \text{scale_factor}]$
$H_{out} = [H_{in} \times \text{scale_factor}]$
$W_{out} = [W_{in} \times \text{scale_factor}]$
unpooling
Unpooling是在CNN中常用的來表示max pooling的逆操作。這是從2013年紐約大學Matthew D. Zeiler和Rob Fergus發表的《Visualizing and Understanding Convolutional Networks》中產生的idea:鑑於max pooling不可逆,因此使用近似的方式來反轉得到max pooling操作之前的原始情況
簡單來說,記住做max pooling的時候的最大item的位置,比如一個3x3的矩陣,max pooling的size為2x2,stride為1,反摺積記住其位置,其餘位置至為0就行:
$$\left[\begin{array}{lll}
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9
\end{array}\right]->(\text { maxpooling })\left[\begin{array}{ll}
5 & 6 \\
8 & 9
\end{array}\right]->(\text { unpooling })\left[\begin{array}{lll}
0 & 0 & 0 \\
0 & 5 & 6 \\
0 & 8 & 9
\end{array}\right]$$
方法一
def unpool_with_with_argmax(pooled, ind, ksize=[1, 2, 2, 1]): """https://github.com/sangeet259/tensorflow_unpooling To unpool the tensor after max_pool_with_argmax. Argumnets: pooled: the max pooled output tensor ind: argmax indices , the second output of max_pool_with_argmax ksize: ksize should be the same as what you have used to pool Returns: unpooled: the tensor after unpooling Some points to keep in mind :: 1. In tensorflow the indices in argmax are flattened, so that a maximum value at position [b, y, x, c] becomes flattened index ((b * height + y) * width + x) * channels + c 2. Due to point 1, use broadcasting to appropriately place the values at their right locations ! """ # Get the the shape of the tensor in th form of a list input_shape = pooled.get_shape().as_list() # Determine the output shape output_shape = (input_shape[0], input_shape[1] * ksize[1], input_shape[2] * ksize[2], input_shape[3]) # Ceshape into one giant tensor for better workability pooled_ = tf.reshape(pooled, [input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3]]) # The indices in argmax are flattened, so that a maximum value at position [b, y, x, c] becomes flattened index ((b * height + y) * width + x) * channels + c # Create a single unit extended cuboid of length bath_size populating it with continous natural number from zero to batch_size batch_range = tf.reshape(tf.range(output_shape[0], dtype=ind.dtype), shape=[input_shape[0], 1, 1, 1]) b = tf.ones_like(ind) * batch_range b_ = tf.reshape(b, [input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3], 1]) ind_ = tf.reshape(ind, [input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3], 1]) ind_ = tf.concat([b_, ind_], 1) ref = tf.Variable(tf.zeros([output_shape[0], output_shape[1] * output_shape[2] * output_shape[3]])) # Update the sparse matrix with the pooled values , it is a batch wise operation unpooled_ = tf.scatter_nd_update(ref, ind_, pooled_) # Reshape the vector to get the final result unpooled = tf.reshape(unpooled_, [output_shape[0], output_shape[1], output_shape[2], output_shape[3]]) return unpooled original_tensor = tf.random_uniform([1, 4, 4, 3], maxval=100, dtype='float32', seed=2) pooled_tensor, max_indices = tf.nn.max_pool_with_argmax(original_tensor, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') print(pooled_tensor.shape) # (1, 2, 2, 3) unpooled_tensor = unpool_with_with_argmax(pooled_tensor, max_indices) print(unpooled_tensor.shape) # (1, 4, 4, 3)
方法二
from tensorflow.python.ops import gen_nn_ops inputs = tf.get_variable(name="a", shape=[64, 32, 32, 4], dtype=tf.float32, initializer=tf.random_normal_initializer(mean=0, stddev=1)) # 最大池化 pool1 = tf.nn.max_pool(inputs, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') print(pool1.shape) # (64, 16, 16, 4) # 最大反池化 grad = gen_nn_ops.max_pool_grad(inputs, # 池化前的tensor,即max pool的輸入 pool1, # 池化後的tensor,即max pool 的輸出 pool1, # 需要進行反池化操作的tensor,可以是任意shape和pool1一樣的tensor ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') print(grad.shape) # (64, 32, 32, 4)
在tensorflow 2.4版本中官方已經幫我們實現好了
tf.keras.layers.UpSampling2D(size=(2, 2), data_format=None, interpolation='nearest')
pytorch版本
轉置卷積
轉置卷積 (transpose convolution) 也會被稱為 反摺積(Deconvolution),與Unpooling不同,使用反摺積來對影像進行上取樣是可以習得的。通常用來對卷積層的結果進行上取樣,使其回到原始圖片的解析度。
- pytorch
nn.ConvTranspose1d(in_channels=N, out_channels=2N, kernel_size=2*S, stride=S, padding=S//2 + S%2, otuput_padding=S%2)
nn.ConvTranspose2d
- tensorflow
PixelShuffle
pixelshuffle演算法的實現流程如上圖,其實現的功能是:將一個[H, W]的低解析度輸入影像(Low Resolution),通過Sub-pixel操作將其變為[r*H, e*W]的高解析度影像(High Resolution)。
但是其實現過程不是直接通過插值等方式產生這個高解析度影像,而是通過卷積先得到$r^2$個通道的特徵圖(特徵圖大小和輸入低解析度影像一致),然後通過週期篩選(periodic shuffing)的方法得到這個高解析度的影像,其中$r$為上取樣因子(upscaling factor),也就是影像的擴大倍率。
二維SubPixel上取樣
[batch, height, width, channels * r * r] --> [batch, height * r, width * r, channels]
tensorflow方法實現
import tensorflow as tf def _phase_shift(I, r): # 相位偏移操作 bsize, a, b, c = I.get_shape().as_list() bsize = tf.shape(I)[0] # Handling Dimension(None) type for undefined batch dim X = tf.reshape(I, (bsize, a, b, r, r)) X = tf.transpose(X, (0, 1, 2, 4, 3)) # bsize, a, b, 1, 1 X = tf.split(X, a, 1) # a, [bsize, b, r, r] X = tf.concat([tf.squeeze(x, axis=1) for x in X], axis=2) # bsize, b, a*r, r X = tf.split(X, b, 1) # b, [bsize, a*r, r] X = tf.concat([tf.squeeze(x, axis=1) for x in X], axis=2) # bsize, a*r, b*r return tf.reshape(X, (bsize, a * r, b * r, 1)) def PixelShuffle(X, r, color=False): if color: Xc = tf.split(X, 3, 3) X = tf.concat([_phase_shift(x, r) for x in Xc], axis=3) else: X = _phase_shift(X, r) return X if __name__ == "__main__": X1 = tf.get_variable(name='X1', shape=[2, 8, 8, 4], initializer=tf.random_normal_initializer(stddev=1.0), dtype=tf.float32) Y = PixelShuffle(X1, 2) print(Y.shape) # (2, 16, 16, 1) X2 = tf.get_variable(name='X2', shape=[2, 8, 8, 4 * 3], initializer=tf.random_normal_initializer(stddev=1.0), dtype=tf.float32) Y2 = PixelShuffle(X2, 2, color=True) print(Y2.shape) # (2, 16, 16, 3)
pytorch方法實現
import torch import torch.nn as nn input = torch.randn(size=(1, 9, 4, 4)) ps = nn.PixelShuffle(3) output = ps(input) print(output.size()) # torch.Size([1, 1, 12, 12])
numpy方法實現
def PS(I, r): assert len(I.shape) == 3 assert r>0 r = int(r) O = np.zeros((I.shape[0]*r, I.shape[1]*r, I.shape[2]/(r*2))) for x in range(O.shape[0]): for y in range(O.shape[1]): for c in range(O.shape[2]): c += 1 a = np.floor(x/r).astype("int") b = np.floor(y/r).astype("int") d = c*r*(y%r) + c*(x%r) print a, b, d O[x, y, c-1] = I[a, b, d] return O
一維SubPixel上取樣
(batch_size, width, channels * r)-->(batch_size, width * r, channels)
tensorflow實現
import tensorflow as tf def SubPixel1D(I, r): """一維subpixel upsampling layer, 輸入維度(batch, width, r). """ with tf.name_scope('subpixel'): X = tf.transpose(I, [2, 1, 0]) # (r, w, b) X = tf.batch_to_space_nd(X, [r], [[0, 0]]) # (1, r*w, b) X = tf.transpose(X, [2, 1, 0]) return X # 示例 # --------------------------------------------------- if __name__ == "__main__": inputs = tf.get_variable(name='input', shape=[64, 8192, 32], initializer=tf.random_normal_initializer(stddev=1.0), dtype=tf.float32) upsample_SubPixel1D = SubPixel1D(I=inputs, r=2) print(upsample_SubPixel1D.shape) # (64, 16384, 16)
pytorch方法實現
class PixelShuffle1D(nn.Module): """ 1D pixel shuffler. https://arxiv.org/pdf/1609.05158.pdf Upscales sample length, downscales channel length "short" is input, "long" is output """ def __init__(self, upscale_factor): super(PixelShuffle1D, self).__init__() self.upscale_factor = upscale_factor def forward(self, x): batch_size, channels, in_width = x.size() channels //= self.upscale_factor out_width = self.upscale_factor * in_width x = x.contiguous().view([batch_size, channels, self.upscale_factor, in_width]) x = x.permute(0, 1, 3, 2).contiguous() x = x.view(batch_size, channels, out_width) return x
sub-pixel or fractional convolution可以看成是transposed convolution的一個特例
Meta upscale module
可以任意上取樣尺寸,還不是很出名,等於後出名了再來補全
參考
這裡很多API我還是分享的tensorflow 1.*的,主要原因是因為我最開始學深度學習的時候用的是 tensoflow 1,現在我已經轉學pytorch了,今天看了看tensorflow,2版本已經發布一年多了,1版本相當於是爛尾了,2版本雖然解決了原來的問題,可是人是向前看的,我已經使用pytorch起來,再讓我回頭學tensorflow 2似乎是一件很不情願的事情。而且tensorflow 2 已經在走向沒落了,使用tensorflow 2的開原始碼,除了google自家公司外,真的也越來越少。tensorflow加油吧,我內心深處還是喜歡你的,只不過pytorch太方便了,開源社群也很強大了。
【文件】tensorflow官方文件
【文件】pytorch官方文件
【程式碼】2D_subpixel
【程式碼】1D_pytorch-pixelshuffle1d
【動圖】卷積的動畫