殘差網路再升級之深度殘差收縮網路(附Keras程式碼)
從本質上講,深度殘差收縮網路屬於卷積神經網路,是深度殘差網路(deep residual network, ResNet)的一個變種。它的核心思想在於,在深度學習進行特徵學習的過程中,剔除冗餘資訊是非常重要的;軟閾值化是一種非常靈活的、刪除冗餘資訊的方式。
1.深度殘差網路
首先,在介紹深度殘差收縮網路的時候,經常需要從深度殘差網路開始講起。下圖展示了深度殘差網路的基本模組,包括一些非線性層(殘差路徑)和一個跨層的恆等連線。恆等連線是深度殘差網路的核心,是其優異效能的一個保障。
2.深度殘差收縮網路
深度殘差收縮網路,就是對深度殘差網路的殘差路徑進行收縮的一種網路。這裡的“收縮”指的就是軟閾值化。
軟閾值化是許多訊號降噪方法的核心步驟,它是將接近於零(或者說絕對值低於某一閾值τ)的特徵置為0,也就是將[-τ, τ]區間內的特徵置為0,讓其他的、距0較遠的特徵也朝著0進行收縮。
如果和前一個卷積層的偏置b放在一起看的話,這個置為零的區間就變成了[-τ+b, τ+b]。因為τ和b都是可以自動學習得到的引數,這個角度看的話,軟閾值化其實是可以將任意區間的特徵置為零,是一種更靈活的、刪除某個取值範圍特徵的方式,也可以理解成一種更靈活的非線性對映。
從另一個方面來看,前面的兩個卷積層、兩個批標準化和兩個啟用函式,將冗餘資訊的特徵,變換成接近於零的值;將有用的特徵,變換成遠離零的值。之後,通過自動學習得到一組閾值,利用軟閾值化將冗餘特徵剔除掉,將有用特徵保留下來。
通過堆疊一定數量的基本模組,可以構成完整的深度殘差收縮網路,如下圖所示:
3.影象識別及Keras程式設計
雖然深度殘差收縮網路原先是應用於基於振動訊號的故障診斷,但是深度殘差收縮網路事實上是一種通用的特徵學習方法,相信在很多工(計算機視覺、語音、文字)中都可能有一定的用處。
下面是基於深度殘差收縮網路的MNIST手寫數字識別程式(程式很簡單,僅供參考):
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Sat Dec 28 23:24:05 2019 Implemented using TensorFlow 1.0.1 and Keras 2.2.1 M. Zhao, S. Zhong, X. Fu, et al., Deep Residual Shrinkage Networks for Fault Diagnosis, IEEE Transactions on Industrial Informatics, 2019, DOI: 10.1109/TII.2019.2943898 @author: super_9527 """ from __future__ import print_function import keras import numpy as np from keras.datasets import mnist from keras.layers import Dense, Conv2D, BatchNormalization, Activation from keras.layers import AveragePooling2D, Input, GlobalAveragePooling2D from keras.optimizers import Adam from keras.regularizers import l2 from keras import backend as K from keras.models import Model from keras.layers.core import Lambda K.set_learning_phase(1) # Input image dimensions img_rows, img_cols = 28, 28 # The data, split between train and test sets (x_train, y_train), (x_test, y_test) = mnist.load_data() if K.image_data_format() == 'channels_first': x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols) x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols) input_shape = (1, img_rows, img_cols) else: x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1) x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1) input_shape = (img_rows, img_cols, 1) # Noised data x_train = x_train.astype('float32') / 255. + 0.5*np.random.random([x_train.shape[0], img_rows, img_cols, 1]) x_test = x_test.astype('float32') / 255. + 0.5*np.random.random([x_test.shape[0], img_rows, img_cols, 1]) print('x_train shape:', x_train.shape) print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples') # convert class vectors to binary class matrices y_train = keras.utils.to_categorical(y_train, 10) y_test = keras.utils.to_categorical(y_test, 10) def abs_backend(inputs): return K.abs(inputs) def expand_dim_backend(inputs): return K.expand_dims(K.expand_dims(inputs,1),1) def sign_backend(inputs): return K.sign(inputs) def pad_backend(inputs, in_channels, out_channels): pad_dim = (out_channels - in_channels)//2 return K.spatial_3d_padding(inputs, padding = ((0,0),(0,0),(pad_dim,pad_dim))) # Residual Shrinakge Block def residual_shrinkage_block(incoming, nb_blocks, out_channels, downsample=False, downsample_strides=2): residual = incoming in_channels = incoming.get_shape().as_list()[-1] for i in range(nb_blocks): identity = residual if not downsample: downsample_strides = 1 residual = BatchNormalization()(residual) residual = Activation('relu')(residual) residual = Conv2D(out_channels, 3, strides=(downsample_strides, downsample_strides), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(residual) residual = BatchNormalization()(residual) residual = Activation('relu')(residual) residual = Conv2D(out_channels, 3, padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(residual) # Calculate global means residual_abs = Lambda(abs_backend)(residual) abs_mean = GlobalAveragePooling2D()(residual_abs) # Calculate scaling coefficients scales = Dense(out_channels, activation=None, kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(abs_mean) scales = BatchNormalization()(scales) scales = Activation('relu')(scales) scales = Dense(out_channels, activation='sigmoid', kernel_regularizer=l2(1e-4))(scales) scales = Lambda(expand_dim_backend)(scales) # Calculate thresholds thres = keras.layers.multiply([abs_mean, scales]) # Soft thresholding sub = keras.layers.subtract([residual_abs, thres]) zeros = keras.layers.subtract([sub, sub]) n_sub = keras.layers.maximum([sub, zeros]) residual = keras.layers.multiply([Lambda(sign_backend)(residual), n_sub]) # Downsampling (it is important to use the pooL-size of (1, 1)) if downsample_strides > 1: identity = AveragePooling2D(pool_size=(1,1), strides=(2,2))(identity) # Zero_padding to match channels (it is important to use zero padding rather than 1by1 convolution) if in_channels != out_channels: identity = Lambda(pad_backend)(identity, in_channels, out_channels) residual = keras.layers.add([residual, identity]) return residual # define and train a model inputs = Input(shape=input_shape) net = Conv2D(8, 3, padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(inputs) net = residual_shrinkage_block(net, 1, 8, downsample=True) net = BatchNormalization()(net) net = Activation('relu')(net) net = GlobalAveragePooling2D()(net) outputs = Dense(10, activation='softmax', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(net) model = Model(inputs=inputs, outputs=outputs) model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy']) model.fit(x_train, y_train, batch_size=100, epochs=5, verbose=1, validation_data=(x_test, y_test)) # get results K.set_learning_phase(0) DRSN_train_score = model.evaluate(x_train, y_train, batch_size=100, verbose=0) print('Train loss:', DRSN_train_score[0]) print('Train accuracy:', DRSN_train_score[1]) DRSN_test_score = model.evaluate(x_test, y_test, batch_size=100, verbose=0) print('Test loss:', DRSN_test_score[0]) print('Test accuracy:', DRSN_test_score[1])
為方便對比,深度殘差網路的程式碼如下:
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Sat Dec 28 23:19:03 2019 Implemented using TensorFlow 1.0 and Keras 2.2.1 K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, CVPR, 2016. @author: me """ from __future__ import print_function import numpy as np import keras from keras.datasets import mnist from keras.layers import Dense, Conv2D, BatchNormalization, Activation from keras.layers import AveragePooling2D, Input, GlobalAveragePooling2D from keras.optimizers import Adam from keras.regularizers import l2 from keras import backend as K from keras.models import Model from keras.layers.core import Lambda K.set_learning_phase(1) # input image dimensions img_rows, img_cols = 28, 28 # the data, split between train and test sets (x_train, y_train), (x_test, y_test) = mnist.load_data() if K.image_data_format() == 'channels_first': x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols) x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols) input_shape = (1, img_rows, img_cols) else: x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1) x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1) input_shape = (img_rows, img_cols, 1) # Noised data x_train = x_train.astype('float32') / 255. + 0.5*np.random.random([x_train.shape[0], img_rows, img_cols, 1]) x_test = x_test.astype('float32') / 255. + 0.5*np.random.random([x_test.shape[0], img_rows, img_cols, 1]) print('x_train shape:', x_train.shape) print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples') # convert class vectors to binary class matrices y_train = keras.utils.to_categorical(y_train, 10) y_test = keras.utils.to_categorical(y_test, 10) def pad_backend(inputs, in_channels, out_channels): pad_dim = (out_channels - in_channels)//2 return K.spatial_3d_padding(inputs, padding = ((0,0),(0,0),(pad_dim,pad_dim))) def residual_block(incoming, nb_blocks, out_channels, downsample=False, downsample_strides=2): residual = incoming in_channels = incoming.get_shape().as_list()[-1] for i in range(nb_blocks): identity = residual if not downsample: downsample_strides = 1 residual = BatchNormalization()(residual) residual = Activation('relu')(residual) residual = Conv2D(out_channels, 3, strides=(downsample_strides, downsample_strides), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(residual) residual = BatchNormalization()(residual) residual = Activation('relu')(residual) residual = Conv2D(out_channels, 3, padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(residual) # Downsampling (it is important to use the pooL-size of (1, 1)) if downsample_strides > 1: identity = AveragePooling2D(pool_size=(1, 1), strides=(2, 2))(identity) # Zero_padding to match channels (it is important to use zero padding rather than 1by1 convolution) if in_channels != out_channels: identity = Lambda(pad_backend)(identity, in_channels, out_channels) residual = keras.layers.add([residual, identity]) return residual # define and train a model inputs = Input(shape=input_shape) net = Conv2D(8, 3, padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(inputs) net = residual_block(net, 1, 8, downsample=True) net = BatchNormalization()(net) net = Activation('relu')(net) net = GlobalAveragePooling2D()(net) outputs = Dense(10, activation='softmax', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(net) model = Model(inputs=inputs, outputs=outputs) model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy']) model.fit(x_train, y_train, batch_size=100, epochs=5, verbose=1, validation_data=(x_test, y_test)) # get results K.set_learning_phase(0) resnet_train_score = model.evaluate(x_train, y_train, batch_size=100, verbose=0) print('Train loss:', resnet_train_score[0]) print('Train accuracy:', resnet_train_score[1]) resnet_test_score = model.evaluate(x_test, y_test, batch_size=100, verbose=0) print('Test loss:', resnet_test_score[0]) print('Test accuracy:', resnet_test_score[1])
備註:
(1)深度殘差收縮網路的結構比普通的深度殘差網路複雜,也許更難訓練。
(2)程式裡只設定了一個基本模組,在更復雜的資料集上,可適當增加。
(3)如果遇到這個TypeError:softmax() got an unexpected keyword argument 'axis',就點開tensorflow_backend.py,將return tf.nn.softmax(x, axis=axis)中的第一個axis改成dim即可。
參考文獻:
M. Zhao, S. Zhong, X. Fu, et al., Deep residual shrinkage networks for fault diagnosis, IEEE Transactions on Industrial Informatics, 2019, DOI: 10.1109/TII.2019.2943898
https://ieeexplore.ieee.org/document/8850096
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69957925/viewspace-2671518/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 深度殘差收縮網路:(六)程式碼實現
- 深度殘差收縮網路:(三)網路結構
- 深度殘差收縮網路:(一)背景知識
- 深度殘差收縮網路:(二)整體思路
- 深度學習故障診斷——深度殘差收縮網路深度學習
- 深度殘差網路(ResNet)
- 十分鐘弄懂深度殘差收縮網路
- 深度殘差收縮網路:(五)實驗驗證
- 深度學習之殘差網路深度學習
- 殘差網路(Residual Networks, ResNets)
- 殘差神經網路-ResNet神經網路
- 深度殘差收縮網路:(四)注意力機制下的閾值設定
- 學習筆記16:殘差網路筆記
- PyTorch入門-殘差卷積神經網路PyTorch卷積神經網路
- 影像分割論文 | DRN膨脹殘差網路 | CVPR2017
- ResNet架構可逆!多大等提出效能優越的可逆殘差網路架構
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄10)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄11)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄12)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄13)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄14)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄15)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄16)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄17)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄1)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄2)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄3)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄4)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄5)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄6)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄7)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄8)函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄9)函式
- 深度學習——手動實現殘差網路ResNet 辛普森一家人物識別深度學習
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄18)Cifar10~94.28%函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄19)Cifar10~93.96%函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄26)Cifar10~95.92%函式
- 深度殘差網路+自適應引數化ReLU啟用函式(調參記錄20)Cifar10~94.17%函式