經典論文復現 | 基於深度卷積網路的影像超解析度演算法 - IT人

筆者本次選擇復現的是湯曉鷗教授和何愷明團隊發表於 2015 年的經典論文——SRCNN。超解析度技術（Super-Resolution）是指從觀測到的低解析度影像重建出相應的高解析度影像，在監控裝置、衛星影像和醫學影像等領域都有重要的應用價值。在深度卷積網路的浪潮下，本文首次提出了基於深度卷積網路的端到端超解析度演算法。

經典論文復現 | 基於深度卷積網路的影像超解析度演算法

論文復現程式碼： http://aistudio.baidu.com/aistudio/#/projectdetail/24446

SRCNN流程

經典論文復現 | 基於深度卷積網路的影像超解析度演算法 ▲ SRCNN演算法框架

SRCNN 將深度學習與傳統稀疏編碼之間的關係作為依據，將 3 層網路劃分為影像塊提取（Patch extraction and representation）、非線性對映（Non-linear mapping）以及最終的重建（Reconstruction）。

SRCNN 具體流程如下：

1. 先將低解析度影像使用雙三次差值放大至目標尺寸（如放大至 2 倍、3 倍、4 倍），此時仍然稱放大至目標尺寸後的影像為低解析度影像（Low-resolution image），即圖中的輸入（input）；

2. 將低解析度影像輸入三層卷積神經網路。舉例：在論文其中一個實驗相關設定，對 YCrCb 顏色空間中的 Y 通道進行重建，網路形式為 (conv1+relu1)—(conv2+relu2)—(conv3+relu3)；第一層卷積：卷積核尺寸 9×9 (f1×f1)，卷積核數目 64 (n1)，輸出 64 張特徵圖；第二層卷積：卷積核尺寸 1×1 (f2×f2)，卷積核數目 32 (n2)，輸出 32 張特徵圖；第三層卷積：卷積核尺寸 5×5 (f3×f3)，卷積核數目 1 (n3)，輸出 1 張特徵圖即為最終重建高解析度影像。

經典論文復現 | 基於深度卷積網路的影像超解析度演算法

訓練

訓練資料集

論文中某一實驗採用 91 張自然影像作為訓練資料集，對訓練集中的影像先使用雙三次差值縮小到低解析度尺寸，再將其放大到目標放大尺寸，最後切割成諸多 33 × 33 影像塊作為訓練資料，作為標籤資料的則為影像中心的 21 × 21 影像塊（與卷積層細節設定相關）。

損失函式

採用 MSE 函式作為卷積神經網路損失函式。

經典論文復現 | 基於深度卷積網路的影像超解析度演算法

卷積層細節設定

第一層卷積核 9 × 9，得到特徵圖尺寸為 (33-9)/1+1=25，第二層卷積核 1 × 1，得到特徵圖尺寸不變，第三層卷積核 5 × 5，得到特徵圖尺寸為 (25-5)/1+1=21。訓練時得到的尺寸為 21 × 21，因此影像中心的 21 × 21 影像塊作為標籤資料（卷積訓練時不進行 padding）。

# 檢視個人持久化工作區檔案
!ls /home/aistudio/work/
# coding=utf-8
import os
import paddle.fluid as fluid
import paddle.v2 as paddle
from PIL import Image
import numpy as np
import scipy.misc
import scipy.ndimage
import h5py
import glob

FLAGS={"epoch": 10,"batch_size": 128,"image_size": 33,"label_size": 21,
      "learning_rate": 1e-4,"c_dim": 1,"scale": 3,"stride": 14,
      "checkpoint_dir": "checkpoint","sample_dir": "sample","is_train": True}

#utils
def read_data(path):
    with h5py.File(path, 'r') as hf:
        data = np.array(hf.get('data'))
        label = np.array(hf.get('label'))
        return data, label

def preprocess(path, scale=3):

    image = imread(path, is_grayscale=True)
    label_ = modcrop(image, scale)

    label_ = label_ / 255.
    input_ = scipy.ndimage.interpolation.zoom(label_, zoom=(1. / scale), prefilter=False)  # 一次
    input_ = scipy.ndimage.interpolation.zoom(input_, zoom=(scale / 1.), prefilter=False)  # 二次，bicubic

    return input_, label_

def prepare_data(dataset):
    if FLAGS['is_train']:
        data_dir = os.path.join(os.getcwd(), dataset)
        data = glob.glob(os.path.join(data_dir, "*.bmp"))
    else:
        data_dir = os.path.join(os.sep, (os.path.join(os.getcwd(), dataset)), "Set5")
        data = glob.glob(os.path.join(data_dir, "*.bmp"))

    return data

def make_data(data, label):
    if not os.path.exists('data/checkpoint'):
        os.makedirs('data/checkpoint')
    if FLAGS['is_train']:
        savepath = os.path.join(os.getcwd(), 'data/checkpoint/train.h5')
    # else:
    #     savepath = os.path.join(os.getcwd(), 'data/checkpoint/test.h5')

    with h5py.File(savepath, 'w') as hf:
        hf.create_dataset('data', data=data)
        hf.create_dataset('label', data=label)

def imread(path, is_grayscale=True):
    if is_grayscale:
        return scipy.misc.imread(path, flatten=True, mode='YCbCr').astype(np.float)  # 將影像轉灰度
    else:
        return scipy.misc.imread(path, mode='YCbCr').astype(np.float)  # 預設為false

def modcrop(image, scale=3):

    if len(image.shape) == 3:  # 彩色 800*600*3
        h, w, _ = image.shape
        h = h - np.mod(h, scale)
        w = w - np.mod(w, scale)
        image = image[0:h, 0:w, :]
    else:  # 灰度 800*600
        h, w = image.shape
        h = h - np.mod(h, scale)
        w = w - np.mod(w, scale)
        image = image[0:h, 0:w]
    return image

def input_setup(config):
    if config['is_train']:
        data = prepare_data(dataset="data/data899/Train.zip_files/Train")
    else:
        data = prepare_data(dataset="Test")

    sub_input_sequence = []
    sub_label_sequence = []
    padding = abs(config['image_size'] - config['label_size']) // 2  # 6 填充

    if config['is_train']:
        for i in range(len(data)):
            input_, label_ = preprocess(data[i], config['scale'])  # data[i]為資料目錄

            if len(input_.shape) == 3:
                h, w, _ = input_.shape
            else:
                h, w = input_.shape
            for x in range(0, h - config['image_size'] + 1, config['stride']):
                for y in range(0, w - config['image_size'] + 1, config['stride']):
                    sub_input = input_[x:x + config['image_size'], y:y + config['image_size']]  # [33 x 33]
                    sub_label = label_[x + padding:x + padding + config['label_size'],
                                y + padding:y + padding + config['label_size']]  # [21 x 21]

                    # Make channel value,顏色通道1
                    sub_input = sub_input.reshape([config['image_size'], config['image_size'], 1])
                    sub_label = sub_label.reshape([config['label_size'], config['label_size'], 1])

                    sub_input_sequence.append(sub_input)
                    sub_label_sequence.append(sub_label)
        arrdata = np.asarray(sub_input_sequence)  # [?, 33, 33, 1]
        arrlabel = np.asarray(sub_label_sequence)  # [?, 21, 21, 1]

        make_data(arrdata, arrlabel)  # 把處理好的資料進行儲存，路徑為checkpoint/..
    else:
        input_, label_ = preprocess(data[4], config['scale'])

        if len(input_.shape) == 3:
            h, w, _ = input_.shape
        else:
            h, w = input_.shape
        input = input_.reshape([h, w, 1])

        label = label_[6:h - 6, 6:w - 6]
        label = label.reshape([h - 12, w - 12, 1])

        sub_input_sequence.append(input)
        sub_label_sequence.append(label)

        input1 = np.asarray(sub_input_sequence)
        label1 = np.asarray(sub_label_sequence)
        return input1, label1, h, w

def imsave(image, path):
    return scipy.misc.imsave(path, image)
#train
def reader_creator_image_and_label():
    input_setup(FLAGS)
    data_dir= os.path.join('./data/{}'.format(FLAGS['checkpoint_dir']), "train.h5")
    images,labels=read_data(data_dir)
    def reader():
        for i in range(len(images)):
            yield images, labels
    return reader
def train(use_cuda, num_passes,BATCH_SIZE = 128, model_save_dir='../models'):
    if FLAGS['is_train']:
      images = fluid.layers.data(name='images', shape=[1, FLAGS['image_size'], FLAGS['image_size']], dtype='float32')
      labels = fluid.layers.data(name='labels', shape=[1, FLAGS['label_size'], FLAGS['label_size']], dtype='float32')
    else:
      _,_,FLAGS['image_size'],FLAGS['label_size']=input_setup(FLAGS)
      images = fluid.layers.data(name='images', shape=[1, FLAGS['image_size'], FLAGS['label_size']], dtype='float32')
      labels = fluid.layers.data(name='labels', shape=[1, FLAGS['image_size']-12, FLAGS['label_size']-12], dtype='float32')

    #feed_order=['images','labels']
    # 獲取神經網路的訓練結果
    predict = model(images)
    # 獲取損失函式
    cost = fluid.layers.square_error_cost(input=predict, label=labels)
    # 定義平均損失函式
    avg_cost = fluid.layers.mean(cost)
    # 定義優化方法
    optimizer = fluid.optimizer.Momentum(learning_rate=1e-4,momentum=0.9)
    opts =optimizer.minimize(avg_cost)

    # 是否使用GPU
    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()

    # 初始化執行器
    exe=fluid.Executor(place)
    exe.run(fluid.default_startup_program())
    # 獲取訓練資料
    train_reader = paddle.batch(
        reader_creator_image_and_label(), batch_size=BATCH_SIZE)
    # 獲取測試資料
    # test_reader = paddle.batch(
    #     read_data(), batch_size=BATCH_SIZE)
    #print(len(next(train_reader())))
    feeder = fluid.DataFeeder(place=place, feed_list=[images, labels])
    for pass_id in range(num_passes):
        for batch_id, data in enumerate(train_reader()):
            avg_cost_value = exe.run(fluid.default_main_program(),
                                    feed=feeder.feed(data),
                                    fetch_list=[avg_cost])

            if batch_id%100 == 0:
                print("loss="+avg_cost_value[0])

def model(images):
    conv1=fluid.layers.conv2d(input=images, num_filters=64, filter_size=9, act='relu')
    conv2=fluid.layers.conv2d(input=conv1, num_filters=32, filter_size=1,act='relu')
    conv3=fluid.layers.conv2d(input=conv2, num_filters=1, filter_size=5)
    return conv3

if __name__ == '__main__':
    # 開始訓練
    train(use_cuda=False, num_passes=10)

測試

全卷積網路

所用網路為全卷積網路，因此作為實際測試時，直接輸入完整影像即可。

Padding

訓練時得到的實際上是除去四周 (33-21)/2=6 畫素外的影像，若直接採用訓練時的設定（無 padding），得到的影像最後會減少四周各 6 畫素（如插值放大後輸入 512 × 512，輸出 500 × 500）。

因此在測試時每一層卷積都進行了 padding（卷積核尺寸為 1 × 1的不需要進行 padding），這樣保證插值放大後輸入與輸出尺寸的一致性。

重建結果

客觀評價指標 PSNR 與 SSIM：相比其他傳統方法，SRCNN 取得更好的重建效果。

經典論文復現 | 基於深度卷積網路的影像超解析度演算法

主觀效果：相比其他傳統方法，SRCNN 重建效果更具優勢。

經典論文復現 | 基於深度卷積網路的影像超解析度演算法

經典論文復現 | 基於深度卷積網路的影像超解析度演算法

SRCNN流程

訓練

測試

重建結果

相關文章