深度有趣 | 30 快速影象風格遷移

張巨集倫發表於2018-09-21

原文網址 : https://juejin.im/post/5ba4e029e51d450e9874eaf3

簡介

使用TensorFlow實現快速影象風格遷移（Fast Neural Style Transfer）

原理

在之前介紹的影象風格遷移中，我們根據內容圖片和風格圖片優化輸入圖片，使得內容損失函式和風格損失函式儘可能小

和DeepDream一樣，屬於網路引數不變，根據損失函式調整輸入資料，因此每生成一張圖片都相當於訓練一個模型，需要很長時間

訓練模型需要很長時間，而使用訓練好的模型進行推斷則很快

使用快速影象風格遷移可大大縮短生成一張遷移圖片所需的時間，其模型結構如下，包括轉換網路和損失網路

風格圖片是固定的，而內容圖片是可變的輸入，因此以上模型用於將任意圖片快速轉換為指定風格的圖片

轉換網路：引數需要訓練，將內容圖片轉換成遷移圖片
損失網路：計算遷移圖片和風格圖片之間的風格損失，以及遷移圖片和原始內容圖片之間的內容損失

經過訓練後，轉換網路所生成的遷移圖片，在內容上和輸入的內容圖片相似，在風格上和指定的風格圖片相似

進行推斷時，僅使用轉換網路，輸入內容圖片，即可得到對應的遷移圖片

如果有多個風格圖片，對每個風格分別訓練一個模型即可

實現

基於以下兩個專案進行修改，github.com/lengstrom/f…、github.com/hzy46/fast-…

依然通過之前用過的imagenet-vgg-verydeep-19.mat計算內容損失函式和風格損失函式

需要一些圖片作為輸入的內容圖片，對圖片具體內容沒有任何要求，也不需要任何標註，這裡選擇使用MSCOCO資料集的train2014部分，cocodataset.org/#download，共82612張圖片

載入庫

# -*- coding: utf-8 -*-

import tensorflow as tf
import numpy as np
import cv2
from imageio import imread, imsave
import scipy.io
import os
import glob
from tqdm import tqdm
import matplotlib.pyplot as plt
%matplotlib inline
複製程式碼

檢視風格圖片，共10張

style_images = glob.glob('styles/*.jpg')
print(style_images)
複製程式碼

載入內容圖片，去掉黑白圖片，處理成指定大小，暫時不進行歸一化，畫素值範圍為0至255之間

def resize_and_crop(image, image_size):
    h = image.shape[0]
    w = image.shape[1]
    if h > w:
        image = image[h // 2 - w // 2: h // 2 + w // 2, :, :]
    else:
        image = image[:, w // 2 - h // 2: w // 2 + h // 2, :]    
    image = cv2.resize(image, (image_size, image_size))
    return image

X_data = []
image_size = 256
paths = glob.glob('train2014/*.jpg')
for i in tqdm(range(len(paths))):
    path = paths[i]
    image = imread(path)
    if len(image.shape) < 3:
        continue
    X_data.append(resize_and_crop(image, image_size))
X_data = np.array(X_data)
print(X_data.shape)
複製程式碼

載入vgg19模型，並定義一個函式，對於給定的輸入，返回vgg19各個層的輸出值，就像在GAN中那樣，通過variable_scope重用實現網路的重用

vgg = scipy.io.loadmat('imagenet-vgg-verydeep-19.mat')
vgg_layers = vgg['layers']

def vgg_endpoints(inputs, reuse=None):
    with tf.variable_scope('endpoints', reuse=reuse):
        def _weights(layer, expected_layer_name):
            W = vgg_layers[0][layer][0][0][2][0][0]
            b = vgg_layers[0][layer][0][0][2][0][1]
            layer_name = vgg_layers[0][layer][0][0][0][0]
            assert layer_name == expected_layer_name
            return W, b

        def _conv2d_relu(prev_layer, layer, layer_name):
            W, b = _weights(layer, layer_name)
            W = tf.constant(W)
            b = tf.constant(np.reshape(b, (b.size)))
            return tf.nn.relu(tf.nn.conv2d(prev_layer, filter=W, strides=[1, 1, 1, 1], padding='SAME') + b)

        def _avgpool(prev_layer):
            return tf.nn.avg_pool(prev_layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

        graph = {}
        graph['conv1_1']  = _conv2d_relu(inputs, 0, 'conv1_1')
        graph['conv1_2']  = _conv2d_relu(graph['conv1_1'], 2, 'conv1_2')
        graph['avgpool1'] = _avgpool(graph['conv1_2'])
        graph['conv2_1']  = _conv2d_relu(graph['avgpool1'], 5, 'conv2_1')
        graph['conv2_2']  = _conv2d_relu(graph['conv2_1'], 7, 'conv2_2')
        graph['avgpool2'] = _avgpool(graph['conv2_2'])
        graph['conv3_1']  = _conv2d_relu(graph['avgpool2'], 10, 'conv3_1')
        graph['conv3_2']  = _conv2d_relu(graph['conv3_1'], 12, 'conv3_2')
        graph['conv3_3']  = _conv2d_relu(graph['conv3_2'], 14, 'conv3_3')
        graph['conv3_4']  = _conv2d_relu(graph['conv3_3'], 16, 'conv3_4')
        graph['avgpool3'] = _avgpool(graph['conv3_4'])
        graph['conv4_1']  = _conv2d_relu(graph['avgpool3'], 19, 'conv4_1')
        graph['conv4_2']  = _conv2d_relu(graph['conv4_1'], 21, 'conv4_2')
        graph['conv4_3']  = _conv2d_relu(graph['conv4_2'], 23, 'conv4_3')
        graph['conv4_4']  = _conv2d_relu(graph['conv4_3'], 25, 'conv4_4')
        graph['avgpool4'] = _avgpool(graph['conv4_4'])
        graph['conv5_1']  = _conv2d_relu(graph['avgpool4'], 28, 'conv5_1')
        graph['conv5_2']  = _conv2d_relu(graph['conv5_1'], 30, 'conv5_2')
        graph['conv5_3']  = _conv2d_relu(graph['conv5_2'], 32, 'conv5_3')
        graph['conv5_4']  = _conv2d_relu(graph['conv5_3'], 34, 'conv5_4')
        graph['avgpool5'] = _avgpool(graph['conv5_4'])

        return graph
複製程式碼

選擇一張風格圖，減去通道顏色均值後，得到風格圖片在vgg19各個層的輸出值，計算四個風格層對應的Gram矩陣

style_index = 1
X_style_data = resize_and_crop(imread(style_images[style_index]), image_size)
X_style_data = np.expand_dims(X_style_data, 0)
print(X_style_data.shape)

MEAN_VALUES = np.array([123.68, 116.779, 103.939]).reshape((1, 1, 1, 3))

X_style = tf.placeholder(dtype=tf.float32, shape=X_style_data.shape, name='X_style')
style_endpoints = vgg_endpoints(X_style - MEAN_VALUES)
STYLE_LAYERS = ['conv1_2', 'conv2_2', 'conv3_3', 'conv4_3']
style_features = {}

sess = tf.Session()
for layer_name in STYLE_LAYERS:
    features = sess.run(style_endpoints[layer_name], feed_dict={X_style: X_style_data})
    features = np.reshape(features, (-1, features.shape[3]))
    gram = np.matmul(features.T, features) / features.size
    style_features[layer_name] = gram
複製程式碼

定義轉換網路，典型的卷積、殘差、逆卷積結構，內容圖片輸入之前也需要減去通道顏色均值

batch_size = 4
X = tf.placeholder(dtype=tf.float32, shape=[None, None, None, 3], name='X')
k_initializer = tf.truncated_normal_initializer(0, 0.1)

def relu(x):
    return tf.nn.relu(x)

def conv2d(inputs, filters, kernel_size, strides):
    p = int(kernel_size / 2)
    h0 = tf.pad(inputs, [[0, 0], [p, p], [p, p], [0, 0]], mode='reflect')
    return tf.layers.conv2d(inputs=h0, filters=filters, kernel_size=kernel_size, strides=strides, padding='valid', kernel_initializer=k_initializer)

def deconv2d(inputs, filters, kernel_size, strides):
    shape = tf.shape(inputs)
    height, width = shape[1], shape[2]
    h0 = tf.image.resize_images(inputs, [height * strides * 2, width * strides * 2], tf.image.ResizeMethod.NEAREST_NEIGHBOR)
    return conv2d(h0, filters, kernel_size, strides)
    
def instance_norm(inputs):
    return tf.contrib.layers.instance_norm(inputs)

def residual(inputs, filters, kernel_size):
    h0 = relu(conv2d(inputs, filters, kernel_size, 1))
    h0 = conv2d(h0, filters, kernel_size, 1)
    return tf.add(inputs, h0)

with tf.variable_scope('transformer', reuse=None):
    h0 = tf.pad(X - MEAN_VALUES, [[0, 0], [10, 10], [10, 10], [0, 0]], mode='reflect')
    h0 = relu(instance_norm(conv2d(h0, 32, 9, 1)))
    h0 = relu(instance_norm(conv2d(h0, 64, 3, 2)))
    h0 = relu(instance_norm(conv2d(h0, 128, 3, 2)))

    for i in range(5):
        h0 = residual(h0, 128, 3)

    h0 = relu(instance_norm(deconv2d(h0, 64, 3, 2)))
    h0 = relu(instance_norm(deconv2d(h0, 32, 3, 2)))
    h0 = tf.nn.tanh(instance_norm(conv2d(h0, 3, 9, 1)))
    h0 = (h0 + 1) / 2 * 255.
    shape = tf.shape(h0)
    g = tf.slice(h0, [0, 10, 10, 0], [-1, shape[1] - 20, shape[2] - 20, -1], name='g')
複製程式碼

將轉換網路的輸出即遷移圖片，以及原始內容圖片都輸入到vgg19，得到各自對應層的輸出，計算內容損失函式

CONTENT_LAYER = 'conv3_3'
content_endpoints = vgg_endpoints(X - MEAN_VALUES, True)
g_endpoints = vgg_endpoints(g - MEAN_VALUES, True)

def get_content_loss(endpoints_x, endpoints_y, layer_name):
    x = endpoints_x[layer_name]
    y = endpoints_y[layer_name]
    return 2 * tf.nn.l2_loss(x - y) / tf.to_float(tf.size(x))

content_loss = get_content_loss(content_endpoints, g_endpoints, CONTENT_LAYER)
複製程式碼

根據遷移圖片和風格圖片在指定風格層的輸出，計算風格損失函式

style_loss = []
for layer_name in STYLE_LAYERS:
    layer = g_endpoints[layer_name]
    shape = tf.shape(layer)
    bs, height, width, channel = shape[0], shape[1], shape[2], shape[3]
    
    features = tf.reshape(layer, (bs, height * width, channel))
    gram = tf.matmul(tf.transpose(features, (0, 2, 1)), features) / tf.to_float(height * width * channel)
    
    style_gram = style_features[layer_name]
    style_loss.append(2 * tf.nn.l2_loss(gram - style_gram) / tf.to_float(tf.size(layer)))

style_loss = tf.reduce_sum(style_loss)
複製程式碼

計算全變差正則，得到總的損失函式

def get_total_variation_loss(inputs):
    h = inputs[:, :-1, :, :] - inputs[:, 1:, :, :]
    w = inputs[:, :, :-1, :] - inputs[:, :, 1:, :]
    return tf.nn.l2_loss(h) / tf.to_float(tf.size(h)) + tf.nn.l2_loss(w) / tf.to_float(tf.size(w)) 

total_variation_loss = get_total_variation_loss(g)

content_weight = 1
style_weight = 250
total_variation_weight = 0.01

loss = content_weight * content_loss + style_weight * style_loss + total_variation_weight * total_variation_loss
複製程式碼

定義優化器，通過調整轉換網路中的引數降低總損失

vars_t = [var for var in tf.trainable_variables() if var.name.startswith('transformer')]
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss, var_list=vars_t)
複製程式碼

訓練模型，每輪訓練結束後，用一張測試圖片進行測試，並且將一些tensor的值寫入events檔案，便於使用tensorboard檢視

style_name = style_images[style_index]
style_name = style_name[style_name.find('/') + 1:].rstrip('.jpg')
OUTPUT_DIR = 'samples_%s' % style_name
if not os.path.exists(OUTPUT_DIR):
    os.mkdir(OUTPUT_DIR)

tf.summary.scalar('losses/content_loss', content_loss)
tf.summary.scalar('losses/style_loss', style_loss)
tf.summary.scalar('losses/total_variation_loss', total_variation_loss)
tf.summary.scalar('losses/loss', loss)
tf.summary.scalar('weighted_losses/weighted_content_loss', content_weight * content_loss)
tf.summary.scalar('weighted_losses/weighted_style_loss', style_weight * style_loss)
tf.summary.scalar('weighted_losses/weighted_total_variation_loss', total_variation_weight * total_variation_loss)
tf.summary.image('transformed', g)
tf.summary.image('origin', X)
summary = tf.summary.merge_all()
writer = tf.summary.FileWriter(OUTPUT_DIR)

sess.run(tf.global_variables_initializer())
losses = []
epochs = 2

X_sample = imread('sjtu.jpg')
h_sample = X_sample.shape[0]
w_sample = X_sample.shape[1]

for e in range(epochs):
    data_index = np.arange(X_data.shape[0])
    np.random.shuffle(data_index)
    X_data = X_data[data_index]
    
    for i in tqdm(range(X_data.shape[0] // batch_size)):
        X_batch = X_data[i * batch_size: i * batch_size + batch_size]
        ls_, _ = sess.run([loss, optimizer], feed_dict={X: X_batch})
        losses.append(ls_)
        
        if i > 0 and i % 20 == 0:
            writer.add_summary(sess.run(summary, feed_dict={X: X_batch}), e * X_data.shape[0] // batch_size + i)
            writer.flush()
        
    print('Epoch %d Loss %f' % (e, np.mean(losses)))
    losses = []

    gen_img = sess.run(g, feed_dict={X: [X_sample]})[0]
    gen_img = np.clip(gen_img, 0, 255)
    result = np.zeros((h_sample, w_sample * 2, 3))
    result[:, :w_sample, :] = X_sample / 255.
    result[:, w_sample:, :] = gen_img[:h_sample, :w_sample, :] / 255.
    plt.axis('off')
    plt.imshow(result)
    plt.show()
    imsave(os.path.join(OUTPUT_DIR, 'sample_%d.jpg' % e), result)
複製程式碼

儲存模型

saver = tf.train.Saver()
saver.save(sess, os.path.join(OUTPUT_DIR, 'fast_style_transfer'))
複製程式碼

測試圖片依舊是之前用過的交大廟門

風格遷移結果

訓練過程中可以使用tensorboard檢視訓練過程

tensorboard --logdir=samples_starry
複製程式碼

在單機上使用以下程式碼即可快速完成風格遷移，在CPU上也只需要10秒左右

# -*- coding: utf-8 -*-

import tensorflow as tf
import numpy as np
from imageio import imread, imsave
import os
import time

def the_current_time():
    print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(int(time.time()))))

style = 'wave'
model = 'samples_%s' % style
content_image = 'sjtu.jpg'
result_image = 'sjtu_%s.jpg' % style
X_image = imread(content_image)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

saver = tf.train.import_meta_graph(os.path.join(model, 'fast_style_transfer.meta'))
saver.restore(sess, tf.train.latest_checkpoint(model))

graph = tf.get_default_graph()
X = graph.get_tensor_by_name('X:0')
g = graph.get_tensor_by_name('transformer/g:0')

the_current_time()

gen_img = sess.run(g, feed_dict={X: [X_image]})[0]
gen_img = np.clip(gen_img, 0, 255) / 255.
imsave(result_image, gen_img)

the_current_time()
複製程式碼

對於其他風格圖片，用相同方法訓練對應模型即可

參考

Perceptual Losses for Real-Time Style Transfer and Super-Resolution：arxiv.org/abs/1603.08…
Fast Style Transfer in TensorFlow：github.com/lengstrom/f…
A Tensorflow Implementation for Fast Neural Style：github.com/hzy46/fast-…

視訊講解課程

深度有趣（一）

Flora影象風格遷移App
2018-12-17
APP
Android 端影象多風格遷移
2019-04-22
Android
android中的深度學習——快速風格遷移
2018-05-19
Android深度學習
Python+OpenCV 影象風格遷移(模仿名畫)
2018-10-15
PythonOpenCV
AI繪畫第二彈——影象風格遷移
2019-04-29
AI
[譯] TensorFlow 教程 #15 – 風格遷移
2019-03-04
讀“基於深度學習的影像風格遷移研究綜述”有感
2020-11-21
深度學習
深度有趣 | 14 Dlib快速入門
2019-03-02
Perceptual Losses 風格遷移論文復現小記
2023-10-16
Python 超簡單實現 9 種影像風格遷移
2020-05-16
Python
BAIR提出MC-GAN，使用GAN實現字型風格遷移
2018-03-27
AI
遷移部落格至掘金
2018-12-04
神經風格遷移：使用 tf.keras 和 Eager Execution，藉助深度學習創作藝術作品
2018-08-31
Keras深度學習
機器學習之光：神經風格遷移的直觀指南！
2019-01-22
機器學習
如何用Keras打造出“風格遷移”的AI藝術作品
2018-05-23
KerasAI
Gram格拉姆矩陣在風格遷移中的應用
2018-07-27
矩陣
基於GAN的字型風格遷移 | CVPR 2018論文解讀
2018-04-03
機器學習開源框架系列：Torch：3：影像風格遷移
2020-12-29
機器學習框架
雲遷移的安全風險
2022-05-26
部落格圖床遷移記
2019-05-03
圖床
暢談人工智慧藝術新命題：神經風格遷移
2018-11-20
人工智慧
[置頂] About Me & 部落格遷移
2024-03-31
伯克利AI實驗室出品：用GAN實現字型風格遷移
2018-04-08
AI
《深度學習——Andrew Ng》第四課第四周程式設計作業_2_神經網路風格遷移
2018-04-14
深度學習程式設計神經網路
深度學習+深度強化學習+遷移學習【研修】
2021-03-25
深度學習強化學習遷移學習
人工智慧AI影像風格遷移(StyleTransfer),基於雙層ControlNet(Python3.10)
2023-04-21
人工智慧AIPython
二維網格的遷移（java實現）
2020-11-03
Java
我有個大膽的想法，用風格遷移玩《絕地》版的《堡壘之夜》
2019-01-29
2020-11-30 VS Code 設定Linux風格
2020-11-30
Linux
英偉達再出GAN神作！多層次特徵的風格遷移人臉生成器
2018-12-14
特徵
單影像三維重建、2D到3D風格遷移和3D DeepDream
2020-04-25
3D
二維碼太醜？用風格遷移生成個性二維碼瞭解一下
2019-03-04
深度學習之遷移學習介紹與使用
2018-10-24
深度學習遷移學習
深度學習之PyTorch實戰（4）——遷移學習
2023-03-26
深度學習PyTorch遷移學習
快速實現地圖遷移資料視覺化
2018-12-19
地圖視覺化
快速遷移 Next.js 應用到函式計算
2020-03-19
JS函式
功能解讀｜快速上手 OceanBase 資料遷移服務
2022-03-28
AconMac版快速製作bigsur風格圖示
2022-02-10
Mac

深度有趣 | 30 快速影象風格遷移

簡介

原理

實現

參考

視訊講解課程

相關文章