TensorFlow 實戰：Neural Style

發表於2017-06-21

Neural Style是一個非常有意思的深度學習應用：輸入一張代表內容的圖片和一張代表風格的圖片，深度學習網路會輸出一張融合了這個風格和內容的新作品。

TensorFlow是Google開源的最流行的深度學習框架。作者anishathalye使用TensorFlow實現了Neural Style，並將其開源放在了GitHub上。本文對他的程式碼進行深入剖析。程式碼請點這裡。

Pretrained VGG-19 Model

VGG在2014年的 ILSVRC localization and classification 兩個問題上分別取得了第一名和第二名。VGG-19是其中的一個模型，官網上提供了預先訓練好的係數，經常被業界用來做原始圖片的特徵變換。

VGG-19是一個非常深的神經網路，總共有19層，基本結構如下：

前幾層為卷積和maxpool的交替，每個卷積包含多個卷積層，最後面再緊跟三個全連線層。具體而言，第一個卷積包含2個卷積層，第二個卷積包含2個卷積層，第三個卷積包含4個卷基層，第四個卷積包含4個卷積層，第五個卷積包含4個卷基層，所以一共有16個卷積層，加上3個全連線層，一共19層，因此稱為VGG-19模型。VGG-19的神經網路結構如下表所示：

Neural Style只依賴於VGG-19的卷積層，需要使用神經網路層列舉如下：

VGG19_LAYERS = (
    'conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1',

    'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2',

    'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3',
    'relu3_3', 'conv3_4', 'relu3_4', 'pool3',

    'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3',
    'relu4_3', 'conv4_4', 'relu4_4', 'pool4',

    'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3',
    'relu5_3', 'conv5_4', 'relu5_4'
)

VGG19_LAYERS = (

'conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1',

'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2',

'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3',

'relu3_3', 'conv3_4', 'relu3_4', 'pool3',

'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3',

'relu4_3', 'conv4_4', 'relu4_4', 'pool4',

'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3',

'relu5_3', 'conv5_4', 'relu5_4'

)

我們可以從MatCovNet下載頁獲取VGG-19模型預先訓練好的模型係數檔案。該檔案為Matlab格式，我們可以使用Python的scipy.io進行資料讀取。

該資料包含很多資訊，我們需要的資訊是每層神經網路的kernels和bias。kernels的獲取方式是data['layers'][0][第i層][0][0][0][0][0]，形狀為[width, height, in_channels, out_channels]，bias的獲取方式是data['layers'][0][第i層][0][0][0][0][0]，形狀為[1,out_channels]。對於VGG-19的卷積，全部採用了3X3的filters，所以width為3，height為3。注意，這裡面的層數i，指的是最細粒度的層數，包括conv、relu、pool、fc各種操作。因此，i=0為卷積核，i=1為relu，i=2為卷積核，i=3為relu，i=4為pool，i=5為卷積核，……，i=37為全連線層，以此類推。VGG-19的pooling採用了長寬為2X2的max-pooling，Neural Style將它替換為了average-pooling，因為作者發現這樣的效果會稍微好一些。

VGG-19需要對輸入圖片進行一步預處理，把每個畫素點的取值減去訓練集算出來的RGB均值。VGG-19的RGB均值可以通過np.mean(data['normalization'][0][0][0], axis=(0, 1)獲得，其取值為[ 123.68 116.779 103.939]。

綜上所述，我們可以使用下面的程式碼vgg.py讀取VGG-19神經網路，用於構造Neural Style模型。

import tensorflow as tf
import numpy as np
import scipy.io

def load_net(data_path):
    data = scipy.io.loadmat(data_path)
    mean = data['normalization'][0][0][0]
    mean_pixel = np.mean(mean, axis=(0,1))
    weights = data['layers'][0]
    return weights, mean_pixel

def net_preloaded(weights, input_image, pooling):
    net = {}
    current = input_image
    for i, name in enumerate(VGG19_LAYERS):
        kind = name[:4]
        if kind == 'conv':
            kernels, bias = weights[i][0][0][0][0]
            # matconvnet: weights are [width, height, in_channels, out_channels]
            # tensorflow: weights are [height, width, in_channels, out_channels]
            kernels = np.transpose(kernels, (1, 0, 2, 3))
            bias = bias.reshape(-1)
            current = _conv_layer(current, kernels, bias)
        elif kind == 'relu':
            current = tf.nn.relu(current)
        elif kind == 'pool':
            current = _pool_layer(current, pooling)
        net[name] = current
    return net
    
def _conv_layer(input, weights, bias):
    conv = tf.nn.conv2d(input, tf.constant(weights), strides=(1, 1, 1, 1),
            padding='SAME')
    return tf.nn.bias_add(conv, bias)

def _pool_layer(input, pooling):
    if pooling == 'avg':
        return tf.nn.avg_pool(input, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1),
            padding='SAME')
    else:
        return tf.nn.max_pool(input, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1),
            padding='SAME')

def preprocess(image, mean_pixel):
    return image - mean_pixel

def unprocess(image, mean_pixel):
    return image + mean_pixel

import tensorflow as tf

import numpy as np

import scipy.io

def load_net(data_path):

data = scipy.io.loadmat(data_path)

mean = data['normalization'][0][0][0]

mean_pixel = np.mean(mean, axis=(0,1))

weights = data['layers'][0]

return weights, mean_pixel

def net_preloaded(weights, input_image, pooling):

net = {}

current = input_image

for i, name in enumerate(VGG19_LAYERS):

kind = name[:4]

if kind == 'conv':

kernels, bias = weights[i][0][0][0][0]

# matconvnet: weights are [width, height, in_channels, out_channels]

# tensorflow: weights are [height, width, in_channels, out_channels]

kernels = np.transpose(kernels, (1, 0, 2, 3))

bias = bias.reshape(-1)

current = _conv_layer(current, kernels, bias)

elif kind == 'relu':

current = tf.nn.relu(current)

elif kind == 'pool':

current = _pool_layer(current, pooling)

net[name] = current

return net

def _conv_layer(input, weights, bias):

conv = tf.nn.conv2d(input, tf.constant(weights), strides=(1, 1, 1, 1),

padding='SAME')

return tf.nn.bias_add(conv, bias)

def _pool_layer(input, pooling):

if pooling == 'avg':

return tf.nn.avg_pool(input, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1),

padding='SAME')

else:

return tf.nn.max_pool(input, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1),

padding='SAME')

def preprocess(image, mean_pixel):

return image - mean_pixel

def unprocess(image, mean_pixel):

return image + mean_pixel

Neural Style

Neural Style的核心思想如下圖所示：

Part 1: Content Reconstruction

基本思路如下：將content圖片p和一張隨機生成的圖片x，都經過VGG-19的卷積網路進行特徵變換，獲取某些層級輸出的特徵變換結果，要求二者的差異最小。二者在l層的損失函式定義如下：

其中F_{ij}^l為隨機圖片的第i個卷積核filter在位置j的取值，P_{ij}^l為content圖片的第i個卷積核filter在位置j的取值。

計算content圖片的feature map邏輯實現如下：

# 引數說明
# network為VGG-19檔案的路徑
# content為內容圖片轉化得到的陣列
# pooling為池化方式

CONTENT_LAYERS = ('relu4_2', 'relu5_2')  # paper原文只使用了relu4_2
content_features = {}
shape = (1,) + content.shape  # input shape: [batch, height, width, channels], only one image, so batch=1.

# 獲取VGG-19的訓練係數，和RGB均值
vgg_weights, vgg_mean_pixel = vgg.load_net(network)


# 計算Content圖片的feature map
g = tf.Graph()
with g.as_default(), g.device('/cpu:0'), tf.Session() as sess:
    # 構造Computation Graph，feed為image，輸出的net包含了VGG-19每個層級的輸出結果
    image = tf.placeholder('float', shape=shape)
    net = vgg.net_preloaded(vgg_weights, image, pooling)
    # 將content進行預處理
    content_pre = np.array([vgg.preprocess(content, vgg_mean_pixel)])
    # 將預處理後的content_pre feed給Computation Graph，得到計算結果
    for layer in CONTENT_LAYERS:
        content_features[layer] = net[layer].eval(feed_dict={image: content_pre})

# 引數說明

# network為VGG-19檔案的路徑

# content為內容圖片轉化得到的陣列

# pooling為池化方式

CONTENT_LAYERS = ('relu4_2', 'relu5_2') # paper原文只使用了relu4_2

content_features = {}

shape = (1,) + content.shape # input shape: [batch, height, width, channels], only one image, so batch=1.

# 獲取VGG-19的訓練係數，和RGB均值

vgg_weights, vgg_mean_pixel = vgg.load_net(network)

# 計算Content圖片的feature map

g = tf.Graph()

with g.as_default(), g.device('/cpu:0'), tf.Session() as sess:

# 構造Computation Graph，feed為image，輸出的net包含了VGG-19每個層級的輸出結果

image = tf.placeholder('float', shape=shape)

net = vgg.net_preloaded(vgg_weights, image, pooling)

# 將content進行預處理

content_pre = np.array([vgg.preprocess(content, vgg_mean_pixel)])

# 將預處理後的content_pre feed給Computation Graph，得到計算結果

for layer in CONTENT_LAYERS:

content_features[layer] = net[layer].eval(feed_dict={image: content_pre})

計算隨機圖片的feature map，並計算content loss的邏輯實現如下：

# 引數說明
# image為隨機生成的圖片
# pooling為池化方式
# content_weight_blend為兩個content重構層的佔比，預設為1，只使用更精細的重構層relu4_2；更抽象的重構層relu5_2佔比為1-content_weight_blend.
# content_weight為內容損失的係數

with tf.Graph().as_default():
    net = vgg.net_preloaded(vgg_weights, image, pooling)
    content_layers_weights = {}
    content_layers_weights['relu4_2'] = content_weight_blend
    content_layers_weights['relu5_2'] = 1.0 - content_weight_blend
    
    content_loss = 0
    content_losses = []
    for content_layer in CONTENT_LAYERS:
        content_losses.append(content_layers_weights[content_layer] * content_weight * (2 * tf.nn.l2_loss(net[content_layer] - content_features[content_layer]) / content_features[content_layer].size))
        content_loss += reduce(tf.add, content_losses)

# 引數說明

# image為隨機生成的圖片

# pooling為池化方式

# content_weight_blend為兩個content重構層的佔比，預設為1，只使用更精細的重構層relu4_2；更抽象的重構層relu5_2佔比為1-content_weight_blend.

# content_weight為內容損失的係數

with tf.Graph().as_default():

net = vgg.net_preloaded(vgg_weights, image, pooling)

content_layers_weights = {}

content_layers_weights['relu4_2'] = content_weight_blend

content_layers_weights['relu5_2'] = 1.0 - content_weight_blend

content_loss = 0

content_losses = []

for content_layer in CONTENT_LAYERS:

content_losses.append(content_layers_weights[content_layer] * content_weight * (2 * tf.nn.l2_loss(net[content_layer] - content_features[content_layer]) / content_features[content_layer].size))

content_loss += reduce(tf.add, content_losses)

Part 2: Style Reconstruction

從數學上定義什麼是風格，是Neural Style比較有意思的地方。每個卷積核filter可以看做是圖形的一種特徵抽取。風格在這篇paper中被簡化為任意兩種特徵的相關性。相關性的描述使用餘弦相似性，而餘弦相似性又正比於兩種特徵的點積。於是風格的數學定義被表示為神經網路層裡filter i和filter j的點積，用G_{ij}^l表示。

與Content Reconstruction中的損失定義相似，我們把style圖片和隨機生成的噪點圖片經過相同的VGG-19卷積網路進行特徵變換，選出指定層級的filters。對每個層級，計算兩張圖片特徵變換後$G_{ij}^l$的差異。

各個層級的加權和就是最後的style loss：

計算style圖片的feature map邏輯實現如下：

# 引數說明
# styles為風格圖片集，可以為多張圖片
# style_blend_weights為風格圖片集之間的權重
# style_layers_weights為不同神經網路層的權重

STYLE_LAYERS = ('relu1_1', 'relu2_1', 'relu3_1', 'relu4_1', 'relu5_1')
style_shapes = [(1,) + style.shape for style in styles]
style_features = [{} for _ in styles]


# 計算style圖片的feature map
for i in range(len(styles)):
    g = tf.Graph()
    with g.as_default(), g.device('/cpu:0'), tf.Session() as sess:
        image = tf.placeholder('float', shape=style_shapes[i])
        net = vgg.net_preloaded(vgg_weights, image, pooling)
        style_pre = np.array([vgg.preprocess(styles[i], vgg_mean_pixel)])
        for layer in STYLE_LAYERS:
            features = net[layer].eval(feed_dict={image: style_pre})
            features = np.reshape(features, (-1, features.shape[3]))  # features.shape[3] is the number of filters
            gram = np.matmul(features.T, features) / features.size
            style_features[i][layer] = gram

# 引數說明

# styles為風格圖片集，可以為多張圖片

# style_blend_weights為風格圖片集之間的權重

# style_layers_weights為不同神經網路層的權重

STYLE_LAYERS = ('relu1_1', 'relu2_1', 'relu3_1', 'relu4_1', 'relu5_1')

style_shapes = [(1,) + style.shape for style in styles]

style_features = [{} for _ in styles]

# 計算style圖片的feature map

for i in range(len(styles)):

g = tf.Graph()

with g.as_default(), g.device('/cpu:0'), tf.Session() as sess:

image = tf.placeholder('float', shape=style_shapes[i])

net = vgg.net_preloaded(vgg_weights, image, pooling)

style_pre = np.array([vgg.preprocess(styles[i], vgg_mean_pixel)])

for layer in STYLE_LAYERS:

features = net[layer].eval(feed_dict={image: style_pre})

features = np.reshape(features, (-1, features.shape[3])) # features.shape[3] is the number of filters

gram = np.matmul(features.T, features) / features.size

style_features[i][layer] = gram

計算隨機圖片的feature map，並計算style loss的邏輯實現如下：

# style loss
style_loss = 0
for i in range(len(styles)):
    style_losses = []
    for style_layer in STYLE_LAYERS:
        layer = net[style_layer]
        _, height, width, number = map(lambda i: i.value, layer.get_shape())
        size = height * width * number
        feats = tf.reshape(layer, (-1, number))
        gram = tf.matmul(tf.transpose(feats), feats) / size
        style_gram = style_features[i][style_layer]
        style_losses.append(style_layers_weights[style_layer] * 2 * tf.nn.l2_loss(gram - style_gram) / style_gram.size)
        style_loss += style_weight * style_blend_weights[i] * reduce(tf.add, style_losses)
        
# tv_loss 
# 注：The total variation (TV) loss encourages spatial smoothness in the generated image. It was not used by Gatys et al in their CVPR paper but it can sometimes improve the results; for more details and explanation see Mahendran and Vedaldi "Understanding Deep Image Representations by Inverting Them" CVPR 2015.
tv_loss = ... 

loss = content_loss + style_loss + tv_loss
train_step = tf.train.AdamOptimizer(learning_rate, beta1, beta2, epsilon).minimize(loss)

# style loss

style_loss = 0

for i in range(len(styles)):

style_losses = []

for style_layer in STYLE_LAYERS:

layer = net[style_layer]

_, height, width, number = map(lambda i: i.value, layer.get_shape())

size = height * width * number

feats = tf.reshape(layer, (-1, number))

gram = tf.matmul(tf.transpose(feats), feats) / size

style_gram = style_features[i][style_layer]

style_losses.append(style_layers_weights[style_layer] * 2 * tf.nn.l2_loss(gram - style_gram) / style_gram.size)

style_loss += style_weight * style_blend_weights[i] * reduce(tf.add, style_losses)

# tv_loss

# 注：The total variation (TV) loss encourages spatial smoothness in the generated image. It was not used by Gatys et al in their CVPR paper but it can sometimes improve the results; for more details and explanation see Mahendran and Vedaldi "Understanding Deep Image Representations by Inverting Them" CVPR 2015.

tv_loss = ...

loss = content_loss + style_loss + tv_loss

train_step = tf.train.AdamOptimizer(learning_rate, beta1, beta2, epsilon).minimize(loss)

將上述程式碼有序組合在一起後，可以得到Neural Style TensorFlow程式碼的第二個關鍵檔案stylize.py。

參考資料

影像風格遷移(Neural Style)簡史
2018-02-04
機器學習實戰（十三）：Convolutional Neural Networks
2020-12-27
機器學習
Tensorflow-keras 理論 & 實戰
2020-03-04
Keras
面向機器智慧的TensorFlow實戰2：TensorFlow基礎
2018-05-24
深度學習：TensorFlow入門實戰
2021-09-16
深度學習
基於TensorFlow的深度學習實戰
2018-04-25
深度學習
TensorFlow 2.0 程式碼實戰專欄開篇
2020-04-06
分享《TensorFlow實戰》中文版PDF+原始碼
2018-12-05
原始碼
JavaScript玩轉機器學習-Tensorflow.js專案實戰
2020-12-29
JavaScript機器學習JS
深度學習Tensorflow實戰，新課進行曲！
2018-03-26
深度學習
面向機器智慧的TensorFlow實戰1：安裝
2018-05-24
TensorFlow實戰之Scikit Flow系列指導：Part 1
2015-11-24
【深度學習-基於Tensorflow的實戰】公開課實況
2018-12-21
深度學習
卷積神經網路第四周作業2: Art Generation with Neural Style Transfer - v1
2018-12-31
卷積神經網路
推薦閱讀《Tensorflow：實戰Google深度學習框架》
2019-12-17
Go深度學習框架
《Tensorflow：實戰Google深度學習框架》圖書推薦
2018-03-08
Go深度學習框架
面向機器智慧的TensorFlow實戰4：機器學習基礎
2018-05-25
機器學習
面向機器智慧的TensorFlow實戰7：詞向量嵌入
2018-05-26
面向機器智慧的TensorFlow實戰8：序列分類
2018-05-26
TensorFlow實戰卷積神經網路之LeNet
2018-04-03
卷積神經網路
TensorFlow技術解析與實戰 4 基礎知識
2017-12-14
WPF style BasedOn base style
2024-11-17
windows10 tensorflow（二）原理實戰之迴歸分析，深度
2021-09-09
Windows
深度學習之tensorflow2實戰：多輸出模型
2022-11-23
深度學習模型
打造漂亮實用的Interllij Java Style
2017-01-04
Java
TensorFlow快速入門與實戰-彭靖田-極客時間
2019-03-26
[action]tensorflow深度學習實戰 (4) 實現簡單卷積神經網路
2016-08-09
深度學習卷積神經網路
Simple Neural Network
2024-04-27
入門 | Tensorflow實戰講解神經網路搭建詳細過程
2019-06-21
神經網路
面向機器智慧的TensorFlow實戰5：目標識別與分類
2018-05-26
深度學習入門實戰（二）- 用 TensorFlow 訓練線性迴歸
2017-04-21
深度學習
TensorFlow實現Batch Normalization
2020-04-06
BATORM
TensorFlow分散式實踐
2019-01-16
分散式
Ruby Code & Style
2006-05-24
TensorFlow系列專題（六）：實戰專案Mnist手寫資料集識別
2018-11-22
《深度學習——基於Tensorflow的實戰》公開課語音直播預告
2017-08-03
深度學習
list-style與list-style-type的區別
2013-01-30
[論文速覽] Separating Style and Content for Generalized Style Transfer
2024-03-11
Zed

TensorFlow 實戰：Neural Style

Pretrained VGG-19 Model

Neural Style

Part 1: Content Reconstruction

Part 2: Style Reconstruction

參考資料

相關文章