題圖來自:Experiments with style transfer
終於寫到這一篇了。這兩年各種藝術風格影象處理的app層出不窮,比如當初火熱的Prisma。
本文簡單地介紹並實現了風格遷移演算法,更多描述可參考之前翻譯的文章(影象風格化、AI作曲,機器學習與藝術)。
不過在具體應用時可能還需優化,比如視訊中要考慮幀間穩定性等。01 - 簡單線性模型 | 02 - 卷積神經網路 | 03 - PrettyTensor | 04 - 儲存& 恢復
05 - 整合學習 | 06 - CIFAR 10 | 07 - Inception 模型 | 08 - 遷移學習
09 - 視訊資料 | 11 - 對抗樣本 | 12 - MNIST的對抗噪聲 | 13 - 視覺化分析
14 - DeepDream
by Magnus Erik Hvass Pedersen / GitHub / Videos on YouTube
中文翻譯 thrillerist / Github
如有轉載,請附上本文連結。
介紹
在之前的教程#14中,我們看到了如何最大化神經網路內部的特徵啟用,以便放大輸入影象中的模式。這個稱為DeepDream。
本文采用了類似的想法,不過有兩張輸入圖:一張內容影象和一張風格影象。然後,我們希望建立一張混合影象,它包含了內容圖的輪廓以及風格圖的紋理。
本文基於之前的教程。你需要大概地熟悉神經網路(詳見教程 #01和 #02),熟悉教程 #14中的DeepDream也很有幫助。
流程圖
這張流程圖顯示了風格遷移演算法的大體想法,儘管比起圖中所展示出來的,我們所使用的VGG-16模型有更多的層次。
輸入兩張影象到神經網路中:一張內容影象和一張風格影象。我們希望建立一張混合影象,它包含了內容圖的輪廓以及風格圖的紋理。
我們通過建立幾個可以被優化的損失函式來完成這一點。
內容影象的損失函式會試著在網路的某一層或多層上,最小化內容影象以及混合影象啟用特徵的差距。這使得混合影象和內容影象的的輪廓相似。
風格影象的損失函式稍微複雜一些,因為它試圖讓風格影象和混合影象的格拉姆矩陣(Gram-matrices)的差異最小化。這在網路的一個或多個層中完成。 Gram-matrices度量了哪個特徵在給定層中同時被啟用。改變混合影象,使其模仿風格影象的啟用模式(activation patterns),這將導致顏色和紋理的遷移。
我們用TensorFlow來自動匯出這些損失函式的梯度。然後用梯度來更新混合影象。重複多次這個過程,直到我們對結果影象滿意為止。
風格遷移演算法的一些細節沒有在這張流程圖中顯示出來,比如,對於Gram-matrices的計算,計算並儲存中間值來提升效率,還有一個用來給混合影象去噪的損失函式,對損失函式做歸一化(normalization),這樣它們更容易相對彼此縮放。
from IPython.display import Image, display
Image('images/15_style_transfer_flowchart.png')複製程式碼
Imports
%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import PIL.Image複製程式碼
使用Python3.5.2(Anaconda)開發,TensorFlow版本是:
tf.__version__複製程式碼
'0.11.0rc0'
VGG-16 模型
我花了兩天時間,想用之前教程#14中在DeepDream上使用的Inception 5h模型來實現風格遷移演算法,但無法得到看起來足夠好的影象。這有點奇怪,因為教程#14中生成的影象看起來挺好的。但回想起來,我們(在教程#14裡)也用了一些技巧來得到這種質量,比如平滑梯度以及遞迴的降取樣並處理影象。
原始論文 使用了VGG-19卷積神經網路。出於由於某些原因,對於TendorFlow來說,預訓練的VGG-19模型在本教程中不夠穩定。因此我們使用VGG-16模型,這是其他人制作的,可以很容易地獲取並在TensorFlow中載入。方便起見,我們封裝了一個類。
import vgg16複製程式碼
VGG-16模型是從網上下載的。這是你儲存資料檔案的預設資料夾。如果資料夾不存在,它就會被建立。
# vgg16.data_dir = 'vgg16/'複製程式碼
Download the data for the VGG-16 model if it doesn't already exist in the directory.
WARNING: It is 550 MB!
如果資料夾中沒有VGG-16模型,就自動下載。
注意:它有500MB!
vgg16.maybe_download()複製程式碼
Downloading VGG16 Model ...
Data has apparently already been downloaded and unpacked.
操作影象的幫助函式
這個函式載入一張影象,並返回一個浮點型numpy陣列。影象可以被自動地改變大小,因此最大的寬高等於max_size
。
def load_image(filename, max_size=None):
image = PIL.Image.open(filename)
if max_size is not None:
# Calculate the appropriate rescale-factor for
# ensuring a max height and width, while keeping
# the proportion between them.
factor = max_size / np.max(image.size)
# Scale the image's height and width.
size = np.array(image.size) * factor
# The size is now floating-point because it was scaled.
# But PIL requires the size to be integers.
size = size.astype(int)
# Resize the image.
image = image.resize(size, PIL.Image.LANCZOS)
# Convert to numpy floating-point array.
return np.float32(image)複製程式碼
將影象儲存成一個jpeg檔案。給到的影象是一個包含0到255畫素值的numpy陣列。
def save_image(image, filename):
# Ensure the pixel-values are between 0 and 255.
image = np.clip(image, 0.0, 255.0)
# Convert to bytes.
image = image.astype(np.uint8)
# Write the image-file in jpeg-format.
with open(filename, 'wb') as file:
PIL.Image.fromarray(image).save(file, 'jpeg')複製程式碼
這個函式繪製出一張大的影象。給到的影象是一個包含0到255畫素值的numpy陣列。
def plot_image_big(image):
# Ensure the pixel-values are between 0 and 255.
image = np.clip(image, 0.0, 255.0)
# Convert pixels to bytes.
image = image.astype(np.uint8)
# Convert to a PIL-image and display it.
display(PIL.Image.fromarray(image))複製程式碼
這個函式畫出內容影象,混合影象以及風格影象。
def plot_images(content_image, style_image, mixed_image):
# Create figure with sub-plots.
fig, axes = plt.subplots(1, 3, figsize=(10, 10))
# Adjust vertical spacing.
fig.subplots_adjust(hspace=0.1, wspace=0.1)
# Use interpolation to smooth pixels?
smooth = True
# Interpolation type.
if smooth:
interpolation = 'sinc'
else:
interpolation = 'nearest'
# Plot the content-image.
# Note that the pixel-values are normalized to
# the [0.0, 1.0] range by dividing with 255.
ax = axes.flat[0]
ax.imshow(content_image / 255.0, interpolation=interpolation)
ax.set_xlabel("Content")
# Plot the mixed-image.
ax = axes.flat[1]
ax.imshow(mixed_image / 255.0, interpolation=interpolation)
ax.set_xlabel("Mixed")
# Plot the style-image
ax = axes.flat[2]
ax.imshow(style_image / 255.0, interpolation=interpolation)
ax.set_xlabel("Style")
# Remove ticks from all the plots.
for ax in axes.flat:
ax.set_xticks([])
ax.set_yticks([])
# Ensure the plot is shown correctly with multiple plots
# in a single Notebook cell.
plt.show()複製程式碼
損失函式
這些幫助函式建立了在TensorFlow優化中用的損失函式。
這個函式建立了一個TensorFlow運算,用來計算兩個輸入張量的最小平均誤差(Mean Squared Error)。
def mean_squared_error(a, b):
return tf.reduce_mean(tf.square(a - b))複製程式碼
這個函式建立了內容影象的損失函式。它是在給定層中,內容影象和混合影象啟用特徵的最小平均誤差。當內容損失最小時,意味著在給定層中,混合影象與內容影象的啟用特徵很相似。根據你所選擇的層次,這會將內容影象的輪廓遷移到混合影象中。
def create_content_loss(session, model, content_image, layer_ids):
"""
Create the loss-function for the content-image.
Parameters:
session: An open TensorFlow session for running the model's graph.
model: The model, e.g. an instance of the VGG16-class.
content_image: Numpy float array with the content-image.
layer_ids: List of integer id's for the layers to use in the model.
"""
# Create a feed-dict with the content-image.
feed_dict = model.create_feed_dict(image=content_image)
# Get references to the tensors for the given layers.
layers = model.get_layer_tensors(layer_ids)
# Calculate the output values of those layers when
# feeding the content-image to the model.
values = session.run(layers, feed_dict=feed_dict)
# Set the model's graph as the default so we can add
# computational nodes to it. It is not always clear
# when this is necessary in TensorFlow, but if you
# want to re-use this code then it may be necessary.
with model.graph.as_default():
# Initialize an empty list of loss-functions.
layer_losses = []
# For each layer and its corresponding values
# for the content-image.
for value, layer in zip(values, layers):
# These are the values that are calculated
# for this layer in the model when inputting
# the content-image. Wrap it to ensure it
# is a const - although this may be done
# automatically by TensorFlow.
value_const = tf.constant(value)
# The loss-function for this layer is the
# Mean Squared Error between the layer-values
# when inputting the content- and mixed-images.
# Note that the mixed-image is not calculated
# yet, we are merely creating the operations
# for calculating the MSE between those two.
loss = mean_squared_error(layer, value_const)
# Add the loss-function for this layer to the
# list of loss-functions.
layer_losses.append(loss)
# The combined loss for all layers is just the average.
# The loss-functions could be weighted differently for
# each layer. You can try it and see what happens.
total_loss = tf.reduce_mean(layer_losses)
return total_loss複製程式碼
我們將對風格層做相同的處理,但現在需要度量出哪些特徵在風格層和風格影象中同時被啟用,接著將這些啟用模式複製到混合影象中。
一種辦法是為風格層的輸出張量計算一個所謂的格拉姆矩陣(Gram-matrix)。Gram-matrix本質上就是風格層中啟用特徵向量的點乘矩陣。
如果Gram-matrix中的一個元素的值接近於0,這意味著給定層的兩個特徵在風格影象中沒有同時啟用。反之亦然,如果Gram-matrix中有很大的值,代表著兩個特徵同時被啟用。接著,我們會試圖生成複製了風格影象啟用模式的混合影象。
這個幫助函式用來計算神經網路中卷積層輸出張量的Gram-matrix。真正的損失函式會在後面建立。
def gram_matrix(tensor):
shape = tensor.get_shape()
# Get the number of feature channels for the input tensor,
# which is assumed to be from a convolutional layer with 4-dim.
num_channels = int(shape[3])
# Reshape the tensor so it is a 2-dim matrix. This essentially
# flattens the contents of each feature-channel.
matrix = tf.reshape(tensor, shape=[-1, num_channels])
# Calculate the Gram-matrix as the matrix-product of
# the 2-dim matrix with itself. This calculates the
# dot-products of all combinations of the feature-channels.
gram = tf.matmul(tf.transpose(matrix), matrix)
return gram複製程式碼
下面的函式建立了風格影象的損失函式。它和上面的create_content_loss()
很像,除了我們是計算Gram-matrix而非layer輸出張量的最小平方誤差。
def create_style_loss(session, model, style_image, layer_ids):
"""
Create the loss-function for the style-image.
Parameters:
session: An open TensorFlow session for running the model's graph.
model: The model, e.g. an instance of the VGG16-class.
style_image: Numpy float array with the style-image.
layer_ids: List of integer id's for the layers to use in the model.
"""
# Create a feed-dict with the style-image.
feed_dict = model.create_feed_dict(image=style_image)
# Get references to the tensors for the given layers.
layers = model.get_layer_tensors(layer_ids)
# Set the model's graph as the default so we can add
# computational nodes to it. It is not always clear
# when this is necessary in TensorFlow, but if you
# want to re-use this code then it may be necessary.
with model.graph.as_default():
# Construct the TensorFlow-operations for calculating
# the Gram-matrices for each of the layers.
gram_layers = [gram_matrix(layer) for layer in layers]
# Calculate the values of those Gram-matrices when
# feeding the style-image to the model.
values = session.run(gram_layers, feed_dict=feed_dict)
# Initialize an empty list of loss-functions.
layer_losses = []
# For each Gram-matrix layer and its corresponding values.
for value, gram_layer in zip(values, gram_layers):
# These are the Gram-matrix values that are calculated
# for this layer in the model when inputting the
# style-image. Wrap it to ensure it is a const,
# although this may be done automatically by TensorFlow.
value_const = tf.constant(value)
# The loss-function for this layer is the
# Mean Squared Error between the Gram-matrix values
# for the content- and mixed-images.
# Note that the mixed-image is not calculated
# yet, we are merely creating the operations
# for calculating the MSE between those two.
loss = mean_squared_error(gram_layer, value_const)
# Add the loss-function for this layer to the
# list of loss-functions.
layer_losses.append(loss)
# The combined loss for all layers is just the average.
# The loss-functions could be weighted differently for
# each layer. You can try it and see what happens.
total_loss = tf.reduce_mean(layer_losses)
return total_loss複製程式碼
下面建立了用來給混合影象去噪的損失函式。這個演算法稱為Total Variation Denoising,本質上就是在x和y軸上將影象偏移一個畫素,計算它與原始影象的差異,取絕對值保證差異是正值,然後對整個影象所有畫素求和。這個步驟建立了一個可以最小化的損失函式,用來抑制影象中的噪聲。
def create_denoise_loss(model):
loss = tf.reduce_sum(tf.abs(model.input[:,1:,:,:] - model.input[:,:-1,:,:])) + \
tf.reduce_sum(tf.abs(model.input[:,:,1:,:] - model.input[:,:,:-1,:]))
return loss複製程式碼
風格遷移演算法
這是風格遷移主要的優化演算法。它基本上就是在上面定義的那些損失函式上做梯度下降。
演算法也使用了損失函式的歸一化。這似乎是一個之前未發表過的新穎想法。在每次優化迭代中,調整損失值,使它們等於一。這讓使用者可以獨立地設定所選風格層以及內容層的損失權重。同時,在優化過程中也修改權重,來確保保留風格、內容、去噪之間所需的比重。
def style_transfer(content_image, style_image,
content_layer_ids, style_layer_ids,
weight_content=1.5, weight_style=10.0,
weight_denoise=0.3,
num_iterations=120, step_size=10.0):
"""
Use gradient descent to find an image that minimizes the
loss-functions of the content-layers and style-layers. This
should result in a mixed-image that resembles the contours
of the content-image, and resembles the colours and textures
of the style-image.
Parameters:
content_image: Numpy 3-dim float-array with the content-image.
style_image: Numpy 3-dim float-array with the style-image.
content_layer_ids: List of integers identifying the content-layers.
style_layer_ids: List of integers identifying the style-layers.
weight_content: Weight for the content-loss-function.
weight_style: Weight for the style-loss-function.
weight_denoise: Weight for the denoising-loss-function.
num_iterations: Number of optimization iterations to perform.
step_size: Step-size for the gradient in each iteration.
"""
# Create an instance of the VGG16-model. This is done
# in each call of this function, because we will add
# operations to the graph so it can grow very large
# and run out of RAM if we keep using the same instance.
model = vgg16.VGG16()
# Create a TensorFlow-session.
session = tf.InteractiveSession(graph=model.graph)
# Print the names of the content-layers.
print("Content layers:")
print(model.get_layer_names(content_layer_ids))
print()
# Print the names of the style-layers.
print("Style layers:")
print(model.get_layer_names(style_layer_ids))
print()
# Create the loss-function for the content-layers and -image.
loss_content = create_content_loss(session=session,
model=model,
content_image=content_image,
layer_ids=content_layer_ids)
# Create the loss-function for the style-layers and -image.
loss_style = create_style_loss(session=session,
model=model,
style_image=style_image,
layer_ids=style_layer_ids)
# Create the loss-function for the denoising of the mixed-image.
loss_denoise = create_denoise_loss(model)
# Create TensorFlow variables for adjusting the values of
# the loss-functions. This is explained below.
adj_content = tf.Variable(1e-10, name='adj_content')
adj_style = tf.Variable(1e-10, name='adj_style')
adj_denoise = tf.Variable(1e-10, name='adj_denoise')
# Initialize the adjustment values for the loss-functions.
session.run([adj_content.initializer,
adj_style.initializer,
adj_denoise.initializer])
# Create TensorFlow operations for updating the adjustment values.
# These are basically just the reciprocal values of the
# loss-functions, with a small value 1e-10 added to avoid the
# possibility of division by zero.
update_adj_content = adj_content.assign(1.0 / (loss_content + 1e-10))
update_adj_style = adj_style.assign(1.0 / (loss_style + 1e-10))
update_adj_denoise = adj_denoise.assign(1.0 / (loss_denoise + 1e-10))
# This is the weighted loss-function that we will minimize
# below in order to generate the mixed-image.
# Because we multiply the loss-values with their reciprocal
# adjustment values, we can use relative weights for the
# loss-functions that are easier to select, as they are
# independent of the exact choice of style- and content-layers.
loss_combined = weight_content * adj_content * loss_content + \
weight_style * adj_style * loss_style + \
weight_denoise * adj_denoise * loss_denoise
# Use TensorFlow to get the mathematical function for the
# gradient of the combined loss-function with regard to
# the input image.
gradient = tf.gradients(loss_combined, model.input)
# List of tensors that we will run in each optimization iteration.
run_list = [gradient, update_adj_content, update_adj_style, \
update_adj_denoise]
# The mixed-image is initialized with random noise.
# It is the same size as the content-image.
mixed_image = np.random.rand(*content_image.shape) + 128
for i in range(num_iterations):
# Create a feed-dict with the mixed-image.
feed_dict = model.create_feed_dict(image=mixed_image)
# Use TensorFlow to calculate the value of the
# gradient, as well as updating the adjustment values.
grad, adj_content_val, adj_style_val, adj_denoise_val \
= session.run(run_list, feed_dict=feed_dict)
# Reduce the dimensionality of the gradient.
grad = np.squeeze(grad)
# Scale the step-size according to the gradient-values.
step_size_scaled = step_size / (np.std(grad) + 1e-8)
# Update the image by following the gradient.
mixed_image -= grad * step_size_scaled
# Ensure the image has valid pixel-values between 0 and 255.
mixed_image = np.clip(mixed_image, 0.0, 255.0)
# Print a little progress-indicator.
print(". ", end="")
# Display status once every 10 iterations, and the last.
if (i % 10 == 0) or (i == num_iterations - 1):
print()
print("Iteration:", i)
# Print adjustment weights for loss-functions.
msg = "Weight Adj. for Content: {0:.2e}, Style: {1:.2e}, Denoise: {2:.2e}"
print(msg.format(adj_content_val, adj_style_val, adj_denoise_val))
# Plot the content-, style- and mixed-images.
plot_images(content_image=content_image,
style_image=style_image,
mixed_image=mixed_image)
print()
print("Final image:")
plot_image_big(mixed_image)
# Close the TensorFlow session to release its resources.
session.close()
# Return the mixed-image.
return mixed_image複製程式碼
例子
這個例子展示瞭如何將多張影象的風格遷移到一張肖像上。
首先,我們載入內容影象,它有混合影象想要的大體輪廓。
content_filename = 'images/willy_wonka_old.jpg'
content_image = load_image(content_filename, max_size=None)複製程式碼
然後我們載入風格影象,它擁有混合影象想要的顏色和紋理。
style_filename = 'images/style7.jpg'
style_image = load_image(style_filename, max_size=300)複製程式碼
接著我們定義一個整數列表,它代表神經網路中我們用來匹配內容影象的層次。這些是神經網路層次的索引。對於VGG16模型,第5層(索引4)似乎是唯一有效的內容層。
content_layer_ids = [4]複製程式碼
然後,我們為風格層定義另外一個整型陣列。
# The VGG16-model has 13 convolutional layers.
# This selects all those layers as the style-layers.
# This is somewhat slow to optimize.
style_layer_ids = list(range(13))
# You can also select a sub-set of the layers, e.g. like this:
# style_layer_ids = [1, 2, 3, 4]複製程式碼
現在執行風格遷移。它自動地為風格影象、內容影象建立合適的損失函式,然後進行多次優化迭代。這將逐步地生成一張混合影象,其擁有內容影象的大體輪廓,並且它的紋理、顏色和風格影象類似。
在CPU上這個運算會很慢!
%%time
img = style_transfer(content_image=content_image,
style_image=style_image,
content_layer_ids=content_layer_ids,
style_layer_ids=style_layer_ids,
weight_content=1.5,
weight_style=10.0,
weight_denoise=0.3,
num_iterations=60,
step_size=10.0)複製程式碼
Content layers:
['conv3_1/conv3_1']Style layers:
['conv1_1/conv1_1', 'conv1_2/conv1_2', 'conv2_1/conv2_1', 'conv2_2/conv2_2', 'conv3_1/conv3_1', 'conv3_2/conv3_2', 'conv3_3/conv3_3', 'conv4_1/conv4_1', 'conv4_2/conv4_2', 'conv4_3/conv4_3', 'conv5_1/conv5_1', 'conv5_2/conv5_2', 'conv5_3/conv5_3'].
Iteration: 0
Weight Adj. for Content: 5.18e-11, Style: 2.14e-29, Denoise: 5.61e-06. . . . . . . . . .
Iteration: 10
Weight Adj. for Content: 2.79e-11, Style: 4.13e-28, Denoise: 1.25e-07
. . . . . . . . . .
Iteration: 20
Weight Adj. for Content: 2.63e-11, Style: 1.09e-27, Denoise: 1.30e-07. . . . . . . . . .
Iteration: 30
Weight Adj. for Content: 2.66e-11, Style: 1.27e-27, Denoise: 1.27e-07. . . . . . . . . .
Iteration: 40
Weight Adj. for Content: 2.73e-11, Style: 1.16e-27, Denoise: 1.26e-07. . . . . . . . . .
Iteration: 50
Weight Adj. for Content: 2.75e-11, Style: 1.12e-27, Denoise: 1.24e-07. . . . . . . . .
Iteration: 59
Weight Adj. for Content: 1.85e-11, Style: 3.86e-28, Denoise: 1.01e-07Final image:
CPU times: user 20min 1s, sys: 45.5 s, total: 20min 46s
Wall time: 3min 4s
總結
這篇教程說明了用神經網路來結合兩張影象內容和風格的基本想法。不幸的是,結果並不像一些商業系統那麼好,比如 DeepArt,它是由這種技術的一些先驅者開發的。(結果不好的)原因暫不明確。也許我們只是需要更強的計算力,可以在高解析度影象上以更小的步長,執行更多的優化迭代。或許我們需要更復雜的優化方法。下面的練習給出了一些可能會提升質量的建議,鼓勵你嘗試一下。
練習
下面使一些可能會讓你提升TensorFlow技能的一些建議練習。為了學習如何更合適地使用TensorFlow,實踐經驗是很重要的。
在你對這個Notebook進行修改之前,可能需要先備份一下。
- 試著使用其他影象。本文中包含了一些風格影象。你可以使用自己的影象。
- 試著更多的迭代次數(比如1000-5000),以及更小的步長(比如1.0-3.0)。它會提升質量嗎?
* 改變風格層、內容層以及去噪時的權重。 - 試著從內容或風格影象開始優化,或許二者的平均。你可以加入一些噪聲。
- 試著改變風格圖和內容圖的解析度。在
load_image()
函式中,你可以用max_size
引數來改變影象大小。它對結果有什麼影響? - 試著使用VGG-16模型的其他層。
- 改變程式碼,使其每10次優化迭代就儲存影象。
- 在優化過程中使用常數權重。它對結果有何影響?
- 在風格層中使用不同的權重。同樣,試著像其他損失函式一樣自動調整權重。
- 用TensorFlow的ADAM優化器來代替基本的梯度下降。
- 使用L-BFGS優化器。目前在TensorFlow中沒有實現這個。你能在風格遷移演算法中使用SciPy中實現的優化器麼?它有提升結果嗎?
- 用另外的預訓練網路,比如我們在教程 #14中使用的Inception 5h模型,或者用你從網上找到的VGG-19模型。
- 向朋友解釋程式如何工作。