課程筆記

演算法將一幅圖片分為內容+風格，有了這兩像，圖片也就確定了，所以”生成圖片主要的思想，通過兩個損失函式（內容損失+風格損失）來進行迭代更新”
這裡寫圖片描述

遷移學習總體分為三步:

建立內容損失函式 $J_{c o n t e n t} (C, G)$
J_{content}(C,G)
建立風格損失函式 $J_{s t y l e} (S, G)$
J_{style}(S,G)
加權組合起來，即總體損失函式 $J (G) = α J_{c o n t e n t} (C, G) + β J_{s t y l e} (S, G)$
J(G) = \alpha J_{content}(C,G) + \beta J_{style}(S,G)
.

CNN是對輸入的圖片進行處理的神經網路，一般有卷積層、池化層、全連線層，每一層都是對圖片進行畫素級的運算。圖片以矩陣的形式輸入神經網路，在經過每一層時的輸出依然時矩陣，把這個矩陣反轉回去得到的影像，就是這一層對圖片進行處理後得到的影像。

一個神經網路，前面幾層（淺層）一般檢測圖片的基礎特徵，例如邊緣和結構；後面幾層（深層）一般檢測圖片的綜合特徵，例如具體的類別。

內容損失函式

我們希望“生成的”影像G具有與輸入影像C相似的內容。但是選擇神經網路的哪些層的輸出來表示圖片的內容呢，作業中使用了中間的層，既不太淺也不太深，可以取得好的效果。（完成此練習後，請隨時返回並嘗試使用不同的圖層，以檢視結果的變化。）

使用已經訓練過的網路 VGG，輸出層輸入影像為C，經過VGG網路前向傳播，得到 $[l]$

[l]

層輸出為

a^{[l] (C)}

a^{[l](C)}

，這裡用

a^{(C)}

a^{(C)}

表示；同時生成一張白噪聲的圖片G，重複同樣的操作得到

a^{(G)}

a^{(G)}

，從而可以得到內容損失函式：

\begin{matrix} (1) & J_{c o n t e n t} (C, G) = \frac{1}{4 \times n_{H} \times n_{W} \times n_{C}} \sum_{all entries} (a^{(C)} - a^{(G)})^{2} \end{matrix}

J_{content}(C,G) = \frac{1}{4 \times n_H \times n_W \times n_C}\sum _{ \text{all entries}} (a^{(C)} - a^{(G)})^2\tag{1}

$n_{H}, n_{W}$

n_H, n_W

and

n_{C}

n_C

時指定神經網路層的輸出矩陣，這裡為了方便計算，做舉證展開（Unrolled），如下圖：
這裡寫圖片描述

What you should remember:
- The content cost takes a hidden layer activation of the neural network, and measures how different $a^{(C)}$

a^{(C)}

and

a^{(G)}

a^{(G)}

are.
- When we minimize the content cost later, this will help make sure

G

G

has similar content as

C

C

.

風格損失函式

上面的內容矩陣是直接採用指定層的輸出矩陣，而風格矩陣在這裡用 “Gram matrix.” 表示，也叫相關矩陣，如下圖：

計算Gram Matrix首先對矩陣進行展開（Unrolled），隨後再進行矩陣轉置，矩陣點乘。
這裡寫圖片描述
線上性代數中， Gram matrix表示的是矩陣中不同向量之間的相關性， G 的向量是做如下運算得到的：

G_{i j} = v_{i}^{T} v_{j} = n p . d o t (v_{i}, v_{j})

{\displaystyle G_{ij} = v_{i}^T v_{j} = np.dot(v_{i}, v_{j}) }

.
矩陣對角線上的元素是 向量內積；非對角線元素是 兩兩不同向量內積，值的大小可以反應這兩個不同向量的相關性，值越大，相關性越大。

在神經網路中，上述的進過 Unrolled 矩陣的不同向量代表同一層不同濾波器的輸出，所以 Gram Matrix 對角線上的元素 $G_{i i}$

G_{ii}

衡量該濾波器檢測的特徵值在圖片中所佔的比例；例如，di

i

i

層卷積檢測垂直結構，則

G_{i i}

G_{ii}

可以衡量該圖片中垂直結構所佔比例的大小。而

G_{i j}

G_{ij}

衡量不同濾波器的相似程度。筆者認為風格函式的主要貢獻在 Gram Matrix 對角線。

在有了 Gram Matrix 以後，風格損失函式定義如下：

\begin{matrix} (2) & J_{s t y l e}^{[l]} (S, G) = \frac{1}{4 \times {n_{C}}^{2} \times (n_{H} \times n_{W})^{2}} \sum_{i = 1}^{n_{C}} \sum_{j = 1}^{n_{C}} (G_{i j}^{(S)} - G_{i j}^{(G)})^{2} \end{matrix}

J_{style}^{[l]}(S,G) = \frac{1}{4 \times {n_C}^2 \times (n_H \times n_W)^2} \sum _{i=1}^{n_C}\sum_{j=1}^{n_C}(G^{(S)}_{ij} - G^{(G)}_{ij})^2\tag{2}

What you should remember:
- The style of an image can be represented using the Gram matrix of a hidden layer’s activations. However, we get even better results combining this representation from multiple different layers. This is in contrast to the content representation, where usually using just a single hidden layer is sufficient.
- Minimizing the style cost will cause the image $G$

G

to follow the style of the image

S

S

.

總體損失函式

最後，將內容損失函式和風格損失函式進行加權相加，得到總的損失函式：

J (G) = α J_{c o n t e n t} (C, G) + β J_{s t y l e} (S, G)

J(G) = \alpha J_{content}(C,G) + \beta J_{style}(S,G)

有了總體損失函式，每次迭代更新的引數應該是輸入白噪聲圖片的畫素；就像是神經網路看了兩幅畫，找到他們的特徵（ $[l]$
[l]
層輸出影像），然後找到不同的地方（總體損失函式），去做修正（畫素級），最終得到想要的結果。具體怎麼更新圖片畫素，有待研究。

pycharm版程式

使用 tensorflow 進行訓練

import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
from nst_utils import *
import numpy as np
import tensorflow as tf

import datetime


# GRADED FUNCTION: compute_content_cost
def compute_content_cost(a_C, a_G):
    """
    Computes the content cost

    Arguments:
    a_C -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image C
    a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image G

    Returns:
    J_content -- scalar that you compute using equation 1 above.
    """

    ### START CODE HERE ###
    # Retrieve dimensions from a_G (≈1 line)
    m, n_H, n_W, n_C = a_G.get_shape().as_list()                # 用 a_G 和 a_C 的區別?

    # Reshape a_C and a_G (≈2 lines)
    a_C_unrolled = tf.reshape(a_C,[n_H * n_W, n_C])
    a_G_unrolled = tf.reshape(a_G,[n_H * n_W, n_C])

    # compute the cost with tensorflow (≈1 line)
    J_content = tf.reduce_sum(tf.square(tf.subtract(a_C_unrolled, a_G_unrolled))) / (4*n_H*n_W*n_C)
    ### END CODE HERE ###

    return J_content


# GRADED FUNCTION: gram_matrix
def gram_matrix(A):
    """
    Argument:
    A -- matrix of shape (n_C, n_H*n_W)

    Returns:
    GA -- Gram matrix of A, of shape (n_C, n_C)
    """

    ### START CODE HERE ### (≈1 line)
    GA = tf.matmul(A, A ,transpose_a=False, transpose_b=True)       # 矩陣相乘,後面的flag表示是否對對應矩陣進行轉置操作
    ### END CODE HERE ###

    return GA


# GRADED FUNCTION: compute_layer_style_cost
def compute_layer_style_cost(a_S, a_G):
    """
    Arguments:
    a_S -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image S
    a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image G

    Returns:
    J_style_layer -- tensor representing a scalar value, style cost defined above by equation (2)
    """

    ### START CODE HERE ###
    # Retrieve dimensions from a_G (≈1 line)
    m, n_H, n_W, n_C = a_G.get_shape().as_list()

    # Reshape the images to have them of shape (n_H*n_W, n_C) (≈2 lines)
    a_S = tf.reshape(a_S, [n_W*n_H, n_C])
    a_G = tf.reshape(a_G, [n_W*n_H, n_C])

    # Computing gram_matrices for both images S and G (≈2 lines)
    GS = gram_matrix(tf.transpose(a_S))
    GG = gram_matrix(tf.transpose(a_G))
    # GS = gram_matrix(a_S)
    # GG = gram_matrix(a_G)

    # Computing the loss (≈1 line)
    J_style_layer = tf.reduce_sum(tf.square(tf.subtract(GS, GG))) / (4*tf.to_float(tf.square(n_C*n_H*n_W)))

    ### END CODE HERE ###

    return J_style_layer


def compute_style_cost(model, STYLE_LAYERS):
    """
    Computes the overall style cost from several chosen layers

    Arguments:
    model -- our tensorflow model
    STYLE_LAYERS -- A python list containing:
                        - the names of the layers we would like to extract style from
                        - a coefficient for each of them

    Returns:
    J_style -- tensor representing a scalar value, style cost defined above by equation (2)
    """

    # initialize the overall style cost
    J_style = 0

    for layer_name, coeff in STYLE_LAYERS:
        # Select the output tensor of the currently selected layer
        out = model[layer_name]

        # Set a_S to be the hidden layer activation from the layer we have selected, by running the session on out
        a_S = sess.run(out)

        # Set a_G to be the hidden layer activation from same layer. Here, a_G references model[layer_name]
        # and isn't evaluated yet. Later in the code, we'll assign the image G as the model input, so that
        # when we run the session, this will be the activations drawn from the appropriate layer, with G as input.
        a_G = out

        # Compute style_cost for the current layer
        J_style_layer = compute_layer_style_cost(a_S, a_G)

        # Add coeff * J_style_layer of this layer to overall style cost
        J_style += coeff * J_style_layer

    return J_style


# GRADED FUNCTION: total_cost
def total_cost(J_content, J_style, alpha=10, beta=40):
    """
    Computes the total cost function

    Arguments:
    J_content -- content cost coded above
    J_style -- style cost coded above
    alpha -- hyperparameter weighting the importance of the content cost
    beta -- hyperparameter weighting the importance of the style cost

    Returns:
    J -- total cost as defined by the formula above.
    """

    ### START CODE HERE ### (≈1 line)
    J = alpha * J_content + beta * J_style
    ### END CODE HERE ###

    return J


def model_nn(sess, input_image, num_iterations=200):
    # Initialize global variables (you need to run the session on the initializer)
    ### START CODE HERE ### (1 line)
    sess.run(tf.global_variables_initializer())
    ### END CODE HERE ###

    # Run the noisy input image (initial generated image) through the model. Use assign().
    ### START CODE HERE ### (1 line)
    sess.run(model['input'].assign(input_image))
    ### END CODE HERE ###

    for i in range(num_iterations):

        # Run the session on the train_step to minimize the total cost
        ### START CODE HERE ### (1 line)
        sess.run(train_step)
        ### END CODE HERE ###

        # Compute the generated image by running the session on the current model['input']
        ### START CODE HERE ### (1 line)
        generated_image = sess.run(model['input'])
        ### END CODE HERE ###

        # Print every 20 iteration.
        if i % 20 == 0:
            Jt, Jc, Js = sess.run([J, J_content, J_style])
            print("Iteration " + str(i) + " :")
            print("total cost = " + str(Jt))
            print("content cost = " + str(Jc))
            print("style cost = " + str(Js))

            # save current generated image in the "/output" directory
            save_image("out1/3/" + str(i) + ".png", generated_image)

    # save last generated image
    save_image('out1/3/generated_image.jpg', generated_image)

    return generated_image




if __name__ == '__main__':

    starttime = datetime.datetime.now()

    ###############################################
    # Reset the graph
    tf.reset_default_graph()

    # Start interactive session
    sess = tf.InteractiveSession()
    content_image = scipy.misc.imread("input/y.jpg")

    content_image = reshape_and_normalize_image(content_image)

    style_image = scipy.misc.imread("images/sky.jpg")
    style_image = reshape_and_normalize_image(style_image)

    generated_image = generate_noise_image(content_image)
    plt.imshow(generated_image[0])
    plt.show()

    model = load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat")

    STYLE_LAYERS = [                                     # style_layers 的作用
        ('conv1_1', 0.2),
        ('conv2_1', 0.2),
        ('conv3_1', 0.2),
        ('conv4_1', 0.2),
        ('conv5_1', 0.2)]

    # Assign the content image to be the input of the VGG model.
    sess.run(model['input'].assign(content_image))

    # Select the output tensor of layer conv4_2
    out = model['conv4_2']

    # Set a_C to be the hidden layer activation from the layer we have selected
    a_C = sess.run(out)

    # Set a_G to be the hidden layer activation from same layer. Here, a_G references model['conv4_2']
    # and isn't evaluated yet. Later in the code, we'll assign the image G as the model input, so that
    # when we run the session, this will be the activations drawn from the appropriate layer, with G as input.
    a_G = out

    # Compute the content cost
    J_content = compute_content_cost(a_C, a_G)

    # Assign the input of the model to be the "style" image
    sess.run(model['input'].assign(style_image))

    # Compute the style cost
    J_style = compute_style_cost(model, STYLE_LAYERS)

    ### START CODE HERE ### (1 line)
    J = total_cost(J_content=J_content, J_style=J_style)
    ### END CODE HERE ###

    # define optimizer (1 line)
    optimizer = tf.train.AdamOptimizer(2.0)

    # define train_step (1 line)
    train_step = optimizer.minimize(J)

    model_nn(sess, generated_image)

    #################################################
    endtime = datetime.datetime.now()
    print("the running time :" + str((endtime - starttime).seconds))
    print("END!")

結果

剛開始生成的白噪聲圖片，400*300 ，神經網路通過學習，把這個圖片改成想要的模樣，可怕：
這裡寫圖片描述

內容圖片（400*300）：
這裡寫圖片描述

風格圖片（400*300）：
這裡寫圖片描述

生成圖片（400*300），迭代200，結果已穩定：
這裡寫圖片描述

《深度學習——Andrew Ng》第四課第四周程式設計作業_2_神經網路風格遷移

課程筆記

內容損失函式

風格損失函式

總體損失函式

pycharm版程式

結果

相關文章