[譯] TensorFlow 教程 #13 – 視覺化分析

thrillerist發表於2019-02-25

題圖來自:toyota.csail.mit.edu
本文主要對卷積神經網路做視覺化分析。

01 – 簡單線性模型 | 02 – 卷積神經網路 | 03 – PrettyTensor | 04 – 儲存& 恢復
05 – 整合學習 | 06 – CIFAR 10 | 07 – Inception 模型 | 08 – 遷移學習
09 – 視訊資料 | 11 – 對抗樣本 | 12 – MNIST的對抗噪聲

by Magnus Erik Hvass Pedersen / GitHub / Videos on YouTube
中文翻譯 thrillerist / Github

如有轉載,請附上本文連結。


介紹

在之前的一些關於卷積神經網路的教程中,我們展示了卷積濾波權重,比如教程#02和#06。但單從濾波權重上看,不可能確定卷積濾波器能從輸入影像中識別出什麼。

本教程中,我們會提出一種用於視覺化分析神經網路內部工作原理的基本方法。這個方法就是生成最大化神經網路內個體特徵的影像。影像用一些隨機噪聲初始化,然後用給定特徵關於輸入影像的梯度來逐漸改變(生成的)影像。

視覺化分析神經網路的方法也稱為 特徵最大化(feature maximization) 啟用最大化(activation maximization)**。

本文基於之前的教程。你需要大概地熟悉神經網路(詳見教程 #01和 #02),瞭解Inception模型也很有幫助(教程 #07)。

流程圖

這裡將會使用教程 #07中的Inception模型。我們想要找到使得神經網路內給定特徵最大化的影像。輸入影像用一些噪聲初始化,然後用給定特徵的梯度來更新影像。在執行了一些優化迭代之後,我們會得到一個這個特定特徵“喜歡看到的”影像。

由於Inception模型是由很多相結合的基本數學運算構造的,使用微分鏈式法則,TensorFlow讓我們很快就能找到損失函式的梯度。

from IPython.display import Image, display
Image(`images/13_visual_analysis_flowchart.png`)複製程式碼

匯入

%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

# Functions and classes for loading and using the Inception model.
import inception複製程式碼

使用Python3.5.2(Anaconda)開發,TensorFlow版本是:

tf.__version__複製程式碼

`1.1.0`

Inception 模型

從網上下載Inception模型

從網上下載Inception模型。這是你儲存資料檔案的預設資料夾。如果資料夾不存在就自動建立。

# inception.data_dir = `inception/`複製程式碼

如果資料夾中不存在Inception模型,就自動下載。 它有85MB。

inception.maybe_download()複製程式碼

Downloading Inception v3 Model …
Download progress: 100.0%
Download finished. Extracting files.
Done.

卷積層的名稱

這個函式返回Inception模型中卷積層的名稱列表。

def get_conv_layer_names():
    # Load the Inception model.
    model = inception.Inception()

    # Create a list of names for the operations in the graph
    # for the Inception model where the operator-type is `Conv2D`.
    names = [op.name for op in model.graph.get_operations() if op.type==`Conv2D`]

    # Close the TensorFlow session inside the model-object.
    model.close()

    return names複製程式碼
conv_names = get_conv_layer_names()複製程式碼

在Inception模型中總共有94個卷積層。

len(conv_names)複製程式碼

94

寫出頭5個卷積層的名稱。

conv_names[:5]複製程式碼

[`conv/Conv2D`,
`conv_1/Conv2D`,
`conv_2/Conv2D`,
`conv_3/Conv2D`,
`conv_4/Conv2D`]

寫出最後5個卷積層的名稱。

conv_names[-5:]複製程式碼

[`mixed_10/tower_1/conv/Conv2D`,
`mixed_10/tower_1/conv_1/Conv2D`,
`mixed_10/tower_1/mixed/conv/Conv2D`,
`mixed_10/tower_1/mixed/conv_1/Conv2D`,
`mixed_10/tower_2/conv/Conv2D`]

找到輸入影像的幫助函式

這個函式用來尋找使網路內給定特徵最大化的輸入影像。它本質上是用梯度法來進行優化。影像用小的隨機值初始化,然後用給定特徵關於輸入影像的梯度來逐步更新。

def optimize_image(conv_id=None, feature=0,
                   num_iterations=30, show_progress=True):
    """
    Find an image that maximizes the feature
    given by the conv_id and feature number.

    Parameters:
    conv_id: Integer identifying the convolutional layer to
             maximize. It is an index into conv_names.
             If None then use the last fully-connected layer
             before the softmax output.
    feature: Index into the layer for the feature to maximize.
    num_iteration: Number of optimization iterations to perform.
    show_progress: Boolean whether to show the progress.
    """

    # Load the Inception model. This is done for each call of
    # this function because we will add a lot to the graph
    # which will cause the graph to grow and eventually the
    # computer will run out of memory.
    model = inception.Inception()

    # Reference to the tensor that takes the raw input image.
    resized_image = model.resized_image

    # Reference to the tensor for the predicted classes.
    # This is the output of the final layer`s softmax classifier.
    y_pred = model.y_pred

    # Create the loss-function that must be maximized.
    if conv_id is None:
        # If we want to maximize a feature on the last layer,
        # then we use the fully-connected layer prior to the
        # softmax-classifier. The feature no. is the class-number
        # and must be an integer between 1 and 1000.
        # The loss-function is just the value of that feature.
        loss = model.y_logits[0, feature]
    else:
        # If instead we want to maximize a feature of a
        # convolutional layer inside the neural network.

        # Get the name of the convolutional operator.
        conv_name = conv_names[conv_id]

        # Get a reference to the tensor that is output by the
        # operator. Note that ":0" is added to the name for this.
        tensor = model.graph.get_tensor_by_name(conv_name + ":0")

        # Set the Inception model`s graph as the default
        # so we can add an operator to it.
        with model.graph.as_default():
            # The loss-function is the average of all the
            # tensor-values for the given feature. This
            # ensures that we generate the whole input image.
            # You can try and modify this so it only uses
            # a part of the tensor.
            loss = tf.reduce_mean(tensor[:,:,:,feature])

    # Get the gradient for the loss-function with regard to
    # the resized input image. This creates a mathematical
    # function for calculating the gradient.
    gradient = tf.gradients(loss, resized_image)

    # Create a TensorFlow session so we can run the graph.
    session = tf.Session(graph=model.graph)

    # Generate a random image of the same size as the raw input.
    # Each pixel is a small random value between 128 and 129,
    # which is about the middle of the colour-range.
    image_shape = resized_image.get_shape()
    image = np.random.uniform(size=image_shape) + 128.0

    # Perform a number of optimization iterations to find
    # the image that maximizes the loss-function.
    for i in range(num_iterations):
        # Create a feed-dict. This feeds the image to the
        # tensor in the graph that holds the resized image, because
        # this is the final stage for inputting raw image data.
        feed_dict = {model.tensor_name_resized_image: image}

        # Calculate the predicted class-scores,
        # as well as the gradient and the loss-value.
        pred, grad, loss_value = session.run([y_pred, gradient, loss],
                                             feed_dict=feed_dict)

        # Squeeze the dimensionality for the gradient-array.
        grad = np.array(grad).squeeze()

        # The gradient now tells us how much we need to change the
        # input image in order to maximize the given feature.

        # Calculate the step-size for updating the image.
        # This step-size was found to give fast convergence.
        # The addition of 1e-8 is to protect from div-by-zero.
        step_size = 1.0 / (grad.std() + 1e-8)

        # Update the image by adding the scaled gradient
        # This is called gradient ascent.
        image += step_size * grad

        # Ensure all pixel-values in the image are between 0 and 255.
        image = np.clip(image, 0.0, 255.0)

        if show_progress:
            print("Iteration:", i)

            # Convert the predicted class-scores to a one-dim array.
            pred = np.squeeze(pred)

            # The predicted class for the Inception model.
            pred_cls = np.argmax(pred)

            # Name of the predicted class.
            cls_name = model.name_lookup.cls_to_name(pred_cls,
                                               only_first_name=True)

            # The score (probability) for the predicted class.
            cls_score = pred[pred_cls]

            # Print the predicted score etc.
            msg = "Predicted class-name: {0} (#{1}), score: {2:>7.2%}"
            print(msg.format(cls_name, pred_cls, cls_score))

            # Print statistics for the gradient.
            msg = "Gradient min: {0:>9.6f}, max: {1:>9.6f}, stepsize: {2:>9.2f}"
            print(msg.format(grad.min(), grad.max(), step_size))

            # Print the loss-value.
            print("Loss:", loss_value)

            # Newline.
            print()

    # Close the TensorFlow session inside the model-object.
    model.close()

    return image.squeeze()複製程式碼

繪製影像和噪聲的幫助函式

函式對影像做歸一化,則畫素值在0.0到1.0之間。

def normalize_image(x):
    # Get the min and max values for all pixels in the input.
    x_min = x.min()
    x_max = x.max()

    # Normalize so all values are between 0.0 and 1.0
    x_norm = (x - x_min) / (x_max - x_min)

    return x_norm複製程式碼

這個函式繪製一張影像。

def plot_image(image):
    # Normalize the image so pixels are between 0.0 and 1.0
    img_norm = normalize_image(image)

    # Plot the image.
    plt.imshow(img_norm, interpolation=`nearest`)
    plt.show()複製程式碼

這個函式在座標系內繪製6張圖。

def plot_images(images, show_size=100):
    """
    The show_size is the number of pixels to show for each image.
    The max value is 299.
    """

    # Create figure with sub-plots.
    fig, axes = plt.subplots(2, 3)

    # Adjust vertical spacing.
    fig.subplots_adjust(hspace=0.1, wspace=0.1)

    # Use interpolation to smooth pixels?
    smooth = True

    # Interpolation type.
    if smooth:
        interpolation = `spline16`
    else:
        interpolation = `nearest`

    # For each entry in the grid.
    for i, ax in enumerate(axes.flat):
        # Get the i`th image and only use the desired pixels.
        img = images[i, 0:show_size, 0:show_size, :]

        # Normalize the image so its pixels are between 0.0 and 1.0
        img_norm = normalize_image(img)

        # Plot the image.
        ax.imshow(img_norm, interpolation=interpolation)

        # Remove ticks.
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()複製程式碼

優化和繪製影像的幫助函式

這個函式優化多張影像並繪製它們。

def optimize_images(conv_id=None, num_iterations=30, show_size=100):
    """
    Find 6 images that maximize the 6 first features in the layer
    given by the conv_id.

    Parameters:
    conv_id: Integer identifying the convolutional layer to
             maximize. It is an index into conv_names.
             If None then use the last layer before the softmax output.
    num_iterations: Number of optimization iterations to perform.
    show_size: Number of pixels to show for each image. Max 299.
    """

    # Which layer are we using?
    if conv_id is None:
        print("Final fully-connected layer before softmax.")
    else:
        print("Layer:", conv_names[conv_id])

    # Initialize the array of images.
    images = []

    # For each feature do the following. Note that the
    # last fully-connected layer only supports numbers
    # between 1 and 1000, while the convolutional layers
    # support numbers between 0 and some other number.
    # So we just use the numbers between 1 and 7.
    for feature in range(1,7):
        print("Optimizing image for feature no.", feature)

        # Find the image that maximizes the given feature
        # for the network layer identified by conv_id (or None).
        image = optimize_image(conv_id=conv_id, feature=feature,
                               show_progress=False,
                               num_iterations=num_iterations)

        # Squeeze the dim of the array.
        image = image.squeeze()

        # Append to the list of images.
        images.append(image)

    # Convert to numpy-array so we can index all dimensions easily.
    images = np.array(images)

    # Plot the images.
    plot_images(images=images, show_size=show_size)複製程式碼

結果

為淺處的卷積層優化影像

舉個例子,尋找讓卷積層conv_names[conv_id]中的2號特徵最大化的輸入影像,其中conv_id=5

image = optimize_image(conv_id=5, feature=2,
                       num_iterations=30, show_progress=True)複製程式碼

Iteration: 0
Predicted class-name: dishwasher (#667), score: 4.81%
Gradient min: -0.000083, max: 0.000100, stepsize: 76290.32
Loss: 4.83793

Iteration: 1
Predicted class-name: kite (#397), score: 15.12%
Gradient min: -0.000142, max: 0.000126, stepsize: 71463.42
Loss: 5.59611

Iteration: 2
Predicted class-name: wall clock (#524), score: 6.85%
Gradient min: -0.000119, max: 0.000121, stepsize: 80427.39
Loss: 6.91725


Iteration: 28
Predicted class-name: bib (#941), score: 19.26%
Gradient min: -0.000043, max: 0.000043, stepsize: 214742.82
Loss: 17.7469

Iteration: 29
Predicted class-name: bib (#941), score: 18.87%
Gradient min: -0.000047, max: 0.000059, stepsize: 218511.00
Loss: 17.9321

plot_image(image)複製程式碼

為卷積層優化多張影像

下面,我們為Inception模型中的卷積層優化多張影像,並繪製它們。這些影像展示了卷積層“想看到的”內容。注意更深的層次裡圖案變得越來越複雜。

optimize_images(conv_id=0, num_iterations=10)複製程式碼

Layer: conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5

optimize_images(conv_id=3, num_iterations=30)複製程式碼

Layer: conv_3/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=4, num_iterations=30)複製程式碼

Layer: conv_4/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=5, num_iterations=30)複製程式碼

Layer: mixed/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=6, num_iterations=30)複製程式碼

Layer: mixed/tower/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=7, num_iterations=30)複製程式碼

Layer: mixed/tower/conv_1/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=8, num_iterations=30)複製程式碼

Layer: mixed/tower_1/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=9, num_iterations=30)複製程式碼

Layer: mixed/tower_1/conv_1/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=10, num_iterations=30)複製程式碼

Layer: mixed/tower_1/conv_2/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=20, num_iterations=30)複製程式碼

Layer: mixed_2/tower/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=30, num_iterations=30)複製程式碼

Layer: mixed_4/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=40, num_iterations=30)複製程式碼

Layer: mixed_5/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=50, num_iterations=30)複製程式碼

Layer: mixed_6/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=60, num_iterations=30)複製程式碼

Layer: mixed_7/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=70, num_iterations=30)複製程式碼

Layer: mixed_8/tower/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=80, num_iterations=30)複製程式碼

Layer: mixed_9/tower_1/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=90, num_iterations=30)複製程式碼

Layer: mixed_10/tower_1/conv_1/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=93, num_iterations=30)複製程式碼

Layer: mixed_10/tower_2/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

Softmax前最終的全連線層

現在,我們為Inception模型中的最後一層優化並繪製影像。這是在softmax分類器前的全連線層。該層特徵對應了輸出的類別。

我們可能希望在這些影像裡看到一些可識別的圖案,比如對應輸出類別的猴子、鳥類等,但影像只顯示了一些複雜的、抽象的圖案。

optimize_images(conv_id=None, num_iterations=30)複製程式碼

Final fully-connected layer before softmax.
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

上面只顯示了100×100畫素的影像,但實際上是299×299畫素。如果我們執行更多的優化迭代並畫出完整的影像,可能會有一些可識別的模式。那麼,讓我們再次優化第一張影像,並以全解析度來繪製。

Inception模型以大約100%的確信度將結果影像分類成“敏狐”,但在人眼看來,影像只是一些抽象的圖案。

如果你想測試另一個特徵號碼,要注意,號碼必須介於0到1000之間,因為它對應了最終輸出層的一個有效類別號。

image = optimize_image(conv_id=None, feature=1,
                       num_iterations=100, show_progress=True)複製程式碼

Iteration: 0
Predicted class-name: dishwasher (#667), score: 4.98%
Gradient min: -0.006252, max: 0.004451, stepsize: 3734.48
Loss: -0.837608

Iteration: 1
Predicted class-name: ballpoint (#907), score: 8.52%
Gradient min: -0.007303, max: 0.006427, stepsize: 2152.89
Loss: -0.416723

Iteration: 98
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.007732, max: 0.010692, stepsize: 1286.44
Loss: 67.5603

Iteration: 99
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005850, max: 0.006159, stepsize: 1863.65
Loss: 75.6356

plot_image(image=image)複製程式碼

關閉TensorFlow會話

在上面使用Inception模型的函式中已經關閉了TensorFlow會話。這麼做是為了節省記憶體,因此當計算圖中新增了很多梯度函式時,電腦不會奔潰。

總結

這篇教程說明了如何優化輸入影像,使得神經網路內的特徵最大化。由於神經網路內給定特徵(或神經元)對特定的影像反應最強烈,這讓我們可以對其“喜歡看到的東西”進行視覺化分析。

對神經網路的較低層,影像包含了簡單的圖案,比如不同型別的波浪線。隨著網路越來越深,影像模式越來越複雜。我們可能會希望深層網路的模式是可識別的,比如猴子、狐狸、汽車等等,但實際上深層網路的影像模式更加複雜和抽象。

這是為什麼?回想在教程 #11中,Inception模型很容易就被一些對抗噪聲糊弄,而將任何輸入圖分類為另外的目標類別。因此,不難想象Inception模型可以識別這些在人眼看來並不清楚的抽象影像模式。可能存在無窮多的能夠最大化神經網路內部特徵的影像,並且人類只能識別出其中的一小部分。這也許是優化過程只找到抽象影像模式的原因。

其他方法

研究文獻中還有許多指導優化過程的建議,從而找到人類更易識別的影像模式。

這篇文章提出了一種結合啟發式來引導影像模式的優化過程。論文中展示了一些類別的樣本影像,比如火烈鳥、鵜鶘、黑天鵝,人眼多多少少都能識別出來。在這裡有方法的實現(精確的行數以後可能會改變)。這個方法需要啟發式的組合並對引數進行微調,以生成這些影像。但論文中引數的選擇並不明確。儘管嘗試了一番,我還是無法重現他們的結果。也許我誤解了這篇論文,或許啟發式對他們網路架構(一種AlexNet的變體)的微調是好的,然而這篇教程中用的是更先進的Inception模型。

這篇文章提出了另一種生成人眼可識別的影像的方法。然而,實際上這個方法作弊了,因為它遍歷訓練集中的所有影像(比如ImageNet),找到能最大啟用神經網路中給定特徵的影像。然後對相似的影像做聚類和平均。將這個作為優化程式的初始影像。因此,當使用從真實照片構造的影像時,這個方法能得到更好的結果也不足為怪了。

練習

下面使一些可能會讓你提升TensorFlow技能的一些建議練習。為了學習如何更合適地使用TensorFlow,實踐經驗是很重要的。

在你對這個Notebook進行修改之前,可能需要先備份一下。

  • 嘗試為網路中較低層的特徵執行多次優化。得到的影像總是相同嗎?
  • 試著用更少或更多的優化迭代。這對影像質量有何影響?
  • 試著改變卷積特徵的損失函式。這可以用不同的方法來做。它將如何影響圖樣模式?為什麼?
  • 你認為優化器除了增大我們想要最大化的那個特徵之外,會放大其他特徵嗎?你要怎麼度量這個?你確定優化器一次只會最大化一個特徵嗎?
  • 試著同時最大化多個特徵。
  • 在MNIST資料集上訓練一個小一點的網路,然後試著對特徵和層次做視覺化。會更容易在影像中看到圖案嗎?
  • 試著實現上述論文中的方法。
  • 試著用你自己的方法來改善優化的影像。
  • 向朋友解釋程式如何工作。

相關文章