題圖來自:toyota.csail.mit.edu
本文主要對卷積神經網路做視覺化分析。01 – 簡單線性模型 | 02 – 卷積神經網路 | 03 – PrettyTensor | 04 – 儲存& 恢復
05 – 整合學習 | 06 – CIFAR 10 | 07 – Inception 模型 | 08 – 遷移學習
09 – 視訊資料 | 11 – 對抗樣本 | 12 – MNIST的對抗噪聲
by Magnus Erik Hvass Pedersen / GitHub / Videos on YouTube
中文翻譯 thrillerist / Github
如有轉載,請附上本文連結。
介紹
在之前的一些關於卷積神經網路的教程中,我們展示了卷積濾波權重,比如教程#02和#06。但單從濾波權重上看,不可能確定卷積濾波器能從輸入影像中識別出什麼。
本教程中,我們會提出一種用於視覺化分析神經網路內部工作原理的基本方法。這個方法就是生成最大化神經網路內個體特徵的影像。影像用一些隨機噪聲初始化,然後用給定特徵關於輸入影像的梯度來逐漸改變(生成的)影像。
視覺化分析神經網路的方法也稱為 特徵最大化(feature maximization) 或 啟用最大化(activation maximization)**。
本文基於之前的教程。你需要大概地熟悉神經網路(詳見教程 #01和 #02),瞭解Inception模型也很有幫助(教程 #07)。
流程圖
這裡將會使用教程 #07中的Inception模型。我們想要找到使得神經網路內給定特徵最大化的影像。輸入影像用一些噪聲初始化,然後用給定特徵的梯度來更新影像。在執行了一些優化迭代之後,我們會得到一個這個特定特徵“喜歡看到的”影像。
由於Inception模型是由很多相結合的基本數學運算構造的,使用微分鏈式法則,TensorFlow讓我們很快就能找到損失函式的梯度。
from IPython.display import Image, display
Image(`images/13_visual_analysis_flowchart.png`)複製程式碼
匯入
%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
# Functions and classes for loading and using the Inception model.
import inception複製程式碼
使用Python3.5.2(Anaconda)開發,TensorFlow版本是:
tf.__version__複製程式碼
`1.1.0`
Inception 模型
從網上下載Inception模型
從網上下載Inception模型。這是你儲存資料檔案的預設資料夾。如果資料夾不存在就自動建立。
# inception.data_dir = `inception/`複製程式碼
如果資料夾中不存在Inception模型,就自動下載。 它有85MB。
inception.maybe_download()複製程式碼
Downloading Inception v3 Model …
Download progress: 100.0%
Download finished. Extracting files.
Done.
卷積層的名稱
這個函式返回Inception模型中卷積層的名稱列表。
def get_conv_layer_names():
# Load the Inception model.
model = inception.Inception()
# Create a list of names for the operations in the graph
# for the Inception model where the operator-type is `Conv2D`.
names = [op.name for op in model.graph.get_operations() if op.type==`Conv2D`]
# Close the TensorFlow session inside the model-object.
model.close()
return names複製程式碼
conv_names = get_conv_layer_names()複製程式碼
在Inception模型中總共有94個卷積層。
len(conv_names)複製程式碼
94
寫出頭5個卷積層的名稱。
conv_names[:5]複製程式碼
[`conv/Conv2D`,
`conv_1/Conv2D`,
`conv_2/Conv2D`,
`conv_3/Conv2D`,
`conv_4/Conv2D`]
寫出最後5個卷積層的名稱。
conv_names[-5:]複製程式碼
[`mixed_10/tower_1/conv/Conv2D`,
`mixed_10/tower_1/conv_1/Conv2D`,
`mixed_10/tower_1/mixed/conv/Conv2D`,
`mixed_10/tower_1/mixed/conv_1/Conv2D`,
`mixed_10/tower_2/conv/Conv2D`]
找到輸入影像的幫助函式
這個函式用來尋找使網路內給定特徵最大化的輸入影像。它本質上是用梯度法來進行優化。影像用小的隨機值初始化,然後用給定特徵關於輸入影像的梯度來逐步更新。
def optimize_image(conv_id=None, feature=0,
num_iterations=30, show_progress=True):
"""
Find an image that maximizes the feature
given by the conv_id and feature number.
Parameters:
conv_id: Integer identifying the convolutional layer to
maximize. It is an index into conv_names.
If None then use the last fully-connected layer
before the softmax output.
feature: Index into the layer for the feature to maximize.
num_iteration: Number of optimization iterations to perform.
show_progress: Boolean whether to show the progress.
"""
# Load the Inception model. This is done for each call of
# this function because we will add a lot to the graph
# which will cause the graph to grow and eventually the
# computer will run out of memory.
model = inception.Inception()
# Reference to the tensor that takes the raw input image.
resized_image = model.resized_image
# Reference to the tensor for the predicted classes.
# This is the output of the final layer`s softmax classifier.
y_pred = model.y_pred
# Create the loss-function that must be maximized.
if conv_id is None:
# If we want to maximize a feature on the last layer,
# then we use the fully-connected layer prior to the
# softmax-classifier. The feature no. is the class-number
# and must be an integer between 1 and 1000.
# The loss-function is just the value of that feature.
loss = model.y_logits[0, feature]
else:
# If instead we want to maximize a feature of a
# convolutional layer inside the neural network.
# Get the name of the convolutional operator.
conv_name = conv_names[conv_id]
# Get a reference to the tensor that is output by the
# operator. Note that ":0" is added to the name for this.
tensor = model.graph.get_tensor_by_name(conv_name + ":0")
# Set the Inception model`s graph as the default
# so we can add an operator to it.
with model.graph.as_default():
# The loss-function is the average of all the
# tensor-values for the given feature. This
# ensures that we generate the whole input image.
# You can try and modify this so it only uses
# a part of the tensor.
loss = tf.reduce_mean(tensor[:,:,:,feature])
# Get the gradient for the loss-function with regard to
# the resized input image. This creates a mathematical
# function for calculating the gradient.
gradient = tf.gradients(loss, resized_image)
# Create a TensorFlow session so we can run the graph.
session = tf.Session(graph=model.graph)
# Generate a random image of the same size as the raw input.
# Each pixel is a small random value between 128 and 129,
# which is about the middle of the colour-range.
image_shape = resized_image.get_shape()
image = np.random.uniform(size=image_shape) + 128.0
# Perform a number of optimization iterations to find
# the image that maximizes the loss-function.
for i in range(num_iterations):
# Create a feed-dict. This feeds the image to the
# tensor in the graph that holds the resized image, because
# this is the final stage for inputting raw image data.
feed_dict = {model.tensor_name_resized_image: image}
# Calculate the predicted class-scores,
# as well as the gradient and the loss-value.
pred, grad, loss_value = session.run([y_pred, gradient, loss],
feed_dict=feed_dict)
# Squeeze the dimensionality for the gradient-array.
grad = np.array(grad).squeeze()
# The gradient now tells us how much we need to change the
# input image in order to maximize the given feature.
# Calculate the step-size for updating the image.
# This step-size was found to give fast convergence.
# The addition of 1e-8 is to protect from div-by-zero.
step_size = 1.0 / (grad.std() + 1e-8)
# Update the image by adding the scaled gradient
# This is called gradient ascent.
image += step_size * grad
# Ensure all pixel-values in the image are between 0 and 255.
image = np.clip(image, 0.0, 255.0)
if show_progress:
print("Iteration:", i)
# Convert the predicted class-scores to a one-dim array.
pred = np.squeeze(pred)
# The predicted class for the Inception model.
pred_cls = np.argmax(pred)
# Name of the predicted class.
cls_name = model.name_lookup.cls_to_name(pred_cls,
only_first_name=True)
# The score (probability) for the predicted class.
cls_score = pred[pred_cls]
# Print the predicted score etc.
msg = "Predicted class-name: {0} (#{1}), score: {2:>7.2%}"
print(msg.format(cls_name, pred_cls, cls_score))
# Print statistics for the gradient.
msg = "Gradient min: {0:>9.6f}, max: {1:>9.6f}, stepsize: {2:>9.2f}"
print(msg.format(grad.min(), grad.max(), step_size))
# Print the loss-value.
print("Loss:", loss_value)
# Newline.
print()
# Close the TensorFlow session inside the model-object.
model.close()
return image.squeeze()複製程式碼
繪製影像和噪聲的幫助函式
函式對影像做歸一化,則畫素值在0.0到1.0之間。
def normalize_image(x):
# Get the min and max values for all pixels in the input.
x_min = x.min()
x_max = x.max()
# Normalize so all values are between 0.0 and 1.0
x_norm = (x - x_min) / (x_max - x_min)
return x_norm複製程式碼
這個函式繪製一張影像。
def plot_image(image):
# Normalize the image so pixels are between 0.0 and 1.0
img_norm = normalize_image(image)
# Plot the image.
plt.imshow(img_norm, interpolation=`nearest`)
plt.show()複製程式碼
這個函式在座標系內繪製6張圖。
def plot_images(images, show_size=100):
"""
The show_size is the number of pixels to show for each image.
The max value is 299.
"""
# Create figure with sub-plots.
fig, axes = plt.subplots(2, 3)
# Adjust vertical spacing.
fig.subplots_adjust(hspace=0.1, wspace=0.1)
# Use interpolation to smooth pixels?
smooth = True
# Interpolation type.
if smooth:
interpolation = `spline16`
else:
interpolation = `nearest`
# For each entry in the grid.
for i, ax in enumerate(axes.flat):
# Get the i`th image and only use the desired pixels.
img = images[i, 0:show_size, 0:show_size, :]
# Normalize the image so its pixels are between 0.0 and 1.0
img_norm = normalize_image(img)
# Plot the image.
ax.imshow(img_norm, interpolation=interpolation)
# Remove ticks.
ax.set_xticks([])
ax.set_yticks([])
# Ensure the plot is shown correctly with multiple plots
# in a single Notebook cell.
plt.show()複製程式碼
優化和繪製影像的幫助函式
這個函式優化多張影像並繪製它們。
def optimize_images(conv_id=None, num_iterations=30, show_size=100):
"""
Find 6 images that maximize the 6 first features in the layer
given by the conv_id.
Parameters:
conv_id: Integer identifying the convolutional layer to
maximize. It is an index into conv_names.
If None then use the last layer before the softmax output.
num_iterations: Number of optimization iterations to perform.
show_size: Number of pixels to show for each image. Max 299.
"""
# Which layer are we using?
if conv_id is None:
print("Final fully-connected layer before softmax.")
else:
print("Layer:", conv_names[conv_id])
# Initialize the array of images.
images = []
# For each feature do the following. Note that the
# last fully-connected layer only supports numbers
# between 1 and 1000, while the convolutional layers
# support numbers between 0 and some other number.
# So we just use the numbers between 1 and 7.
for feature in range(1,7):
print("Optimizing image for feature no.", feature)
# Find the image that maximizes the given feature
# for the network layer identified by conv_id (or None).
image = optimize_image(conv_id=conv_id, feature=feature,
show_progress=False,
num_iterations=num_iterations)
# Squeeze the dim of the array.
image = image.squeeze()
# Append to the list of images.
images.append(image)
# Convert to numpy-array so we can index all dimensions easily.
images = np.array(images)
# Plot the images.
plot_images(images=images, show_size=show_size)複製程式碼
結果
為淺處的卷積層優化影像
舉個例子,尋找讓卷積層conv_names[conv_id]
中的2號特徵最大化的輸入影像,其中conv_id=5
。
image = optimize_image(conv_id=5, feature=2,
num_iterations=30, show_progress=True)複製程式碼
Iteration: 0
Predicted class-name: dishwasher (#667), score: 4.81%
Gradient min: -0.000083, max: 0.000100, stepsize: 76290.32
Loss: 4.83793Iteration: 1
Predicted class-name: kite (#397), score: 15.12%
Gradient min: -0.000142, max: 0.000126, stepsize: 71463.42
Loss: 5.59611Iteration: 2
Predicted class-name: wall clock (#524), score: 6.85%
Gradient min: -0.000119, max: 0.000121, stepsize: 80427.39
Loss: 6.91725…
Iteration: 28
Predicted class-name: bib (#941), score: 19.26%
Gradient min: -0.000043, max: 0.000043, stepsize: 214742.82
Loss: 17.7469Iteration: 29
Predicted class-name: bib (#941), score: 18.87%
Gradient min: -0.000047, max: 0.000059, stepsize: 218511.00
Loss: 17.9321
plot_image(image)複製程式碼
為卷積層優化多張影像
下面,我們為Inception模型中的卷積層優化多張影像,並繪製它們。這些影像展示了卷積層“想看到的”內容。注意更深的層次裡圖案變得越來越複雜。
optimize_images(conv_id=0, num_iterations=10)複製程式碼
Layer: conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
optimize_images(conv_id=3, num_iterations=30)複製程式碼
Layer: conv_3/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=4, num_iterations=30)複製程式碼
Layer: conv_4/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=5, num_iterations=30)複製程式碼
Layer: mixed/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=6, num_iterations=30)複製程式碼
Layer: mixed/tower/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=7, num_iterations=30)複製程式碼
Layer: mixed/tower/conv_1/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=8, num_iterations=30)複製程式碼
Layer: mixed/tower_1/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=9, num_iterations=30)複製程式碼
Layer: mixed/tower_1/conv_1/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=10, num_iterations=30)複製程式碼
Layer: mixed/tower_1/conv_2/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=20, num_iterations=30)複製程式碼
Layer: mixed_2/tower/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=30, num_iterations=30)複製程式碼
Layer: mixed_4/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=40, num_iterations=30)複製程式碼
Layer: mixed_5/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=50, num_iterations=30)複製程式碼
Layer: mixed_6/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=60, num_iterations=30)複製程式碼
Layer: mixed_7/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=70, num_iterations=30)複製程式碼
Layer: mixed_8/tower/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=80, num_iterations=30)複製程式碼
Layer: mixed_9/tower_1/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=90, num_iterations=30)複製程式碼
Layer: mixed_10/tower_1/conv_1/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
optimize_images(conv_id=93, num_iterations=30)複製程式碼
Layer: mixed_10/tower_2/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
Softmax前最終的全連線層
現在,我們為Inception模型中的最後一層優化並繪製影像。這是在softmax分類器前的全連線層。該層特徵對應了輸出的類別。
我們可能希望在這些影像裡看到一些可識別的圖案,比如對應輸出類別的猴子、鳥類等,但影像只顯示了一些複雜的、抽象的圖案。
optimize_images(conv_id=None, num_iterations=30)複製程式碼
Final fully-connected layer before softmax.
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6
上面只顯示了100×100畫素的影像,但實際上是299×299畫素。如果我們執行更多的優化迭代並畫出完整的影像,可能會有一些可識別的模式。那麼,讓我們再次優化第一張影像,並以全解析度來繪製。
Inception模型以大約100%的確信度將結果影像分類成“敏狐”,但在人眼看來,影像只是一些抽象的圖案。
如果你想測試另一個特徵號碼,要注意,號碼必須介於0到1000之間,因為它對應了最終輸出層的一個有效類別號。
image = optimize_image(conv_id=None, feature=1,
num_iterations=100, show_progress=True)複製程式碼
Iteration: 0
Predicted class-name: dishwasher (#667), score: 4.98%
Gradient min: -0.006252, max: 0.004451, stepsize: 3734.48
Loss: -0.837608Iteration: 1
Predicted class-name: ballpoint (#907), score: 8.52%
Gradient min: -0.007303, max: 0.006427, stepsize: 2152.89
Loss: -0.416723
…
Iteration: 98
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.007732, max: 0.010692, stepsize: 1286.44
Loss: 67.5603Iteration: 99
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005850, max: 0.006159, stepsize: 1863.65
Loss: 75.6356
plot_image(image=image)複製程式碼
關閉TensorFlow會話
在上面使用Inception模型的函式中已經關閉了TensorFlow會話。這麼做是為了節省記憶體,因此當計算圖中新增了很多梯度函式時,電腦不會奔潰。
總結
這篇教程說明了如何優化輸入影像,使得神經網路內的特徵最大化。由於神經網路內給定特徵(或神經元)對特定的影像反應最強烈,這讓我們可以對其“喜歡看到的東西”進行視覺化分析。
對神經網路的較低層,影像包含了簡單的圖案,比如不同型別的波浪線。隨著網路越來越深,影像模式越來越複雜。我們可能會希望深層網路的模式是可識別的,比如猴子、狐狸、汽車等等,但實際上深層網路的影像模式更加複雜和抽象。
這是為什麼?回想在教程 #11中,Inception模型很容易就被一些對抗噪聲糊弄,而將任何輸入圖分類為另外的目標類別。因此,不難想象Inception模型可以識別這些在人眼看來並不清楚的抽象影像模式。可能存在無窮多的能夠最大化神經網路內部特徵的影像,並且人類只能識別出其中的一小部分。這也許是優化過程只找到抽象影像模式的原因。
其他方法
研究文獻中還有許多指導優化過程的建議,從而找到人類更易識別的影像模式。
這篇文章提出了一種結合啟發式來引導影像模式的優化過程。論文中展示了一些類別的樣本影像,比如火烈鳥、鵜鶘、黑天鵝,人眼多多少少都能識別出來。在這裡有方法的實現(精確的行數以後可能會改變)。這個方法需要啟發式的組合並對引數進行微調,以生成這些影像。但論文中引數的選擇並不明確。儘管嘗試了一番,我還是無法重現他們的結果。也許我誤解了這篇論文,或許啟發式對他們網路架構(一種AlexNet的變體)的微調是好的,然而這篇教程中用的是更先進的Inception模型。
這篇文章提出了另一種生成人眼可識別的影像的方法。然而,實際上這個方法作弊了,因為它遍歷訓練集中的所有影像(比如ImageNet),找到能最大啟用神經網路中給定特徵的影像。然後對相似的影像做聚類和平均。將這個作為優化程式的初始影像。因此,當使用從真實照片構造的影像時,這個方法能得到更好的結果也不足為怪了。
練習
下面使一些可能會讓你提升TensorFlow技能的一些建議練習。為了學習如何更合適地使用TensorFlow,實踐經驗是很重要的。
在你對這個Notebook進行修改之前,可能需要先備份一下。
- 嘗試為網路中較低層的特徵執行多次優化。得到的影像總是相同嗎?
- 試著用更少或更多的優化迭代。這對影像質量有何影響?
- 試著改變卷積特徵的損失函式。這可以用不同的方法來做。它將如何影響圖樣模式?為什麼?
- 你認為優化器除了增大我們想要最大化的那個特徵之外,會放大其他特徵嗎?你要怎麼度量這個?你確定優化器一次只會最大化一個特徵嗎?
- 試著同時最大化多個特徵。
- 在MNIST資料集上訓練一個小一點的網路,然後試著對特徵和層次做視覺化。會更容易在影像中看到圖案嗎?
- 試著實現上述論文中的方法。
- 試著用你自己的方法來改善優化的影像。
- 向朋友解釋程式如何工作。