題圖來自:github
本文主要演示了在CIFAR-10資料集上進行影像識別。
其中有大段之前教程的文字及程式碼,如果看過的朋友可以快速翻閱。01 - 簡單線性模型 | 02 - 卷積神經網路 | 03 - PrettyTensor | 04 - 儲存& 恢復
05 - 整合學習
by Magnus Erik Hvass Pedersen / GitHub / Videos on YouTube
中文翻譯 thrillerist / Github
簡介
這篇教程介紹瞭如何建立一個在CIRAR-10資料集上進行影像分類的卷積神經網路。同時也說明了在訓練和測試時如何使用不同的網路。
本文基於上一篇教程,你需要了解基本的TensorFlow和附加包Pretty Tensor。其中大量程式碼和文字與之前教程相似,如果你已經看過可以快速地瀏覽本文。
流程圖
下面的圖表直接顯示了之後實現的卷積神經網路中資料的傳遞。首先有一個扭曲(distorts)輸入影像的預處理層,用來人為地擴大訓練集。接著有兩個卷積層,兩個全連線層和一個softmax分類層。在後面會有更大的圖示來顯示權重和卷積層的輸出,教程 #02 有卷積如何工作的更多細節。
在這種情況下影像是誤分類的。影像上有一隻狗,但神經網路不確定它是狗還是貓,認為更有可能是貓。
from IPython.display import Image
Image('images/06_network_flowchart.png')複製程式碼
匯入
%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.metrics import confusion_matrix
import time
from datetime import timedelta
import math
import os
# Use PrettyTensor to simplify Neural Network construction.
import prettytensor as pt複製程式碼
使用Python3.5.2(Anaconda)開發,TensorFlow版本是:
tf.__version__複製程式碼
'0.12.0-rc0'
PrettyTensor 版本:
pt.__version__複製程式碼
'0.7.1'
載入資料
import cifar10複製程式碼
設定電腦上儲存資料集的路徑。
# cifar10.data_path = "data/CIFAR-10/"複製程式碼
CIFAR-10資料集大概有163MB,如果給定路徑沒有找到檔案的話,將會自動下載。
cifar10.maybe_download_and_extract()複製程式碼
Data has apparently already been downloaded and unpacked.
載入分類名稱。
class_names = cifar10.load_class_names()
class_names複製程式碼
Loading data: data/CIFAR-10/cifar-10-batches-py/batches.meta
['airplane',
'automobile',
'bird',
'cat',
'deer',
'dog',
'frog',
'horse',
'ship',
'truck']
載入訓練集。這個函式返回影像、整形分類號碼、以及用One-Hot編碼的分類號陣列,稱為標籤。
images_train, cls_train, labels_train = cifar10.load_training_data()複製程式碼
Loading data: data/CIFAR-10/cifar-10-batches-py/data_batch_1
Loading data: data/CIFAR-10/cifar-10-batches-py/data_batch_2
Loading data: data/CIFAR-10/cifar-10-batches-py/data_batch_3
Loading data: data/CIFAR-10/cifar-10-batches-py/data_batch_4
Loading data: data/CIFAR-10/cifar-10-batches-py/data_batch_5
載入測試集。
images_test, cls_test, labels_test = cifar10.load_test_data()複製程式碼
Loading data: data/CIFAR-10/cifar-10-batches-py/test_batch
現在已經載入了CIFAR-10資料集,它包含60,000張影像以及相關的標籤(影像的分類)。資料集被分為兩個獨立的子集,即訓練集和測試集。
print("Size of:")
print("- Training-set:\t\t{}".format(len(images_train)))
print("- Test-set:\t\t{}".format(len(images_test)))複製程式碼
Size of:
- Training-set: 50000
- Test-set: 10000
資料維度
下面的程式碼中多次用到資料維度。cirfa10模組中已經定義好了這些,因此我們只需要import進來。
from cifar10 import img_size, num_channels, num_classes複製程式碼
影像是32 x 32畫素的,但我們將影像裁剪至24 x 24畫素。
img_size_cropped = 24複製程式碼
用來繪製圖片的幫助函式
這個函式用來在3x3的柵格中畫9張影像,然後在每張影像下面寫出真實類別和預測類別。
def plot_images(images, cls_true, cls_pred=None, smooth=True):
assert len(images) == len(cls_true) == 9
# Create figure with sub-plots.
fig, axes = plt.subplots(3, 3)
# Adjust vertical spacing if we need to print ensemble and best-net.
if cls_pred is None:
hspace = 0.3
else:
hspace = 0.6
fig.subplots_adjust(hspace=hspace, wspace=0.3)
for i, ax in enumerate(axes.flat):
# Interpolation type.
if smooth:
interpolation = 'spline16'
else:
interpolation = 'nearest'
# Plot image.
ax.imshow(images[i, :, :, :],
interpolation=interpolation)
# Name of the true class.
cls_true_name = class_names[cls_true[i]]
# Show true and predicted classes.
if cls_pred is None:
xlabel = "True: {0}".format(cls_true_name)
else:
# Name of the predicted class.
cls_pred_name = class_names[cls_pred[i]]
xlabel = "True: {0}\nPred: {1}".format(cls_true_name, cls_pred_name)
# Show the classes as the label on the x-axis.
ax.set_xlabel(xlabel)
# Remove ticks from the plot.
ax.set_xticks([])
ax.set_yticks([])
# Ensure the plot is shown correctly with multiple plots
# in a single Notebook cell.
plt.show()複製程式碼
繪製幾張影像來看看資料是否正確
# Get the first images from the test-set.
images = images_test[0:9]
# Get the true classes for those images.
cls_true = cls_test[0:9]
# Plot the images and labels using our helper-function above.
plot_images(images=images, cls_true=cls_true, smooth=False)複製程式碼
上面畫素化的影像是神經網路的輸入。如果我們對影像進行平滑處理,可能更易於人眼識別。
plot_images(images=images, cls_true=cls_true, smooth=True)複製程式碼
TensorFlow圖
TensorFlow的全部目的就是使用一個稱之為計算圖(computational graph)的東西,它會比直接在Python中進行相同計算量要高效得多。TensorFlow比Numpy更高效,因為TensorFlow瞭解整個需要執行的計算圖,然而Numpy只知道某個時間點上唯一的數學運算。
TensorFlow也能夠自動地計算需要優化的變數的梯度,使得模型有更好的表現。這是由於圖是簡單數學表示式的結合,因此整個圖的梯度可以用鏈式法則推匯出來。
TensorFlow還能利用多核CPU和GPU,Google也為TensorFlow製造了稱為TPUs(Tensor Processing Units)的特殊晶片,它比GPU更快。
一個TensorFlow圖由下面幾個部分組成,後面會詳細描述:
- 佔位符變數(Placeholder)用來改變圖的輸入。
- 模型變數(Model)將會被優化,使得模型表現得更好。
- 模型本質上就是一些數學函式,它根據Placeholder和模型的輸入變數來計算一些輸出。
- 一個cost度量用來指導變數的優化。
- 一個優化策略會更新模型的變數。
另外,TensorFlow圖也包含了一些除錯狀態,比如用TensorBoard列印log資料,本教程不涉及這些。
佔位符 (Placeholder)變數
Placeholder是作為圖的輸入,我們每次執行圖的時候都可能改變它們。將這個過程稱為feeding placeholder變數,後面將會描述這個。
首先我們為輸入影像定義placeholder變數。這讓我們可以改變輸入到TensorFlow圖中的影像。這也是一個張量(tensor),代表一個多維向量或矩陣。資料型別設定為float32,形狀設為[None, img_size, img_size, num_channels]
代表tensor可能儲存著任意數量的影像,每張影像寬高都為img_size
,有num_channels
個顏色通道。
x = tf.placeholder(tf.float32, shape=[None, img_size, img_size, num_channels], name='x')複製程式碼
接下來我們為輸入變數x
中的影像所對應的真實標籤定義placeholder變數。變數的形狀是[None, num_classes]
,這代表著它儲存了任意數量的標籤,每個標籤是長度為num_classes
的向量,本例中長度為10。
y_true = tf.placeholder(tf.float32, shape=[None, num_classes], name='y_true')複製程式碼
我們也可以為class-number提供一個placeholder,但這裡用argmax來計算它。這裡只是TensorFlow中的一些操作,沒有執行什麼運算。
y_true_cls = tf.argmax(y_true, dimension=1)複製程式碼
預處理的幫助函式
下面的幫助函式建立了用來預處理輸入影像的TensorFlow計算圖。這裡並未執行計算,函式只是給TensorFlow計算圖新增了節點。
神經網路在訓練和測試階段的預處理方法不同:
對於訓練來說,輸入影像是隨機裁剪、水平翻轉的,並且用隨機值來調整色調、對比度和飽和度。這樣就建立了原始輸入影像的隨機變體,人為地擴充了訓練集。後面會顯示一些扭曲過的影像樣本。
對於測試,輸入影像根據中心裁剪,其他不作調整。
def pre_process_image(image, training):
# This function takes a single image as input,
# and a boolean whether to build the training or testing graph.
if training:
# For training, add the following to the TensorFlow graph.
# Randomly crop the input image.
image = tf.random_crop(image, size=[img_size_cropped, img_size_cropped, num_channels])
# Randomly flip the image horizontally.
image = tf.image.random_flip_left_right(image)
# Randomly adjust hue, contrast and saturation.
image = tf.image.random_hue(image, max_delta=0.05)
image = tf.image.random_contrast(image, lower=0.3, upper=1.0)
image = tf.image.random_brightness(image, max_delta=0.2)
image = tf.image.random_saturation(image, lower=0.0, upper=2.0)
# Some of these functions may overflow and result in pixel
# values beyond the [0, 1] range. It is unclear from the
# documentation of TensorFlow 0.10.0rc0 whether this is
# intended. A simple solution is to limit the range.
# Limit the image pixels between [0, 1] in case of overflow.
image = tf.minimum(image, 1.0)
image = tf.maximum(image, 0.0)
else:
# For training, add the following to the TensorFlow graph.
# Crop the input image around the centre so it is the same
# size as images that are randomly cropped during training.
image = tf.image.resize_image_with_crop_or_pad(image,
target_height=img_size_cropped,
target_width=img_size_cropped)
return image複製程式碼
下面函式中,輸入batch中每張影像都呼叫以上函式。
def pre_process(images, training):
# Use TensorFlow to loop over all the input images and call
# the function above which takes a single image as input.
images = tf.map_fn(lambda image: pre_process_image(image, training), images)
return images複製程式碼
為了繪製扭曲過的影像,我們為TensorFlow建立預處理graph,後面將會執行它。
distorted_images = pre_process(images=x, training=True)複製程式碼
建立主要處理程式的幫助函式
下面的幫助函式建立了卷積神經網路的主要部分。這裡使用之前教程描述過的Pretty Tensor。
def main_network(images, training):
# Wrap the input images as a Pretty Tensor object.
x_pretty = pt.wrap(images)
# Pretty Tensor uses special numbers to distinguish between
# the training and testing phases.
if training:
phase = pt.Phase.train
else:
phase = pt.Phase.infer
# Create the convolutional neural network using Pretty Tensor.
# It is very similar to the previous tutorials, except
# the use of so-called batch-normalization in the first layer.
with pt.defaults_scope(activation_fn=tf.nn.relu, phase=phase):
y_pred, loss = x_pretty.\
conv2d(kernel=5, depth=64, name='layer_conv1', batch_normalize=True).\
max_pool(kernel=2, stride=2).\
conv2d(kernel=5, depth=64, name='layer_conv2').\
max_pool(kernel=2, stride=2).\
flatten().\
fully_connected(size=256, name='layer_fc1').\
fully_connected(size=128, name='layer_fc2').\
softmax_classifier(num_classes=num_classes, labels=y_true)
return y_pred, loss複製程式碼
建立神經網路的幫助函式
下面的幫助函式建立了整個神經網路,包含上面定義的預處理以及主要處理模組。
注意,神經網路被編碼到'network'變數作用域中。因為我們實際上在TensorFlow圖中建立了兩個神經網路。像這樣指定一個變數作用域,可以在兩個神經網路中複用變數,因此訓練網路優化過的變數可以在測試網路中複用。
def create_network(training):
# Wrap the neural network in the scope named 'network'.
# Create new variables during training, and re-use during testing.
with tf.variable_scope('network', reuse=not training):
# Just rename the input placeholder variable for convenience.
images = x
# Create TensorFlow graph for pre-processing.
images = pre_process(images=images, training=training)
# Create TensorFlow graph for the main processing.
y_pred, loss = main_network(images=images, training=training)
return y_pred, loss複製程式碼
為訓練階段建立神經網路
首先建立一個儲存當前優化迭代次數的TensorFlow變數。在之前的教程中,是使用一個Python變數,但本教程中,我們想用checkpoints中的其他TensorFlow變數來儲存。
trainable=False
表示TensorFlow不會優化此變數。
global_step = tf.Variable(initial_value=0,
name='global_step', trainable=False)複製程式碼
建立訓練用的神經網路。函式 create_network()
返回y_pred
和loss
,但在訓練時我們只需用到loss
函式。
_, loss = create_network(training=True)複製程式碼
建立最小化loss
函式的優化器。同時將global_step
傳給優化器,這樣每次迭代它都減一。
optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(loss, global_step=global_step)複製程式碼
建立測試階段的神經網路
現在建立測試階段的神經網路。 同樣的,create_network()
返回輸入影像的預測標籤 y_pred
,優化過程也用到 loss
函式。測試時我們只需要y_pred
。
y_pred, _ = create_network(training=False)複製程式碼
然後我們計算預測類別號的整形數字。網路的輸出y_pred
是一個10個元素的陣列。類別號是陣列中最大元素的索引。
y_pred_cls = tf.argmax(y_pred, dimension=1)複製程式碼
然後建立一個布林向量,用來告訴我們每張圖片的真實類別是否與預測類別相同。
correct_prediction = tf.equal(y_pred_cls, y_true_cls)複製程式碼
上面的計算先將布林值向量型別轉換成浮點型向量,這樣子False就變成0,True變成1,然後計算這些值的平均數,以此來計算分類的準確率。
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))複製程式碼
Saver
為了儲存神經網路的變數(這樣不必再次訓練網路就能過載),我們建立一個稱為Saver-object的物件,它用來儲存及恢復TensorFlow圖的所有變數。在這裡並未儲存什麼東西,(儲存操作)在後面的optimize()
函式中完成。
saver = tf.train.Saver()複製程式碼
獲取權重
下面,我們要繪製神經網路的權重。當使用Pretty Tensor來建立網路時,層的所有變數都是由Pretty Tensoe間接建立的。因此我們要從TensorFlow中獲取變數。
我們用layer_conv1
和 layer_conv2
代表兩個卷積層。這也叫變數作用域(不要與上面描述的defaults_scope
混淆了)。PrettyTensor會自動給它為每個層建立的變數命名,因此我們可以通過層的作用域名稱和變數名來取得某一層的權重。
函式實現有點笨拙,因為我們不得不用TensorFlow函式get_variable()
,它是設計給其他用途的,建立新的變數或重用現有變數。建立下面的幫助函式很簡單。
def get_weights_variable(layer_name):
# Retrieve an existing variable named 'weights' in the scope
# with the given layer_name.
# This is awkward because the TensorFlow function was
# really intended for another purpose.
with tf.variable_scope("network/" + layer_name, reuse=True):
variable = tf.get_variable('weights')
return variable複製程式碼
藉助這個幫助函式我們可以獲取變數。這些是TensorFlow的objects。你需要類似的操作來獲取變數的內容: contents = session.run(weights_conv1)
,下面會提到這個。
weights_conv1 = get_weights_variable(layer_name='layer_conv1')
weights_conv2 = get_weights_variable(layer_name='layer_conv2')複製程式碼
獲取layer的輸出
同樣的,我們還需要獲取卷積層的輸出。這個函式與上面獲取權重的函式有所不同。這裡我們找回卷積層輸出的最後一個張量。
def get_layer_output(layer_name):
# The name of the last operation of the convolutional layer.
# This assumes you are using Relu as the activation-function.
tensor_name = "network/" + layer_name + "/Relu:0"
# Get the tensor with this name.
tensor = tf.get_default_graph().get_tensor_by_name(tensor_name)
return tensor複製程式碼
取得卷積層的輸出以便之後繪製。
output_conv1 = get_layer_output(layer_name='layer_conv1')
output_conv2 = get_layer_output(layer_name='layer_conv2')複製程式碼
執行TensorFlow
建立TensorFlow會話(session)
一旦建立了TensorFlow圖,我們需要建立一個TensorFlow會話,用來執行圖。
session = tf.Session()複製程式碼
初始化或恢復變數
訓練神經網路會花上很長時間,特別是當你沒有GPU的時候。因此我們在訓練時儲存checkpoints,這樣就能在其他時間繼續訓練(比如晚上),以後也可以不用訓練神經網路就用這些來分析結果。
如果你想重新訓練神經網路,就需要先刪掉這些checkpoints。
這是用來儲存checkpoints的資料夾。
save_dir = 'checkpoints/'複製程式碼
如果資料夾不存在則建立。
if not os.path.exists(save_dir):
os.makedirs(save_dir)複製程式碼
這是checkpoints的基本檔名,TensorFlow會在後面新增迭代次數等。
save_path = os.path.join(save_dir, 'cifar10_cnn')複製程式碼
試著載入最新的checkpoint。如果checkpoint不存在或改變了TensorFlow圖的話,可能會失敗並丟擲異常。
try:
print("Trying to restore last checkpoint ...")
# Use TensorFlow to find the latest checkpoint - if any.
last_chk_path = tf.train.latest_checkpoint(checkpoint_dir=save_dir)
# Try and load the data in the checkpoint.
saver.restore(session, save_path=last_chk_path)
# If we get to this point, the checkpoint was successfully loaded.
print("Restored checkpoint from:", last_chk_path)
except:
# If the above failed for some reason, simply
# initialize all the variables for the TensorFlow graph.
print("Failed to restore checkpoint. Initializing variables instead.")
session.run(tf.global_variables_initializer())複製程式碼
Trying to restore last checkpoint ...
Restored checkpoint from: checkpoints/cifar10_cnn-150000複製程式碼
建立隨機訓練batch的幫助函式
在訓練集中有50,000張圖。用這些影像計算模型的梯度會花很多時間。因此,在優化器的每次迭代裡只用到了一小部分的影像。
如果記憶體耗盡導致電腦當機或變得很慢,你應該試著減少這些數量,但同時可能還需要更優化的迭代。
train_batch_size = 64複製程式碼
函式從訓練集中挑選一個隨機的training-batch。
def random_batch():
# Number of images in the training-set.
num_images = len(images_train)
# Create a random index.
idx = np.random.choice(num_images,
size=train_batch_size,
replace=False)
# Use the random index to select random images and labels.
x_batch = images_train[idx, :, :, :]
y_batch = labels_train[idx, :]
return x_batch, y_batch複製程式碼
執行優化迭代的幫助函式
函式用來執行一定數量的優化迭代,以此來逐漸改善網路層的變數。在每次迭代中,會從訓練集中選擇新的一批資料,然後TensorFlow在這些訓練樣本上執行優化。每100次迭代會列印出進度。每1000次迭代後會儲存一個checkpoint,最後一次迭代完畢也會儲存。
def optimize(num_iterations):
# Start-time used for printing time-usage below.
start_time = time.time()
for i in range(num_iterations):
# Get a batch of training examples.
# x_batch now holds a batch of images and
# y_true_batch are the true labels for those images.
x_batch, y_true_batch = random_batch()
# Put the batch into a dict with the proper names
# for placeholder variables in the TensorFlow graph.
feed_dict_train = {x: x_batch,
y_true: y_true_batch}
# Run the optimizer using this batch of training data.
# TensorFlow assigns the variables in feed_dict_train
# to the placeholder variables and then runs the optimizer.
# We also want to retrieve the global_step counter.
i_global, _ = session.run([global_step, optimizer],
feed_dict=feed_dict_train)
# Print status to screen every 100 iterations (and last).
if (i_global % 100 == 0) or (i == num_iterations - 1):
# Calculate the accuracy on the training-batch.
batch_acc = session.run(accuracy,
feed_dict=feed_dict_train)
# Print status.
msg = "Global Step: {0:>6}, Training Batch Accuracy: {1:>6.1%}"
print(msg.format(i_global, batch_acc))
# Save a checkpoint to disk every 1000 iterations (and last).
if (i_global % 1000 == 0) or (i == num_iterations - 1):
# Save all variables of the TensorFlow graph to a
# checkpoint. Append the global_step counter
# to the filename so we save the last several checkpoints.
saver.save(session,
save_path=save_path,
global_step=global_step)
print("Saved checkpoint.")
# Ending time.
end_time = time.time()
# Difference between start and end-times.
time_dif = end_time - start_time
# Print the time-usage.
print("Time usage: " + str(timedelta(seconds=int(round(time_dif)))))複製程式碼
用來繪製錯誤樣本的幫助函式
函式用來繪製測試集中被誤分類的樣本。
def plot_example_errors(cls_pred, correct):
# This function is called from print_test_accuracy() below.
# cls_pred is an array of the predicted class-number for
# all images in the test-set.
# correct is a boolean array whether the predicted class
# is equal to the true class for each image in the test-set.
# Negate the boolean array.
incorrect = (correct == False)
# Get the images from the test-set that have been
# incorrectly classified.
images = images_test[incorrect]
# Get the predicted classes for those images.
cls_pred = cls_pred[incorrect]
# Get the true classes for those images.
cls_true = cls_test[incorrect]
# Plot the first 9 images.
plot_images(images=images[0:9],
cls_true=cls_true[0:9],
cls_pred=cls_pred[0:9])複製程式碼
繪製混淆(confusion)矩陣的幫助函式
def plot_confusion_matrix(cls_pred):
# This is called from print_test_accuracy() below.
# cls_pred is an array of the predicted class-number for
# all images in the test-set.
# Get the confusion matrix using sklearn.
cm = confusion_matrix(y_true=cls_test, # True class for test-set.
y_pred=cls_pred) # Predicted class.
# Print the confusion matrix as text.
for i in range(num_classes):
# Append the class-name to each line.
class_name = "({}) {}".format(i, class_names[i])
print(cm[i, :], class_name)
# Print the class-numbers for easy reference.
class_numbers = [" ({0})".format(i) for i in range(num_classes)]
print("".join(class_numbers))複製程式碼
計算分類的幫助函式
這個函式用來計算影像的預測類別,同時返回一個代表每張影像分類是否正確的布林陣列。
由於計算可能會耗費太多記憶體,就分批處理。如果你的電腦當機了,試著降低batch-size。
# Split the data-set in batches of this size to limit RAM usage.
batch_size = 256
def predict_cls(images, labels, cls_true):
# Number of images.
num_images = len(images)
# Allocate an array for the predicted classes which
# will be calculated in batches and filled into this array.
cls_pred = np.zeros(shape=num_images, dtype=np.int)
# Now calculate the predicted classes for the batches.
# We will just iterate through all the batches.
# There might be a more clever and Pythonic way of doing this.
# The starting index for the next batch is denoted i.
i = 0
while i < num_images:
# The ending index for the next batch is denoted j.
j = min(i + batch_size, num_images)
# Create a feed-dict with the images and labels
# between index i and j.
feed_dict = {x: images[i:j, :],
y_true: labels[i:j, :]}
# Calculate the predicted class using TensorFlow.
cls_pred[i:j] = session.run(y_pred_cls, feed_dict=feed_dict)
# Set the start-index for the next batch to the
# end-index of the current batch.
i = j
# Create a boolean array whether each image is correctly classified.
correct = (cls_true == cls_pred)
return correct, cls_pred複製程式碼
計算測試集的預測類別。
def predict_cls_test():
return predict_cls(images = images_test,
labels = labels_test,
cls_true = cls_test)複製程式碼
計算分類準確率的幫助函式
這個函式計算了給定布林陣列的分類準確率,布林陣列表示每張影像是否被正確分類。比如, cls_accuracy([True, True, False, False, False]) = 2/5 = 0.4
。這個函式也返回了正確分類的數量。
def classification_accuracy(correct):
# When averaging a boolean array, False means 0 and True means 1.
# So we are calculating: number of True / len(correct) which is
# the same as the classification accuracy.
# Return the classification accuracy
# and the number of correct classifications.
return correct.mean(), correct.sum()複製程式碼
展示效能的幫助函式
函式用來列印測試集上的分類準確率。
為測試集上的所有圖片計算分類會花費一段時間,因此我們直接從這個函式裡呼叫上面的函式,這樣就不用每個函式都重新計算分類。
def print_test_accuracy(show_example_errors=False,
show_confusion_matrix=False):
# For all the images in the test-set,
# calculate the predicted classes and whether they are correct.
correct, cls_pred = predict_cls_test()
# Classification accuracy and the number of correct classifications.
acc, num_correct = classification_accuracy(correct)
# Number of images being classified.
num_images = len(correct)
# Print the accuracy.
msg = "Accuracy on Test-Set: {0:.1%} ({1} / {2})"
print(msg.format(acc, num_correct, num_images))
# Plot some examples of mis-classifications, if desired.
if show_example_errors:
print("Example errors:")
plot_example_errors(cls_pred=cls_pred, correct=correct)
# Plot the confusion matrix, if desired.
if show_confusion_matrix:
print("Confusion Matrix:")
plot_confusion_matrix(cls_pred=cls_pred)複製程式碼
繪製卷積權重的幫助函式
def plot_conv_weights(weights, input_channel=0):
# Assume weights are TensorFlow ops for 4-dim variables
# e.g. weights_conv1 or weights_conv2.
# Retrieve the values of the weight-variables from TensorFlow.
# A feed-dict is not necessary because nothing is calculated.
w = session.run(weights)
# Print statistics for the weights.
print("Min: {0:.5f}, Max: {1:.5f}".format(w.min(), w.max()))
print("Mean: {0:.5f}, Stdev: {1:.5f}".format(w.mean(), w.std()))
# Get the lowest and highest values for the weights.
# This is used to correct the colour intensity across
# the images so they can be compared with each other.
w_min = np.min(w)
w_max = np.max(w)
abs_max = max(abs(w_min), abs(w_max))
# Number of filters used in the conv. layer.
num_filters = w.shape[3]
# Number of grids to plot.
# Rounded-up, square-root of the number of filters.
num_grids = math.ceil(math.sqrt(num_filters))
# Create figure with a grid of sub-plots.
fig, axes = plt.subplots(num_grids, num_grids)
# Plot all the filter-weights.
for i, ax in enumerate(axes.flat):
# Only plot the valid filter-weights.
if i<num_filters:
# Get the weights for the i'th filter of the input channel.
# The format of this 4-dim tensor is determined by the
# TensorFlow API. See Tutorial #02 for more details.
img = w[:, :, input_channel, i]
# Plot image.
ax.imshow(img, vmin=-abs_max, vmax=abs_max,
interpolation='nearest', cmap='seismic')
# Remove ticks from the plot.
ax.set_xticks([])
ax.set_yticks([])
# Ensure the plot is shown correctly with multiple plots
# in a single Notebook cell.
plt.show()複製程式碼
繪製卷積層輸出的幫助函式
def plot_layer_output(layer_output, image):
# Assume layer_output is a 4-dim tensor
# e.g. output_conv1 or output_conv2.
# Create a feed-dict which holds the single input image.
# Note that TensorFlow needs a list of images,
# so we just create a list with this one image.
feed_dict = {x: [image]}
# Retrieve the output of the layer after inputting this image.
values = session.run(layer_output, feed_dict=feed_dict)
# Get the lowest and highest values.
# This is used to correct the colour intensity across
# the images so they can be compared with each other.
values_min = np.min(values)
values_max = np.max(values)
# Number of image channels output by the conv. layer.
num_images = values.shape[3]
# Number of grid-cells to plot.
# Rounded-up, square-root of the number of filters.
num_grids = math.ceil(math.sqrt(num_images))
# Create figure with a grid of sub-plots.
fig, axes = plt.subplots(num_grids, num_grids)
# Plot all the filter-weights.
for i, ax in enumerate(axes.flat):
# Only plot the valid image-channels.
if i<num_images:
# Get the images for the i'th output channel.
img = values[0, :, :, i]
# Plot image.
ax.imshow(img, vmin=values_min, vmax=values_max,
interpolation='nearest', cmap='binary')
# Remove ticks from the plot.
ax.set_xticks([])
ax.set_yticks([])
# Ensure the plot is shown correctly with multiple plots
# in a single Notebook cell.
plt.show()複製程式碼
輸入影像變體的樣本
為了人為地增加訓練用的影像數量,神經網路預處理獲取輸入影像的隨機變體。這讓神經網路在識別和分類影像時更加靈活。
這是用來繪製輸入影像變體的幫助函式。
def plot_distorted_image(image, cls_true):
# Repeat the input image 9 times.
image_duplicates = np.repeat(image[np.newaxis, :, :, :], 9, axis=0)
# Create a feed-dict for TensorFlow.
feed_dict = {x: image_duplicates}
# Calculate only the pre-processing of the TensorFlow graph
# which distorts the images in the feed-dict.
result = session.run(distorted_images, feed_dict=feed_dict)
# Plot the images.
plot_images(images=result, cls_true=np.repeat(cls_true, 9))複製程式碼
幫助函式獲取測試集影像以及它的分類號。
def get_test_image(i):
return images_test[i, :, :, :], cls_test[i]複製程式碼
從測試集中取一張影像以及它的真實類別。
img, cls = get_test_image(16)複製程式碼
畫出影像的9張隨機變體。如果你重新執行程式碼,可能會得到不太一樣的結果。
plot_distorted_image(img, cls)複製程式碼
執行優化
我的膝上型電腦是4核的,每個2GHz。電腦帶有一個GPU,但對TensorFlow來說不太快,因此只用了CPU。在電腦上迭代10,000次大概花了1個小時。本教程中我執行了150,000次優化迭代,共花了15個小時。我讓它在夜裡以及白天的幾個時間段執行。
由於我們在優化過程中儲存了checkpoints,重新執行程式碼時會載入最後的那個checkpoint,所以可以先停止,等晚點再繼續執行優化。
if False:
optimize(num_iterations=1000)複製程式碼
結果
在150,000次優化迭代之後,測試集上的分類準確率大約79%-80%。下面畫出了一些誤分類的影像。其中有一些即使人眼也很難分辨出來,也有一些是合乎情理的錯誤,比如大型車和卡車,貓與狗,但有些錯誤就有點奇怪了。
print_test_accuracy(show_example_errors=True,
show_confusion_matrix=True)複製程式碼
Accuracy on Test-Set: 79.3% (7932 / 10000)
Example errors:
Confusion Matrix:
[775 20 71 8 14 4 18 10 44 36] (0) airplane
[ 7 914 5 0 3 7 9 3 14 38] (1) automobile
[ 32 2 724 28 42 44 94 17 9 8] (2) bird
[ 18 7 48 508 56 209 99 29 7 19] (3) cat
[ 4 2 45 25 769 29 75 43 3 5] (4) deer
[ 8 6 34 89 35 748 38 32 1 9] (5) dog
[ 4 2 18 9 14 14 930 4 2 3] (6) frog
[ 6 2 23 18 31 55 17 833 0 15] (7) horse
[ 31 29 15 11 8 7 15 0 856 28] (8) ship
[ 13 67 4 5 0 4 7 7 18 875] (9) truck
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9)
卷積權重
下面展示了一些第一個卷積層的權重(或濾波)。共有3個輸入通道,因此有三組(資料),你可以改變input_channel
來改變繪製結果。
權重正值是紅的,負值是藍的。
plot_conv_weights(weights=weights_conv1, input_channel=0)複製程式碼
Min: -0.61643, Max: 0.63949
Mean: -0.00177, Stdev: 0.16469複製程式碼
下面展示了一些第二個卷積層的權重(或濾波)。它們比第一個卷積層的權重更接近零,你可以看到比較低的標準差。
plot_conv_weights(weights=weights_conv2, input_channel=1)複製程式碼
Min: -0.73326, Max: 0.25344
Mean: -0.00394, Stdev: 0.05466複製程式碼
卷積層的輸出
繪製影像的幫助函式。
def plot_image(image):
# Create figure with sub-plots.
fig, axes = plt.subplots(1, 2)
# References to the sub-plots.
ax0 = axes.flat[0]
ax1 = axes.flat[1]
# Show raw and smoothened images in sub-plots.
ax0.imshow(image, interpolation='nearest')
ax1.imshow(image, interpolation='spline16')
# Set labels.
ax0.set_xlabel('Raw')
ax1.set_xlabel('Smooth')
# Ensure the plot is shown correctly with multiple plots
# in a single Notebook cell.
plt.show()複製程式碼
繪製一張測試集中的影像。未處理的畫素影像作為神經網路的輸入。
img, cls = get_test_image(16)
plot_image(img)複製程式碼
將原始影像作為神經網路的輸入,然後畫出第一個卷積層的輸出。
plot_layer_output(output_conv1, image=img)複製程式碼
將同樣的影像作為輸入,畫出第二個卷積層的輸出。
plot_layer_output(output_conv2, image=img)複製程式碼
預測的類別標籤
獲取影像的預測類別標籤和類別號。
label_pred, cls_pred = session.run([y_pred, y_pred_cls],
feed_dict={x: [img]})複製程式碼
列印預測類別標籤。
# Set the rounding options for numpy.
np.set_printoptions(precision=3, suppress=True)
# Print the predicted label.
print(label_pred[0])複製程式碼
[ 0. 0. 0. 0.493 0. 0.49 0.006 0.01 0. 0. ]
預測類別標籤是長度為10的陣列,每個元素代表著神經網路有多大信心認為影像是該類別。
在這個例子中,索引3的值是0.493,5的值為0.490。這表示神經網路相信影像要麼是類別3,要麼是類別5,即貓或狗。
class_names[3]複製程式碼
'cat'
class_names[5]複製程式碼
'dog'
關閉TensorFlow會話
現在我們已經用TensorFlow完成了任務,關閉session,釋放資源。
# This has been commented out in case you want to modify and experiment
# with the Notebook without having to restart it.
# session.close()複製程式碼
結論
這篇教程介紹瞭如何建立一個在CIRAR-10資料集上進行影像分類的卷積神經網路。測試集上的分類準確率大概79-80%。
同時也畫出了卷積層的輸出,但很難看出神經網路如何分辨並分類影像。需要更好的視覺化技巧。
練習
下面使一些可能會讓你提升TensorFlow技能的一些建議練習。為了學習如何更合適地使用TensorFlow,實踐經驗是很重要的。
在你對這個Notebook進行改變之前,可能需要先備份一下。
- 執行10,000次迭代,看看分類準確率如何。將會儲存一個checkpoint來儲存TensorFlow圖的所有變數。
- 再執行100,000次迭代,看看分類準確率有沒有提升。然後再執行100,000次。準確率有提升嗎,你認為值得這些增加的計算時間嗎?
- 試著再預處理階段改變影像的變體。
- 試著改變神經網路的結構。你可以讓神經網路更大或更小。這對訓練時間或分類準確率有什麼影響?要注意的是,當你改變了神經網路結構時,就無法重新載入checkpoints了。
- 試著在第二個卷積層使用batch-normalization。也試試在倆個層中都刪掉它。
- 研究一些CIFAR-10上的更好的神經網路 ,試著實現它們。
- 向朋友解釋程式如何工作。