[譯] TensorFlow 教程 #05 - 整合學習

活魚眼發表於2017-07-27

題圖來自 Combining Classifiers
本篇主要介紹神經網路的整合（ensemble）。
其中有大段之前教程的文字及程式碼，如果看過的朋友可以快速翻到下文 建立神經網路的整合（ensemble） 部分。

01 - 簡單線性模型 | 02 - 卷積神經網路 | 03 - PrettyTensor | 04 - 儲存 & 恢復

by Magnus Erik Hvass Pedersen / GitHub / Videos on YouTube
中文翻譯 thrillerist / Github

如有轉載，請附上本文連結。

簡介

這篇教程介紹了卷積神經網路的整合（ensemble）。我們使用多個神經網路，然後取它們輸出的平均，而不是隻用一個。

最終也是在MINIST資料集上識別手寫數字。ensemble稍微地提升了測試集上的分類準確率，但差異很小，也可能是隨機出現的。此外，ensemble誤分類的一些影像在單獨網路上卻是正確分類的。

本文基於上一篇教程，你需要了解基本的TensorFlow和附加包Pretty Tensor。其中大量程式碼和文字與之前教程相似，如果你已經看過就可以快速地瀏覽本文。

流程圖

下面的圖表直接顯示了之後實現的卷積神經網路中資料的傳遞。網路有兩個卷積層和兩個全連線層，最後一層是用來給輸入影像分類的。關於網路和卷積的更多細節描述見教程 #02 。

本教程實現了5個這樣的神經網路的整合，每個網路的結構相同但權重以及其他變數不同。

from IPython.display import Image
Image('images/02_network_flowchart.png')複製程式碼

匯入

%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.metrics import confusion_matrix
import time
from datetime import timedelta
import math
import os

# Use PrettyTensor to simplify Neural Network construction.
import prettytensor as pt複製程式碼

使用Python3.5.2（Anaconda）開發，TensorFlow版本是：

tf.__version__複製程式碼

'0.12.0-rc0'

PrettyTensor 版本:

pt.__version__複製程式碼

'0.7.1'

載入資料

MNIST資料集大約12MB，如果沒在給定路徑中找到就會自動下載。

from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets('data/MNIST/', one_hot=True)複製程式碼

Extracting data/MNIST/train-images-idx3-ubyte.gz
Extracting data/MNIST/train-labels-idx1-ubyte.gz
Extracting data/MNIST/t10k-images-idx3-ubyte.gz
Extracting data/MNIST/t10k-labels-idx1-ubyte.gz

現在已經載入了MNIST資料集，它由70,000張影像和對應的標籤（比如影像的類別）組成。資料集分成三份互相獨立的子集，但後面我們會生成隨機的訓練集。

print("Size of:")
print("- Training-set:\t\t{}".format(len(data.train.labels)))
print("- Test-set:\t\t{}".format(len(data.test.labels)))
print("- Validation-set:\t{}".format(len(data.validation.labels)))複製程式碼

Size of:

Training-set: 55000

Test-set: 10000

Validation-set: 5000

類別數字

型別標籤使用One-Hot編碼，這意外每個標籤是長為10的向量，除了一個元素之外，其他的都為零。這個元素的索引就是類別的數字，即相應圖片中畫的數字。我們也需要測試集和驗證集的整形類別數字，在這裡計算。

data.test.cls = np.argmax(data.test.labels, axis=1)
data.validation.cls = np.argmax(data.validation.labels, axis=1)複製程式碼

建立隨機訓練集的幫助函式

我們將會在隨機選擇的訓練集上訓練5個不同的神經網路。首先，將原始訓練集和驗證集合併到大的一個陣列中。影像和標籤都要進行此操作。

combined_images = np.concatenate([data.train.images, data.validation.images], axis=0)
combined_labels = np.concatenate([data.train.labels, data.validation.labels], axis=0)複製程式碼

檢查合併後的陣列大小是否正確。

print(combined_images.shape)
print(combined_labels.shape)複製程式碼

(60000, 784)
(60000, 10)

合併資料集的大小。

combined_size = len(combined_images)
combined_size複製程式碼

60000

定義每個神經網路使用的訓練集的大小。你可以試著改變大小。

train_size = int(0.8 * combined_size)
train_size複製程式碼

48000

在訓練時並沒有使用驗證集，但它的大小如下。

validation_size = combined_size - train_size
validation_size複製程式碼

12000

幫助函式將合併陣列集劃分成隨機的訓練集和驗證集。

def random_training_set():
    # Create a randomized index into the full / combined training-set.
    idx = np.random.permutation(combined_size)

    # Split the random index into training- and validation-sets.
    idx_train = idx[0:train_size]
    idx_validation = idx[train_size:]

    # Select the images and labels for the new training-set.
    x_train = combined_images[idx_train, :]
    y_train = combined_labels[idx_train, :]

    # Select the images and labels for the new validation-set.
    x_validation = combined_images[idx_validation, :]
    y_validation = combined_labels[idx_validation, :]

    # Return the new training- and validation-sets.
    return x_train, y_train, x_validation, y_validation複製程式碼

資料維度

在下面的原始碼中，有很多地方用到了資料維度。它們只在一個地方定義，因此我們可以在程式碼中使用這些變數而不是直接寫數字。

# We know that MNIST images are 28 pixels in each dimension.
img_size = 28

# Images are stored in one-dimensional arrays of this length.
img_size_flat = img_size * img_size

# Tuple with height and width of images used to reshape arrays.
img_shape = (img_size, img_size)

# Number of colour channels for the images: 1 channel for gray-scale.
num_channels = 1

# Number of classes, one class for each of 10 digits.
num_classes = 10複製程式碼

用來繪製圖片的幫助函式

這個函式用來在3x3的柵格中畫9張影像，然後在每張影像下面寫出真實類別和預測類別。

def plot_images(images,                  # Images to plot, 2-d array.
                cls_true,                # True class-no for images.
                ensemble_cls_pred=None,  # Ensemble predicted class-no.
                best_cls_pred=None):     # Best-net predicted class-no.

    assert len(images) == len(cls_true)

    # Create figure with 3x3 sub-plots.
    fig, axes = plt.subplots(3, 3)

    # Adjust vertical spacing if we need to print ensemble and best-net.
    if ensemble_cls_pred is None:
        hspace = 0.3
    else:
        hspace = 1.0
    fig.subplots_adjust(hspace=hspace, wspace=0.3)

    # For each of the sub-plots.
    for i, ax in enumerate(axes.flat):

        # There may not be enough images for all sub-plots.
        if i < len(images):
            # Plot image.
            ax.imshow(images[i].reshape(img_shape), cmap='binary')

            # Show true and predicted classes.
            if ensemble_cls_pred is None:
                xlabel = "True: {0}".format(cls_true[i])
            else:
                msg = "True: {0}\nEnsemble: {1}\nBest Net: {2}"
                xlabel = msg.format(cls_true[i],
                                    ensemble_cls_pred[i],
                                    best_cls_pred[i])

            # Show the classes as the label on the x-axis.
            ax.set_xlabel(xlabel)

        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()複製程式碼

繪製幾張影像來看看資料是否正確

# Get the first images from the test-set.
images = data.test.images[0:9]

# Get the true classes for those images.
cls_true = data.test.cls[0:9]

# Plot the images and labels using our helper-function above.
plot_images(images=images, cls_true=cls_true)複製程式碼

TensorFlow圖

TensorFlow的全部目的就是使用一個稱之為計算圖（computational graph）的東西，它會比直接在Python中進行相同計算量要高效得多。TensorFlow比Numpy更高效，因為TensorFlow瞭解整個需要執行的計算圖，然而Numpy只知道某個時間點上唯一的數學運算。

TensorFlow也能夠自動地計算需要優化的變數的梯度，使得模型有更好的表現。這是由於圖是簡單數學表示式的結合，因此整個圖的梯度可以用鏈式法則推匯出來。

TensorFlow還能利用多核CPU和GPU，Google也為TensorFlow製造了稱為TPUs（Tensor Processing Units）的特殊晶片，它比GPU更快。

一個TensorFlow圖由下面幾個部分組成，後面會詳細描述：

佔位符變數（Placeholder）用來改變圖的輸入。
模型變數（Model）將會被優化，使得模型表現得更好。
模型本質上就是一些數學函式，它根據Placeholder和模型的輸入變數來計算一些輸出。
一個cost度量用來指導變數的優化。
一個優化策略會更新模型的變數。

另外，TensorFlow圖也包含了一些除錯狀態，比如用TensorBoard列印log資料，本教程不涉及這些。

佔位符（Placeholder）變數

Placeholder是作為圖的輸入，我們每次執行圖的時候都可能改變它們。將這個過程稱為feeding placeholder變數，後面將會描述這個。

首先我們為輸入影像定義placeholder變數。這讓我們可以改變輸入到TensorFlow圖中的影像。這也是一個張量（tensor），代表一個多維向量或矩陣。資料型別設定為float32，形狀設為[None, img_size_flat]，None代表tensor可能儲存著任意數量的影像，每張圖象是一個長度為img_size_flat的向量。

x = tf.placeholder(tf.float32, shape=[None, img_size_flat], name='x')複製程式碼

卷積層希望x被編碼為4維張量，因此我們需要將它的形狀轉換至[num_images, img_height, img_width, num_channels]。注意img_height == img_width == img_size，如果第一維的大小設為-1， num_images的大小也會被自動推匯出來。轉換運算如下：

x_image = tf.reshape(x, [-1, img_size, img_size, num_channels])複製程式碼

接下來我們為輸入變數x中的影像所對應的真實標籤定義placeholder變數。變數的形狀是[None, num_classes]，這代表著它儲存了任意數量的標籤，每個標籤是長度為num_classes的向量，本例中長度為10。

y_true = tf.placeholder(tf.float32, shape=[None, 10], name='y_true')複製程式碼

我們也可以為class-number提供一個placeholder，但這裡用argmax來計算它。這裡只是TensorFlow中的一些操作，沒有執行什麼運算。

y_true_cls = tf.argmax(y_true, dimension=1)複製程式碼

神經網路

這一節用PrettyTensor實現卷積神經網路，這要比直接在TensorFlow中實現來得簡單，詳見教程 #03。

基本思想就是用一個Pretty Tensor object封裝輸入張量x_image，它有一個新增新卷積層的幫助函式，以此來建立整個神經網路。Pretty Tensor負責變數分配等等。

x_pretty = pt.wrap(x_image)複製程式碼

現在我們已經將輸入影像裝到一個PrettyTensor的object中，再用幾行程式碼就可以新增摺積層和全連線層。

注意，在with程式碼塊中，pt.defaults_scope(activation_fn=tf.nn.relu) 把 activation_fn=tf.nn.relu當作每個的層引數，因此這些層都用到了 Rectified Linear Units (ReLU) 。defaults_scope使我們能更方便地修改所有層的引數。

with pt.defaults_scope(activation_fn=tf.nn.relu):
    y_pred, loss = x_pretty.\
        conv2d(kernel=5, depth=16, name='layer_conv1').\
        max_pool(kernel=2, stride=2).\
        conv2d(kernel=5, depth=36, name='layer_conv2').\
        max_pool(kernel=2, stride=2).\
        flatten().\
        fully_connected(size=128, name='layer_fc1').\
        softmax_classifier(num_classes=num_classes, labels=y_true)複製程式碼

優化方法

PrettyTensor給我們提供了預測型別標籤(y_pred)以及一個需要最小化的損失度量，用來提升神經網路分類圖片的能力。

PrettyTensor的文件並沒有說明它的損失度量是用cross-entropy還是其他的。但現在我們用AdamOptimizer來最小化損失。

優化過程並不是在這裡執行。實際上，還沒計算任何東西，我們只是往TensorFlow圖中新增了優化器，以便後續操作。

optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(loss)複製程式碼

效能度量

我們需要另外一些效能度量，來向使用者展示這個過程。

首先我們從神經網路輸出的y_pred中計算出預測的類別，它是一個包含10個元素的向量。類別數字是最大元素的索引。

y_pred_cls = tf.argmax(y_pred, dimension=1)複製程式碼

然後建立一個布林向量，用來告訴我們每張圖片的真實類別是否與預測類別相同。

correct_prediction = tf.equal(y_pred_cls, y_true_cls)複製程式碼

上面的計算先將布林值向量型別轉換成浮點型向量，這樣子False就變成0，True變成1，然後計算這些值的平均數，以此來計算分類的準確度。

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))複製程式碼

Saver

為了儲存神經網路的變數，我們建立一個稱為Saver-object的物件，它用來儲存及恢復TensorFlow圖的所有變數。在這裡並未儲存什麼東西，（儲存操作）在後面的optimize()函式中完成。

注意，如果在ensemble中有超過100個的神經網路，你需要根據情況來增加max_to_keep。

saver = tf.train.Saver(max_to_keep=100)複製程式碼

這是用來儲存或恢復資料的資料夾。

save_dir = 'checkpoints/'複製程式碼

如果資料夾不存在則建立。

if not os.path.exists(save_dir):
    os.makedirs(save_dir)複製程式碼

這個函式根據輸入的網路編號返回資料檔案的儲存路徑。

def get_save_path(net_number):
    return save_dir + 'network' + str(net_number)複製程式碼

執行TensorFlow

建立TensorFlow會話（session）

一旦建立了TensorFlow圖，我們需要建立一個TensorFlow會話，用來執行圖。

session = tf.Session()複製程式碼

初始化變數

變數weights和biases在優化之前需要先進行初始化。我們寫一個簡單的封裝函式，後面會再次呼叫。

def init_variables():
    session.run(tf.initialize_all_variables())複製程式碼

建立隨機訓練batch的幫助函式

在訓練集中有上千張圖。用這些影像計算模型的梯度會花很多時間。因此，它在優化器的每次迭代裡只用到了一小部分的影像。

如果記憶體耗盡導致電腦當機或變得很慢，你應該試著減少這些數量，但同時可能還需要更優化的迭代。

train_batch_size = 64複製程式碼

函式根據給定的大小挑選一個隨機的training-batch。

def random_batch(x_train, y_train):
    # Total number of images in the training-set.
    num_images = len(x_train)

    # Create a random index into the training-set.
    idx = np.random.choice(num_images,
                           size=train_batch_size,
                           replace=False)

    # Use the random index to select random images and labels.
    x_batch = x_train[idx, :]  # Images.
    y_batch = y_train[idx, :]  # Labels.

    # Return the batch.
    return x_batch, y_batch複製程式碼

執行優化迭代的幫助函式

函式用來執行一定數量的優化迭代，以此來逐漸改善網路層的變數。在每次迭代中，會從訓練集中選擇新的一批資料，然後TensorFlow在這些訓練樣本上執行優化。每100次迭代會列印出（資訊）。

def optimize(num_iterations, x_train, y_train):
    # Start-time used for printing time-usage below.
    start_time = time.time()

    for i in range(num_iterations):

        # Get a batch of training examples.
        # x_batch now holds a batch of images and
        # y_true_batch are the true labels for those images.
        x_batch, y_true_batch = random_batch(x_train, y_train)

        # Put the batch into a dict with the proper names
        # for placeholder variables in the TensorFlow graph.
        feed_dict_train = {x: x_batch,
                           y_true: y_true_batch}

        # Run the optimizer using this batch of training data.
        # TensorFlow assigns the variables in feed_dict_train
        # to the placeholder variables and then runs the optimizer.
        session.run(optimizer, feed_dict=feed_dict_train)

        # Print status every 100 iterations and after last iteration.
        if i % 100 == 0:

            # Calculate the accuracy on the training-batch.
            acc = session.run(accuracy, feed_dict=feed_dict_train)

            # Status-message for printing.
            msg = "Optimization Iteration: {0:>6}, Training Batch Accuracy: {1:>6.1%}"

            # Print it.
            print(msg.format(i + 1, acc))

    # Ending time.
    end_time = time.time()

    # Difference between start and end-times.
    time_dif = end_time - start_time

    # Print the time-usage.
    print("Time usage: " + str(timedelta(seconds=int(round(time_dif)))))複製程式碼

建立神經網路的整合（ensemble）

神經網路ensemble的數量

num_networks = 5複製程式碼

每個神經網路優化迭代的次數。

num_iterations = 10000複製程式碼

建立神經網路的ensemble。所有網路都使用上面定義的那個TensorFlow圖。每個網路的TensorFlow權重和變數都用隨機值初始化，然後進行優化。接著將變數儲存到磁碟中以便之後過載使用。

如果你只是想重新執行Notebook來對結果進行不同的分析，可以跳過這一步。

if True:
    # For each of the neural networks.
    for i in range(num_networks):
        print("Neural network: {0}".format(i))

        # Create a random training-set. Ignore the validation-set.
        x_train, y_train, _, _ = random_training_set()

        # Initialize the variables of the TensorFlow graph.
        session.run(tf.global_variables_initializer())

        # Optimize the variables using this training-set.
        optimize(num_iterations=num_iterations,
                 x_train=x_train,
                 y_train=y_train)

        # Save the optimized variables to disk.
        saver.save(sess=session, save_path=get_save_path(i))

        # Print newline.
        print()複製程式碼

Neural network: 0
Optimization Iteration: 1, Training Batch Accuracy: 6.2%
Optimization Iteration: 101, Training Batch Accuracy: 87.5%
...
Optimization Iteration: 9901, Training Batch Accuracy: 100.0%
Time usage: 0:00:40

Neural network: 1
Optimization Iteration: 1, Training Batch Accuracy: 7.8%
Optimization Iteration: 101, Training Batch Accuracy: 85.9%
...
Optimization Iteration: 9901, Training Batch Accuracy: 98.4%
Time usage: 0:00:40

Neural network: 2
Optimization Iteration: 1, Training Batch Accuracy: 3.1%
Optimization Iteration: 101, Training Batch Accuracy: 84.4%
...
Optimization Iteration: 9901, Training Batch Accuracy: 100.0%
Time usage: 0:00:39

Neural network: 3
Optimization Iteration: 1, Training Batch Accuracy: 9.4%
Optimization Iteration: 101, Training Batch Accuracy: 89.1%
...
Optimization Iteration: 9901, Training Batch Accuracy: 100.0%
Time usage: 0:00:39

Neural network: 4
Optimization Iteration: 1, Training Batch Accuracy: 9.4%
Optimization Iteration: 101, Training Batch Accuracy: 82.8%
...
Optimization Iteration: 9901, Training Batch Accuracy: 98.4%
Time usage: 0:00:39

計算並且預測分類的幫助函式

這個函式計算了影像的預測標籤，對每張影像來說，函式計算了一個長度為10的向量，向量顯示了影像的類別。

計算分批完成，否則將佔用太多記憶體。如果電腦當機了，你需要降低batch-size。

# Split the data-set in batches of this size to limit RAM usage.
batch_size = 256

def predict_labels(images):
    # Number of images.
    num_images = len(images)

    # Allocate an array for the predicted labels which
    # will be calculated in batches and filled into this array.
    pred_labels = np.zeros(shape=(num_images, num_classes),
                           dtype=np.float)

    # Now calculate the predicted labels for the batches.
    # We will just iterate through all the batches.
    # There might be a more clever and Pythonic way of doing this.

    # The starting index for the next batch is denoted i.
    i = 0

    while i < num_images:
        # The ending index for the next batch is denoted j.
        j = min(i + batch_size, num_images)

        # Create a feed-dict with the images between index i and j.
        feed_dict = {x: images[i:j, :]}

        # Calculate the predicted labels using TensorFlow.
        pred_labels[i:j] = session.run(y_pred, feed_dict=feed_dict)

        # Set the start-index for the next batch to the
        # end-index of the current batch.
        i = j

    return pred_labels複製程式碼

計算一個布林值向量，代表影像的預測型別是否正確。

def correct_prediction(images, labels, cls_true):
    # Calculate the predicted labels.
    pred_labels = predict_labels(images=images)

    # Calculate the predicted class-number for each image.
    cls_pred = np.argmax(pred_labels, axis=1)

    # Create a boolean array whether each image is correctly classified.
    correct = (cls_true == cls_pred)

    return correct複製程式碼

計算一個布林陣列，代表測試集中影像是否分類正確。

def test_correct():
    return correct_prediction(images = data.test.images,
                              labels = data.test.labels,
                              cls_true = data.test.cls)複製程式碼

計算一個布林陣列，代表驗證集中影像是否分類正確。

def validation_correct():
    return correct_prediction(images = data.validation.images,
                              labels = data.validation.labels,
                              cls_true = data.validation.cls)複製程式碼

計算分類準確率的幫助函式

這個函式計算了給定布林陣列的分類準確率，布林陣列表示每張影像是否被正確分類。比如， cls_accuracy([True, True, False, False, False]) = 2/5 = 0.4。

def classification_accuracy(correct):
    # When averaging a boolean array, False means 0 and True means 1.
    # So we are calculating: number of True / len(correct) which is
    # the same as the classification accuracy.
    return correct.mean()複製程式碼

計算測試集的分類準確率。

def test_accuracy():
    # Get the array of booleans whether the classifications are correct
    # for the test-set.
    correct = test_correct()

    # Calculate the classification accuracy and return it.
    return classification_accuracy(correct)複製程式碼

計算原始驗證集上的分類準確率。

def validation_accuracy():
    # Get the array of booleans whether the classifications are correct
    # for the validation-set.
    correct = validation_correct()

    # Calculate the classification accuracy and return it.
    return classification_accuracy(correct)複製程式碼

結果與分析

函式用來為ensemble中的所有神經網路計算預測標籤。後面會將這些標籤合併起來。

def ensemble_predictions():
    # Empty list of predicted labels for each of the neural networks.
    pred_labels = []

    # Classification accuracy on the test-set for each network.
    test_accuracies = []

    # Classification accuracy on the validation-set for each network.
    val_accuracies = []

    # For each neural network in the ensemble.
    for i in range(num_networks):
        # Reload the variables into the TensorFlow graph.
        saver.restore(sess=session, save_path=get_save_path(i))

        # Calculate the classification accuracy on the test-set.
        test_acc = test_accuracy()

        # Append the classification accuracy to the list.
        test_accuracies.append(test_acc)

        # Calculate the classification accuracy on the validation-set.
        val_acc = validation_accuracy()

        # Append the classification accuracy to the list.
        val_accuracies.append(val_acc)

        # Print status message.
        msg = "Network: {0}, Accuracy on Validation-Set: {1:.4f}, Test-Set: {2:.4f}"
        print(msg.format(i, val_acc, test_acc))

        # Calculate the predicted labels for the images in the test-set.
        # This is already calculated in test_accuracy() above but
        # it is re-calculated here to keep the code a bit simpler.
        pred = predict_labels(images=data.test.images)

        # Append the predicted labels to the list.
        pred_labels.append(pred)

    return np.array(pred_labels), \
           np.array(test_accuracies), \
           np.array(val_accuracies)複製程式碼

pred_labels, test_accuracies, val_accuracies = ensemble_predictions()複製程式碼

Network: 0, Accuracy on Validation-Set: 0.9948, Test-Set: 0.9893
Network: 1, Accuracy on Validation-Set: 0.9936, Test-Set: 0.9880
Network: 2, Accuracy on Validation-Set: 0.9958, Test-Set: 0.9893
Network: 3, Accuracy on Validation-Set: 0.9938, Test-Set: 0.9889
Network: 4, Accuracy on Validation-Set: 0.9938, Test-Set: 0.9892

總結ensemble中的神經網路在測試集上的分類準確率。

print("Mean test-set accuracy: {0:.4f}".format(np.mean(test_accuracies)))
print("Min test-set accuracy:  {0:.4f}".format(np.min(test_accuracies)))
print("Max test-set accuracy:  {0:.4f}".format(np.max(test_accuracies)))複製程式碼

Mean test-set accuracy: 0.9889
Min test-set accuracy: 0.9880
Max test-set accuracy: 0.9893

ensemble的預測標籤是3維的陣列，第一維是神經網路數量，第二維是影像數量，第三維是分類向量。

pred_labels.shape複製程式碼

(5, 10000, 10)

ensemble預測

有幾種不同的方法來計算ensemble的預測標籤。一種是計算每個神經網路的預測類別數字，然後選擇得票最多的那個類別。但根據分類的類別數量，這種方法需要大量的神經網路。

這裡用的方法是取ensemble中所有預測標籤的平均。這個計算很簡單，而且整合種不需要大量的神經網路。

ensemble_pred_labels = np.mean(pred_labels, axis=0)
ensemble_pred_labels.shape複製程式碼

(10000, 10)

取標籤中最大數字的索引作為ensemble的預測類別數字，這通常用argmax來計算。

ensemble_cls_pred = np.argmax(ensemble_pred_labels, axis=1)
ensemble_cls_pred.shape複製程式碼

(10000,)

布林陣列表示測試集中的影像是否被神經網路的ensemble正確分類。

ensemble_correct = (ensemble_cls_pred == data.test.cls)複製程式碼

對布林陣列取反，因此我們可以用它來查詢誤分類的影像。

ensemble_incorrect = np.logical_not(ensemble_correct)複製程式碼

最佳的神經網路

現在我們找出在測試集上表現最佳的單個神經網路。

首先列出ensemble中所有神經網路在測試集上的分類準確率。

test_accuracies複製程式碼

array([ 0.9893, 0.988 , 0.9893, 0.9889, 0.9892])

準確率最高的神經網路索引。

best_net = np.argmax(test_accuracies)
best_net複製程式碼

0

最佳神經網路在測試集上的分類準確率。

test_accuracies[best_net]複製程式碼

0.98929999999999996

最佳神經網路的預測標籤。

best_net_pred_labels = pred_labels[best_net, :, :]複製程式碼

預測的類別數字。

best_net_cls_pred = np.argmax(best_net_pred_labels, axis=1)複製程式碼

最佳神經網路在測試集上是否正確分類影像的布林陣列。

best_net_correct = (best_net_cls_pred == data.test.cls)複製程式碼

影像是否被誤分類的布林陣列。

best_net_incorrect = np.logical_not(best_net_correct)複製程式碼

ensemble與最佳網路的比較

測試集中被ensemble正確分類的影像數量。

np.sum(ensemble_correct)複製程式碼

9916

測試集中被最佳網路正確分類的影像數量。

np.sum(best_net_correct)複製程式碼

9893

布林陣列表示測試集中每張影像是否“被ensemble正確分類且被最佳網路誤分類”。

ensemble_better = np.logical_and(best_net_incorrect,
                                 ensemble_correct)複製程式碼

測試集上ensemble比最佳網路表現更好的影像數量：

ensemble_better.sum()複製程式碼

39

布林陣列表示測試集中每張影像是否“被最佳網路正確分類且被ensemble誤分類”。

best_net_better = np.logical_and(best_net_correct,
                                 ensemble_incorrect)複製程式碼

測試集上最佳網路比ensemble表現更好的影像數量：

best_net_better.sum()複製程式碼

16

繪製以及列印對比的幫助函式

函式用來繪製測試集中的影像，以及它們的真實類別與預測類別。

def plot_images_comparison(idx):
    plot_images(images=data.test.images[idx, :],
                cls_true=data.test.cls[idx],
                ensemble_cls_pred=ensemble_cls_pred[idx],
                best_cls_pred=best_net_cls_pred[idx])複製程式碼

列印預測標籤的函式。

def print_labels(labels, idx, num=1):
    # Select the relevant labels based on idx.
    labels = labels[idx, :]

    # Select the first num labels.
    labels = labels[0:num, :]

    # Round numbers to 2 decimal points so they are easier to read.
    labels_rounded = np.round(labels, 2)

    # Print the rounded labels.
    print(labels_rounded)複製程式碼

列印神經網路ensemble預測標籤的函式。

def print_labels_ensemble(idx, **kwargs):
    print_labels(labels=ensemble_pred_labels, idx=idx, **kwargs)複製程式碼

列印單個網路預測標籤的函式。

def print_labels_best_net(idx, **kwargs):
    print_labels(labels=best_net_pred_labels, idx=idx, **kwargs)複製程式碼

列印ensemble中所有神經網路預測標籤的函式。只列印第一張影像的標籤。

def print_labels_all_nets(idx):
    for i in range(num_networks):
        print_labels(labels=pred_labels[i, :, :], idx=idx, num=1)複製程式碼

樣本：ensemble比最佳網路好

繪製出那些被整合網路正確分類，且被最佳網路誤分類的樣本。

plot_images_comparison(idx=ensemble_better)複製程式碼

ensemble對第一張影像（左上）的預測標籤：

print_labels_ensemble(idx=ensemble_better, num=1)複製程式碼

[[ 0. 0. 0. 0.76 0. 0. 0. 0. 0.23 0. ]]

最佳網路對第一張影像的預測標籤：

print_labels_best_net(idx=ensemble_better, num=1)複製程式碼

[[ 0. 0. 0. 0.21 0. 0. 0. 0. 0.79 0. ]]

ensemble中所有網路對第一張影像的預測標籤：

print_labels_all_nets(idx=ensemble_better)複製程式碼

[[ 0. 0. 0. 0.21 0. 0. 0. 0. 0.79 0. ]]
[[ 0. 0. 0. 0.96 0. 0.01 0. 0. 0.03 0. ]]
[[ 0. 0. 0. 0.99 0. 0. 0. 0. 0.01 0. ]]
[[ 0. 0. 0. 0.88 0. 0. 0. 0. 0.12 0. ]]
[[ 0. 0. 0. 0.76 0. 0.01 0. 0. 0.22 0. ]]

樣本：最佳網路比ensemble好

現在繪製那些被ensemble誤分類，但被最佳網路正確分類的樣本。

plot_images_comparison(idx=best_net_better)複製程式碼

ensemble對第一張影像（左上）的預測標籤：

print_labels_ensemble(idx=best_net_better, num=1)複製程式碼

[[ 0.5 0. 0. 0. 0. 0.05 0.45 0. 0. 0. ]]

最佳網路對第一張影像的預測標籤：

print_labels_best_net(idx=best_net_better, num=1)複製程式碼

[[ 0.3 0. 0. 0. 0. 0.15 0.56 0. 0. 0. ]]

ensemble中所有網路對第一張影像的預測標籤：

print_labels_all_nets(idx=best_net_better)複製程式碼

[[ 0.3 0. 0. 0. 0. 0.15 0.56 0. 0. 0. ]]
[[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
[[ 0.19 0. 0. 0. 0. 0. 0.81 0. 0. 0. ]]
[[ 0.15 0. 0. 0. 0. 0.12 0.72 0. 0. 0. ]]
[[ 0.85 0. 0. 0. 0. 0. 0.14 0. 0. 0. ]]

關閉TensorFlow會話

現在我們已經用TensorFlow完成了任務，關閉session，釋放資源。

# This has been commented out in case you want to modify and experiment
# with the Notebook without having to restart it.
# session.close()複製程式碼

總結

這篇教程建立了5個神經網路的整合（ensemble），用來識別MINIST資料集中的手寫數字。ensemble取5個單獨神經網路的平均值。最終稍微提高了在測試集上的分類準確率，相比單個最佳網路98.9%的準確率，ensemble是99.1%。

然而，ensemble的表現並不是一直都比單個網路好，有些單個網路正確分類的影像卻被ensemble誤分類。這表明神經網路ensemble的作用有點隨機，可能無法提供一個提升效能的可靠方式（和單獨神經網路效能相比）。

這裡使用的整合學習的形式叫bagging (或 Bootstrap Aggregating)，它常用來避免過擬合，但對（本文中的）這個特定的神經網路和資料集來說不是必要的。在其他情況下整合學習可能仍然有效。

技術說明

本文在實現整合學習時用了TensorFlow中Saver()物件來儲存和恢復神經網路中的變數。但這個功能其實是為其他目的設計的，使用在有多種型別神經網路的整合學習中，或者想同時載入多個神經網路時就有點笨拙了。有一個叫 sk-flow 的TensorFlow新增包有更簡單的方法，但到2016年八月為止，它仍然處於開發的前期階段。

練習

下面是一些可能會讓你提升TensorFlow技能的一些建議練習。為了學習如何更合適地使用TensorFlow，實踐經驗是很重要的。

在你對這個Notebook進行修改之前，可能需要先備份一下。

改變程式的幾個不同地方，看看它如何影響效能：
- 在整合中使用更多神經網路。
- 改變訓練集的大小。
- 改變優化迭代的次數，試著增加或減少。
向朋友解釋程式如何工作。
你認為整合學習值得更多的研究嗎，或者寧可專注於提升單個神經網路的效能？

[譯] TensorFlow 教程 #08 – 遷移學習
2019-03-04
遷移學習
05整合學習-Boosting-GBDT初探
2018-11-21
[譯] TensorFlow 教程 #14 – DeepDream
2019-03-04
ML.NET 示例：深度學習之整合TensorFlow
2018-12-16
深度學習
[譯] TensorFlow 教程 – 07 Inception 模型
2019-02-28
模型
[譯] TensorFlow 教程 #06 – CIFAR-10
2019-03-01
Tensorflow 學習
2018-07-18
[譯] 通過整合學習提高機器學習結果
2019-02-27
機器學習
[譯] TensorFlow 教程 #13 – 視覺化分析
2019-02-25
視覺化
[譯] TensorFlow 教程 #15 – 風格遷移
2019-03-04
TensorFlow學習（十三）：構造LSTM超長簡明教程
2018-10-18
Max/MSP/Jitter 官方教程翻譯05 - 矩陣的數學運算
2021-09-09
矩陣
TensorFlow 學習筆記
2024-10-11
筆記
tensorflow語法學習
2020-04-06
整合學習（一）：簡述整合學習
2022-03-20
Tensorflow快餐教程(11) – 不懂機器學習就只調API行不行？
2019-02-16
機器學習API
Tensorflow快餐教程(11)-不懂機器學習就只調API行不行？
2018-05-18
機器學習API
【學習圖片】05：GIF
2023-02-23
[譯] 系列教程：如何將程式碼遷移至 TensorFlow 1.0
2019-01-28
整合學習
2024-05-13
《深度學習之TensorFlow》pdf
2019-12-17
深度學習
tensorflow學習之 Eager execution
2019-12-09
tensorflow學習筆記——DenseNet
2020-12-12
筆記SENet
TensorFlow學習筆記（二）
2019-04-11
筆記
tensorflow學習筆記3
2018-11-26
筆記
深度學習之Tensorflow框架
2019-02-20
深度學習框架
機器學習-整合學習
2019-05-12
機器學習
[譯] 基於 TensorFlow.js 的無服務架構機器學習
2019-02-27
JS架構機器學習
[譯] 哪一個深度學習框架增長最迅猛？TensorFlow 還是 PyTorch？
2019-04-11
深度學習框架PyTorch
學習Java的Day05
2020-06-29
Java
Netty、MINA、Twisted一起學系列05：整合protobuf
2019-01-22
Netty
Tensorflow學習筆記No.7
2020-10-17
筆記
Tensorflow學習筆記No.8
2020-10-24
筆記
Tensorflow學習筆記No.10
2020-12-11
筆記
Tensorflow學習筆記No.11
2020-12-12
筆記
TensorFlow Java API 學習筆記
2019-03-03
JavaAPI筆記
TensorFlow學習資源彙總
2019-03-30
TensorFlow學習指南四、分散式
2018-10-04
分散式
Tensorflow 深度學習簡介（自用）
2018-09-20
深度學習

[譯] TensorFlow 教程 #05 - 整合學習

簡介

流程圖

匯入

載入資料

類別數字

建立隨機訓練集的幫助函式

資料維度

用來繪製圖片的幫助函式

繪製幾張影像來看看資料是否正確

TensorFlow圖

佔位符 （Placeholder）變數

神經網路

優化方法

效能度量

Saver

執行TensorFlow

建立TensorFlow會話（session）

初始化變數

建立隨機訓練batch的幫助函式

執行優化迭代的幫助函式

建立神經網路的整合（ensemble）

計算並且預測分類的幫助函式

計算分類準確率的幫助函式

結果與分析

ensemble預測

最佳的神經網路

ensemble與最佳網路的比較

繪製以及列印對比的幫助函式

樣本：ensemble比最佳網路好

樣本：最佳網路比ensemble好

關閉TensorFlow會話

總結

技術說明

練習

相關文章

佔位符（Placeholder）變數