[譯] TensorFlow 教程 #11 - 對抗樣本

活魚眼發表於2017-08-29

注:作者未提供教程#10,以後若有更新將補充。
本文主要演示了如何給影象增加“對抗噪聲”,以此欺騙模型,使其誤分類。

01 - 簡單線性模型 | 02 - 卷積神經網路 | 03 - PrettyTensor | 04 - 儲存& 恢復
05 - 整合學習 | 06 - CIFAR 10 | 07 - Inception 模型 | 08 - 遷移學習
09 - 視訊資料

by Magnus Erik Hvass Pedersen / GitHub / Videos on YouTube
中文翻譯 thrillerist / Github

如有轉載,請附上本文連結。


介紹

之前的教程中,我們用幾種不用的深度神經網路來分類影象,取得不同程度的成功。在這篇教程裡,我們將會看到一個尋找對抗樣本的簡單方法,它會使一個最先進的神經網路誤分類任何輸入影象,不管選的是什麼類別。這通過簡單地向輸入影象新增小部分“特定”噪聲完成。人類不會覺察到這些變化,但它卻能戲弄神經網路。

本文基於之前的教程。你需要大概地熟悉神經網路(教程#01和#02),瞭解Inception模型(教程#07)也很有幫助。

流程圖

我們使用教程#07中的Inception模型,然後修改/黑掉TensorFlow圖,來尋找引起Inception模型誤分類輸入影象的對抗樣本。

在下面的流程圖中,我們在《查理和巧克力工廠》影象上新增了一些噪聲,然後作為Inception模型的輸入。最終目標是找到使Inception模型將影象誤分類成我們目標型別的噪聲,這邊選擇書櫃型別(分類號300)。

我們也為圖新增一個新的損失函式,來計算cross-entropy,它是Inception模型分類噪聲影象的效能度量。

由於Inception模型是由很多相結合的基本數學運算構造的,使用微分鏈式法則,TensorFlow讓我們很快就能找到損失函式的梯度。

我們使用損失函式關於輸入影象的梯度,來尋找對抗噪聲。要尋找的是那些可以增加'書櫃'類別而不是輸入影象原始類別的評分(即概率)的噪聲。

這本質上是用梯度下降法來執行優化的,後面會實現它。

from IPython.display import Image, display
Image('images/11_adversarial_examples_flowchart.png')複製程式碼

匯入

%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import os

# Functions and classes for loading and using the Inception model.
import inception複製程式碼

使用Python3.5.2(Anaconda)開發,TensorFlow版本是:

tf.__version__複製程式碼

'0.11.0rc0'

Inception 模型

從網上下載Inception模型。

從網上下載Inception模型。這是你儲存資料檔案的預設資料夾。如果資料夾不存在就自動建立。

# inception.data_dir = 'inception/'複製程式碼

如果資料夾中不存在Inception模型,就自動下載。 它有85MB。

inception.maybe_download()複製程式碼

Downloading Inception v3 Model ...
Data has apparently already been downloaded and unpacked.

載入Inception模型

載入模型,為影象分類做準備。

注意warning資訊,以後可能會導致程式執行失敗。

model = inception.Inception()複製程式碼

獲取Inception模型的輸入和輸出

取得Inception模型輸入張量的引用。這個張量是用來儲存調整大小後的影象,即299 x 299畫素並帶有3個顏色通道。我們會在調整大小後的影象上新增噪聲,然後還是用這個張量將結果傳到圖(graph)中,因此需要確保調整大小的演算法沒有引入噪聲。

resized_image = model.resized_image複製程式碼

獲取Inception模型softmax分類器輸出的引用。

y_pred = model.y_pred複製程式碼

獲取Inception模型softmax分類器未經尺度變化的(unscaled)輸出的引用。這通常稱為“logits”。由於我們會在graph上新增一個新的損失函式,其中用到這些未經變化的輸出,因此logits是必要的。

y_logits = model.y_logits複製程式碼

黑掉Inception模型

為了找到對抗樣本,需要為Inception模型的圖新增一個新的損失函式。我們還需要這個損失函式關於輸入影象的梯度。

# Set the graph for the Inception model as the default graph,
# so that all changes inside this with-block are done to that graph.
with model.graph.as_default():
    # Add a placeholder variable for the target class-number.
    # This will be set to e.g. 300 for the 'bookcase' class.
    pl_cls_target = tf.placeholder(dtype=tf.int32)

    # Add a new loss-function. This is the cross-entropy.
    # See Tutorial #01 for an explanation of cross-entropy.
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y_logits, labels=[pl_cls_target])

    # Get the gradient for the loss-function with regard to
    # the resized input image.
    gradient = tf.gradients(loss, resized_image)複製程式碼

TensorFlow 會話

我們需要一個TensorFlow會話來執行圖。

session = tf.Session(graph=model.graph)複製程式碼

幫助函式用來尋找對抗噪聲

下面的函式找出了要新增到輸入影象上的噪聲,這樣(輸入影象)就會被分類到想要的目標型別。

這個函式本質上是用梯度下降來執行優化。噪聲被初始化為零,然後用損失函式關於輸入噪聲影象的梯度來逐步優化,這樣,每次迭代噪聲都使分類更接近於想要的目標型別。當分類評分達到要求(比如99%)或者執行了最大迭代次數時,就停止優化。

def find_adversary_noise(image_path, cls_target, noise_limit=3.0,
                         required_score=0.99, max_iterations=100):
    """
    Find the noise that must be added to the given image so
    that it is classified as the target-class.

    image_path: File-path to the input-image (must be *.jpg).
    cls_target: Target class-number (integer between 1-1000).
    noise_limit: Limit for pixel-values in the noise.
    required_score: Stop when target-class score reaches this.
    max_iterations: Max number of optimization iterations to perform.
    """

    # Create a feed-dict with the image.
    feed_dict = model._create_feed_dict(image_path=image_path)

    # Use TensorFlow to calculate the predicted class-scores
    # (aka. probabilities) as well as the resized image.
    pred, image = session.run([y_pred, resized_image],
                              feed_dict=feed_dict)

    # Convert to one-dimensional array.
    pred = np.squeeze(pred)

    # Predicted class-number.
    cls_source = np.argmax(pred)

    # Score for the predicted class (aka. probability or confidence).
    score_source_org = pred.max()

    # Names for the source and target classes.
    name_source = model.name_lookup.cls_to_name(cls_source,
                                                only_first_name=True)
    name_target = model.name_lookup.cls_to_name(cls_target,
                                                only_first_name=True)

    # Initialize the noise to zero.
    noise = 0

    # Perform a number of optimization iterations to find
    # the noise that causes mis-classification of the input image.
    for i in range(max_iterations):
        print("Iteration:", i)

        # The noisy image is just the sum of the input image and noise.
        noisy_image = image + noise

        # Ensure the pixel-values of the noisy image are between
        # 0 and 255 like a real image. If we allowed pixel-values
        # outside this range then maybe the mis-classification would
        # be due to this 'illegal' input breaking the Inception model.
        noisy_image = np.clip(a=noisy_image, a_min=0.0, a_max=255.0)

        # Create a feed-dict. This feeds the noisy image to the
        # tensor in the graph that holds the resized image, because
        # this is the final stage for inputting raw image data.
        # This also feeds the target class-number that we desire.
        feed_dict = {model.tensor_name_resized_image: noisy_image,
                     pl_cls_target: cls_target}

        # Calculate the predicted class-scores as well as the gradient.
        pred, grad = session.run([y_pred, gradient],
                                 feed_dict=feed_dict)

        # Convert the predicted class-scores to a one-dim array.
        pred = np.squeeze(pred)

        # The scores (probabilities) for the source and target classes.
        score_source = pred[cls_source]
        score_target = pred[cls_target]

        # Squeeze the dimensionality for the gradient-array.
        grad = np.array(grad).squeeze()

        # The gradient now tells us how much we need to change the
        # noisy input image in order to move the predicted class
        # closer to the desired target-class.

        # Calculate the max of the absolute gradient values.
        # This is used to calculate the step-size.
        grad_absmax = np.abs(grad).max()

        # If the gradient is very small then use a lower limit,
        # because we will use it as a divisor.
        if grad_absmax < 1e-10:
            grad_absmax = 1e-10

        # Calculate the step-size for updating the image-noise.
        # This ensures that at least one pixel colour is changed by 7.
        # Recall that pixel colours can have 255 different values.
        # This step-size was found to give fast convergence.
        step_size = 7 / grad_absmax

        # Print the score etc. for the source-class.
        msg = "Source score: {0:>7.2%}, class-number: {1:>4}, class-name: {2}"
        print(msg.format(score_source, cls_source, name_source))

        # Print the score etc. for the target-class.
        msg = "Target score: {0:>7.2%}, class-number: {1:>4}, class-name: {2}"
        print(msg.format(score_target, cls_target, name_target))

        # Print statistics for the gradient.
        msg = "Gradient min: {0:>9.6f}, max: {1:>9.6f}, stepsize: {2:>9.2f}"
        print(msg.format(grad.min(), grad.max(), step_size))

        # Newline.
        print()

        # If the score for the target-class is not high enough.
        if score_target < required_score:
            # Update the image-noise by subtracting the gradient
            # scaled by the step-size.
            noise -= step_size * grad

            # Ensure the noise is within the desired range.
            # This avoids distorting the image too much.
            noise = np.clip(a=noise,
                            a_min=-noise_limit,
                            a_max=noise_limit)
        else:
            # Abort the optimization because the score is high enough.
            break

    return image.squeeze(), noisy_image.squeeze(), noise, \
           name_source, name_target, \
           score_source, score_source_org, score_target複製程式碼

繪製影象和噪聲的幫助函式

函式對輸入做歸一化,則輸入值在0.0到1.0之間,這樣才能正確的顯示出噪聲。

def normalize_image(x):
    # Get the min and max values for all pixels in the input.
    x_min = x.min()
    x_max = x.max()

    # Normalize so all values are between 0.0 and 1.0
    x_norm = (x - x_min) / (x_max - x_min)

    return x_norm複製程式碼

這個函式繪製了原始影象、噪聲影象,以及噪聲。它也顯示了類別名和評分。

def plot_images(image, noise, noisy_image,
                name_source, name_target,
                score_source, score_source_org, score_target):
    """
    Plot the image, the noisy image and the noise.
    Also shows the class-names and scores.

    Note that the noise is amplified to use the full range of
    colours, otherwise if the noise is very low it would be
    hard to see.

    image: Original input image.
    noise: Noise that has been added to the image.
    noisy_image: Input image + noise.
    name_source: Name of the source-class.
    name_target: Name of the target-class.
    score_source: Score for the source-class.
    score_source_org: Original score for the source-class.
    score_target: Score for the target-class.
    """

    # Create figure with sub-plots.
    fig, axes = plt.subplots(1, 3, figsize=(10,10))

    # Adjust vertical spacing.
    fig.subplots_adjust(hspace=0.1, wspace=0.1)

    # Use interpolation to smooth pixels?
    smooth = True

    # Interpolation type.
    if smooth:
        interpolation = 'spline16'
    else:
        interpolation = 'nearest'

    # Plot the original image.
    # Note that the pixel-values are normalized to the [0.0, 1.0]
    # range by dividing with 255.
    ax = axes.flat[0]
    ax.imshow(image / 255.0, interpolation=interpolation)
    msg = "Original Image:\n{0} ({1:.2%})"
    xlabel = msg.format(name_source, score_source_org)
    ax.set_xlabel(xlabel)

    # Plot the noisy image.
    ax = axes.flat[1]
    ax.imshow(noisy_image / 255.0, interpolation=interpolation)
    msg = "Image + Noise:\n{0} ({1:.2%})\n{2} ({3:.2%})"
    xlabel = msg.format(name_source, score_source, name_target, score_target)
    ax.set_xlabel(xlabel)

    # Plot the noise.
    # The colours are amplified otherwise they would be hard to see.
    ax = axes.flat[2]
    ax.imshow(normalize_image(noise), interpolation=interpolation)
    xlabel = "Amplified Noise"
    ax.set_xlabel(xlabel)

    # Remove ticks from all the plots.
    for ax in axes.flat:
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()複製程式碼

尋找並繪製對抗樣本的幫助函式

這個函式結合了上面的兩個方法。它先找到對抗噪聲,然後畫出影象和噪聲。

def adversary_example(image_path, cls_target,
                      noise_limit, required_score):
    """
    Find and plot adversarial noise for the given image.

    image_path: File-path to the input-image (must be *.jpg).
    cls_target: Target class-number (integer between 1-1000).
    noise_limit: Limit for pixel-values in the noise.
    required_score: Stop when target-class score reaches this.
    """

    # Find the adversarial noise.
    image, noisy_image, noise, \
    name_source, name_target, \
    score_source, score_source_org, score_target = \
        find_adversary_noise(image_path=image_path,
                             cls_target=cls_target,
                             noise_limit=noise_limit,
                             required_score=required_score)

    # Plot the image and the noise.
    plot_images(image=image, noise=noise, noisy_image=noisy_image,
                name_source=name_source, name_target=name_target,
                score_source=score_source,
                score_source_org=score_source_org,
                score_target=score_target)

    # Print some statistics for the noise.
    msg = "Noise min: {0:.3f}, max: {1:.3f}, mean: {2:.3f}, std: {3:.3f}"
    print(msg.format(noise.min(), noise.max(),
                     noise.mean(), noise.std()))複製程式碼

結果

鸚鵡

這個例子將一張鸚鵡圖作為輸入,然後找到對抗噪聲,使得Inception模型將影象誤分類成一個書架(類別號300)。

噪聲界限設為3.0,這表示只允許每個畫素顏色在3.0範圍內波動。畫素顏色在0到255之間,因此3.0的浮動對應大約1.2%的可能範圍。這樣的少量噪聲對人眼是不可見的,因此噪聲影象和原始影象看起來基本一致,如下所示。

要求評分設為0.99,這表示當目標分類的評分大於等於0.99時,用來尋找對抗噪聲的優化器就會停止,這樣Inception模型幾乎確定了噪聲影象展示的是期望的目標類別。

image_path = "images/parrot_cropped1.jpg"

adversary_example(image_path=image_path,
                  cls_target=300,
                  noise_limit=3.0,
                  required_score=0.99)複製程式碼
Iteration: 0
Source score:  97.38%, class-number:  409, class-name: macaw
Target score:   0.00%, class-number:  300, class-name: bookcase
Gradient min: -0.001329, max:  0.001370, stepsize:   5110.94複製程式碼

Iteration: 1
Source score: 88.87%, class-number: 409, class-name: macaw
Target score: 0.01%, class-number: 300, class-name: bookcase
Gradient min: -0.001499, max: 0.001401, stepsize: 4668.28

Iteration: 2
Source score: 68.47%, class-number: 409, class-name: macaw
Target score: 0.06%, class-number: 300, class-name: bookcase
Gradient min: -0.003093, max: 0.002587, stepsize: 2262.91

Iteration: 3
Source score: 16.76%, class-number: 409, class-name: macaw
Target score: 0.22%, class-number: 300, class-name: bookcase
Gradient min: -0.001077, max: 0.001047, stepsize: 6499.39

...
Iteration: 23
Source score: 0.01%, class-number: 409, class-name: macaw
Target score: 95.90%, class-number: 300, class-name: bookcase
Gradient min: -0.000111, max: 0.000142, stepsize: 49346.70

Iteration: 24
Source score: 0.00%, class-number: 409, class-name: macaw
Target score: 98.98%, class-number: 300, class-name: bookcase
Gradient min: -0.000029, max: 0.000025, stepsize: 245266.90

Iteration: 25
Source score: 0.00%, class-number: 409, class-name: macaw
Target score: 99.12%, class-number: 300, class-name: bookcase
Gradient min: -0.000019, max: 0.000022, stepsize: 311258.06

Noise min: -3.000, max: 3.000, mean: 0.001, std: 1.492

如上所示,鸚鵡的原始影象與噪聲影象看起來幾乎一致。人眼無法區分開兩張影象。原始圖被Inception模型正確地分類成金剛鸚鵡(鸚鵡),評分為97.38%。但噪聲影象對金剛鸚鵡的分類評分是0.00%,對書架的評分是99.12%。

這樣,我們糊弄了Inception模型,讓它相信一張鸚鵡影象展示的是一個書架。只是新增了一些“特定的”噪聲就導致了這個誤分類。

注意,上面展示的噪聲是被放大數倍的。實際上,噪聲只在輸入影象每個畫素顏色強度的最多1.2%範圍內調整影象(假定噪聲界限像上面的函式一樣設定為3.0)。由於噪聲很弱,人類觀察不到,但它導致Inception模型完全誤分類的輸入影象。

Elon Musk

我們也找到了Elon Mask影象的對抗噪聲。目標類別再次設為“書櫃”(類別號300),噪聲界限和要求分數也與上面的相同。

image_path = "images/elon_musk.jpg"

adversary_example(image_path=image_path,
                  cls_target=300,
                  noise_limit=3.0,
                  required_score=0.99)複製程式碼

Iteration: 0
Source score: 19.73%, class-number: 837, class-name: sweatshirt
Target score: 0.01%, class-number: 300, class-name: bookcase
Gradient min: -0.008348, max: 0.005946, stepsize: 838.48

Iteration: 1
Source score: 1.77%, class-number: 837, class-name: sweatshirt
Target score: 0.24%, class-number: 300, class-name: bookcase
Gradient min: -0.002952, max: 0.005907, stepsize: 1185.13

Iteration: 2
Source score: 0.52%, class-number: 837, class-name: sweatshirt
Target score: 10.06%, class-number: 300, class-name: bookcase
Gradient min: -0.006741, max: 0.006555, stepsize: 1038.46

...
Iteration: 21
Source score: 0.03%, class-number: 535, class-name: sunglasses
Target score: 98.31%, class-number: 300, class-name: bookcase
Gradient min: -0.000033, max: 0.000026, stepsize: 213124.72

Iteration: 22
Source score: 0.03%, class-number: 535, class-name: sunglasses
Target score: 98.80%, class-number: 300, class-name: bookcase
Gradient min: -0.000023, max: 0.000027, stepsize: 260036.19

Iteration: 23
Source score: 0.03%, class-number: 535, class-name: sunglasses
Target score: 99.03%, class-number: 300, class-name: bookcase
Gradient min: -0.000022, max: 0.000024, stepsize: 294094.62


Noise min: -3.000, max: 3.000, mean: 0.010, std: 1.534

在上面的《查理和巧克力工廠》影象中(新版電影),原先Inception模型將影象分類成“太陽鏡”(評分31.48%)。但再一次,我們能夠生成讓模型將影象分類成“書架”的對抗噪聲(評分99.03%)。

兩張影象看起來一樣。但你可以傾斜電腦螢幕,看到白色區域一些輕微變化的噪聲圖樣。

查理和巧克力工廠 (舊版)

image_path = "images/willy_wonka_old.jpg"

adversary_example(image_path=image_path,
                  cls_target=300,
                  noise_limit=3.0,
                  required_score=0.99)複製程式碼

Iteration: 0
Source score: 97.22%, class-number: 817, class-name: bow tie
Target score: 0.00%, class-number: 300, class-name: bookcase
Gradient min: -0.002479, max: 0.003469, stepsize: 2017.94

Iteration: 1
Source score: 10.65%, class-number: 817, class-name: bow tie
Target score: 0.08%, class-number: 300, class-name: bookcase
Gradient min: -0.000859, max: 0.001458, stepsize: 4799.50

Iteration: 2
Source score: 2.21%, class-number: 817, class-name: bow tie
Target score: 0.25%, class-number: 300, class-name: bookcase
Gradient min: -0.000415, max: 0.000617, stepsize: 11350.70

...
Iteration: 13
Source score: 0.00%, class-number: 817, class-name: bow tie
Target score: 98.09%, class-number: 300, class-name: bookcase
Gradient min: -0.000037, max: 0.000041, stepsize: 168840.03

Iteration: 14
Source score: 0.07%, class-number: 817, class-name: bow tie
Target score: 95.18%, class-number: 300, class-name: bookcase
Gradient min: -0.000212, max: 0.000168, stepsize: 32997.19

Iteration: 15
Source score: 0.00%, class-number: 817, class-name: bow tie
Target score: 99.72%, class-number: 300, class-name: bookcase
Gradient min: -0.000004, max: 0.000004, stepsize: 1590352.60

Noise min: -3.000, max: 3.000, mean: -0.000, std: 1.309

《查理和巧克力工廠》影象(舊版電影)原先被Inception模型分類成“蝴蝶領結”。同樣,加了噪聲之後,它被分類成“書架”(評分99.72%)。

關閉TensorFlow會話

現在我們已經用TensorFlow完成了任務,關閉session,釋放資源。注意,我們需要關閉兩個TensorFlow-session,每個模型物件各有一個。

# This has been commented out in case you want to modify and experiment
# with the Notebook without having to restart it.
# session.close()
# model.close()複製程式碼

總結

我們演示瞭如何尋找導致Inception模型誤分類影象的對抗樣本。通過一個簡單的流程,我們發現將噪聲新增到輸入影象上會使模型錯誤地分類影象,即使每個畫素只做了輕微的改變,而且人眼無法察覺這些變化。

更進一步,優化後的噪聲可以給出一個接近100%的評分(概率或確信度)。因此,輸入影象不僅被誤分類了,神經網路還很確信自己正確地分類了影象。

這是神經網路的一個普遍的問題,並且是一個很嚴肅的問題!我們無法在關鍵應用中相信神經網路,直到能夠理解為什麼會發生上述問題或如何解決它。想象一下自動駕駛汽車由於其神經網路誤分類了輸入影象而忽視停止標誌或穿過馬路的行人。

對這個問題的研究正在進行中,鼓勵你在網上搜尋一下這個課題的最新論文。也許你可以找到問題的解決方案?

練習

下面使一些可能會讓你提升TensorFlow技能的一些建議練習。為了學習如何更合適地使用TensorFlow,實踐經驗是很重要的。

在你對這個Notebook進行修改之前,可能需要先備份一下。

  • 試著使用自己的影象。
  • 試著在 adversary_example()中使用其他的引數。試試其它的目標類別、噪聲界限和評分要求。結果是怎樣的?
  • 你認為對於所有的目標類別都能生成它的對抗噪聲嗎?如何證明你的理論?
  • 試著在find_adversary_noise()中使用不同的公式來計算step-size。你能使優化更快嗎?
  • 試著在噪聲影象輸入到神經網路之前對它進行模糊處理。它能去掉對抗噪聲,並且導致再一次的正確分類嗎?
  • 試著降低噪聲影象的顏色深度,而不是對它做模糊。它會去除對抗噪聲並導致正確分類嗎?比如將影象的RGB限制在16或32位裡,通常是有255位的。
  • 你認為你的噪聲消除對MNIST資料集的手寫數字或奇特的幾何形狀有效嗎?有時將這些稱為'fooling images',上網搜尋看看。
  • 你能找到對所有影象都有效的對抗噪聲嗎?這樣就不用為每張影象尋找特定的噪聲了。你會怎麼做?
  • 你能直接用TensorFlow而不是Numpy來實現find_adversary_noise()嗎?需要在TensorFlow圖中建立一個噪聲變數,這樣它就能被優化。
  • 向朋友解釋什麼是對抗樣本以及程式如何找到它們。

相關文章