今天來看 TensorBoard 的一個內建的視覺化工具 Embedding Projector, 是個互動式的視覺化，可用來分析諸如 embeddings 的高維資料。
embedding projector 將從你的 checkpoint 檔案中讀取 embeddings。
預設情況下，embedding projector 會用 PCA 主成分分析方法將高維資料投影到 3D 空間, 還有一種投影方法是 T-SNE。

主要就是透過3步來實現這個視覺化:

Setup a 2D tensor that holds your embedding(s).

embedding_var = tf.Variable(....)

Periodically save your model variables in a checkpoint in LOG_DIR.

saver = tf.train.Saver()
saver.save(session, os.path.join(LOG_DIR, "model.ckpt"), step)

(Optional) Associate metadata with your embedding.

本節官方教程沒有給出完整的例子，這裡用 MNIST 舉一個簡單的例子。

1. 引入 projector，data，定義 path：

%matplotlib inlineimport matplotlib.pyplot as pltimport tensorflow as tfimport numpy as npimport osfrom tensorflow.contrib.tensorboard.plugins import projectorfrom tensorflow.examples.tutorials.mnist import input_data

LOG_DIR = 'minimalsample'NAME_TO_VISUALISE_VARIABLE = "mnistembedding"TO_EMBED_COUNT = 500path_for_mnist_sprites =  os.path.join(LOG_DIR,'mnistdigits.png')
path_for_mnist_metadata =  os.path.join(LOG_DIR,'metadata.tsv')

mnist = input_data.read_data_sets("MNIST_data/", one_hot=False)
batch_xs, batch_ys = mnist.train.next_batch(TO_EMBED_COUNT)

2. 建立 embeddings，也就是前面的第一步，最主要的就是你要知道想視覺化檢視的 variable 的名字：

embedding_var = tf.Variable(batch_xs, name=NAME_TO_VISUALISE_VARIABLE)
summary_writer = tf.summary.FileWriter(LOG_DIR)

3. 建立 embedding projectorc：
這一步很重要，要指定想要視覺化的 variable，metadata 檔案的位置

config = projector.ProjectorConfig()
embedding = config.embeddings.add()
embedding.tensor_name = embedding_var.name# Specify where you find the metadataembedding.metadata_path = path_for_mnist_metadata #'metadata.tsv'# Specify where you find the sprite (we will create this later)embedding.sprite.image_path = path_for_mnist_sprites #'mnistdigits.png'embedding.sprite.single_image_dim.extend([28,28])# Say that you want to visualise the embeddingsprojector.visualize_embeddings(summary_writer, config)

4. 儲存，即上面第二步：
Tensorboard 會從儲存的圖形中載入儲存的變數，所以初始化 session 和變數，並將其儲存在 logdir 中，

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.save(sess, os.path.join(LOG_DIR, "model.ckpt"), 1)

5. 定義 helper functions：

**create_sprite_image:** 將 sprits 整齊地對齊在方形畫布上
**vector_to_matrix_mnist:** 將 MNIST 的 vector 資料形式轉化為 images
**invert_grayscale: **將黑背景變為白背景

def create_sprite_image(images):
    """Returns a sprite image consisting of images passed as argument. Images should be count x width x height"""
    if isinstance(images, list):
        images = np.array(images)
    img_h = images.shape[1]
    img_w = images.shape[2]
    n_plots = int(np.ceil(np.sqrt(images.shape[0])))
    
    
    spriteimage = np.ones((img_h * n_plots ,img_w * n_plots ))    
    for i in range(n_plots):        for j in range(n_plots):
            this_filter = i * n_plots + j            if this_filter 6. 儲存 sprite image：
將 vector 轉換為 images，反轉灰度，並建立並儲存 sprite image。
to_visualise = batch_xs
to_visualise = vector_to_matrix_mnist(to_visualise)
to_visualise = invert_grayscale(to_visualise)

sprite_image = create_sprite_image(to_visualise)

plt.imsave(path_for_mnist_sprites,sprite_image,cmap='gray')
plt.imshow(sprite_image,cmap='gray')
7. 儲存 metadata：
將資料寫入 metadata，因為如果想在視覺化時看到不同數字用不同顏色表示，需要知道每個 image 的標籤，在這個 metadata 檔案中有這樣兩列："Index" , "Label"
with open(path_for_mnist_metadata,'w') as f:
    f.write("IndextLabeln")    for index,label in enumerate(batch_ys):
        f.write("%dt%dn" % (index,label))
8. 執行：
我是用 jupyter notebook 寫的，執行前面的程式碼後，會在當前 ipynb 所在資料夾下生成一個 minimalsample 資料夾，
要開啟 tensorboard ，需要在終端執行：
$ tensorboard --logdir=YOUR FOLDER/minimalsample
9. 然後在 embeddings 中可以看到圖了：
如果提示了 metadata.tsv is not a file 這個錯誤，
那麼，去 minimalsample 資料夾下會找到一個 projector_config.pbtxt 檔案，把裡面的 metadata_path: 和 image_path: 改為你的 metadata.tsv 和 mnistdigits.png 所在的絕對路徑。

TensorFlow-7-TensorBoard Embedding可

相關文章