DCGAN

Candy_GL發表於2018-07-20

原文網址 : https://blog.csdn.net/candy_gl/article/details/81138297

轉自：https://blog.csdn.net/liuxiao214/article/details/74502975

首先是各種參考部落格、連結等，表示感謝。

1、參考部落格1：地址

——以下，開始正文。

2017/12/12 更新解決訓練不收斂的問題。

更新在最後面部分。

1、DCGAN的簡單總結

穩定的深度卷積GAN 架構指南:

所有的pooling層使用步幅卷積(判別網路)和微步幅度卷積(生成網路)進行替換。
在生成網路和判別網路上使用批處理規範化。
對於更深的架構移除全連線隱藏層。
在生成網路的所有層上使用RelU啟用函式，除了輸出層使用Tanh啟用函式。
在判別網路的所有層上使用LeakyReLU啟用函式。

這裡寫圖片描述

圖： LSUN 場景模型中使用的DCGAN生成網路。一個100維度的均勻分佈z對映到一個有很多特徵對映的小空間範圍卷積。一連串的四個微步幅卷積（在最近的一些論文中它們錯誤地稱為去卷積），將高層表徵轉換為64*64畫素的影象。明顯，沒有使用全連線層和池化層。

2、DCGAN的實現

DCGAN原文作者是生成了臥室圖片，這裡參照前面寫的參考連結中，來生成動漫人物頭像。生成效果如下：

暫且先不放，因為還沒開始做。

2.1 蒐集原始資料集

首先是需要獲取大量的動漫影象，這個可以利用爬蟲爬取一個動漫網站：konachan.net的圖片。爬蟲的程式碼如下所示：

import requests  # http lib
from bs4 import BeautifulSoup  # climb lib
import os # operation system
import traceback # trace deviance

def download(url,filename):
    if os.path.exists(filename):
        print('file exists!')
        return
    try:
        r = requests.get(url,stream=True,timeout=60)
        r.raise_for_status()
        with open(filename,'wb') as f:
            for chunk in r.iter_content(chunk_size=1024):
                if chunk: # filter out keep-alove new chunks
                    f.write(chunk)
                    f.flush()
        return filename
    except KeyboardInterrupt:
        if os.path.exists(filename):
            os.remove(filename)
        return KeyboardInterrupt
    except Exception:
        traceback.print_exc()
        if os.path.exists(filename):
            os.remove(filename)

if os.path.exists('imgs') is False:
    os.makedirs('imgs')

start = 1
end = 8000
for i in range(start, end+1):
    url = 'http://konachan.net/post?page=%d&tags=' % i
    html = requests.get(url).text # gain the web's information
    soup =  BeautifulSoup(html,'html.parser') # doc's string and jie xi qi
    for img in soup.find_all('img',class_="preview"):# 遍歷所有preview類，找到img標籤
        target_url = 'http:' + img['src']
        filename = os.path.join('imgs',target_url.split('/')[-1])
        download(target_url,filename)
    print('%d / %d' % (i,end))

目標是獲取1萬張影象，因為自己是在CPU上跑的，而且記憶體太小，太多影象根本訓練不起來，就先少一點訓練，看看效果。

擷取部分影象如下所示：

這裡寫圖片描述

現在已經有了基本的影象了，但我們的目標是生成動漫頭像，不需要整張影象，而且其他的資訊會干擾到訓練，所以需要進行人臉檢測擷取人臉影象。

2.2 人臉檢測擷取人臉

通過基於opencv的人臉檢測分類器，參考於lbpcascade_animeface。

首先，要使用這個分類器要先進行下載：

wget https://raw.githubusercontent.com/nagadomi/lbpcascade_animeface/master/lbpcascade_animeface.xml

下載完成後，執行以下程式碼對影象進行人臉擷取。

import cv2
import sys
import os.path
from glob import glob

def detect(filename,cascade_file="lbpcascade_animeface.xml"):
    if not os.path.isfile(cascade_file):
        raise RuntimeError("%s: not found" % cascade_file)

    cascade = cv2.CascadeClassifier(cascade_file)
    image = cv2.imread(filename)
    gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
    gray = cv2.equalizeHist(gray)

    faces = cascade.detectMultiScale(
        gray,
        # detector options
        scaleFactor = 1.1,
        minNeighbors = 5,
        minSize = (48,48)
    )

    for i,(x,y,w,h) in enumerate(faces):
        face = image[y: y+h, x:x+w, :]
        face = cv2.resize(face,(96,96))
        save_filename = '%s.jpg' % (os.path.basename(filename).split('.')[0])
        cv2.imwrite("faces/"+sace_filename,face)

if __name__ == '__main__':
    if os.path.exists('faces') is False:
        os.makedirs('faces')
    file_list = glob('imgs/*.jpg')
    for filename in file_list:
        detect(filename)

處理後的影象如下所示：

這裡寫圖片描述

2.3 原始碼解析

參照於DCGAN-tensorflow。

總共獲取11053張影象，人臉檢測後得到3533張。

一共有4個檔案，分別是main.py、model.py、ops.py、utils.py。

2.3.1 mian.py

原始碼（98行）：

import os
import scipy.misc # 
import numpy as np

from model import DCGAN
from utils import pp, visualize, to_json, show_all_variables

import tensorflow as tf

flags = tf.app.flags
flags.DEFINE_integer("epoch", 25, "Epoch to train [25]")
flags.DEFINE_float("learning_rate", 0.0002, "Learning rate of for adam [0.0002]")
flags.DEFINE_float("beta1", 0.5, "Momentum term of adam [0.5]")
flags.DEFINE_integer("train_size", np.inf, "The size of train images [np.inf]")
flags.DEFINE_integer("batch_size", 64, "The size of batch images [64]")
flags.DEFINE_integer("input_height", 108, "The size of image to use (will be center cropped). [108]")
flags.DEFINE_integer("input_width", None, "The size of image to use (will be center cropped). If None, same value as input_height [None]")
flags.DEFINE_integer("output_height", 64, "The size of the output images to produce [64]")
flags.DEFINE_integer("output_width", None, "The size of the output images to produce. If None, same value as output_height [None]")
flags.DEFINE_string("dataset", "celebA", "The name of dataset [celebA, mnist, lsun]")
flags.DEFINE_string("input_fname_pattern", "*.jpg", "Glob pattern of filename of input images [*]")
flags.DEFINE_string("checkpoint_dir", "checkpoint", "Directory name to save the checkpoints [checkpoint]")
flags.DEFINE_string("sample_dir", "samples", "Directory name to save the image samples [samples]")
flags.DEFINE_boolean("train", False, "True for training, False for testing [False]")
flags.DEFINE_boolean("crop", False, "True for training, False for testing [False]")
flags.DEFINE_boolean("visualize", False, "True for visualizing, False for nothing [False]")
FLAGS = flags.FLAGS

def main(_):
  pp.pprint(flags.FLAGS.__flags)

  if FLAGS.input_width is None:
    FLAGS.input_width = FLAGS.input_height
  if FLAGS.output_width is None:
    FLAGS.output_width = FLAGS.output_height

  if not os.path.exists(FLAGS.checkpoint_dir):
    os.makedirs(FLAGS.checkpoint_dir)
  if not os.path.exists(FLAGS.sample_dir):
    os.makedirs(FLAGS.sample_dir)

  #gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
  run_config = tf.ConfigProto()
  run_config.gpu_options.allow_growth=True

  with tf.Session(config=run_config) as sess:
    if FLAGS.dataset == 'mnist':
      dcgan = DCGAN(
          sess,
          input_width=FLAGS.input_width,
          input_height=FLAGS.input_height,
          output_width=FLAGS.output_width,
          output_height=FLAGS.output_height,
          batch_size=FLAGS.batch_size,
          sample_num=FLAGS.batch_size,
          y_dim=10,
          dataset_name=FLAGS.dataset,
          input_fname_pattern=FLAGS.input_fname_pattern,
          crop=FLAGS.crop,
          checkpoint_dir=FLAGS.checkpoint_dir,
          sample_dir=FLAGS.sample_dir)
    else:
      dcgan = DCGAN(
          sess,
          input_width=FLAGS.input_width,
          input_height=FLAGS.input_height,
          output_width=FLAGS.output_width,
          output_height=FLAGS.output_height,
          batch_size=FLAGS.batch_size,
          sample_num=FLAGS.batch_size,
          dataset_name=FLAGS.dataset,
          input_fname_pattern=FLAGS.input_fname_pattern,
          crop=FLAGS.crop,
          checkpoint_dir=FLAGS.checkpoint_dir,
          sample_dir=FLAGS.sample_dir)

    show_all_variables()

    if FLAGS.train:
      dcgan.train(FLAGS)
    else:
      if not dcgan.load(FLAGS.checkpoint_dir)[0]:
        raise Exception("[!] Train a model first, then run test mode")


    # to_json("./web/js/layers.js", [dcgan.h0_w, dcgan.h0_b, dcgan.g_bn0],
    #                 [dcgan.h1_w, dcgan.h1_b, dcgan.g_bn1],
    #                 [dcgan.h2_w, dcgan.h2_b, dcgan.g_bn2],
    #                 [dcgan.h3_w, dcgan.h3_b, dcgan.g_bn3],
    #                 [dcgan.h4_w, dcgan.h4_b, None])

    # Below is codes for visualization
    OPTION = 1
    visualize(sess, dcgan, FLAGS, OPTION)

if __name__ == '__main__':
  tf.app.run()

該檔案呼叫了model.py檔案和utils.py檔案。

step0：執行main函式之前首先進行flags的解析，TensorFlow底層使用了python-gflags專案，然後封裝成tf.app.flags介面，也就是說TensorFlow通過設定flags來傳遞tf.app.run()所需要的引數，我們可以直接在程式執行前初始化flags，也可以在執行程式的時候設定命令列引數來達到傳參的目的。

這裡主要設定了：

epoch：迭代次數
learning_rate：學習速率，預設是0.002
beta1
train_size
batch_size：每次迭代的影象數量
input_height：需要指定輸入影象的高
input_width：需要指定輸入影象的寬
output_height：需要指定輸出影象的高
output_width：需要指定輸出影象的寬
dataset：需要指定處理哪個資料集
input_fname_pattern
checkpoint_dir
sample_dir
train：True for training, False for testing
crop：True for training, False for testing
visualize

step1：首先是列印引數資料，然後判斷輸入影象的輸出影象的寬是否指定，如果沒有指定，則等於其影象的高。

step2：然後判斷checkpoint和sample的檔案是否存在，不存在則建立。

step3：然後是設定session引數。tf.ConfigProto一般用在建立session的時候，用來對session進行引數配置，詳細內容可見這篇部落格。

#tf.ConfigProto()的引數：
log_device_placement=True : 是否列印裝置分配日誌
allow_soft_placement=True ： 如果你指定的裝置不存在，允許TF自動分配裝置
tf.ConfigProto(log_device_placement=True,allow_soft_placement=True)

控制GPU資源使用率：
#allow growth
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
# 使用allow_growth option，剛一開始分配少量的GPU容量，然後按需慢慢的增加，由於不會釋放記憶體，所以會導致碎片

step4：執行session，首先判斷處理的是哪個資料集，然後對應使用不同引數的DCGAN類，這個類會在model.py檔案中定義。

step5：show所有與訓練相關的變數。

step6：判斷是訓練還是測試，如果是訓練，則進行訓練；如果不是，判斷是否有訓練好的model，然後進行測試，如果沒有先訓練，則會提示“[!] Train a model first, then run test mode”。

step7：最後進行視覺化，visualize(sess, dcgan, FLAGS, OPTION)。

main.py主要是呼叫前面定義好的模型、影象處理方法，來進行訓練測試，程式的入口。

2.3.2 utils.py

原始碼（250行）：

"""
Some codes from https://github.com/Newmu/dcgan_code
"""
from __future__ import division
import math
import json
import random
import pprint # print data_struct
import scipy.misc
import numpy as np
from time import gmtime, strftime
from six.moves import xrange

import tensorflow as tf
import tensorflow.contrib.slim as slim

pp = pprint.PrettyPrinter()

get_stddev = lambda x, k_h, k_w: 1/math.sqrt(k_w*k_h*x.get_shape()[-1])

def show_all_variables():
  model_vars = tf.trainable_variables()
  slim.model_analyzer.analyze_vars(model_vars, print_info=True)

def get_image(image_path, input_height, input_width,
              resize_height=64, resize_width=64,
              crop=True, grayscale=False):
  image = imread(image_path, grayscale)
  return transform(image, input_height, input_width,
                   resize_height, resize_width, crop)

def save_images(images, size, image_path):
  return imsave(inverse_transform(images), size, image_path)

def imread(path, grayscale = False):
  if (grayscale):
    return scipy.misc.imread(path, flatten = True).astype(np.float)
  else:
    return scipy.misc.imread(path).astype(np.float)

def merge_images(images, size):
  return inverse_transform(images)

def merge(images, size):
  h, w = images.shape[1], images.shape[2]
  if (images.shape[3] in (3,4)):
    c = images.shape[3]
    img = np.zeros((h * size[0], w * size[1], c))
    for idx, image in enumerate(images):
      i = idx % size[1]
      j = idx // size[1]
      img[j * h:j * h + h, i * w:i * w + w, :] = image
    return img
  elif images.shape[3]==1:
    img = np.zeros((h * size[0], w * size[1]))
    for idx, image in enumerate(images):
      i = idx % size[1]
      j = idx // size[1]
      img[j * h:j * h + h, i * w:i * w + w] = image[:,:,0]
    return img
  else:
    raise ValueError('in merge(images,size) images parameter '
                     'must have dimensions: HxW or HxWx3 or HxWx4')

def imsave(images, size, path):
  image = np.squeeze(merge(images, size))
  return scipy.misc.imsave(path, image)

def center_crop(x, crop_h, crop_w,
                resize_h=64, resize_w=64):
  if crop_w is None:
    crop_w = crop_h
  h, w = x.shape[:2]
  j = int(round((h - crop_h)/2.))
  i = int(round((w - crop_w)/2.))
  return scipy.misc.imresize(
      x[j:j+crop_h, i:i+crop_w], [resize_h, resize_w])

def transform(image, input_height, input_width, 
              resize_height=64, resize_width=64, crop=True):
  if crop:
    cropped_image = center_crop(
      image, input_height, input_width, 
      resize_height, resize_width)
  else:
    cropped_image = scipy.misc.imresize(image, [resize_height, resize_width])
  return np.array(cropped_image)/127.5 - 1.

def inverse_transform(images):
  return (images+1.)/2.

def to_json(output_path, *layers):
  with open(output_path, "w") as layer_f:
    lines = ""
    for w, b, bn in layers:
      layer_idx = w.name.split('/')[0].split('h')[1]

      B = b.eval()

      if "lin/" in w.name:
        W = w.eval()
        depth = W.shape[1]
      else:
        W = np.rollaxis(w.eval(), 2, 0)
        depth = W.shape[0]

      biases = {"sy": 1, "sx": 1, "depth": depth, "w": ['%.2f' % elem for elem in list(B)]}
      if bn != None:
        gamma = bn.gamma.eval()
        beta = bn.beta.eval()

        gamma = {"sy": 1, "sx": 1, "depth": depth, "w": ['%.2f' % elem for elem in list(gamma)]}
        beta = {"sy": 1, "sx": 1, "depth": depth, "w": ['%.2f' % elem for elem in list(beta)]}
      else:
        gamma = {"sy": 1, "sx": 1, "depth": 0, "w": []}
        beta = {"sy": 1, "sx": 1, "depth": 0, "w": []}

      if "lin/" in w.name:
        fs = []
        for w in W.T:
          fs.append({"sy": 1, "sx": 1, "depth": W.shape[0], "w": ['%.2f' % elem for elem in list(w)]})

        lines += """
          var layer_%s = {
            "layer_type": "fc", 
            "sy": 1, "sx": 1, 
            "out_sx": 1, "out_sy": 1,
            "stride": 1, "pad": 0,
            "out_depth": %s, "in_depth": %s,
            "biases": %s,
            "gamma": %s,
            "beta": %s,
            "filters": %s
          };""" % (layer_idx.split('_')[0], W.shape[1], W.shape[0], biases, gamma, beta, fs)
      else:
        fs = []
        for w_ in W:
          fs.append({"sy": 5, "sx": 5, "depth": W.shape[3], "w": ['%.2f' % elem for elem in list(w_.flatten())]})

        lines += """
          var layer_%s = {
            "layer_type": "deconv", 
            "sy": 5, "sx": 5,
            "out_sx": %s, "out_sy": %s,
            "stride": 2, "pad": 1,
            "out_depth": %s, "in_depth": %s,
            "biases": %s,
            "gamma": %s,
            "beta": %s,
            "filters": %s
          };""" % (layer_idx, 2**(int(layer_idx)+2), 2**(int(layer_idx)+2),
               W.shape[0], W.shape[3], biases, gamma, beta, fs)
    layer_f.write(" ".join(lines.replace("'","").split()))

def make_gif(images, fname, duration=2, true_image=False):
  import moviepy.editor as mpy

  def make_frame(t):
    try:
      x = images[int(len(images)/duration*t)]
    except:
      x = images[-1]

    if true_image:
      return x.astype(np.uint8)
    else:
      return ((x+1)/2*255).astype(np.uint8)

  clip = mpy.VideoClip(make_frame, duration=duration)
  clip.write_gif(fname, fps = len(images) / duration)

def visualize(sess, dcgan, config, option):
  image_frame_dim = int(math.ceil(config.batch_size**.5))
  if option == 0:
    z_sample = np.random.uniform(-0.5, 0.5, size=(config.batch_size, dcgan.z_dim))
    samples = sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample})
    save_images(samples, [image_frame_dim, image_frame_dim], './samples/test_%s.png' % strftime("%Y%m%d%H%M%S", gmtime()))
  elif option == 1:
    values = np.arange(0, 1, 1./config.batch_size)
    for idx in xrange(100):
      print(" [*] %d" % idx)
      z_sample = np.zeros([config.batch_size, dcgan.z_dim])
      for kdx, z in enumerate(z_sample):
        z[idx] = values[kdx]

      if config.dataset == "mnist":
        y = np.random.choice(10, config.batch_size)
        y_one_hot = np.zeros((config.batch_size, 10))
        y_one_hot[np.arange(config.batch_size), y] = 1

        samples = sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample, dcgan.y: y_one_hot})
      else:
        samples = sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample})

      save_images(samples, [image_frame_dim, image_frame_dim], './samples/test_arange_%s.png' % (idx))
  elif option == 2:
    values = np.arange(0, 1, 1./config.batch_size)
    for idx in [random.randint(0, 99) for _ in xrange(100)]:
      print(" [*] %d" % idx)
      z = np.random.uniform(-0.2, 0.2, size=(dcgan.z_dim))
      z_sample = np.tile(z, (config.batch_size, 1))
      #z_sample = np.zeros([config.batch_size, dcgan.z_dim])
      for kdx, z in enumerate(z_sample):
        z[idx] = values[kdx]

      if config.dataset == "mnist":
        y = np.random.choice(10, config.batch_size)
        y_one_hot = np.zeros((config.batch_size, 10))
        y_one_hot[np.arange(config.batch_size), y] = 1

        samples = sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample, dcgan.y: y_one_hot})
      else:
        samples = sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample})

      try:
        make_gif(samples, './samples/test_gif_%s.gif' % (idx))
      except:
        save_images(samples, [image_frame_dim, image_frame_dim], './samples/test_%s.png' % strftime("%Y%m%d%H%M%S", gmtime()))
  elif option == 3:
    values = np.arange(0, 1, 1./config.batch_size)
    for idx in xrange(100):
      print(" [*] %d" % idx)
      z_sample = np.zeros([config.batch_size, dcgan.z_dim])
      for kdx, z in enumerate(z_sample):
        z[idx] = values[kdx]

      samples = sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample})
      make_gif(samples, './samples/test_gif_%s.gif' % (idx))
  elif option == 4:
    image_set = []
    values = np.arange(0, 1, 1./config.batch_size)

    for idx in xrange(100):
      print(" [*] %d" % idx)
      z_sample = np.zeros([config.batch_size, dcgan.z_dim])
      for kdx, z in enumerate(z_sample): z[idx] = values[kdx]

      image_set.append(sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample}))
      make_gif(image_set[-1], './samples/test_gif_%s.gif' % (idx))

    new_image_set = [merge(np.array([images[idx] for images in image_set]), [10, 10]) \
        for idx in range(64) + range(63, -1, -1)]
    make_gif(new_image_set, './samples/test_gif_merged.gif', duration=8)


def image_manifold_size(num_images):
  manifold_h = int(np.floor(np.sqrt(num_images)))
  manifold_w = int(np.ceil(np.sqrt(num_images)))
  assert manifold_h * manifold_w == num_images
  return manifold_h, manifold_w

這份程式碼主要是定義了各種對影象處理的函式，相當於其他3個檔案的標頭檔案。

step0：首先定義了一個pp = pprint.PrettyPrinter()，以方便列印資料結構資訊，詳細資訊可見這篇部落格。

step1：定義了get_stddev函式，是三個引數乘積後開平方的倒數，應該是為了隨機化用。

step2：定義show_all_variables()函式。首先，tf.trainable_variables返回的是需要訓練的變數列表；然後用tensorflow.contrib.slim中的model_analyzer.analyze_vars列印出所有與訓練相關的變數資訊。用法如下：

-程式碼：

import tensorflow as tf
import tensorflow.contrib.slim as slim

x1=tf.Variable(tf.constant(1,shape=[1],dtype=tf.float32),name='x11')
x2=tf.Variable(tf.constant(2,shape=[1],dtype=tf.float32),name='x22')
m=tf.train.ExponentialMovingAverage(0.99,5)
v=tf.trainable_variables()
for i in v:
    print 233
    print i

print 23333333   
slim.model_analyzer.analyze_vars(v,print_info=True)
print 23333333

-結果截圖如下：

這裡寫圖片描述

-
注：從step3-step11，都是在定義一些影象處理的函式，它們之間相互呼叫。

step3：定義get_image(image_path,input_height,input_width,resize_height=64, resize_width=64,crop=True, grayscale=False)函式。首先根據影象路徑引數讀取路徑，根據灰度化引數選擇是否進行灰度化。然後對影象參照輸入的引數進行裁剪。

step4：定義save_images(images,size,image_path)函式。呼叫imsave(inverse_transform(images), size, image_path)函式並返回新影象。

step5：定義imread(path, grayscale = False)函式。呼叫cipy.misc.imread()函式，判斷grayscale引數是否進行範圍灰度化，並進行型別轉換為np.float。

step6：定義merge_images(images, size)函式。呼叫inverse_transform(images)函式，並返回新影象。

step7：定義merge(images, size)函式。首先獲取image的高和寬。然後判斷image是RGB圖還是灰度圖，以分別進行不同的處理。如果通道數是3或4，則對每一批次（如，batch_size=64）的所有影象，用0初始化一張原始影象放大8*8的影象，然後迴圈，依次將所有影象填入大影象，並且返回這張大影象。如果通道數是1，也是一樣，只不過填入影象的時候只填一個通道的資訊。如果不是上述兩種情況，則丟擲錯誤提示。

step8：定義imsave(images, size, path)函式。首先將merge()函式返回的影象，用 np.squeeze()函式移除長度為1的軸。然後利用scipy.misc.imsave()函式將新影象儲存到指定路徑中。

step9：定義center_crop(x, crop_h, crop_w,resize_h=64, resize_w=64)函式。對影象的H和W與crop的H和W相減，得到取整的值，根據這個值作為下標依據來scipy.misc.resize影象。

step10：定義transform(image, input_height, input_width,resize_height=64, resize_width=64, crop=True)函式。對輸入的影象進行裁剪，如果crop為true，則使用center_crop()函式，對影象的H和W與crop的H和W相減，得到取整的值，根據這個值作為下標依據來scipy.misc.resize影象；否則不對影象進行其他操作，直接scipy.misc.resize為64*64大小的影象。最後返回影象。

step11：定義inverse_transform(images)函式。對影象進行翻轉後返回新影象。

總結下來，這幾個函式相互呼叫，主要實現了3個影象操作功能：獲取影象get_image()，負責讀取影象，返回影象裁剪後的新影象；儲存影象save_images()，負責將一個batch中所有影象儲存為一張大影象並返回；影象翻轉merge_images()，負責不知道怎麼得翻轉的，返回新影象。它們之間的相互關係如下圖所示。

這裡寫圖片描述

step12：定義to_json(output_path, *layers)函式。應該是獲取每一層的權值、偏置值什麼的，但貌似程式碼中沒有用到這個函式，所以先不管，後面用到再說。

step13：定義make_gif(images, fname, duration=2, true_image=False)函式。利用moviepy.editor模組來製作動圖，為了視覺化用的。函式又定義了一個函式make_frame(t)，首先根據影象集的長度和持續的時間做一個除法，然後返回每幀影象。最後視訊修剪並製作成GIF動畫。

step14：定義visualize(sess, dcgan, config, option)函式。分為0、1、2、3、4種option。如果option=0，則之間顯示生產的樣本‘如果option=1，根據不同資料集不一樣的處理，並利用前面的save_images()函式將sample儲存下來；等等。本次在main.py中選用option=1。

step15：定義image_manifold_size(num_images)函式。首先獲取影象數量的開平方後向下取整的h和向上取整的w，然後設定一個assert斷言，如果h*w與影象數量相等，則返回h和w，否則斷言錯誤提示。

這就是全部utils.py全部內容，主要負責影象的一些基本操作，獲取影象、儲存影象、影象翻轉，和利用moviepy模組視覺化訓練過程。

2.3.3 ops.py

原始碼（105行）：

import math
import numpy as np 
import tensorflow as tf

from tensorflow.python.framework import ops

from utils import *

try:
  image_summary = tf.image_summary
  scalar_summary = tf.scalar_summary
  histogram_summary = tf.histogram_summary
  merge_summary = tf.merge_summary
  SummaryWriter = tf.train.SummaryWriter
except:
  image_summary = tf.summary.image
  scalar_summary = tf.summary.scalar
  histogram_summary = tf.summary.histogram
  merge_summary = tf.summary.merge
  SummaryWriter = tf.summary.FileWriter

if "concat_v2" in dir(tf):
  def concat(tensors, axis, *args, **kwargs):
    return tf.concat_v2(tensors, axis, *args, **kwargs)
else:
  def concat(tensors, axis, *args, **kwargs):
    return tf.concat(tensors, axis, *args, **kwargs)

class batch_norm(object):
  def __init__(self, epsilon=1e-5, momentum = 0.9, name="batch_norm"):
    with tf.variable_scope(name):
      self.epsilon  = epsilon
      self.momentum = momentum
      self.name = name

  def __call__(self, x, train=True):
    return tf.contrib.layers.batch_norm(x,
                      decay=self.momentum, 
                      updates_collections=None,
                      epsilon=self.epsilon,
                      scale=True,
                      is_training=train,
                      scope=self.name)

def conv_cond_concat(x, y):
  """Concatenate conditioning vector on feature map axis."""
  x_shapes = x.get_shape()
  y_shapes = y.get_shape()
  return concat([
    x, y*tf.ones([x_shapes[0], x_shapes[1], x_shapes[2], y_shapes[3]])], 3)

def conv2d(input_, output_dim, 
       k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,
       name="conv2d"):
  with tf.variable_scope(name):
    w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],
              initializer=tf.truncated_normal_initializer(stddev=stddev))
    conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')

    biases = tf.get_variable('biases', [output_dim], initializer=tf.constant_initializer(0.0))
    conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())

    return conv

def deconv2d(input_, output_shape,
       k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,
       name="deconv2d", with_w=False):
  with tf.variable_scope(name):
    # filter : [height, width, output_channels, in_channels]
    w = tf.get_variable('w', [k_h, k_w, output_shape[-1], input_.get_shape()[-1]],
              initializer=tf.random_normal_initializer(stddev=stddev))

    try:
      deconv = tf.nn.conv2d_transpose(input_, w, output_shape=output_shape,
                strides=[1, d_h, d_w, 1])

    # Support for verisons of TensorFlow before 0.7.0
    except AttributeError:
      deconv = tf.nn.deconv2d(input_, w, output_shape=output_shape,
                strides=[1, d_h, d_w, 1])

    biases = tf.get_variable('biases', [output_shape[-1]], initializer=tf.constant_initializer(0.0))
    deconv = tf.reshape(tf.nn.bias_add(deconv, biases), deconv.get_shape())

    if with_w:
      return deconv, w, biases
    else:
      return deconv

def lrelu(x, leak=0.2, name="lrelu"):
  return tf.maximum(x, leak*x)

def linear(input_, output_size, scope=None, stddev=0.02, bias_start=0.0, with_w=False):
  shape = input_.get_shape().as_list()

  with tf.variable_scope(scope or "Linear"):
    matrix = tf.get_variable("Matrix", [shape[1], output_size], tf.float32,
                 tf.random_normal_initializer(stddev=stddev))
    bias = tf.get_variable("bias", [output_size],
      initializer=tf.constant_initializer(bias_start))
    if with_w:
      return tf.matmul(input_, matrix) + bias, matrix, bias
    else:
      return tf.matmul(input_, matrix) + bias

該檔案呼叫了utils.py檔案。

step0：首先匯入tensorflow.python.framework模組，包含了tensorflow中圖、張量等的定義操作。

step1：然後是一個try…except部分，定義了一堆變數：image_summary 、scalar_summary、histogram_summary、merge_summary、SummaryWriter，都是從相應的tensorflow中獲取的。如果可是直接獲取，則獲取，否則從tf.summary中獲取。

step2：用來連線多個tensor。利用dir(tf)判斷”concat_v2”是否在裡面，如果在的話，定義一個concat(tensors, axis, *args, **kwargs)函式，並返回tf.concat_v2(tensors, axis, *args, **kwargs)；否則也定義concat(tensors, axis, *args, **kwargs)函式，只不過返回的是tf.concat(tensors, axis, *args, **kwargs)。其中，tf.concat使用如下：

t1=tf.constant([[1,2,3],[4,5,6]])
t2=tf.constant([[7,8,9],[10,11,12]])
t3=tf.concat([t1,t2],0)
t4=tf.concat([t1,t2],1)
print t1
print t2
print t3
print t4

這裡寫圖片描述

step3：定義一個batch_norm類，包含兩個函式init和call函式。首先在init(self, epsilon=1e-5, momentum = 0.9, name=”batch_norm”)函式中，定義一個name引數名字的變數，初始化self變數epsilon、momentum 、name。在call(self, x, train=True)函式中，利用tf.contrib.layers.batch_norm函式批處理規範化。

step4：定義conv_cond_concat(x,y)函式。連線x,y與Int32型的[x_shapes[0], x_shapes[1], x_shapes[2], y_shapes[3]]維度的張量乘積。

step5：定義conv2d(input_, output_dim, k_h=5, k_w=5, d_h=2,d_w=2, stddev=0.02,name=”conv2d”)函式。卷積函式：獲取隨機正態分佈權值、實現卷積、獲取初始偏置值，獲取新增偏置值後的卷積變數並返回。

step6：定義deconv2d(input_, output_shape,k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,name=”deconv2d”, with_w=False):函式。解卷積函式：獲取隨機正態分佈權值、解卷積，獲取初始偏置值，獲取新增偏置值後的卷積變數，判斷with_w是否為真，真則返回解卷積、權值、偏置值，否則返回解卷積。

step7：定義lrelu(x, leak=0.2, name=”lrelu”)函式。定義一個lrelu激勵函式。

step8：定義linear(input_, output_size, scope=None, stddev=0.02, bias_start=0.0, with_w=False)函式。進行線性運算，獲取一個隨機正態分佈矩陣，獲取初始偏置值，如果with_w為真，則返回xw+b，權值w和偏置值b；否則返回xw+b。

這個檔案主要定義了一些變數連線的函式、批處理規範化的函式、卷積函式、解卷積函式、激勵函式、線性運算函式。

2.3.4 model.py

原始碼（530行）：

from __future__ import division
import os
import time
import math
from glob import glob  # file path search
import tensorflow as tf
import numpy as np
from six.moves import xrange

from ops import *
from utils import *

def conv_out_size_same(size, stride):
  return int(math.ceil(float(size) / float(stride)))

class DCGAN(object):
  def __init__(self, sess, input_height=108, input_width=108, crop=True,
         batch_size=64, sample_num = 64, output_height=64, output_width=64,
         y_dim=None, z_dim=100, gf_dim=64, df_dim=64,
         gfc_dim=1024, dfc_dim=1024, c_dim=3, dataset_name='default',
         input_fname_pattern='*.jpg', checkpoint_dir=None, sample_dir=None):
    """

    Args:
      sess: TensorFlow session
      batch_size: The size of batch. Should be specified before training.
      y_dim: (optional) Dimension of dim for y. [None]
      z_dim: (optional) Dimension of dim for Z. [100]
      gf_dim: (optional) Dimension of gen filters in first conv layer. [64]
      df_dim: (optional) Dimension of discrim filters in first conv layer. [64]
      gfc_dim: (optional) Dimension of gen units for for fully connected layer. [1024]
      dfc_dim: (optional) Dimension of discrim units for fully connected layer. [1024]
      c_dim: (optional) Dimension of image color. For grayscale input, set to 1. [3]
    """
    self.sess = sess
    self.crop = crop

    self.batch_size = batch_size
    self.sample_num = sample_num

    self.input_height = input_height
    self.input_width = input_width
    self.output_height = output_height
    self.output_width = output_width

    self.y_dim = y_dim
    self.z_dim = z_dim

    self.gf_dim = gf_dim
    self.df_dim = df_dim

    self.gfc_dim = gfc_dim
    self.dfc_dim = dfc_dim

    # batch normalization : deals with poor initialization helps gradient flow
    self.d_bn1 = batch_norm(name='d_bn1')
    self.d_bn2 = batch_norm(name='d_bn2')

    if not self.y_dim:
      self.d_bn3 = batch_norm(name='d_bn3')

    self.g_bn0 = batch_norm(name='g_bn0')
    self.g_bn1 = batch_norm(name='g_bn1')
    self.g_bn2 = batch_norm(name='g_bn2')

    if not self.y_dim:
      self.g_bn3 = batch_norm(name='g_bn3')

    self.dataset_name = dataset_name
    self.input_fname_pattern = input_fname_pattern
    self.checkpoint_dir = checkpoint_dir

    if self.dataset_name == 'mnist':
      self.data_X, self.data_y = self.load_mnist()
      self.c_dim = self.data_X[0].shape[-1]
    else:
      self.data = glob(os.path.join("./data", self.dataset_name, self.input_fname_pattern))
      imreadImg = imread(self.data[0]);
      if len(imreadImg.shape) >= 3: #check if image is a non-grayscale image by checking channel number
        self.c_dim = imread(self.data[0]).shape[-1]
      else:
        self.c_dim = 1

    self.grayscale = (self.c_dim == 1)

    self.build_model()

  def build_model(self):
    if self.y_dim:
      self.y= tf.placeholder(tf.float32, [self.batch_size, self.y_dim], name='y')

    if self.crop:
      image_dims = [self.output_height, self.output_width, self.c_dim]
    else:
      image_dims = [self.input_height, self.input_width, self.c_dim]

    self.inputs = tf.placeholder(
      tf.float32, [self.batch_size] + image_dims, name='real_images')

    inputs = self.inputs

    self.z = tf.placeholder(
      tf.float32, [None, self.z_dim], name='z')
    self.z_sum = histogram_summary("z", self.z)

    if self.y_dim:
      self.G = self.generator(self.z, self.y)
      self.D, self.D_logits = \
          self.discriminator(inputs, self.y, reuse=False)

      self.sampler = self.sampler(self.z, self.y)
      self.D_, self.D_logits_ = \
          self.discriminator(self.G, self.y, reuse=True)
    else:
      self.G = self.generator(self.z)
      self.D, self.D_logits = self.discriminator(inputs)

      self.sampler = self.sampler(self.z)
      self.D_, self.D_logits_ = self.discriminator(self.G, reuse=True)

    self.d_sum = histogram_summary("d", self.D)
    self.d__sum = histogram_summary("d_", self.D_)
    self.G_sum = image_summary("G", self.G)

    def sigmoid_cross_entropy_with_logits(x, y):
      try:
        return tf.nn.sigmoid_cross_entropy_with_logits(logits=x, labels=y)
      except:
        return tf.nn.sigmoid_cross_entropy_with_logits(logits=x, targets=y)

    self.d_loss_real = tf.reduce_mean(
      sigmoid_cross_entropy_with_logits(self.D_logits, tf.ones_like(self.D)))
    self.d_loss_fake = tf.reduce_mean(
      sigmoid_cross_entropy_with_logits(self.D_logits_, tf.zeros_like(self.D_)))
    self.g_loss = tf.reduce_mean(
      sigmoid_cross_entropy_with_logits(self.D_logits_, tf.ones_like(self.D_)))

    self.d_loss_real_sum = scalar_summary("d_loss_real", self.d_loss_real)
    self.d_loss_fake_sum = scalar_summary("d_loss_fake", self.d_loss_fake)

    self.d_loss = self.d_loss_real + self.d_loss_fake

    self.g_loss_sum = scalar_summary("g_loss", self.g_loss)
    self.d_loss_sum = scalar_summary("d_loss", self.d_loss)

    t_vars = tf.trainable_variables()

    self.d_vars = [var for var in t_vars if 'd_' in var.name]
    self.g_vars = [var for var in t_vars if 'g_' in var.name]

    self.saver = tf.train.Saver()

  def train(self, config):
    d_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1) \
              .minimize(self.d_loss, var_list=self.d_vars)
    g_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1) \
              .minimize(self.g_loss, var_list=self.g_vars)
    try:
      tf.global_variables_initializer().run()
    except:
      tf.initialize_all_variables().run()

    self.g_sum = merge_summary([self.z_sum, self.d__sum,
      self.G_sum, self.d_loss_fake_sum, self.g_loss_sum])
    self.d_sum = merge_summary(
        [self.z_sum, self.d_sum, self.d_loss_real_sum, self.d_loss_sum])
    self.writer = SummaryWriter("./logs", self.sess.graph)

    sample_z = np.random.uniform(-1, 1, size=(self.sample_num , self.z_dim))

    if config.dataset == 'mnist':
      sample_inputs = self.data_X[0:self.sample_num]
      sample_labels = self.data_y[0:self.sample_num]
    else:
      sample_files = self.data[0:self.sample_num]
      sample = [
          get_image(sample_file,
                    input_height=self.input_height,
                    input_width=self.input_width,
                    resize_height=self.output_height,
                    resize_width=self.output_width,
                    crop=self.crop,
                    grayscale=self.grayscale) for sample_file in sample_files]
      if (self.grayscale):
        sample_inputs = np.array(sample).astype(np.float32)[:, :, :, None]
      else:
        sample_inputs = np.array(sample).astype(np.float32)

    counter = 1
    start_time = time.time()
    could_load, checkpoint_counter = self.load(self.checkpoint_dir)
    if could_load:
      counter = checkpoint_counter
      print(" [*] Load SUCCESS")
    else:
      print(" [!] Load failed...")

    for epoch in xrange(config.epoch):
      if config.dataset == 'mnist':
        batch_idxs = min(len(self.data_X), config.train_size) // config.batch_size
      else:      
        self.data = glob(os.path.join(
          "./data", config.dataset, self.input_fname_pattern))
        batch_idxs = min(len(self.data), config.train_size) // config.batch_size

      for idx in xrange(0, batch_idxs):
        if config.dataset == 'mnist':
          batch_images = self.data_X[idx*config.batch_size:(idx+1)*config.batch_size]
          batch_labels = self.data_y[idx*config.batch_size:(idx+1)*config.batch_size]
        else:
          batch_files = self.data[idx*config.batch_size:(idx+1)*config.batch_size]
          batch = [
              get_image(batch_file,
                        input_height=self.input_height,
                        input_width=self.input_width,
                        resize_height=self.output_height,
                        resize_width=self.output_width,
                        crop=self.crop,
                        grayscale=self.grayscale) for batch_file in batch_files]
          if self.grayscale:
            batch_images = np.array(batch).astype(np.float32)[:, :, :, None]
          else:
            batch_images = np.array(batch).astype(np.float32)

        batch_z = np.random.uniform(-1, 1, [config.batch_size, self.z_dim]) \
              .astype(np.float32)

        if config.dataset == 'mnist':
          # Update D network
          _, summary_str = self.sess.run([d_optim, self.d_sum],
            feed_dict={ 
              self.inputs: batch_images,
              self.z: batch_z,
              self.y:batch_labels,
            })
          self.writer.add_summary(summary_str, counter)

          # Update G network
          _, summary_str = self.sess.run([g_optim, self.g_sum],
            feed_dict={
              self.z: batch_z, 
              self.y:batch_labels,
            })
          self.writer.add_summary(summary_str, counter)

          # Run g_optim twice to make sure that d_loss does not go to zero (different from paper)
          _, summary_str = self.sess.run([g_optim, self.g_sum],
            feed_dict={ self.z: batch_z, self.y:batch_labels })
          self.writer.add_summary(summary_str, counter)

          errD_fake = self.d_loss_fake.eval({
              self.z: batch_z, 
              self.y:batch_labels
          })
          errD_real = self.d_loss_real.eval({
              self.inputs: batch_images,
              self.y:batch_labels
          })
          errG = self.g_loss.eval({
              self.z: batch_z,
              self.y: batch_labels
          })
        else:
          # Update D network
          _, summary_str = self.sess.run([d_optim, self.d_sum],
            feed_dict={ self.inputs: batch_images, self.z: batch_z })
          self.writer.add_summary(summary_str, counter)

          # Update G network
          _, summary_str = self.sess.run([g_optim, self.g_sum],
            feed_dict={ self.z: batch_z })
          self.writer.add_summary(summary_str, counter)

          # Run g_optim twice to make sure that d_loss does not go to zero (different from paper)
          _, summary_str = self.sess.run([g_optim, self.g_sum],
            feed_dict={ self.z: batch_z })
          self.writer.add_summary(summary_str, counter)

          errD_fake = self.d_loss_fake.eval({ self.z: batch_z })
          errD_real = self.d_loss_real.eval({ self.inputs: batch_images })
          errG = self.g_loss.eval({self.z: batch_z})

        counter += 1
        print("Epoch: [%2d] [%4d/%4d] time: %4.4f, d_loss: %.8f, g_loss: %.8f" \
          % (epoch, idx, batch_idxs,
            time.time() - start_time, errD_fake+errD_real, errG))

        if np.mod(counter, 100) == 1:
          if config.dataset == 'mnist':
            samples, d_loss, g_loss = self.sess.run(
              [self.sampler, self.d_loss, self.g_loss],
              feed_dict={
                  self.z: sample_z,
                  self.inputs: sample_inputs,
                  self.y:sample_labels,
              }
            )
            save_images(samples, image_manifold_size(samples.shape[0]),
                  './{}/train_{:02d}_{:04d}.png'.format(config.sample_dir, epoch, idx))
            print("[Sample] d_loss: %.8f, g_loss: %.8f" % (d_loss, g_loss)) 
          else:
            try:
              samples, d_loss, g_loss = self.sess.run(
                [self.sampler, self.d_loss, self.g_loss],
                feed_dict={
                    self.z: sample_z,
                    self.inputs: sample_inputs,
                },
              )
              save_images(samples, image_manifold_size(samples.shape[0]),
                    './{}/train_{:02d}_{:04d}.png'.format(config.sample_dir, epoch, idx))
              print("[Sample] d_loss: %.8f, g_loss: %.8f" % (d_loss, g_loss)) 
            except:
              print("one pic error!...")

        if np.mod(counter, 500) == 2:
          self.save(config.checkpoint_dir, counter)

  def discriminator(self, image, y=None, reuse=False):
    with tf.variable_scope("discriminator") as scope:
      if reuse:
        scope.reuse_variables()

      if not self.y_dim:
        h0 = lrelu(conv2d(image, self.df_dim, name='d_h0_conv'))
        h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim*2, name='d_h1_conv')))
        h2 = lrelu(self.d_bn2(conv2d(h1, self.df_dim*4, name='d_h2_conv')))
        h3 = lrelu(self.d_bn3(conv2d(h2, self.df_dim*8, name='d_h3_conv')))
        h4 = linear(tf.reshape(h3, [self.batch_size, -1]), 1, 'd_h4_lin')

        return tf.nn.sigmoid(h4), h4
      else:
        yb = tf.reshape(y, [self.batch_size, 1, 1, self.y_dim])
        x = conv_cond_concat(image, yb)

        h0 = lrelu(conv2d(x, self.c_dim + self.y_dim, name='d_h0_conv'))
        h0 = conv_cond_concat(h0, yb)

        h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim + self.y_dim, name='d_h1_conv')))
        h1 = tf.reshape(h1, [self.batch_size, -1])      
        h1 = concat([h1, y], 1)

        h2 = lrelu(self.d_bn2(linear(h1, self.dfc_dim, 'd_h2_lin')))
        h2 = concat([h2, y], 1)

        h3 = linear(h2, 1, 'd_h3_lin')

        return tf.nn.sigmoid(h3), h3

  def generator(self, z, y=None):
    with tf.variable_scope("generator") as scope:
      if not self.y_dim:
        s_h, s_w = self.output_height, self.output_width
        s_h2, s_w2 = conv_out_size_same(s_h, 2), conv_out_size_same(s_w, 2)
        s_h4, s_w4 = conv_out_size_same(s_h2, 2), conv_out_size_same(s_w2, 2)
        s_h8, s_w8 = conv_out_size_same(s_h4, 2), conv_out_size_same(s_w4, 2)
        s_h16, s_w16 = conv_out_size_same(s_h8, 2), conv_out_size_same(s_w8, 2)

        # project `z` and reshape
        self.z_, self.h0_w, self.h0_b = linear(
            z, self.gf_dim*8*s_h16*s_w16, 'g_h0_lin', with_w=True)

        self.h0 = tf.reshape(
            self.z_, [-1, s_h16, s_w16, self.gf_dim * 8])
        h0 = tf.nn.relu(self.g_bn0(self.h0))

        self.h1, self.h1_w, self.h1_b = deconv2d(
            h0, [self.batch_size, s_h8, s_w8, self.gf_dim*4], name='g_h1', with_w=True)
        h1 = tf.nn.relu(self.g_bn1(self.h1))

        h2, self.h2_w, self.h2_b = deconv2d(
            h1, [self.batch_size, s_h4, s_w4, self.gf_dim*2], name='g_h2', with_w=True)
        h2 = tf.nn.relu(self.g_bn2(h2))

        h3, self.h3_w, self.h3_b = deconv2d(
            h2, [self.batch_size, s_h2, s_w2, self.gf_dim*1], name='g_h3', with_w=True)
        h3 = tf.nn.relu(self.g_bn3(h3))

        h4, self.h4_w, self.h4_b = deconv2d(
            h3, [self.batch_size, s_h, s_w, self.c_dim], name='g_h4', with_w=True)

        return tf.nn.tanh(h4)
      else:
        s_h, s_w = self.output_height, self.output_width
        s_h2, s_h4 = int(s_h/2), int(s_h/4)
        s_w2, s_w4 = int(s_w/2), int(s_w/4)

        # yb = tf.expand_dims(tf.expand_dims(y, 1),2)
        yb = tf.reshape(y, [self.batch_size, 1, 1, self.y_dim])
        z = concat([z, y], 1)

        h0 = tf.nn.relu(
            self.g_bn0(linear(z, self.gfc_dim, 'g_h0_lin')))
        h0 = concat([h0, y], 1)

        h1 = tf.nn.relu(self.g_bn1(
            linear(h0, self.gf_dim*2*s_h4*s_w4, 'g_h1_lin')))
        h1 = tf.reshape(h1, [self.batch_size, s_h4, s_w4, self.gf_dim * 2])

        h1 = conv_cond_concat(h1, yb)

        h2 = tf.nn.relu(self.g_bn2(deconv2d(h1,
            [self.batch_size, s_h2, s_w2, self.gf_dim * 2], name='g_h2')))
        h2 = conv_cond_concat(h2, yb)

        return tf.nn.sigmoid(
            deconv2d(h2, [self.batch_size, s_h, s_w, self.c_dim], name='g_h3'))

  def sampler(self, z, y=None):
    with tf.variable_scope("generator") as scope:
      scope.reuse_variables()

      if not self.y_dim:
        s_h, s_w = self.output_height, self.output_width
        s_h2, s_w2 = conv_out_size_same(s_h, 2), conv_out_size_same(s_w, 2)
        s_h4, s_w4 = conv_out_size_same(s_h2, 2), conv_out_size_same(s_w2, 2)
        s_h8, s_w8 = conv_out_size_same(s_h4, 2), conv_out_size_same(s_w4, 2)
        s_h16, s_w16 = conv_out_size_same(s_h8, 2), conv_out_size_same(s_w8, 2)

        # project `z` and reshape
        h0 = tf.reshape(
            linear(z, self.gf_dim*8*s_h16*s_w16, 'g_h0_lin'),
            [-1, s_h16, s_w16, self.gf_dim * 8])
        h0 = tf.nn.relu(self.g_bn0(h0, train=False))

        h1 = deconv2d(h0, [self.batch_size, s_h8, s_w8, self.gf_dim*4], name='g_h1')
        h1 = tf.nn.relu(self.g_bn1(h1, train=False))

        h2 = deconv2d(h1, [self.batch_size, s_h4, s_w4, self.gf_dim*2], name='g_h2')
        h2 = tf.nn.relu(self.g_bn2(h2, train=False))

        h3 = deconv2d(h2, [self.batch_size, s_h2, s_w2, self.gf_dim*1], name='g_h3')
        h3 = tf.nn.relu(self.g_bn3(h3, train=False))

        h4 = deconv2d(h3, [self.batch_size, s_h, s_w, self.c_dim], name='g_h4')

        return tf.nn.tanh(h4)
      else:
        s_h, s_w = self.output_height, self.output_width
        s_h2, s_h4 = int(s_h/2), int(s_h/4)
        s_w2, s_w4 = int(s_w/2), int(s_w/4)

        # yb = tf.reshape(y, [-1, 1, 1, self.y_dim])
        yb = tf.reshape(y, [self.batch_size, 1, 1, self.y_dim])
        z = concat([z, y], 1)

        h0 = tf.nn.relu(self.g_bn0(linear(z, self.gfc_dim, 'g_h0_lin'), train=False))
        h0 = concat([h0, y], 1)

        h1 = tf.nn.relu(self.g_bn1(
            linear(h0, self.gf_dim*2*s_h4*s_w4, 'g_h1_lin'), train=False))
        h1 = tf.reshape(h1, [self.batch_size, s_h4, s_w4, self.gf_dim * 2])
        h1 = conv_cond_concat(h1, yb)

        h2 = tf.nn.relu(self.g_bn2(
            deconv2d(h1, [self.batch_size, s_h2, s_w2, self.gf_dim * 2], name='g_h2'), train=False))
        h2 = conv_cond_concat(h2, yb)

        return tf.nn.sigmoid(deconv2d(h2, [self.batch_size, s_h, s_w, self.c_dim], name='g_h3'))

  def load_mnist(self):
    data_dir = os.path.join("./data", self.dataset_name)

    fd = open(os.path.join(data_dir,'train-images-idx3-ubyte'))
    loaded = np.fromfile(file=fd,dtype=np.uint8)
    trX = loaded[16:].reshape((60000,28,28,1)).astype(np.float)

    fd = open(os.path.join(data_dir,'train-labels-idx1-ubyte'))
    loaded = np.fromfile(file=fd,dtype=np.uint8)
    trY = loaded[8:].reshape((60000)).astype(np.float)

    fd = open(os.path.join(data_dir,'t10k-images-idx3-ubyte'))
    loaded = np.fromfile(file=fd,dtype=np.uint8)
    teX = loaded[16:].reshape((10000,28,28,1)).astype(np.float)

    fd = open(os.path.join(data_dir,'t10k-labels-idx1-ubyte'))
    loaded = np.fromfile(file=fd,dtype=np.uint8)
    teY = loaded[8:].reshape((10000)).astype(np.float)

    trY = np.asarray(trY)
    teY = np.asarray(teY)

    X = np.concatenate((trX, teX), axis=0)
    y = np.concatenate((trY, teY), axis=0).astype(np.int)

    seed = 547
    np.random.seed(seed)
    np.random.shuffle(X)
    np.random.seed(seed)
    np.random.shuffle(y)

    y_vec = np.zeros((len(y), self.y_dim), dtype=np.float)
    for i, label in enumerate(y):
      y_vec[i,y[i]] = 1.0

    return X/255.,y_vec

  @property
  def model_dir(self):
    return "{}_{}_{}_{}".format(
        self.dataset_name, self.batch_size,
        self.output_height, self.output_width)

  def save(self, checkpoint_dir, step):
    model_name = "DCGAN.model"
    checkpoint_dir = os.path.join(checkpoint_dir, self.model_dir)

    if not os.path.exists(checkpoint_dir):
      os.makedirs(checkpoint_dir)

    self.saver.save(self.sess,
            os.path.join(checkpoint_dir, model_name),
            global_step=step)

  def load(self, checkpoint_dir):
    import re
    print(" [*] Reading checkpoints...")
    checkpoint_dir = os.path.join(checkpoint_dir, self.model_dir)

    ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
    if ckpt and ckpt.model_checkpoint_path:
      ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
      self.saver.restore(self.sess, os.path.join(checkpoint_dir, ckpt_name))
      counter = int(next(re.finditer("(\d+)(?!.*\d)",ckpt_name)).group(0))
      print(" [*] Success to read {}".format(ckpt_name))
      return True, counter
    else:
      print(" [*] Failed to find a checkpoint")
      return False, 0

這個檔案就是DCGAN模型定義的函式。呼叫了utils.py檔案和ops.py檔案。

step0：定義conv_out_size_same(size, stride)函式。大小和步幅。

step1：然後是定義了DCGAN類，剩餘程式碼都是在寫DCGAN類，所以下面幾步都是在這個類裡面定義進行的。

step2：定義類的初始化函式 init。主要是對一些預設的引數進行初始化。包括session、crop、批處理大小batch_size、樣本數量sample_num、輸入與輸出的高和寬、各種維度、生成器與判別器的批處理、資料集名字、灰度值、構建模型函式，需要注意的是，要判斷資料集的名字是否是mnist，是的話則直接用load_mnist()函式載入資料，否則需要從本地data資料夾中讀取資料，並將影象讀取為灰度圖。

step3：定義構建模型函式build_model(self)。

首先判斷y_dim，然後用tf.placeholder佔位符定義並初始化y。
判斷crop是否為真，是的話是進行測試，影象維度是輸出影象的維度；否則是輸入影象的維度。
利用tf.placeholder定義inputs，是真實資料的向量。
定義並初始化生成器用到的噪音z，z_sum。
再次判斷y_dim，如果為真，用噪音z和標籤y初始化生成器G、用輸入inputs初始化判別器D和D_logits、樣本、用G和y初始化D_和D_logits；如果為假，跟上面一樣初始化各種變數，只不過都沒有標籤y。
將5中的D、D_、G分別放在d_sum、d__sum、G_sum。
定義sigmoid交叉熵損失函式sigmoid_cross_entropy_with_logits(x, y)。都是呼叫tf.nn.sigmoid_cross_entropy_with_logits函式，只不過一個是訓練，y是標籤，一個是測試，y是目標。
定義各種損失值。真實資料的判別損失值d_loss_real、虛假資料的判別損失值d_loss_fake、生成器損失值g_loss、判別器損失值d_loss。
定義訓練的所有變數t_vars。
定義生成和判別的引數集。
最後是儲存。

step4：定義訓練函式train(self, config)。

定義判別器優化器d_optim和生成器優化器g_optim。
變數初始化。
分別將關於生成器和判別器有關的變數各合併到一個變數中，並寫入事件檔案中。
噪音z初始化。
根據資料集是否為mnist的判斷，進行輸入資料和標籤的獲取。這裡使用到了utils.py檔案中的get_image函式。
定義計數器counter和起始時間start_time。
載入檢查點，並判斷載入是否成功。
開始for epoch in xrange(config.epoch)迴圈訓練。先判斷資料集是否是mnist，獲取批處理的大小。
開始for idx in xrange(0, batch_idxs)迴圈訓練，判斷資料集是否是mnist，來定義初始化批處理影象和標籤。
定義初始化噪音z。
判斷資料集是否是mnist，來更新判別器網路和生成器網路，這裡就不管mnist資料集是怎麼處理的，其他資料集是，執行生成器優化器兩次，以確保判別器損失值不會變為0，然後是判別器真實資料損失值和虛假資料損失值、生成器損失值。
輸出本次批處理中訓練引數的情況，首先是第幾個epoch，第幾個batch，訓練時間，判別器損失值，生成器損失值。
每100次batch訓練後，根據資料集是否是mnist的不同，獲取樣本、判別器損失值、生成器損失值，呼叫utils.py檔案的save_images函式，儲存訓練後的樣本，並以epoch、batch的次數命名檔案。然後列印判別器損失值和生成器損失值。
每500次batch訓練後，儲存一次檢查點。

step5：定義判別器函式discriminator(self, image, y=None, reuse=False)。

利用with tf.variable_scope(“discriminator”) as scope，在一個作用域 scope 內共享一些變數。
對scope利用reuse_variables()進行重利用。
如果為假，則直接設定5層，前4層為使用lrelu啟用函式的卷積層，最後一層是使用線性層，最後返回h4和sigmoid處理後的h4。
如果為真，則首先將Y_dim變為yb，然後利用ops.py檔案中的conv_cond_concat函式，連線image與yb得到x，然後設定4層網路，前3層是使用lrelu激勵函式的卷積層，最後一層是線性層，最後返回h3和sigmoid處理後的h3。

step6：定義生成器函式generator(self, z, y=None)。

利用with tf.variable_scope(“generator”) as scope，在一個作用域 scope 內共享一些變數。
根據y_dim是否為真，進行判別網路的設定。
如果為假：首先獲取輸出的寬和高，然後根據這一值得到更多不同大小的高和寬的對。然後獲取h0層的噪音z，權值w，偏置值b，然後利用relu激勵函式。h1層，首先對h0層解卷積得到本層的權值和偏置值，然後利用relu激勵函式。h2、h3等同於h1。h4層，解卷積h3，然後直接返回使用tanh激勵函式後的h4。
如果為真：首先也是獲取輸出的高和寬，根據這一值得到更多不同大小的高和寬的對。然後獲取yb和噪音z。h0層，使用relu激勵函式，並與1連線。h1層，對線性全連線後使用relu激勵函式，並與yb連線。h2層，對解卷積後使用relu激勵函式，並與yb連線。最後返回解卷積、sigmoid處理後的h2。

step7：定義sampler(self, z, y=None)函式。

利用tf.variable_scope(“generator”) as scope，在一個作用域 scope 內共享一些變數。
對scope利用reuse_variables()進行重利用。
根據y_dim是否為真，進行判別網路的設定。
然後就跟生成器差不多，不在贅述。

step8：定義load_mnist(self)函式。這個主要是針對mnist資料集設定的，所以暫且不考慮，過。

step9：定義model_dir(self)函式。返回資料集名字，batch大小，輸出的高和寬。

step10：定義save(self, checkpoint_dir, step)函式。儲存訓練好的模型。建立檢查點資料夾，如果路徑不存在，則建立；然後將其儲存在這個資料夾下。

step11：定義load(self, checkpoint_dir)函式。讀取檢查點，獲取路徑，重新儲存檢查點，並且計數。列印成功讀取的提示；如果沒有路徑，則列印失敗的提示。

以上，就是model.py所有內容，主要是定義了DCGAN的類，完成了生成判別網路的實現。

2.4 訓練

現在，整個4個檔案都已經分析完畢，開始執行。

step0：由於我們使用的動漫人臉資料集，所以我們需要在原始檔的路徑下，建一個data資料夾，然後將放有資料的資料夾放在這個data資料夾中，如下所示。

這裡寫圖片描述

step1：執行命令如下，需要制定各種引數，如我們的輸入資料的高寬，輸出的高寬，是哪個資料集，是否測試、訓練，執行幾個epoch。

如果你看到了此處，很好，接下來一系列的問題都是由於這裡的原因導致我的訓練不收斂，出來的結果亂七八糟！！這是因為，引數名稱寫錯了！！！應該是：

python main.py --input_height 96 --output_height 48 --dataset faces --crop True --train True --epoch 10

下面這個引數名稱是錯誤的！（嗯，後面我還是會再說一遍的）

python main.py --image_size 96 --output_size 48 --dataset faces --crop True --train True --epoch 10

這裡寫圖片描述

step2：中間結果

這是第0個epoch，前3個batch：

這裡寫圖片描述

新生成的檔案：

這裡寫圖片描述

step3：訓練和測試結果

如果你又看到這裡，可以忽略，直接去結果那看，因為這裡都是引數沒寫對，生成的不收斂的結果！

第一個epoch：

這裡寫圖片描述

第9個epoch：

這裡寫圖片描述

看得出來，效果並不咋地，與參考更是相差甚遠，這是因為訓練資料只有3000+，而且總共訓練了10個epoch。本來只是先試試，畢竟是純cpu在跑，還有2個G，哎。

step4：這次訓練資料選了16383張，epoch==300，跑了一晚上了，今天來看才到第5個epoch，嗯，慢慢等。

step5：重新在伺服器上訓練，這次選了參考部落格上提供的資料集，因為前兩次自己採集處理的資料集，或是因為資料集過小，訓練效果差強人意，所以直接拿這個5萬左右的資料集來試試。epoch==300。

step6：效果太差了。也不知道是哪裡的問題，先把結果截圖放上去，等有空再查查是什麼原因。（嚴重懷疑是我的資料集有問題，因為當時在本地跑時對資料操作過，可能出現了問題。後面有空再弄吧）

結果標題代表第幾個epoch第幾個batch。

這裡寫圖片描述

2.5 結果

好了，終於找到原因了，是因為引數名稱寫錯了，沒有將輸入資料的高寬與輸出資料的高寬由原先的108與64改為96與48，簡直是太蠢了！！（此處感謝評論裡某位小夥伴！要不是他說修改了引數我都沒注意到）

重新訓練：

python main.py --input_height 96 --output_height 48 --dataset faces --crop True --train True --epoch 10

只用了10個epoch，效果就已經有點可觀了，等伺服器有空跑個300試試。

換了epoch==300，先放幾張已有的效果，等跑完300再把全部結果放上來。

epoch 0

這裡寫圖片描述

epoch 5

這裡寫圖片描述

epoch 10

這裡寫圖片描述

epoch 20

這裡寫圖片描述

epoch 100

這裡寫圖片描述

epoch 200

這裡寫圖片描述

epoch 300

這裡寫圖片描述

DCGAN

1、DCGAN的簡單總結

2、DCGAN的實現

2.1 蒐集原始資料集

2.2 人臉檢測擷取人臉

2.3 原始碼解析

2.3.1 mian.py

2.3.2 utils.py

2.3.3 ops.py

2.3.4 model.py

2.4 訓練

2.5 結果

相關文章