DCGAN
轉自:https://blog.csdn.net/liuxiao214/article/details/74502975
首先是各種參考部落格、連結等,表示感謝。
1、參考部落格1:地址
——以下,開始正文。
2017/12/12 更新 解決訓練不收斂的問題。
更新在最後面部分。
1、DCGAN的簡單總結
穩定的深度卷積GAN 架構指南:
-
所有的pooling層使用步幅卷積(判別網路)和微步幅度卷積(生成網路)進行替換。
-
在生成網路和判別網路上使用批處理規範化。
-
對於更深的架構移除全連線隱藏層。
-
在生成網路的所有層上使用RelU啟用函式,除了輸出層使用Tanh啟用函式。
-
在判別網路的所有層上使用LeakyReLU啟用函式。
圖: LSUN 場景模型中使用的DCGAN生成網路。一個100維度的均勻分佈z對映到一個有很多特徵對映的小空間範圍卷積。一連串的四個微步幅卷積(在最近的一些論文中它們錯誤地稱為去卷積),將高層表徵轉換為64*64畫素的影象。明顯,沒有使用全連線層和池化層。
2、DCGAN的實現
DCGAN原文作者是生成了臥室圖片,這裡參照前面寫的參考連結中,來生成動漫人物頭像。生成效果如下:
暫且先不放,因為還沒開始做。
2.1 蒐集原始資料集
首先是需要獲取大量的動漫影象,這個可以利用爬蟲爬取一個動漫網站:konachan.net的圖片。爬蟲的程式碼如下所示:
import requests # http lib
from bs4 import BeautifulSoup # climb lib
import os # operation system
import traceback # trace deviance
def download(url,filename):
if os.path.exists(filename):
print('file exists!')
return
try:
r = requests.get(url,stream=True,timeout=60)
r.raise_for_status()
with open(filename,'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alove new chunks
f.write(chunk)
f.flush()
return filename
except KeyboardInterrupt:
if os.path.exists(filename):
os.remove(filename)
return KeyboardInterrupt
except Exception:
traceback.print_exc()
if os.path.exists(filename):
os.remove(filename)
if os.path.exists('imgs') is False:
os.makedirs('imgs')
start = 1
end = 8000
for i in range(start, end+1):
url = 'http://konachan.net/post?page=%d&tags=' % i
html = requests.get(url).text # gain the web's information
soup = BeautifulSoup(html,'html.parser') # doc's string and jie xi qi
for img in soup.find_all('img',class_="preview"):# 遍歷所有preview類,找到img標籤
target_url = 'http:' + img['src']
filename = os.path.join('imgs',target_url.split('/')[-1])
download(target_url,filename)
print('%d / %d' % (i,end))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
目標是獲取1萬張影象,因為自己是在CPU上跑的,而且記憶體太小,太多影象根本訓練不起來,就先少一點訓練,看看效果。
擷取部分影象如下所示:
現在已經有了基本的影象了,但我們的目標是生成動漫頭像,不需要整張影象,而且其他的資訊會干擾到訓練,所以需要進行人臉檢測擷取人臉影象。
2.2 人臉檢測擷取人臉
通過基於opencv的人臉檢測分類器,參考於lbpcascade_animeface。
首先,要使用這個分類器要先進行下載:
wget https://raw.githubusercontent.com/nagadomi/lbpcascade_animeface/master/lbpcascade_animeface.xml
- 1
下載完成後,執行以下程式碼對影象進行人臉擷取。
import cv2
import sys
import os.path
from glob import glob
def detect(filename,cascade_file="lbpcascade_animeface.xml"):
if not os.path.isfile(cascade_file):
raise RuntimeError("%s: not found" % cascade_file)
cascade = cv2.CascadeClassifier(cascade_file)
image = cv2.imread(filename)
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
gray = cv2.equalizeHist(gray)
faces = cascade.detectMultiScale(
gray,
# detector options
scaleFactor = 1.1,
minNeighbors = 5,
minSize = (48,48)
)
for i,(x,y,w,h) in enumerate(faces):
face = image[y: y+h, x:x+w, :]
face = cv2.resize(face,(96,96))
save_filename = '%s.jpg' % (os.path.basename(filename).split('.')[0])
cv2.imwrite("faces/"+sace_filename,face)
if __name__ == '__main__':
if os.path.exists('faces') is False:
os.makedirs('faces')
file_list = glob('imgs/*.jpg')
for filename in file_list:
detect(filename)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
處理後的影象如下所示:
2.3 原始碼解析
參照於DCGAN-tensorflow。
總共獲取11053張影象,人臉檢測後得到3533張。
一共有4個檔案,分別是main.py、model.py、ops.py、utils.py。
2.3.1 mian.py
原始碼(98行):
import os
import scipy.misc #
import numpy as np
from model import DCGAN
from utils import pp, visualize, to_json, show_all_variables
import tensorflow as tf
flags = tf.app.flags
flags.DEFINE_integer("epoch", 25, "Epoch to train [25]")
flags.DEFINE_float("learning_rate", 0.0002, "Learning rate of for adam [0.0002]")
flags.DEFINE_float("beta1", 0.5, "Momentum term of adam [0.5]")
flags.DEFINE_integer("train_size", np.inf, "The size of train images [np.inf]")
flags.DEFINE_integer("batch_size", 64, "The size of batch images [64]")
flags.DEFINE_integer("input_height", 108, "The size of image to use (will be center cropped). [108]")
flags.DEFINE_integer("input_width", None, "The size of image to use (will be center cropped). If None, same value as input_height [None]")
flags.DEFINE_integer("output_height", 64, "The size of the output images to produce [64]")
flags.DEFINE_integer("output_width", None, "The size of the output images to produce. If None, same value as output_height [None]")
flags.DEFINE_string("dataset", "celebA", "The name of dataset [celebA, mnist, lsun]")
flags.DEFINE_string("input_fname_pattern", "*.jpg", "Glob pattern of filename of input images [*]")
flags.DEFINE_string("checkpoint_dir", "checkpoint", "Directory name to save the checkpoints [checkpoint]")
flags.DEFINE_string("sample_dir", "samples", "Directory name to save the image samples [samples]")
flags.DEFINE_boolean("train", False, "True for training, False for testing [False]")
flags.DEFINE_boolean("crop", False, "True for training, False for testing [False]")
flags.DEFINE_boolean("visualize", False, "True for visualizing, False for nothing [False]")
FLAGS = flags.FLAGS
def main(_):
pp.pprint(flags.FLAGS.__flags)
if FLAGS.input_width is None:
FLAGS.input_width = FLAGS.input_height
if FLAGS.output_width is None:
FLAGS.output_width = FLAGS.output_height
if not os.path.exists(FLAGS.checkpoint_dir):
os.makedirs(FLAGS.checkpoint_dir)
if not os.path.exists(FLAGS.sample_dir):
os.makedirs(FLAGS.sample_dir)
#gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
run_config = tf.ConfigProto()
run_config.gpu_options.allow_growth=True
with tf.Session(config=run_config) as sess:
if FLAGS.dataset == 'mnist':
dcgan = DCGAN(
sess,
input_width=FLAGS.input_width,
input_height=FLAGS.input_height,
output_width=FLAGS.output_width,
output_height=FLAGS.output_height,
batch_size=FLAGS.batch_size,
sample_num=FLAGS.batch_size,
y_dim=10,
dataset_name=FLAGS.dataset,
input_fname_pattern=FLAGS.input_fname_pattern,
crop=FLAGS.crop,
checkpoint_dir=FLAGS.checkpoint_dir,
sample_dir=FLAGS.sample_dir)
else:
dcgan = DCGAN(
sess,
input_width=FLAGS.input_width,
input_height=FLAGS.input_height,
output_width=FLAGS.output_width,
output_height=FLAGS.output_height,
batch_size=FLAGS.batch_size,
sample_num=FLAGS.batch_size,
dataset_name=FLAGS.dataset,
input_fname_pattern=FLAGS.input_fname_pattern,
crop=FLAGS.crop,
checkpoint_dir=FLAGS.checkpoint_dir,
sample_dir=FLAGS.sample_dir)
show_all_variables()
if FLAGS.train:
dcgan.train(FLAGS)
else:
if not dcgan.load(FLAGS.checkpoint_dir)[0]:
raise Exception("[!] Train a model first, then run test mode")
# to_json("./web/js/layers.js", [dcgan.h0_w, dcgan.h0_b, dcgan.g_bn0],
# [dcgan.h1_w, dcgan.h1_b, dcgan.g_bn1],
# [dcgan.h2_w, dcgan.h2_b, dcgan.g_bn2],
# [dcgan.h3_w, dcgan.h3_b, dcgan.g_bn3],
# [dcgan.h4_w, dcgan.h4_b, None])
# Below is codes for visualization
OPTION = 1
visualize(sess, dcgan, FLAGS, OPTION)
if __name__ == '__main__':
tf.app.run()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
該檔案呼叫了model.py檔案和utils.py檔案。
step0:執行main函式之前首先進行flags的解析,TensorFlow底層使用了python-gflags專案,然後封裝成tf.app.flags介面,也就是說TensorFlow通過設定flags來傳遞tf.app.run()所需要的引數,我們可以直接在程式執行前初始化flags,也可以在執行程式的時候設定命令列引數來達到傳參的目的。
這裡主要設定了:
- epoch:迭代次數
- learning_rate:學習速率,預設是0.002
- beta1
- train_size
- batch_size:每次迭代的影象數量
- input_height:需要指定輸入影象的高
- input_width:需要指定輸入影象的寬
- output_height:需要指定輸出影象的高
- output_width:需要指定輸出影象的寬
- dataset:需要指定處理哪個資料集
- input_fname_pattern
- checkpoint_dir
- sample_dir
- train:True for training, False for testing
- crop:True for training, False for testing
- visualize
step1:首先是列印引數資料,然後判斷輸入影象的輸出影象的寬是否指定,如果沒有指定,則等於其影象的高。
step2:然後判斷checkpoint和sample的檔案是否存在,不存在則建立。
step3:然後是設定session引數。tf.ConfigProto一般用在建立session的時候,用來對session進行引數配置,詳細內容可見這篇部落格。
#tf.ConfigProto()的引數:
log_device_placement=True : 是否列印裝置分配日誌
allow_soft_placement=True : 如果你指定的裝置不存在,允許TF自動分配裝置
tf.ConfigProto(log_device_placement=True,allow_soft_placement=True)
控制GPU資源使用率:
#allow growth
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
# 使用allow_growth option,剛一開始分配少量的GPU容量,然後按需慢慢的增加,由於不會釋放記憶體,所以會導致碎片
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
step4:執行session,首先判斷處理的是哪個資料集,然後對應使用不同引數的DCGAN類,這個類會在model.py檔案中定義。
step5:show所有與訓練相關的變數。
step6:判斷是訓練還是測試,如果是訓練,則進行訓練;如果不是,判斷是否有訓練好的model,然後進行測試,如果沒有先訓練,則會提示“[!] Train a model first, then run test mode”。
step7:最後進行視覺化,visualize(sess, dcgan, FLAGS, OPTION)。
main.py主要是呼叫前面定義好的模型、影象處理方法,來進行訓練測試,程式的入口。
2.3.2 utils.py
原始碼(250行):
"""
Some codes from https://github.com/Newmu/dcgan_code
"""
from __future__ import division
import math
import json
import random
import pprint # print data_struct
import scipy.misc
import numpy as np
from time import gmtime, strftime
from six.moves import xrange
import tensorflow as tf
import tensorflow.contrib.slim as slim
pp = pprint.PrettyPrinter()
get_stddev = lambda x, k_h, k_w: 1/math.sqrt(k_w*k_h*x.get_shape()[-1])
def show_all_variables():
model_vars = tf.trainable_variables()
slim.model_analyzer.analyze_vars(model_vars, print_info=True)
def get_image(image_path, input_height, input_width,
resize_height=64, resize_width=64,
crop=True, grayscale=False):
image = imread(image_path, grayscale)
return transform(image, input_height, input_width,
resize_height, resize_width, crop)
def save_images(images, size, image_path):
return imsave(inverse_transform(images), size, image_path)
def imread(path, grayscale = False):
if (grayscale):
return scipy.misc.imread(path, flatten = True).astype(np.float)
else:
return scipy.misc.imread(path).astype(np.float)
def merge_images(images, size):
return inverse_transform(images)
def merge(images, size):
h, w = images.shape[1], images.shape[2]
if (images.shape[3] in (3,4)):
c = images.shape[3]
img = np.zeros((h * size[0], w * size[1], c))
for idx, image in enumerate(images):
i = idx % size[1]
j = idx // size[1]
img[j * h:j * h + h, i * w:i * w + w, :] = image
return img
elif images.shape[3]==1:
img = np.zeros((h * size[0], w * size[1]))
for idx, image in enumerate(images):
i = idx % size[1]
j = idx // size[1]
img[j * h:j * h + h, i * w:i * w + w] = image[:,:,0]
return img
else:
raise ValueError('in merge(images,size) images parameter '
'must have dimensions: HxW or HxWx3 or HxWx4')
def imsave(images, size, path):
image = np.squeeze(merge(images, size))
return scipy.misc.imsave(path, image)
def center_crop(x, crop_h, crop_w,
resize_h=64, resize_w=64):
if crop_w is None:
crop_w = crop_h
h, w = x.shape[:2]
j = int(round((h - crop_h)/2.))
i = int(round((w - crop_w)/2.))
return scipy.misc.imresize(
x[j:j+crop_h, i:i+crop_w], [resize_h, resize_w])
def transform(image, input_height, input_width,
resize_height=64, resize_width=64, crop=True):
if crop:
cropped_image = center_crop(
image, input_height, input_width,
resize_height, resize_width)
else:
cropped_image = scipy.misc.imresize(image, [resize_height, resize_width])
return np.array(cropped_image)/127.5 - 1.
def inverse_transform(images):
return (images+1.)/2.
def to_json(output_path, *layers):
with open(output_path, "w") as layer_f:
lines = ""
for w, b, bn in layers:
layer_idx = w.name.split('/')[0].split('h')[1]
B = b.eval()
if "lin/" in w.name:
W = w.eval()
depth = W.shape[1]
else:
W = np.rollaxis(w.eval(), 2, 0)
depth = W.shape[0]
biases = {"sy": 1, "sx": 1, "depth": depth, "w": ['%.2f' % elem for elem in list(B)]}
if bn != None:
gamma = bn.gamma.eval()
beta = bn.beta.eval()
gamma = {"sy": 1, "sx": 1, "depth": depth, "w": ['%.2f' % elem for elem in list(gamma)]}
beta = {"sy": 1, "sx": 1, "depth": depth, "w": ['%.2f' % elem for elem in list(beta)]}
else:
gamma = {"sy": 1, "sx": 1, "depth": 0, "w": []}
beta = {"sy": 1, "sx": 1, "depth": 0, "w": []}
if "lin/" in w.name:
fs = []
for w in W.T:
fs.append({"sy": 1, "sx": 1, "depth": W.shape[0], "w": ['%.2f' % elem for elem in list(w)]})
lines += """
var layer_%s = {
"layer_type": "fc",
"sy": 1, "sx": 1,
"out_sx": 1, "out_sy": 1,
"stride": 1, "pad": 0,
"out_depth": %s, "in_depth": %s,
"biases": %s,
"gamma": %s,
"beta": %s,
"filters": %s
};""" % (layer_idx.split('_')[0], W.shape[1], W.shape[0], biases, gamma, beta, fs)
else:
fs = []
for w_ in W:
fs.append({"sy": 5, "sx": 5, "depth": W.shape[3], "w": ['%.2f' % elem for elem in list(w_.flatten())]})
lines += """
var layer_%s = {
"layer_type": "deconv",
"sy": 5, "sx": 5,
"out_sx": %s, "out_sy": %s,
"stride": 2, "pad": 1,
"out_depth": %s, "in_depth": %s,
"biases": %s,
"gamma": %s,
"beta": %s,
"filters": %s
};""" % (layer_idx, 2**(int(layer_idx)+2), 2**(int(layer_idx)+2),
W.shape[0], W.shape[3], biases, gamma, beta, fs)
layer_f.write(" ".join(lines.replace("'","").split()))
def make_gif(images, fname, duration=2, true_image=False):
import moviepy.editor as mpy
def make_frame(t):
try:
x = images[int(len(images)/duration*t)]
except:
x = images[-1]
if true_image:
return x.astype(np.uint8)
else:
return ((x+1)/2*255).astype(np.uint8)
clip = mpy.VideoClip(make_frame, duration=duration)
clip.write_gif(fname, fps = len(images) / duration)
def visualize(sess, dcgan, config, option):
image_frame_dim = int(math.ceil(config.batch_size**.5))
if option == 0:
z_sample = np.random.uniform(-0.5, 0.5, size=(config.batch_size, dcgan.z_dim))
samples = sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample})
save_images(samples, [image_frame_dim, image_frame_dim], './samples/test_%s.png' % strftime("%Y%m%d%H%M%S", gmtime()))
elif option == 1:
values = np.arange(0, 1, 1./config.batch_size)
for idx in xrange(100):
print(" [*] %d" % idx)
z_sample = np.zeros([config.batch_size, dcgan.z_dim])
for kdx, z in enumerate(z_sample):
z[idx] = values[kdx]
if config.dataset == "mnist":
y = np.random.choice(10, config.batch_size)
y_one_hot = np.zeros((config.batch_size, 10))
y_one_hot[np.arange(config.batch_size), y] = 1
samples = sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample, dcgan.y: y_one_hot})
else:
samples = sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample})
save_images(samples, [image_frame_dim, image_frame_dim], './samples/test_arange_%s.png' % (idx))
elif option == 2:
values = np.arange(0, 1, 1./config.batch_size)
for idx in [random.randint(0, 99) for _ in xrange(100)]:
print(" [*] %d" % idx)
z = np.random.uniform(-0.2, 0.2, size=(dcgan.z_dim))
z_sample = np.tile(z, (config.batch_size, 1))
#z_sample = np.zeros([config.batch_size, dcgan.z_dim])
for kdx, z in enumerate(z_sample):
z[idx] = values[kdx]
if config.dataset == "mnist":
y = np.random.choice(10, config.batch_size)
y_one_hot = np.zeros((config.batch_size, 10))
y_one_hot[np.arange(config.batch_size), y] = 1
samples = sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample, dcgan.y: y_one_hot})
else:
samples = sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample})
try:
make_gif(samples, './samples/test_gif_%s.gif' % (idx))
except:
save_images(samples, [image_frame_dim, image_frame_dim], './samples/test_%s.png' % strftime("%Y%m%d%H%M%S", gmtime()))
elif option == 3:
values = np.arange(0, 1, 1./config.batch_size)
for idx in xrange(100):
print(" [*] %d" % idx)
z_sample = np.zeros([config.batch_size, dcgan.z_dim])
for kdx, z in enumerate(z_sample):
z[idx] = values[kdx]
samples = sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample})
make_gif(samples, './samples/test_gif_%s.gif' % (idx))
elif option == 4:
image_set = []
values = np.arange(0, 1, 1./config.batch_size)
for idx in xrange(100):
print(" [*] %d" % idx)
z_sample = np.zeros([config.batch_size, dcgan.z_dim])
for kdx, z in enumerate(z_sample): z[idx] = values[kdx]
image_set.append(sess.run(dcgan.sampler, feed_dict={dcgan.z: z_sample}))
make_gif(image_set[-1], './samples/test_gif_%s.gif' % (idx))
new_image_set = [merge(np.array([images[idx] for images in image_set]), [10, 10]) \
for idx in range(64) + range(63, -1, -1)]
make_gif(new_image_set, './samples/test_gif_merged.gif', duration=8)
def image_manifold_size(num_images):
manifold_h = int(np.floor(np.sqrt(num_images)))
manifold_w = int(np.ceil(np.sqrt(num_images)))
assert manifold_h * manifold_w == num_images
return manifold_h, manifold_w
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
這份程式碼主要是定義了各種對影象處理的函式,相當於其他3個檔案的標頭檔案。
step0:首先定義了一個pp = pprint.PrettyPrinter(),以方便列印資料結構資訊,詳細資訊可見這篇部落格。
step1:定義了get_stddev函式,是三個引數乘積後開平方的倒數,應該是為了隨機化用。
step2:定義show_all_variables()函式。首先,tf.trainable_variables返回的是需要訓練的變數列表;然後用tensorflow.contrib.slim中的model_analyzer.analyze_vars列印出所有與訓練相關的變數資訊。用法如下:
-程式碼:
import tensorflow as tf
import tensorflow.contrib.slim as slim
x1=tf.Variable(tf.constant(1,shape=[1],dtype=tf.float32),name='x11')
x2=tf.Variable(tf.constant(2,shape=[1],dtype=tf.float32),name='x22')
m=tf.train.ExponentialMovingAverage(0.99,5)
v=tf.trainable_variables()
for i in v:
print 233
print i
print 23333333
slim.model_analyzer.analyze_vars(v,print_info=True)
print 23333333
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
-結果截圖如下:
-
注:從step3-step11,都是在定義一些影象處理的函式,它們之間相互呼叫。
step3:定義get_image(image_path,input_height,input_width,resize_height=64, resize_width=64,crop=True, grayscale=False)函式。首先根據影象路徑引數讀取路徑,根據灰度化引數選擇是否進行灰度化。然後對影象參照輸入的引數進行裁剪。
step4:定義save_images(images,size,image_path)函式。呼叫imsave(inverse_transform(images), size, image_path)函式並返回新影象。
step5:定義imread(path, grayscale = False)函式。呼叫cipy.misc.imread()函式,判斷grayscale引數是否進行範圍灰度化,並進行型別轉換為np.float。
step6:定義merge_images(images, size)函式。呼叫inverse_transform(images)函式,並返回新影象。
step7:定義merge(images, size)函式。首先獲取image的高和寬。然後判斷image是RGB圖還是灰度圖,以分別進行不同的處理。如果通道數是3或4,則對每一批次(如,batch_size=64)的所有影象,用0初始化一張原始影象放大8*8的影象,然後迴圈,依次將所有影象填入大影象,並且返回這張大影象。如果通道數是1,也是一樣,只不過填入影象的時候只填一個通道的資訊。如果不是上述兩種情況,則丟擲錯誤提示。
step8:定義imsave(images, size, path)函式。首先將merge()函式返回的影象,用 np.squeeze()函式移除長度為1的軸。然後利用scipy.misc.imsave()函式將新影象儲存到指定路徑中。
step9:定義center_crop(x, crop_h, crop_w,resize_h=64, resize_w=64)函式。對影象的H和W與crop的H和W相減,得到取整的值,根據這個值作為下標依據來scipy.misc.resize影象。
step10:定義transform(image, input_height, input_width,resize_height=64, resize_width=64, crop=True)函式。對輸入的影象進行裁剪,如果crop為true,則使用center_crop()函式,對影象的H和W與crop的H和W相減,得到取整的值,根據這個值作為下標依據來scipy.misc.resize影象;否則不對影象進行其他操作,直接scipy.misc.resize為64*64大小的影象。最後返回影象。
step11:定義inverse_transform(images)函式。對影象進行翻轉後返回新影象。
總結下來,這幾個函式相互呼叫,主要實現了3個影象操作功能:獲取影象get_image(),負責讀取影象,返回影象裁剪後的新影象;儲存影象save_images(),負責將一個batch中所有影象儲存為一張大影象並返回;影象翻轉merge_images(),負責不知道怎麼得翻轉的,返回新影象。它們之間的相互關係如下圖所示。
step12:定義to_json(output_path, *layers)函式。應該是獲取每一層的權值、偏置值什麼的,但貌似程式碼中沒有用到這個函式,所以先不管,後面用到再說。
step13:定義make_gif(images, fname, duration=2, true_image=False)函式。利用moviepy.editor模組來製作動圖,為了視覺化用的。函式又定義了一個函式make_frame(t),首先根據影象集的長度和持續的時間做一個除法,然後返回每幀影象。最後視訊修剪並製作成GIF動畫。
step14:定義visualize(sess, dcgan, config, option)函式。分為0、1、2、3、4種option。如果option=0,則之間顯示生產的樣本‘如果option=1,根據不同資料集不一樣的處理,並利用前面的save_images()函式將sample儲存下來;等等。本次在main.py中選用option=1。
step15:定義image_manifold_size(num_images)函式。首先獲取影象數量的開平方後向下取整的h和向上取整的w,然後設定一個assert斷言,如果h*w與影象數量相等,則返回h和w,否則斷言錯誤提示。
這就是全部utils.py全部內容,主要負責影象的一些基本操作,獲取影象、儲存影象、影象翻轉,和利用moviepy模組視覺化訓練過程。
2.3.3 ops.py
原始碼(105行):
import math
import numpy as np
import tensorflow as tf
from tensorflow.python.framework import ops
from utils import *
try:
image_summary = tf.image_summary
scalar_summary = tf.scalar_summary
histogram_summary = tf.histogram_summary
merge_summary = tf.merge_summary
SummaryWriter = tf.train.SummaryWriter
except:
image_summary = tf.summary.image
scalar_summary = tf.summary.scalar
histogram_summary = tf.summary.histogram
merge_summary = tf.summary.merge
SummaryWriter = tf.summary.FileWriter
if "concat_v2" in dir(tf):
def concat(tensors, axis, *args, **kwargs):
return tf.concat_v2(tensors, axis, *args, **kwargs)
else:
def concat(tensors, axis, *args, **kwargs):
return tf.concat(tensors, axis, *args, **kwargs)
class batch_norm(object):
def __init__(self, epsilon=1e-5, momentum = 0.9, name="batch_norm"):
with tf.variable_scope(name):
self.epsilon = epsilon
self.momentum = momentum
self.name = name
def __call__(self, x, train=True):
return tf.contrib.layers.batch_norm(x,
decay=self.momentum,
updates_collections=None,
epsilon=self.epsilon,
scale=True,
is_training=train,
scope=self.name)
def conv_cond_concat(x, y):
"""Concatenate conditioning vector on feature map axis."""
x_shapes = x.get_shape()
y_shapes = y.get_shape()
return concat([
x, y*tf.ones([x_shapes[0], x_shapes[1], x_shapes[2], y_shapes[3]])], 3)
def conv2d(input_, output_dim,
k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,
name="conv2d"):
with tf.variable_scope(name):
w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],
initializer=tf.truncated_normal_initializer(stddev=stddev))
conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')
biases = tf.get_variable('biases', [output_dim], initializer=tf.constant_initializer(0.0))
conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
return conv
def deconv2d(input_, output_shape,
k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,
name="deconv2d", with_w=False):
with tf.variable_scope(name):
# filter : [height, width, output_channels, in_channels]
w = tf.get_variable('w', [k_h, k_w, output_shape[-1], input_.get_shape()[-1]],
initializer=tf.random_normal_initializer(stddev=stddev))
try:
deconv = tf.nn.conv2d_transpose(input_, w, output_shape=output_shape,
strides=[1, d_h, d_w, 1])
# Support for verisons of TensorFlow before 0.7.0
except AttributeError:
deconv = tf.nn.deconv2d(input_, w, output_shape=output_shape,
strides=[1, d_h, d_w, 1])
biases = tf.get_variable('biases', [output_shape[-1]], initializer=tf.constant_initializer(0.0))
deconv = tf.reshape(tf.nn.bias_add(deconv, biases), deconv.get_shape())
if with_w:
return deconv, w, biases
else:
return deconv
def lrelu(x, leak=0.2, name="lrelu"):
return tf.maximum(x, leak*x)
def linear(input_, output_size, scope=None, stddev=0.02, bias_start=0.0, with_w=False):
shape = input_.get_shape().as_list()
with tf.variable_scope(scope or "Linear"):
matrix = tf.get_variable("Matrix", [shape[1], output_size], tf.float32,
tf.random_normal_initializer(stddev=stddev))
bias = tf.get_variable("bias", [output_size],
initializer=tf.constant_initializer(bias_start))
if with_w:
return tf.matmul(input_, matrix) + bias, matrix, bias
else:
return tf.matmul(input_, matrix) + bias
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
該檔案呼叫了utils.py檔案。
step0:首先匯入tensorflow.python.framework模組,包含了tensorflow中圖、張量等的定義操作。
step1:然後是一個try…except部分,定義了一堆變數:image_summary 、scalar_summary、histogram_summary、merge_summary、SummaryWriter,都是從相應的tensorflow中獲取的。如果可是直接獲取,則獲取,否則從tf.summary中獲取。
step2:用來連線多個tensor。利用dir(tf)判斷”concat_v2”是否在裡面,如果在的話,定義一個concat(tensors, axis, *args, **kwargs)函式,並返回tf.concat_v2(tensors, axis, *args, **kwargs);否則也定義concat(tensors, axis, *args, **kwargs)函式,只不過返回的是tf.concat(tensors, axis, *args, **kwargs)。其中,tf.concat使用如下:
t1=tf.constant([[1,2,3],[4,5,6]])
t2=tf.constant([[7,8,9],[10,11,12]])
t3=tf.concat([t1,t2],0)
t4=tf.concat([t1,t2],1)
print t1
print t2
print t3
print t4
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
step3:定義一個batch_norm類,包含兩個函式init和call函式。首先在init(self, epsilon=1e-5, momentum = 0.9, name=”batch_norm”)函式中,定義一個name引數名字的變數,初始化self變數epsilon、momentum 、name。在call(self, x, train=True)函式中,利用tf.contrib.layers.batch_norm函式批處理規範化。
step4:定義conv_cond_concat(x,y)函式。連線x,y與Int32型的[x_shapes[0], x_shapes[1], x_shapes[2], y_shapes[3]]維度的張量乘積。
step5:定義conv2d(input_, output_dim, k_h=5, k_w=5, d_h=2,d_w=2, stddev=0.02,name=”conv2d”)函式。卷積函式:獲取隨機正態分佈權值、實現卷積、獲取初始偏置值,獲取新增偏置值後的卷積變數並返回。
step6:定義deconv2d(input_, output_shape,k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,name=”deconv2d”, with_w=False):函式。解卷積函式:獲取隨機正態分佈權值、解卷積,獲取初始偏置值,獲取新增偏置值後的卷積變數,判斷with_w是否為真,真則返回解卷積、權值、偏置值,否則返回解卷積。
step7:定義lrelu(x, leak=0.2, name=”lrelu”)函式。定義一個lrelu激勵函式。
step8:定義linear(input_, output_size, scope=None, stddev=0.02, bias_start=0.0, with_w=False)函式。進行線性運算,獲取一個隨機正態分佈矩陣,獲取初始偏置值,如果with_w為真,則返回xw+b,權值w和偏置值b;否則返回xw+b。
這個檔案主要定義了一些變數連線的函式、批處理規範化的函式、卷積函式、解卷積函式、激勵函式、線性運算函式。
2.3.4 model.py
原始碼(530行):
from __future__ import division
import os
import time
import math
from glob import glob # file path search
import tensorflow as tf
import numpy as np
from six.moves import xrange
from ops import *
from utils import *
def conv_out_size_same(size, stride):
return int(math.ceil(float(size) / float(stride)))
class DCGAN(object):
def __init__(self, sess, input_height=108, input_width=108, crop=True,
batch_size=64, sample_num = 64, output_height=64, output_width=64,
y_dim=None, z_dim=100, gf_dim=64, df_dim=64,
gfc_dim=1024, dfc_dim=1024, c_dim=3, dataset_name='default',
input_fname_pattern='*.jpg', checkpoint_dir=None, sample_dir=None):
"""
Args:
sess: TensorFlow session
batch_size: The size of batch. Should be specified before training.
y_dim: (optional) Dimension of dim for y. [None]
z_dim: (optional) Dimension of dim for Z. [100]
gf_dim: (optional) Dimension of gen filters in first conv layer. [64]
df_dim: (optional) Dimension of discrim filters in first conv layer. [64]
gfc_dim: (optional) Dimension of gen units for for fully connected layer. [1024]
dfc_dim: (optional) Dimension of discrim units for fully connected layer. [1024]
c_dim: (optional) Dimension of image color. For grayscale input, set to 1. [3]
"""
self.sess = sess
self.crop = crop
self.batch_size = batch_size
self.sample_num = sample_num
self.input_height = input_height
self.input_width = input_width
self.output_height = output_height
self.output_width = output_width
self.y_dim = y_dim
self.z_dim = z_dim
self.gf_dim = gf_dim
self.df_dim = df_dim
self.gfc_dim = gfc_dim
self.dfc_dim = dfc_dim
# batch normalization : deals with poor initialization helps gradient flow
self.d_bn1 = batch_norm(name='d_bn1')
self.d_bn2 = batch_norm(name='d_bn2')
if not self.y_dim:
self.d_bn3 = batch_norm(name='d_bn3')
self.g_bn0 = batch_norm(name='g_bn0')
self.g_bn1 = batch_norm(name='g_bn1')
self.g_bn2 = batch_norm(name='g_bn2')
if not self.y_dim:
self.g_bn3 = batch_norm(name='g_bn3')
self.dataset_name = dataset_name
self.input_fname_pattern = input_fname_pattern
self.checkpoint_dir = checkpoint_dir
if self.dataset_name == 'mnist':
self.data_X, self.data_y = self.load_mnist()
self.c_dim = self.data_X[0].shape[-1]
else:
self.data = glob(os.path.join("./data", self.dataset_name, self.input_fname_pattern))
imreadImg = imread(self.data[0]);
if len(imreadImg.shape) >= 3: #check if image is a non-grayscale image by checking channel number
self.c_dim = imread(self.data[0]).shape[-1]
else:
self.c_dim = 1
self.grayscale = (self.c_dim == 1)
self.build_model()
def build_model(self):
if self.y_dim:
self.y= tf.placeholder(tf.float32, [self.batch_size, self.y_dim], name='y')
if self.crop:
image_dims = [self.output_height, self.output_width, self.c_dim]
else:
image_dims = [self.input_height, self.input_width, self.c_dim]
self.inputs = tf.placeholder(
tf.float32, [self.batch_size] + image_dims, name='real_images')
inputs = self.inputs
self.z = tf.placeholder(
tf.float32, [None, self.z_dim], name='z')
self.z_sum = histogram_summary("z", self.z)
if self.y_dim:
self.G = self.generator(self.z, self.y)
self.D, self.D_logits = \
self.discriminator(inputs, self.y, reuse=False)
self.sampler = self.sampler(self.z, self.y)
self.D_, self.D_logits_ = \
self.discriminator(self.G, self.y, reuse=True)
else:
self.G = self.generator(self.z)
self.D, self.D_logits = self.discriminator(inputs)
self.sampler = self.sampler(self.z)
self.D_, self.D_logits_ = self.discriminator(self.G, reuse=True)
self.d_sum = histogram_summary("d", self.D)
self.d__sum = histogram_summary("d_", self.D_)
self.G_sum = image_summary("G", self.G)
def sigmoid_cross_entropy_with_logits(x, y):
try:
return tf.nn.sigmoid_cross_entropy_with_logits(logits=x, labels=y)
except:
return tf.nn.sigmoid_cross_entropy_with_logits(logits=x, targets=y)
self.d_loss_real = tf.reduce_mean(
sigmoid_cross_entropy_with_logits(self.D_logits, tf.ones_like(self.D)))
self.d_loss_fake = tf.reduce_mean(
sigmoid_cross_entropy_with_logits(self.D_logits_, tf.zeros_like(self.D_)))
self.g_loss = tf.reduce_mean(
sigmoid_cross_entropy_with_logits(self.D_logits_, tf.ones_like(self.D_)))
self.d_loss_real_sum = scalar_summary("d_loss_real", self.d_loss_real)
self.d_loss_fake_sum = scalar_summary("d_loss_fake", self.d_loss_fake)
self.d_loss = self.d_loss_real + self.d_loss_fake
self.g_loss_sum = scalar_summary("g_loss", self.g_loss)
self.d_loss_sum = scalar_summary("d_loss", self.d_loss)
t_vars = tf.trainable_variables()
self.d_vars = [var for var in t_vars if 'd_' in var.name]
self.g_vars = [var for var in t_vars if 'g_' in var.name]
self.saver = tf.train.Saver()
def train(self, config):
d_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1) \
.minimize(self.d_loss, var_list=self.d_vars)
g_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1) \
.minimize(self.g_loss, var_list=self.g_vars)
try:
tf.global_variables_initializer().run()
except:
tf.initialize_all_variables().run()
self.g_sum = merge_summary([self.z_sum, self.d__sum,
self.G_sum, self.d_loss_fake_sum, self.g_loss_sum])
self.d_sum = merge_summary(
[self.z_sum, self.d_sum, self.d_loss_real_sum, self.d_loss_sum])
self.writer = SummaryWriter("./logs", self.sess.graph)
sample_z = np.random.uniform(-1, 1, size=(self.sample_num , self.z_dim))
if config.dataset == 'mnist':
sample_inputs = self.data_X[0:self.sample_num]
sample_labels = self.data_y[0:self.sample_num]
else:
sample_files = self.data[0:self.sample_num]
sample = [
get_image(sample_file,
input_height=self.input_height,
input_width=self.input_width,
resize_height=self.output_height,
resize_width=self.output_width,
crop=self.crop,
grayscale=self.grayscale) for sample_file in sample_files]
if (self.grayscale):
sample_inputs = np.array(sample).astype(np.float32)[:, :, :, None]
else:
sample_inputs = np.array(sample).astype(np.float32)
counter = 1
start_time = time.time()
could_load, checkpoint_counter = self.load(self.checkpoint_dir)
if could_load:
counter = checkpoint_counter
print(" [*] Load SUCCESS")
else:
print(" [!] Load failed...")
for epoch in xrange(config.epoch):
if config.dataset == 'mnist':
batch_idxs = min(len(self.data_X), config.train_size) // config.batch_size
else:
self.data = glob(os.path.join(
"./data", config.dataset, self.input_fname_pattern))
batch_idxs = min(len(self.data), config.train_size) // config.batch_size
for idx in xrange(0, batch_idxs):
if config.dataset == 'mnist':
batch_images = self.data_X[idx*config.batch_size:(idx+1)*config.batch_size]
batch_labels = self.data_y[idx*config.batch_size:(idx+1)*config.batch_size]
else:
batch_files = self.data[idx*config.batch_size:(idx+1)*config.batch_size]
batch = [
get_image(batch_file,
input_height=self.input_height,
input_width=self.input_width,
resize_height=self.output_height,
resize_width=self.output_width,
crop=self.crop,
grayscale=self.grayscale) for batch_file in batch_files]
if self.grayscale:
batch_images = np.array(batch).astype(np.float32)[:, :, :, None]
else:
batch_images = np.array(batch).astype(np.float32)
batch_z = np.random.uniform(-1, 1, [config.batch_size, self.z_dim]) \
.astype(np.float32)
if config.dataset == 'mnist':
# Update D network
_, summary_str = self.sess.run([d_optim, self.d_sum],
feed_dict={
self.inputs: batch_images,
self.z: batch_z,
self.y:batch_labels,
})
self.writer.add_summary(summary_str, counter)
# Update G network
_, summary_str = self.sess.run([g_optim, self.g_sum],
feed_dict={
self.z: batch_z,
self.y:batch_labels,
})
self.writer.add_summary(summary_str, counter)
# Run g_optim twice to make sure that d_loss does not go to zero (different from paper)
_, summary_str = self.sess.run([g_optim, self.g_sum],
feed_dict={ self.z: batch_z, self.y:batch_labels })
self.writer.add_summary(summary_str, counter)
errD_fake = self.d_loss_fake.eval({
self.z: batch_z,
self.y:batch_labels
})
errD_real = self.d_loss_real.eval({
self.inputs: batch_images,
self.y:batch_labels
})
errG = self.g_loss.eval({
self.z: batch_z,
self.y: batch_labels
})
else:
# Update D network
_, summary_str = self.sess.run([d_optim, self.d_sum],
feed_dict={ self.inputs: batch_images, self.z: batch_z })
self.writer.add_summary(summary_str, counter)
# Update G network
_, summary_str = self.sess.run([g_optim, self.g_sum],
feed_dict={ self.z: batch_z })
self.writer.add_summary(summary_str, counter)
# Run g_optim twice to make sure that d_loss does not go to zero (different from paper)
_, summary_str = self.sess.run([g_optim, self.g_sum],
feed_dict={ self.z: batch_z })
self.writer.add_summary(summary_str, counter)
errD_fake = self.d_loss_fake.eval({ self.z: batch_z })
errD_real = self.d_loss_real.eval({ self.inputs: batch_images })
errG = self.g_loss.eval({self.z: batch_z})
counter += 1
print("Epoch: [%2d] [%4d/%4d] time: %4.4f, d_loss: %.8f, g_loss: %.8f" \
% (epoch, idx, batch_idxs,
time.time() - start_time, errD_fake+errD_real, errG))
if np.mod(counter, 100) == 1:
if config.dataset == 'mnist':
samples, d_loss, g_loss = self.sess.run(
[self.sampler, self.d_loss, self.g_loss],
feed_dict={
self.z: sample_z,
self.inputs: sample_inputs,
self.y:sample_labels,
}
)
save_images(samples, image_manifold_size(samples.shape[0]),
'./{}/train_{:02d}_{:04d}.png'.format(config.sample_dir, epoch, idx))
print("[Sample] d_loss: %.8f, g_loss: %.8f" % (d_loss, g_loss))
else:
try:
samples, d_loss, g_loss = self.sess.run(
[self.sampler, self.d_loss, self.g_loss],
feed_dict={
self.z: sample_z,
self.inputs: sample_inputs,
},
)
save_images(samples, image_manifold_size(samples.shape[0]),
'./{}/train_{:02d}_{:04d}.png'.format(config.sample_dir, epoch, idx))
print("[Sample] d_loss: %.8f, g_loss: %.8f" % (d_loss, g_loss))
except:
print("one pic error!...")
if np.mod(counter, 500) == 2:
self.save(config.checkpoint_dir, counter)
def discriminator(self, image, y=None, reuse=False):
with tf.variable_scope("discriminator") as scope:
if reuse:
scope.reuse_variables()
if not self.y_dim:
h0 = lrelu(conv2d(image, self.df_dim, name='d_h0_conv'))
h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim*2, name='d_h1_conv')))
h2 = lrelu(self.d_bn2(conv2d(h1, self.df_dim*4, name='d_h2_conv')))
h3 = lrelu(self.d_bn3(conv2d(h2, self.df_dim*8, name='d_h3_conv')))
h4 = linear(tf.reshape(h3, [self.batch_size, -1]), 1, 'd_h4_lin')
return tf.nn.sigmoid(h4), h4
else:
yb = tf.reshape(y, [self.batch_size, 1, 1, self.y_dim])
x = conv_cond_concat(image, yb)
h0 = lrelu(conv2d(x, self.c_dim + self.y_dim, name='d_h0_conv'))
h0 = conv_cond_concat(h0, yb)
h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim + self.y_dim, name='d_h1_conv')))
h1 = tf.reshape(h1, [self.batch_size, -1])
h1 = concat([h1, y], 1)
h2 = lrelu(self.d_bn2(linear(h1, self.dfc_dim, 'd_h2_lin')))
h2 = concat([h2, y], 1)
h3 = linear(h2, 1, 'd_h3_lin')
return tf.nn.sigmoid(h3), h3
def generator(self, z, y=None):
with tf.variable_scope("generator") as scope:
if not self.y_dim:
s_h, s_w = self.output_height, self.output_width
s_h2, s_w2 = conv_out_size_same(s_h, 2), conv_out_size_same(s_w, 2)
s_h4, s_w4 = conv_out_size_same(s_h2, 2), conv_out_size_same(s_w2, 2)
s_h8, s_w8 = conv_out_size_same(s_h4, 2), conv_out_size_same(s_w4, 2)
s_h16, s_w16 = conv_out_size_same(s_h8, 2), conv_out_size_same(s_w8, 2)
# project `z` and reshape
self.z_, self.h0_w, self.h0_b = linear(
z, self.gf_dim*8*s_h16*s_w16, 'g_h0_lin', with_w=True)
self.h0 = tf.reshape(
self.z_, [-1, s_h16, s_w16, self.gf_dim * 8])
h0 = tf.nn.relu(self.g_bn0(self.h0))
self.h1, self.h1_w, self.h1_b = deconv2d(
h0, [self.batch_size, s_h8, s_w8, self.gf_dim*4], name='g_h1', with_w=True)
h1 = tf.nn.relu(self.g_bn1(self.h1))
h2, self.h2_w, self.h2_b = deconv2d(
h1, [self.batch_size, s_h4, s_w4, self.gf_dim*2], name='g_h2', with_w=True)
h2 = tf.nn.relu(self.g_bn2(h2))
h3, self.h3_w, self.h3_b = deconv2d(
h2, [self.batch_size, s_h2, s_w2, self.gf_dim*1], name='g_h3', with_w=True)
h3 = tf.nn.relu(self.g_bn3(h3))
h4, self.h4_w, self.h4_b = deconv2d(
h3, [self.batch_size, s_h, s_w, self.c_dim], name='g_h4', with_w=True)
return tf.nn.tanh(h4)
else:
s_h, s_w = self.output_height, self.output_width
s_h2, s_h4 = int(s_h/2), int(s_h/4)
s_w2, s_w4 = int(s_w/2), int(s_w/4)
# yb = tf.expand_dims(tf.expand_dims(y, 1),2)
yb = tf.reshape(y, [self.batch_size, 1, 1, self.y_dim])
z = concat([z, y], 1)
h0 = tf.nn.relu(
self.g_bn0(linear(z, self.gfc_dim, 'g_h0_lin')))
h0 = concat([h0, y], 1)
h1 = tf.nn.relu(self.g_bn1(
linear(h0, self.gf_dim*2*s_h4*s_w4, 'g_h1_lin')))
h1 = tf.reshape(h1, [self.batch_size, s_h4, s_w4, self.gf_dim * 2])
h1 = conv_cond_concat(h1, yb)
h2 = tf.nn.relu(self.g_bn2(deconv2d(h1,
[self.batch_size, s_h2, s_w2, self.gf_dim * 2], name='g_h2')))
h2 = conv_cond_concat(h2, yb)
return tf.nn.sigmoid(
deconv2d(h2, [self.batch_size, s_h, s_w, self.c_dim], name='g_h3'))
def sampler(self, z, y=None):
with tf.variable_scope("generator") as scope:
scope.reuse_variables()
if not self.y_dim:
s_h, s_w = self.output_height, self.output_width
s_h2, s_w2 = conv_out_size_same(s_h, 2), conv_out_size_same(s_w, 2)
s_h4, s_w4 = conv_out_size_same(s_h2, 2), conv_out_size_same(s_w2, 2)
s_h8, s_w8 = conv_out_size_same(s_h4, 2), conv_out_size_same(s_w4, 2)
s_h16, s_w16 = conv_out_size_same(s_h8, 2), conv_out_size_same(s_w8, 2)
# project `z` and reshape
h0 = tf.reshape(
linear(z, self.gf_dim*8*s_h16*s_w16, 'g_h0_lin'),
[-1, s_h16, s_w16, self.gf_dim * 8])
h0 = tf.nn.relu(self.g_bn0(h0, train=False))
h1 = deconv2d(h0, [self.batch_size, s_h8, s_w8, self.gf_dim*4], name='g_h1')
h1 = tf.nn.relu(self.g_bn1(h1, train=False))
h2 = deconv2d(h1, [self.batch_size, s_h4, s_w4, self.gf_dim*2], name='g_h2')
h2 = tf.nn.relu(self.g_bn2(h2, train=False))
h3 = deconv2d(h2, [self.batch_size, s_h2, s_w2, self.gf_dim*1], name='g_h3')
h3 = tf.nn.relu(self.g_bn3(h3, train=False))
h4 = deconv2d(h3, [self.batch_size, s_h, s_w, self.c_dim], name='g_h4')
return tf.nn.tanh(h4)
else:
s_h, s_w = self.output_height, self.output_width
s_h2, s_h4 = int(s_h/2), int(s_h/4)
s_w2, s_w4 = int(s_w/2), int(s_w/4)
# yb = tf.reshape(y, [-1, 1, 1, self.y_dim])
yb = tf.reshape(y, [self.batch_size, 1, 1, self.y_dim])
z = concat([z, y], 1)
h0 = tf.nn.relu(self.g_bn0(linear(z, self.gfc_dim, 'g_h0_lin'), train=False))
h0 = concat([h0, y], 1)
h1 = tf.nn.relu(self.g_bn1(
linear(h0, self.gf_dim*2*s_h4*s_w4, 'g_h1_lin'), train=False))
h1 = tf.reshape(h1, [self.batch_size, s_h4, s_w4, self.gf_dim * 2])
h1 = conv_cond_concat(h1, yb)
h2 = tf.nn.relu(self.g_bn2(
deconv2d(h1, [self.batch_size, s_h2, s_w2, self.gf_dim * 2], name='g_h2'), train=False))
h2 = conv_cond_concat(h2, yb)
return tf.nn.sigmoid(deconv2d(h2, [self.batch_size, s_h, s_w, self.c_dim], name='g_h3'))
def load_mnist(self):
data_dir = os.path.join("./data", self.dataset_name)
fd = open(os.path.join(data_dir,'train-images-idx3-ubyte'))
loaded = np.fromfile(file=fd,dtype=np.uint8)
trX = loaded[16:].reshape((60000,28,28,1)).astype(np.float)
fd = open(os.path.join(data_dir,'train-labels-idx1-ubyte'))
loaded = np.fromfile(file=fd,dtype=np.uint8)
trY = loaded[8:].reshape((60000)).astype(np.float)
fd = open(os.path.join(data_dir,'t10k-images-idx3-ubyte'))
loaded = np.fromfile(file=fd,dtype=np.uint8)
teX = loaded[16:].reshape((10000,28,28,1)).astype(np.float)
fd = open(os.path.join(data_dir,'t10k-labels-idx1-ubyte'))
loaded = np.fromfile(file=fd,dtype=np.uint8)
teY = loaded[8:].reshape((10000)).astype(np.float)
trY = np.asarray(trY)
teY = np.asarray(teY)
X = np.concatenate((trX, teX), axis=0)
y = np.concatenate((trY, teY), axis=0).astype(np.int)
seed = 547
np.random.seed(seed)
np.random.shuffle(X)
np.random.seed(seed)
np.random.shuffle(y)
y_vec = np.zeros((len(y), self.y_dim), dtype=np.float)
for i, label in enumerate(y):
y_vec[i,y[i]] = 1.0
return X/255.,y_vec
@property
def model_dir(self):
return "{}_{}_{}_{}".format(
self.dataset_name, self.batch_size,
self.output_height, self.output_width)
def save(self, checkpoint_dir, step):
model_name = "DCGAN.model"
checkpoint_dir = os.path.join(checkpoint_dir, self.model_dir)
if not os.path.exists(checkpoint_dir):
os.makedirs(checkpoint_dir)
self.saver.save(self.sess,
os.path.join(checkpoint_dir, model_name),
global_step=step)
def load(self, checkpoint_dir):
import re
print(" [*] Reading checkpoints...")
checkpoint_dir = os.path.join(checkpoint_dir, self.model_dir)
ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
self.saver.restore(self.sess, os.path.join(checkpoint_dir, ckpt_name))
counter = int(next(re.finditer("(\d+)(?!.*\d)",ckpt_name)).group(0))
print(" [*] Success to read {}".format(ckpt_name))
return True, counter
else:
print(" [*] Failed to find a checkpoint")
return False, 0
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356
- 357
- 358
- 359
- 360
- 361
- 362
- 363
- 364
- 365
- 366
- 367
- 368
- 369
- 370
- 371
- 372
- 373
- 374
- 375
- 376
- 377
- 378
- 379
- 380
- 381
- 382
- 383
- 384
- 385
- 386
- 387
- 388
- 389
- 390
- 391
- 392
- 393
- 394
- 395
- 396
- 397
- 398
- 399
- 400
- 401
- 402
- 403
- 404
- 405
- 406
- 407
- 408
- 409
- 410
- 411
- 412
- 413
- 414
- 415
- 416
- 417
- 418
- 419
- 420
- 421
- 422
- 423
- 424
- 425
- 426
- 427
- 428
- 429
- 430
- 431
- 432
- 433
- 434
- 435
- 436
- 437
- 438
- 439
- 440
- 441
- 442
- 443
- 444
- 445
- 446
- 447
- 448
- 449
- 450
- 451
- 452
- 453
- 454
- 455
- 456
- 457
- 458
- 459
- 460
- 461
- 462
- 463
- 464
- 465
- 466
- 467
- 468
- 469
- 470
- 471
- 472
- 473
- 474
- 475
- 476
- 477
- 478
- 479
- 480
- 481
- 482
- 483
- 484
- 485
- 486
- 487
- 488
- 489
- 490
- 491
- 492
- 493
- 494
- 495
- 496
- 497
- 498
- 499
- 500
- 501
- 502
- 503
- 504
- 505
- 506
- 507
- 508
- 509
- 510
- 511
- 512
- 513
- 514
- 515
- 516
- 517
- 518
- 519
- 520
- 521
- 522
- 523
- 524
- 525
- 526
- 527
- 528
- 529
- 530
這個檔案就是DCGAN模型定義的函式。呼叫了utils.py檔案和ops.py檔案。
step0:定義conv_out_size_same(size, stride)函式。大小和步幅。
step1:然後是定義了DCGAN類,剩餘程式碼都是在寫DCGAN類,所以下面幾步都是在這個類裡面定義進行的。
step2:定義類的初始化函式 init。主要是對一些預設的引數進行初始化。包括session、crop、批處理大小batch_size、樣本數量sample_num、輸入與輸出的高和寬、各種維度、生成器與判別器的批處理、資料集名字、灰度值、構建模型函式,需要注意的是,要判斷資料集的名字是否是mnist,是的話則直接用load_mnist()函式載入資料,否則需要從本地data資料夾中讀取資料,並將影象讀取為灰度圖。
step3:定義構建模型函式build_model(self)。
- 首先判斷y_dim,然後用tf.placeholder佔位符定義並初始化y。
- 判斷crop是否為真,是的話是進行測試,影象維度是輸出影象的維度;否則是輸入影象的維度。
- 利用tf.placeholder定義inputs,是真實資料的向量。
- 定義並初始化生成器用到的噪音z,z_sum。
- 再次判斷y_dim,如果為真,用噪音z和標籤y初始化生成器G、用輸入inputs初始化判別器D和D_logits、樣本、用G和y初始化D_和D_logits;如果為假,跟上面一樣初始化各種變數,只不過都沒有標籤y。
- 將5中的D、D_、G分別放在d_sum、d__sum、G_sum。
- 定義sigmoid交叉熵損失函式sigmoid_cross_entropy_with_logits(x, y)。都是呼叫tf.nn.sigmoid_cross_entropy_with_logits函式,只不過一個是訓練,y是標籤,一個是測試,y是目標。
- 定義各種損失值。真實資料的判別損失值d_loss_real、虛假資料的判別損失值d_loss_fake、生成器損失值g_loss、判別器損失值d_loss。
- 定義訓練的所有變數t_vars。
- 定義生成和判別的引數集。
- 最後是儲存。
step4:定義訓練函式train(self, config)。
- 定義判別器優化器d_optim和生成器優化器g_optim。
- 變數初始化。
- 分別將關於生成器和判別器有關的變數各合併到一個變數中,並寫入事件檔案中。
- 噪音z初始化。
- 根據資料集是否為mnist的判斷,進行輸入資料和標籤的獲取。這裡使用到了utils.py檔案中的get_image函式。
- 定義計數器counter和起始時間start_time。
- 載入檢查點,並判斷載入是否成功。
- 開始for epoch in xrange(config.epoch)迴圈訓練。先判斷資料集是否是mnist,獲取批處理的大小。
- 開始for idx in xrange(0, batch_idxs)迴圈訓練,判斷資料集是否是mnist,來定義初始化批處理影象和標籤。
- 定義初始化噪音z。
- 判斷資料集是否是mnist,來更新判別器網路和生成器網路,這裡就不管mnist資料集是怎麼處理的,其他資料集是,執行生成器優化器兩次,以確保判別器損失值不會變為0,然後是判別器真實資料損失值和虛假資料損失值、生成器損失值。
- 輸出本次批處理中訓練引數的情況,首先是第幾個epoch,第幾個batch,訓練時間,判別器損失值,生成器損失值。
- 每100次batch訓練後,根據資料集是否是mnist的不同,獲取樣本、判別器損失值、生成器損失值,呼叫utils.py檔案的save_images函式,儲存訓練後的樣本,並以epoch、batch的次數命名檔案。然後列印判別器損失值和生成器損失值。
- 每500次batch訓練後,儲存一次檢查點。
step5:定義判別器函式discriminator(self, image, y=None, reuse=False)。
- 利用with tf.variable_scope(“discriminator”) as scope,在一個作用域 scope 內共享一些變數。
- 對scope利用reuse_variables()進行重利用。
- 如果為假,則直接設定5層,前4層為使用lrelu啟用函式的卷積層,最後一層是使用線性層,最後返回h4和sigmoid處理後的h4。
- 如果為真,則首先將Y_dim變為yb,然後利用ops.py檔案中的conv_cond_concat函式,連線image與yb得到x,然後設定4層網路,前3層是使用lrelu激勵函式的卷積層,最後一層是線性層,最後返回h3和sigmoid處理後的h3。
step6:定義生成器函式generator(self, z, y=None)。
- 利用with tf.variable_scope(“generator”) as scope,在一個作用域 scope 內共享一些變數。
- 根據y_dim是否為真,進行判別網路的設定。
- 如果為假:首先獲取輸出的寬和高,然後根據這一值得到更多不同大小的高和寬的對。然後獲取h0層的噪音z,權值w,偏置值b,然後利用relu激勵函式。h1層,首先對h0層解卷積得到本層的權值和偏置值,然後利用relu激勵函式。h2、h3等同於h1。h4層,解卷積h3,然後直接返回使用tanh激勵函式後的h4。
- 如果為真:首先也是獲取輸出的高和寬,根據這一值得到更多不同大小的高和寬的對。然後獲取yb和噪音z。h0層,使用relu激勵函式,並與1連線。h1層,對線性全連線後使用relu激勵函式,並與yb連線。h2層,對解卷積後使用relu激勵函式,並與yb連線。最後返回解卷積、sigmoid處理後的h2。
step7:定義sampler(self, z, y=None)函式。
- 利用tf.variable_scope(“generator”) as scope,在一個作用域 scope 內共享一些變數。
- 對scope利用reuse_variables()進行重利用。
- 根據y_dim是否為真,進行判別網路的設定。
- 然後就跟生成器差不多,不在贅述。
step8:定義load_mnist(self)函式。這個主要是針對mnist資料集設定的,所以暫且不考慮,過。
step9:定義model_dir(self)函式。返回資料集名字,batch大小,輸出的高和寬。
step10:定義save(self, checkpoint_dir, step)函式。儲存訓練好的模型。建立檢查點資料夾,如果路徑不存在,則建立;然後將其儲存在這個資料夾下。
step11:定義load(self, checkpoint_dir)函式。讀取檢查點,獲取路徑,重新儲存檢查點,並且計數。列印成功讀取的提示;如果沒有路徑,則列印失敗的提示。
以上,就是model.py所有內容,主要是定義了DCGAN的類,完成了生成判別網路的實現。
2.4 訓練
現在,整個4個檔案都已經分析完畢,開始執行。
step0:由於我們使用的動漫人臉資料集,所以我們需要在原始檔的路徑下,建一個data資料夾,然後將放有資料的資料夾放在這個data資料夾中,如下所示。
step1:執行命令如下,需要制定各種引數,如我們的輸入資料的高寬,輸出的高寬,是哪個資料集,是否測試、訓練,執行幾個epoch。
如果你看到了此處,很好,接下來一系列的問題都是由於這裡的原因導致我的訓練不收斂,出來的結果亂七八糟!!這是因為,引數名稱寫錯了!!!應該是:
python main.py --input_height 96 --output_height 48 --dataset faces --crop True --train True --epoch 10
- 1
下面這個引數名稱是錯誤的!(嗯,後面我還是會再說一遍的)
python main.py --image_size 96 --output_size 48 --dataset faces --crop True --train True --epoch 10
- 1
step2:中間結果
這是第0個epoch,前3個batch:
新生成的檔案:
step3:訓練和測試結果
如果你又看到這裡,可以忽略,直接去結果那看,因為這裡都是引數沒寫對,生成的不收斂的結果!
第一個epoch:
第9個epoch:
看得出來,效果並不咋地,與參考更是相差甚遠,這是因為訓練資料只有3000+,而且總共訓練了10個epoch。本來只是先試試,畢竟是純cpu在跑,還有2個G,哎。
step4:這次訓練資料選了16383張,epoch==300,跑了一晚上了,今天來看才到第5個epoch,嗯,慢慢等。
step5:重新在伺服器上訓練,這次選了參考部落格上提供的資料集,因為前兩次自己採集處理的資料集,或是因為資料集過小,訓練效果差強人意,所以直接拿這個5萬左右的資料集來試試。epoch==300。
step6:效果太差了。也不知道是哪裡的問題,先把結果截圖放上去,等有空再查查是什麼原因。(嚴重懷疑是我的資料集有問題,因為當時在本地跑時對資料操作過,可能出現了問題。後面有空再弄吧)
結果標題代表第幾個epoch第幾個batch。
2.5 結果
好了,終於找到原因了,是因為引數名稱寫錯了,沒有將輸入資料的高寬與輸出資料的高寬由原先的108與64改為96與48,簡直是太蠢了!!(此處感謝評論裡某位小夥伴!要不是他說修改了引數我都沒注意到)
重新訓練:
python main.py --input_height 96 --output_height 48 --dataset faces --crop True --train True --epoch 10
- 1
只用了10個epoch,效果就已經有點可觀了,等伺服器有空跑個300試試。
換了epoch==300,先放幾張已有的效果,等跑完300再把全部結果放上來。
epoch 0
epoch 5
epoch 10
epoch 20
epoch 100
epoch 200
epoch 300