1. 前言
近些年來,隨著以卷積神經網路(CNN)為代表的深度學習在影像識別領域的突破,越來越多的影像識別演算法不斷湧現。在去年,我們初步成功嘗試了影像識別在測試領域的應用:將網站樣式錯亂問題、無線領域機型適配問題轉換為“特定場景下的正常圖片和異常圖片的二分類問題”,並藉助Goolge開源的Inception V3網路進行遷移學習,重訓練出對應場景下的圖片分類模型,問題圖片的準確率達到95%以上。
過去一年,我們在圖片智慧識別做的主要工作包括:
- 模型的落地和引數調優
- 模型的服務化
- 模型服務的優化(包括資料庫連線池的引入、gunicorn容器的引入、docker化等)
本篇文章主要是對模型重訓練的原始碼進行學習和分析,加深對模型訓練過程的理解,以便後續在對模型訓練過程進行調整時有的放矢。
這邊對遷移學習做個簡單解釋:影像識別往往包含數以百萬計的引數,從頭訓練需要大量打好標籤的圖片,還需要大量的計算力(往往數百小時的GPU時間)。對此,遷移學習是一個捷徑,它可以在已經訓練好的相似工作模型基礎上,繼續訓練新的模型。
2. retrain.py原始碼分析
目前我們使用的影像智慧服務,對於遷移學習的程式碼,是參考的開原始碼 github: tensorflow/hub/image_retraining/retrain.py。
下面是對原始碼的學習和解讀:
2.1 執行主入口main:
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--image_dir',
type=str,
default='',
help='Path to folders of labeled images.'
)
parser.add_argument(
'--output_graph',
type=str,
default='/tmp/output_graph.pb',
help='Where to save the trained graph.'
)
......省略......
parser.add_argument(
'--logging_verbosity',
type=str,
default='INFO',
choices=['DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL'],
help='How much logging output should be produced.')
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
可以看到,程式main入口主要是對輸入引數的宣告和解析,實際執行時傳入的引數會存入到FLAGS變數中,然後執行tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
開始正式訓練。
2.2 main(_)方法
def main(_):
# Needed to make sure the logging output is visible.
# See https://github.com/tensorflow/tensorflow/issues/3047
## 設定log級別
logging_verbosity = logging_level_verbosity(FLAGS.logging_verbosity)
tf.logging.set_verbosity(logging_verbosity)
## 判斷image_dir引數是否傳入,該參數列示用於訓練的圖片集路徑
if not FLAGS.image_dir:
tf.logging.error('Must set flag --image_dir.')
return -1
# Prepare necessary directories that can be used during training
## 重建summaries_dir,並確保intermediate_output_graphs_dir存在
prepare_file_system()
# Look at the folder structure, and create lists of all the images.
## 根據輸入的圖片集路徑、測試圖片佔比、驗證圖片佔比來劃分輸入的圖集,將圖集劃分為訓練集、測試集、驗證集
image_lists = create_image_lists(FLAGS.image_dir, FLAGS.testing_percentage,
FLAGS.validation_percentage)
## 根據image_dir下的子目錄個數,判斷要分類的數量。每個子目錄為一個類別,每個類別會各自分為訓練集、測試集、驗證集。如果類別數為0或1,則返回錯誤,因為分類問題至少要有2個類。
class_count = len(image_lists.keys())
if class_count == 0:
tf.logging.error('No valid folders of images found at ' + FLAGS.image_dir)
return -1
if class_count == 1:
tf.logging.error('Only one valid folder of images found at ' +
FLAGS.image_dir +
' - multiple classes are needed for classification.')
return -1
# See if the command-line flags mean we're applying any distortions.
## 根據傳入的引數判斷是否要對圖片進行一些調整
do_distort_images = should_distort_images(
FLAGS.flip_left_right, FLAGS.random_crop, FLAGS.random_scale,
FLAGS.random_brightness)
# Set up the pre-trained graph.
## 載入module,預設使用inception v3,可以用引數--tfhub_module調整為使用其他已訓練的模型
module_spec = hub.load_module_spec(FLAGS.tfhub_module)
## 建立模型圖graph
graph, bottleneck_tensor, resized_image_tensor, wants_quantization = (
create_module_graph(module_spec))
# Add the new layer that we'll be training.
## 呼叫add_final_retrain_ops方法獲得訓練步驟、交叉熵、瓶頸輸入、真實的輸入、最終的tensor
with graph.as_default():
(train_step, cross_entropy, bottleneck_input,
ground_truth_input, final_tensor) = add_final_retrain_ops(
class_count, FLAGS.final_tensor_name, bottleneck_tensor,
wants_quantization, is_training=True)
with tf.Session(graph=graph) as sess:
# Initialize all weights: for the module to their pretrained values,
# and for the newly added retraining layer to random initial values.
## 初始化變數
init = tf.global_variables_initializer()
sess.run(init)
# Set up the image decoding sub-graph.
## 呼叫圖片解碼操作的函式獲得輸入的圖片tensor和解碼後的圖片tensor
jpeg_data_tensor, decoded_image_tensor = add_jpeg_decoding(module_spec)
if do_distort_images:
# We will be applying distortions, so set up the operations we'll need.
(distorted_jpeg_data_tensor,
distorted_image_tensor) = add_input_distortions(
FLAGS.flip_left_right, FLAGS.random_crop, FLAGS.random_scale,
FLAGS.random_brightness, module_spec)
else:
# We'll make sure we've calculated the 'bottleneck' image summaries and
# cached them on disk.
## 建立各個image的bottlenecks,並快取到磁碟disk
cache_bottlenecks(sess, image_lists, FLAGS.image_dir,
FLAGS.bottleneck_dir, jpeg_data_tensor,
decoded_image_tensor, resized_image_tensor,
bottleneck_tensor, FLAGS.tfhub_module)
# Create the operations we need to evaluate the accuracy of our new layer.
## 建立評估的operation
evaluation_step, _ = add_evaluation_step(final_tensor, ground_truth_input)
# Merge all the summaries and write them out to the summaries_dir
## 將summary merge並寫到summaries_dir目錄下
merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/train',
sess.graph)
validation_writer = tf.summary.FileWriter(
FLAGS.summaries_dir + '/validation')
# Create a train saver that is used to restore values into an eval graph
# when exporting models.
train_saver = tf.train.Saver()
# Run the training for as many cycles as requested on the command line.
## 根據傳入的迭代次數,開始訓練
for i in range(FLAGS.how_many_training_steps):
# Get a batch of input bottleneck values, either calculated fresh every
# time with distortions applied, or from the cache stored on disk.
if do_distort_images:
(train_bottlenecks,
train_ground_truth) = get_random_distorted_bottlenecks(
sess, image_lists, FLAGS.train_batch_size, 'training',
FLAGS.image_dir, distorted_jpeg_data_tensor,
distorted_image_tensor, resized_image_tensor, bottleneck_tensor)
else:
## 獲取用於training的圖片bottlenecks值,預設train_batch_size=100,即每次迭代會批量取100張圖片進行訓練
(train_bottlenecks,
train_ground_truth, _) = get_random_cached_bottlenecks(
sess, image_lists, FLAGS.train_batch_size, 'training',
FLAGS.bottleneck_dir, FLAGS.image_dir, jpeg_data_tensor,
decoded_image_tensor, resized_image_tensor, bottleneck_tensor,
FLAGS.tfhub_module)
# Feed the bottlenecks and ground truth into the graph, and run a training
# step. Capture training summaries for TensorBoard with the `merged` op.
## 執行merge操作,並用feed_dict的內容填充placeholder
train_summary, _ = sess.run(
[merged, train_step],
feed_dict={bottleneck_input: train_bottlenecks,
ground_truth_input: train_ground_truth})
train_writer.add_summary(train_summary, i)
# Every so often, print out how well the graph is training.
## 判斷是否最後一步訓練
is_last_step = (i + 1 == FLAGS.how_many_training_steps)
## 預設eval_step_interval=10,即每訓練10次或訓練全部完成,列印一下當前的訓練結果
if (i % FLAGS.eval_step_interval) == 0 or is_last_step:
## 列印訓練精確度和交叉熵
train_accuracy, cross_entropy_value = sess.run(
[evaluation_step, cross_entropy],
feed_dict={bottleneck_input: train_bottlenecks,
ground_truth_input: train_ground_truth})
tf.logging.info('%s: Step %d: Train accuracy = %.1f%%' %
(datetime.now(), i, train_accuracy * 100))
tf.logging.info('%s: Step %d: Cross entropy = %f' %
(datetime.now(), i, cross_entropy_value))
# TODO: Make this use an eval graph, to avoid quantization
# moving averages being updated by the validation set, though in
# practice this makes a negligable difference.
## 獲取驗證集的圖片的bottleneck值,也是每批次取100
validation_bottlenecks, validation_ground_truth, _ = (
get_random_cached_bottlenecks(
sess, image_lists, FLAGS.validation_batch_size, 'validation',
FLAGS.bottleneck_dir, FLAGS.image_dir, jpeg_data_tensor,
decoded_image_tensor, resized_image_tensor, bottleneck_tensor,
FLAGS.tfhub_module))
# Run a validation step and capture training summaries for TensorBoard
# with the `merged` op.
validation_summary, validation_accuracy = sess.run(
[merged, evaluation_step],
feed_dict={bottleneck_input: validation_bottlenecks,
ground_truth_input: validation_ground_truth})
validation_writer.add_summary(validation_summary, i)
## 列印驗證集的測試精確度和測試的圖片數
tf.logging.info('%s: Step %d: Validation accuracy = %.1f%% (N=%d)' %
(datetime.now(), i, validation_accuracy * 100,
len(validation_bottlenecks)))
# Store intermediate results
## 儲存瞬時結果
intermediate_frequency = FLAGS.intermediate_store_frequency
if (intermediate_frequency > 0 and (i % intermediate_frequency == 0)
and i > 0):
# If we want to do an intermediate save, save a checkpoint of the train
# graph, to restore into the eval graph.
train_saver.save(sess, CHECKPOINT_NAME)
intermediate_file_name = (FLAGS.intermediate_output_graphs_dir +
'intermediate_' + str(i) + '.pb')
tf.logging.info('Save intermediate result to : ' +
intermediate_file_name)
save_graph_to_file(intermediate_file_name, module_spec,
class_count)
# After training is complete, force one last save of the train checkpoint.
train_saver.save(sess, CHECKPOINT_NAME)
# We've completed all our training, so run a final test evaluation on
# some new images we haven't used before.
## 執行最終的評估
run_final_eval(sess, module_spec, class_count, image_lists,
jpeg_data_tensor, decoded_image_tensor, resized_image_tensor,
bottleneck_tensor)
# Write out the trained graph and labels with the weights stored as
# constants.
tf.logging.info('Save final result to : ' + FLAGS.output_graph)
if wants_quantization:
tf.logging.info('The model is instrumented for quantization with TF-Lite')
save_graph_to_file(FLAGS.output_graph, module_spec, class_count)
with tf.gfile.GFile(FLAGS.output_labels, 'w') as f:
f.write('\n'.join(image_lists.keys()) + '\n')
## 儲存訓練的graph
if FLAGS.saved_model_dir:
export_model(module_spec, class_count, FLAGS.saved_model_dir)
main方法中的一些細節解釋已經用中文備註在上述程式碼(使用“##”開頭)中,它的主要步驟是:
- 設定log級別
- 準備workspace
- 從image_dir載入輸入圖片集,並建立image_lists,該image_lists是一個欄位,key為各個類別,value為對應類別的圖集(包含訓練集、測試集、驗證集,劃分比例預設為0.8、0.1、0.1)
- 載入在ImageNet上已經訓練好的Inception V3網路的特徵張量
- 針對每個圖片,呼叫圖片解碼操作獲得圖片的原始張量和解碼後張量
- 針對每個圖片的jpeg_data_tensor和decoded_image_tensor,建立其對應的bottlenects(實際上是1*2048維的張量),並快取到磁碟
- 獲取訓練步驟、交叉熵
- 開始迭代訓練
- 每迭代10次,列印訓練的精度和交叉熵,列印驗證集的測試結果。預設情況下訓練集和測試集都是取100張圖
- 訓練完成後,使用測試集進行最後的評估
- 結果的列印和儲存
2.3 其它方法
分析完程式碼的主要執行路徑,下面解讀下其它方法。因為總的程式碼非常的長,篇幅有限,下面按照順序簡單介紹下其它方法的內容。
2.3.1 create_image_lists
def create_image_lists(image_dir, testing_percentage, validation_percentage):
...... 省略......
result[label_name] = {
'dir': dir_name,
'training': training_images,
'testing': testing_images,
'validation': validation_images,
}
return result
根據image_dir的地址,testing_percentage和testing_percentage的比例劃分圖集,返回的格式類似如下:
{
'correct': {
'dir': correct_image_dir,
'training': correct_training_images,
'testing': correct_testing_images,
'validation': correct_validation_images
},
'error': {
'dir': error_image_dir,
'training': error_training_images,
'testing': error_testing_images,
'validation': error_validation_images
}
}
每個training/testing/validation對應的value為image的file_name list。
2.3.2 get_image_path
獲取圖片的全路徑
2.3.3 get_bottleneck_path
獲得不同類別(training、testing、validation)的bottleneck路徑
2.3.4 create_module_graph
根據給定的已訓練好的模型Hub Module,建立模型的圖
2.3.5 run_bottleneck_on_image
def run_bottleneck_on_image(sess, image_data, image_data_tensor,
decoded_image_tensor, resized_input_tensor,
bottleneck_tensor):
"""Runs inference on an image to extract the 'bottleneck' summary layer.
Args:
sess: Current active TensorFlow Session.
image_data: String of raw JPEG data.
image_data_tensor: Input data layer in the graph.
decoded_image_tensor: Output of initial image resizing and preprocessing.
resized_input_tensor: The input node of the recognition graph.
bottleneck_tensor: Layer before the final softmax.
Returns:
Numpy array of bottleneck values.
"""
# First decode the JPEG image, resize it, and rescale the pixel values.
resized_input_values = sess.run(decoded_image_tensor,
{image_data_tensor: image_data})
# Then run it through the recognition network.
bottleneck_values = sess.run(bottleneck_tensor,
{resized_input_tensor: resized_input_values})
bottleneck_values = np.squeeze(bottleneck_values)
return bottleneck_values
根據給定的輸入圖片解碼後的tensor,計算bottleneck_values,並執行squeeze操作(刪除單維度條目,把shape中為1的維度去掉)
2.3.6 ensure_dir_exists
確保目錄存在:如果目錄不存在,則建立目錄
2.3.7 create_bottleneck_file
調run_bottleneck_on_image方法計算bottleneck值,並快取到磁碟檔案
2.3.8 get_or_create_bottleneck
批量獲取一組圖片的bottleneck值
2.3.9 cache_bottlenecks
批量快取bottleneck
2.3.10 get_random_cached_bottlenecks
隨機獲取一批快取的bottlenecks,以及其對應的真實標ground_truths和檔名filenames
2.3.11 add_final_retrain_ops
def add_final_retrain_ops(class_count, final_tensor_name, bottleneck_tensor,
quantize_layer, is_training):
batch_size, bottleneck_tensor_size = bottleneck_tensor.get_shape().as_list()
assert batch_size is None, 'We want to work with arbitrary batch size.'
with tf.name_scope('input'):
bottleneck_input = tf.placeholder_with_default(
bottleneck_tensor,
shape=[batch_size, bottleneck_tensor_size],
name='BottleneckInputPlaceholder')
ground_truth_input = tf.placeholder(
tf.int64, [batch_size], name='GroundTruthInput')
# Organizing the following ops so they are easier to see in TensorBoard.
layer_name = 'final_retrain_ops'
with tf.name_scope(layer_name):
with tf.name_scope('weights'):
initial_value = tf.truncated_normal(
[bottleneck_tensor_size, class_count], stddev=0.001)
layer_weights = tf.Variable(initial_value, name='final_weights')
variable_summaries(layer_weights)
with tf.name_scope('biases'):
layer_biases = tf.Variable(tf.zeros([class_count]), name='final_biases')
variable_summaries(layer_biases)
with tf.name_scope('Wx_plus_b'):
logits = tf.matmul(bottleneck_input, layer_weights) + layer_biases
tf.summary.histogram('pre_activations', logits)
final_tensor = tf.nn.softmax(logits, name=final_tensor_name)
# The tf.contrib.quantize functions rewrite the graph in place for
# quantization. The imported model graph has already been rewritten, so upon
# calling these rewrites, only the newly added final layer will be
# transformed.
if quantize_layer:
if is_training:
tf.contrib.quantize.create_training_graph()
else:
tf.contrib.quantize.create_eval_graph()
tf.summary.histogram('activations', final_tensor)
# If this is an eval graph, we don't need to add loss ops or an optimizer.
if not is_training:
return None, None, bottleneck_input, ground_truth_input, final_tensor
with tf.name_scope('cross_entropy'):
cross_entropy_mean = tf.losses.sparse_softmax_cross_entropy(
labels=ground_truth_input, logits=logits)
tf.summary.scalar('cross_entropy', cross_entropy_mean)
with tf.name_scope('train'):
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)
train_step = optimizer.minimize(cross_entropy_mean)
return (train_step, cross_entropy_mean, bottleneck_input, ground_truth_input,
final_tensor)
在結尾處新增一個新的softmax層和全連線層(y=WX+b),用於訓練和評估。此處與logistic模型是一樣的,採用梯度下降的方式來最小化交叉熵進行迭代訓練。
2.3.12 add_evaluation_step
def add_evaluation_step(result_tensor, ground_truth_tensor):
with tf.name_scope('accuracy'):
with tf.name_scope('correct_prediction'):
## 對每組向量按列找到最大值的index
prediction = tf.argmax(result_tensor, 1)
## 將每組張量比較預測的結果和實際的結果的一致性,一致則為True,否則為False
correct_prediction = tf.equal(prediction, ground_truth_tensor)
with tf.name_scope('accuracy'):
## 將True或False轉為float格式,並計算平均值
evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', evaluation_step)
return evaluation_step, prediction
註解見上述程式碼,返回最終的accuracy和預測的值list。
2.3.13 run_final_eval
執行最終的評估,使用測試集進行結果評估。如果傳入引數print_misclassified_test_images,則會列印評估出錯的圖片的名字和識別結果。
2.3.14 save_graph_to_file
將graph儲存到檔案
2.3.15 prepare_file_system
準備workspace
2.3.16 add_jpeg_decoding
將輸入圖片解析為張量,並進行解碼
2.3.17 export_model
輸出模型