bert訓練過程3
輸出引數
INFO:tensorflow:*** Features ***
INFO:tensorflow: name = input_ids, shape = (8, 128)
INFO:tensorflow: name = input_mask, shape = (8, 128)
INFO:tensorflow: name = masked_lm_ids, shape = (8, 20)
INFO:tensorflow: name = masked_lm_positions, shape = (8, 20)
INFO:tensorflow: name = masked_lm_weights, shape = (8, 20)
INFO:tensorflow: name = next_sentence_labels, shape = (8, 1)
INFO:tensorflow: name = segment_ids, shape = (8, 128)
INFO:tensorflow:**** Trainable Variables ****
INFO:tensorflow: name = bert/embeddings/word_embeddings:0, shape = (30522, 768)
INFO:tensorflow: name = bert/embeddings/token_type_embeddings:0, shape = (2, 768)
INFO:tensorflow: name = bert/embeddings/position_embeddings:0, shape = (512, 768)
INFO:tensorflow: name = bert/embeddings/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/embeddings/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_0/attention/self/query/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_0/attention/self/query/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_0/attention/self/key/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_0/attention/self/key/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_0/attention/self/value/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_0/attention/self/value/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_0/attention/output/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_0/attention/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_0/attention/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_0/attention/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_0/intermediate/dense/kernel:0, shape = (768, 3072)
INFO:tensorflow: name = bert/encoder/layer_0/intermediate/dense/bias:0, shape = (3072,)
INFO:tensorflow: name = bert/encoder/layer_0/output/dense/kernel:0, shape = (3072, 768)
INFO:tensorflow: name = bert/encoder/layer_0/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_0/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_0/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_1/attention/self/query/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_1/attention/self/query/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_1/attention/self/key/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_1/attention/self/key/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_1/attention/self/value/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_1/attention/self/value/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_1/attention/output/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_1/attention/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_1/attention/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_1/attention/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_1/intermediate/dense/kernel:0, shape = (768, 3072)
INFO:tensorflow: name = bert/encoder/layer_1/intermediate/dense/bias:0, shape = (3072,)
INFO:tensorflow: name = bert/encoder/layer_1/output/dense/kernel:0, shape = (3072, 768)
INFO:tensorflow: name = bert/encoder/layer_1/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_1/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_1/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_2/attention/self/query/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_2/attention/self/query/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_2/attention/self/key/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_2/attention/self/key/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_2/attention/self/value/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_2/attention/self/value/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_2/attention/output/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_2/attention/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_2/attention/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_2/attention/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_2/intermediate/dense/kernel:0, shape = (768, 3072)
INFO:tensorflow: name = bert/encoder/layer_2/intermediate/dense/bias:0, shape = (3072,)
INFO:tensorflow: name = bert/encoder/layer_2/output/dense/kernel:0, shape = (3072, 768)
INFO:tensorflow: name = bert/encoder/layer_2/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_2/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_2/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_3/attention/self/query/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_3/attention/self/query/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_3/attention/self/key/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_3/attention/self/key/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_3/attention/self/value/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_3/attention/self/value/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_3/attention/output/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_3/attention/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_3/attention/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_3/attention/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_3/intermediate/dense/kernel:0, shape = (768, 3072)
INFO:tensorflow: name = bert/encoder/layer_3/intermediate/dense/bias:0, shape = (3072,)
INFO:tensorflow: name = bert/encoder/layer_3/output/dense/kernel:0, shape = (3072, 768)
INFO:tensorflow: name = bert/encoder/layer_3/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_3/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_3/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_4/attention/self/query/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_4/attention/self/query/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_4/attention/self/key/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_4/attention/self/key/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_4/attention/self/value/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_4/attention/self/value/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_4/attention/output/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_4/attention/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_4/attention/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_4/attention/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_4/intermediate/dense/kernel:0, shape = (768, 3072)
INFO:tensorflow: name = bert/encoder/layer_4/intermediate/dense/bias:0, shape = (3072,)
INFO:tensorflow: name = bert/encoder/layer_4/output/dense/kernel:0, shape = (3072, 768)
INFO:tensorflow: name = bert/encoder/layer_4/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_4/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_4/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_5/attention/self/query/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_5/attention/self/query/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_5/attention/self/key/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_5/attention/self/key/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_5/attention/self/value/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_5/attention/self/value/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_5/attention/output/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_5/attention/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_5/attention/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_5/attention/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_5/intermediate/dense/kernel:0, shape = (768, 3072)
INFO:tensorflow: name = bert/encoder/layer_5/intermediate/dense/bias:0, shape = (3072,)
INFO:tensorflow: name = bert/encoder/layer_5/output/dense/kernel:0, shape = (3072, 768)
INFO:tensorflow: name = bert/encoder/layer_5/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_5/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_5/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_6/attention/self/query/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_6/attention/self/query/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_6/attention/self/key/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_6/attention/self/key/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_6/attention/self/value/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_6/attention/self/value/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_6/attention/output/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_6/attention/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_6/attention/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_6/attention/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_6/intermediate/dense/kernel:0, shape = (768, 3072)
INFO:tensorflow: name = bert/encoder/layer_6/intermediate/dense/bias:0, shape = (3072,)
INFO:tensorflow: name = bert/encoder/layer_6/output/dense/kernel:0, shape = (3072, 768)
INFO:tensorflow: name = bert/encoder/layer_6/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_6/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_6/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_7/attention/self/query/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_7/attention/self/query/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_7/attention/self/key/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_7/attention/self/key/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_7/attention/self/value/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_7/attention/self/value/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_7/attention/output/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_7/attention/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_7/attention/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_7/attention/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_7/intermediate/dense/kernel:0, shape = (768, 3072)
INFO:tensorflow: name = bert/encoder/layer_7/intermediate/dense/bias:0, shape = (3072,)
INFO:tensorflow: name = bert/encoder/layer_7/output/dense/kernel:0, shape = (3072, 768)
INFO:tensorflow: name = bert/encoder/layer_7/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_7/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_7/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_8/attention/self/query/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_8/attention/self/query/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_8/attention/self/key/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_8/attention/self/key/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_8/attention/self/value/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_8/attention/self/value/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_8/attention/output/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_8/attention/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_8/attention/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_8/attention/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_8/intermediate/dense/kernel:0, shape = (768, 3072)
INFO:tensorflow: name = bert/encoder/layer_8/intermediate/dense/bias:0, shape = (3072,)
INFO:tensorflow: name = bert/encoder/layer_8/output/dense/kernel:0, shape = (3072, 768)
INFO:tensorflow: name = bert/encoder/layer_8/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_8/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_8/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_9/attention/self/query/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_9/attention/self/query/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_9/attention/self/key/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_9/attention/self/key/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_9/attention/self/value/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_9/attention/self/value/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_9/attention/output/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_9/attention/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_9/attention/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_9/attention/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_9/intermediate/dense/kernel:0, shape = (768, 3072)
INFO:tensorflow: name = bert/encoder/layer_9/intermediate/dense/bias:0, shape = (3072,)
INFO:tensorflow: name = bert/encoder/layer_9/output/dense/kernel:0, shape = (3072, 768)
INFO:tensorflow: name = bert/encoder/layer_9/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_9/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_9/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_10/attention/self/query/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_10/attention/self/query/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_10/attention/self/key/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_10/attention/self/key/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_10/attention/self/value/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_10/attention/self/value/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_10/attention/output/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_10/attention/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_10/attention/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_10/attention/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_10/intermediate/dense/kernel:0, shape = (768, 3072)
INFO:tensorflow: name = bert/encoder/layer_10/intermediate/dense/bias:0, shape = (3072,)
INFO:tensorflow: name = bert/encoder/layer_10/output/dense/kernel:0, shape = (3072, 768)
INFO:tensorflow: name = bert/encoder/layer_10/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_10/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_10/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_11/attention/self/query/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_11/attention/self/query/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_11/attention/self/key/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_11/attention/self/key/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_11/attention/self/value/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_11/attention/self/value/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_11/attention/output/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/encoder/layer_11/attention/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_11/attention/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_11/attention/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_11/intermediate/dense/kernel:0, shape = (768, 3072)
INFO:tensorflow: name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,)
INFO:tensorflow: name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768)
INFO:tensorflow: name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = bert/pooler/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = bert/pooler/dense/bias:0, shape = (768,)
INFO:tensorflow: name = cls/predictions/transform/dense/kernel:0, shape = (768, 768)
INFO:tensorflow: name = cls/predictions/transform/dense/bias:0, shape = (768,)
INFO:tensorflow: name = cls/predictions/transform/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = cls/predictions/transform/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = cls/predictions/output_bias:0, shape = (30522,)
INFO:tensorflow: name = cls/seq_relationship/output_weights:0, shape = (2, 768)
INFO:tensorflow: name = cls/seq_relationship/output_bias:0, shape = (2,)
規範化資料集
Estimator要求模型的輸入為特定格式(from_tensor_slices),所以要對資料進行類封裝
"""Creates an `input_fn` closure to be passed to TPUEstimator."""
def input_fn(params):
"""The actual input function."""
batch_size = params["batch_size"] #32
#tf.FixedLenFeature 返回的是一個定長的tensor
name_to_features = {
"input_ids":
tf.FixedLenFeature([max_seq_length], tf.int64),
"input_mask":
tf.FixedLenFeature([max_seq_length], tf.int64),
"segment_ids":
tf.FixedLenFeature([max_seq_length], tf.int64),
"masked_lm_positions":
tf.FixedLenFeature([max_predictions_per_seq], tf.int64),
"masked_lm_ids":
tf.FixedLenFeature([max_predictions_per_seq], tf.int64),
"masked_lm_weights":
tf.FixedLenFeature([max_predictions_per_seq], tf.float32),
"next_sentence_labels":
tf.FixedLenFeature([1], tf.int64),
}
# For training, we want a lot of parallel reading and shuffling.
# For eval, we want no shuffling and parallel reading doesn't matter.
if is_training:
#它的作用是切分傳入Tensor的第一個維度,生成相應的dataset。
#dataset = tf.data.Dataset.from_tensor_slices(np.random.uniform(size=(5, 2)))
#傳入的數值是一個矩陣,它的形狀為(5, 2),tf.data.Dataset.from_tensor_slices就會切分它形狀上的第一個維度,最後生成的dataset中
#一個含有5個元素,每個元素的形狀是(2, ),即每個元素是矩陣的一行。
'''
對於更復雜的情形,比如元素是一個python中的元組或者字典:在影像識別中一個元素可以是{”image”:image_tensor,”label”:label_tensor}的形式。
dataset = tf.data.Dataset.from_tensor_slices ( { “a”:np.array([1.0,2.0,3.0,4.0,5.0]), “b”:np.random.uniform(size=(5,2) ) } )
這時,函式會分別切分”a”中的數值以及”b”中的數值,最後總dataset中的一個元素就是類似於{ “a”:1.0, “b”:[0.9,0.1] }的形式。tf.data.Dataset.from_tensor_slices真正作用是切分傳入Tensor的第一個維度,生成相應的dataset,即第一維表明資料集中資料的數量,之後切分batch等操作都以第一維為基礎。http://www.cnblogs.com/hellcat/p/8569651.html
repeat的功能就是將整個序列重複多次,主要用來處理機器學習中的epoch,假設原先的資料是一個epoch,使用repeat(2)就可以將之變成2個epoch:
'''
d = tf.data.Dataset.from_tensor_slices(tf.constant(input_files))
d = d.repeat()
d = d.shuffle(buffer_size=len(input_files))
# `cycle_length` is the number of parallel files that get read.
cycle_length = min(num_cpu_threads, len(input_files))
# `sloppy` mode means that the interleaving is not exact. This adds
# even more randomness to the training pipeline.
d = d.apply(
tf.contrib.data.parallel_interleave(
tf.data.TFRecordDataset,
sloppy=is_training,
cycle_length=cycle_length))
d = d.shuffle(buffer_size=100)
else:
d = tf.data.TFRecordDataset(input_files)
# Since we evaluate for a fixed number of steps we don't want to encounter
# out-of-range exceptions.
d = d.repeat()
# We must `drop_remainder` on training because the TPU requires fixed
# size dimensions. For eval, we assume we are evaluating on the CPU or GPU
# and we *don't* want to drop the remainder, otherwise we wont cover
# every sample.
d = d.apply(
tf.contrib.data.map_and_batch(
lambda record: _decode_record(record, name_to_features), #map_func:將tensor的巢狀結構對映到另一個tensor巢狀結構的函式。
batch_size=batch_size,
num_parallel_batches=num_cpu_threads,
drop_remainder=True))
return d
return input_fn
tf.contrib.data.map_and_batch(
map_func,
batch_size,
num_parallel_batches=None,
drop_remainder=False,
num_parallel_calls=None
)
定義於:tensorflow/contrib/data/python/ops/batching.py。
複合實現map和batch。
map_func橫跨dataset的batch_size個連續元素,然後將它們組合成一個batch。在功能上,它相當於map 後面跟著batch。但是,通過將兩個轉換融合在一起,實現可以更有效。在API中展示此轉換是暫時的。一旦自動輸入管道的優化實現了,map和batch的融合會自動發生,這個API將被棄用。
引數:
map_func:將tensor的巢狀結構對映到另一個tensor巢狀結構的函式。
batch_size:tf.int64,標量tf.Tensor,表示要在此資料集合並的單個batch中的連續元素數。
num_parallel_batches:(可選)tf.int64,標量tf.Tensor,表示要並行建立的batch數。一方面,較高的值可以幫助減輕落後者的影響。另一方面,如果CPU空閒,較高的值可能會增加競爭。
drop_remainder:(可選)tf.bool,標量tf.Tensor,表示是否應丟棄最後一個batch,以防其大小小於所需值; 預設行為是不刪除較小的batch。
num_parallel_calls:(可選)tf.int32,標量tf.Tensor,表示要並行處理的元素數。如果未指定,則將並行處理batch_size * num_parallel_batches個元素。
返回:
一個Dataset轉換函式,它可以傳遞給 tf.data.Dataset.apply。
def _decode_record(record, name_to_features):
"""Decodes a record to a TensorFlow example."""
example = tf.parse_single_example(record, name_to_features)
# tf.Example only supports tf.int64, but the TPU only supports tf.int32.
# So cast all int64 to int32.
for name in list(example.keys()):
t = example[name]
if t.dtype == tf.int64:
t = tf.to_int32(t)
example[name] = t
#print(example)
return example
#print(example):
{'masked_lm_weights': <tf.Tensor 'ParseSingleExample/ParseSingleExample:4' shape=(20,) dtype=float32>, 'segment_ids': <tf.Tensor 'ToInt32:0' shape=(128,) dtype=int32>, 'masked_lm_positions': <tf.Tensor 'ToInt32_1:0' shape=(20,) dtype=int32>, 'masked_lm_ids': <tf.Tensor 'ToInt32_2:0' shape=(20,) dtype=int32>, 'next_sentence_labels': <tf.Tensor 'ToInt32_3:0' shape=(1,) dtype=int32>, 'input_ids': <tf.Tensor 'ToInt32_4:0' shape=(128,) dtype=int32>, 'input_mask': <tf.Tensor 'ToInt32_5:0' shape=(128,) dtype=int32>}
『TensorFlow』資料讀取類_data.Dataset
#輸入BERT模型的最後一層encoder,輸出遮蔽詞預測任務的loss和概率矩陣。
def get_masked_lm_output(bert_config, input_tensor, output_weights, positions,
label_ids, label_weights):
#input_tensor=model.get_sequence_output()
#output_weights=model.get_embedding_table()
#positions=masked_lm_positions
#label_ids=masked_lm_ids
#label_weights=masked_lm_weights
# 這裡的input_tensor是模型中傳回的最後一層結果 [batch_size,seq_length,hidden_size]。
# #output_weights是詞向量表 [vocab_size,embedding_size]
"""Get loss and log probs for the masked LM.""" #獲取positions位置的所有encoder(即要預測的那些位置的encoder)
input_tensor = gather_indexes(input_tensor, positions) #[batch_size*max_pred_pre_seq,hidden_size]
#print("input_tensor",input_tensor) #shape=(640, 768) #遮蔽的20個位置
with tf.variable_scope("cls/predictions"):
# We apply one more non-linear transformation before the output layer.
# This matrix is not used after pre-training.
with tf.variable_scope("transform"):
input_tensor = tf.layers.dense( # #傳入一個全連線層 輸出shape [batch_size*max_pred_pre_seq,hidden_size]
input_tensor,
units=bert_config.hidden_size,
activation=modeling.get_activation(bert_config.hidden_act),
kernel_initializer=modeling.create_initializer(
bert_config.initializer_range))
input_tensor = modeling.layer_norm(input_tensor)
# The output weights are the same as the input embeddings, but there is
# an output-only bias for each token.
output_bias = tf.get_variable(
"output_bias",
shape=[bert_config.vocab_size],
initializer=tf.zeros_initializer())
#output_weights是embedding層 output_weights進行轉置
logits = tf.matmul(input_tensor, output_weights, transpose_b=True) ##[batch_size*max_pred_pre_seq,vocab_size]
logits = tf.nn.bias_add(logits, output_bias)
log_probs = tf.nn.log_softmax(logits, axis=-1)
label_ids = tf.reshape(label_ids, [-1])
label_weights = tf.reshape(label_weights, [-1])
one_hot_labels = tf.one_hot(
label_ids, depth=bert_config.vocab_size, dtype=tf.float32)
#print(one_hot_labels) #bert-master/run_pretraining.py:284
# The `positions` tensor might be zero-padded (if the sequence is too
# short to have the maximum number of predictions). The `label_weights`
# tensor has a value of 1.0 for every real prediction and 0.0 for the
# padding predictions.
per_example_loss = -tf.reduce_sum(log_probs * one_hot_labels, axis=[-1])
numerator = tf.reduce_sum(label_weights * per_example_loss)
denominator = tf.reduce_sum(label_weights) + 1e-5
loss = numerator / denominator
return (loss, per_example_loss, log_probs)
輸入是:model.get_sequence_output()–模型中傳回的最後一層結果 [batch_size,seq_length,hidden_size]=[32,128,768]
標籤:label_ids
output_weights是embedding層
同標籤的計算:per_example_loss = -tf.reduce_sum(log_probs * one_hot_labels, axis=[-1]) #此處類似與進行與標籤的計算
one_hot_labels------shape=(640, 30522)
在做mask時為什麼是80%的mask,10%的正確詞,10%的錯誤詞??????????????????
為什麼不能全部換成mask?10%的錯誤詞會由影響嗎?
谷歌終於開源BERT程式碼:3 億引數量,機器之心全面解讀
BERT 最核心的就是預訓練過程,這也是該論文的亮點所在。簡單而言,模型會從資料集抽取兩句話,其中 B 句有 50% 的概率是 A 句的下一句,然後將這兩句話轉化前面所示的輸入表徵。現在我們隨機遮掩(Mask 掉)輸入序列中 15% 的詞,並要求 Transformer 預測這些被遮掩的詞,以及 B 句是 A 句下一句的概率這兩個任務。
對於二分類任務,在抽取一個序列(A+B)中,B 有 50% 的概率是 A 的下一句。如果是的話就會生成標註「IsNext」,不是的話就會生成標註「NotNext」,這些標註可以作為二元分類任務判斷模型預測的憑證。
對於 Mask 預測任務,首先整個序列會隨機 Mask 掉 15% 的詞,這裡的 Mask 不只是簡單地用「[MASK]」符號代替某些詞,因為這會引起預訓練與微調兩階段不是太匹配。所以谷歌在確定需要 Mask 掉的詞後,80% 的情況下會直接替代為「[MASK]」,10% 的情況會替代為其它任意的詞,最後 10% 的情況會保留原詞。
原句:my dog is hairy
80%:my dog is [MASK]
10%:my dog is apple
10%:my dog is hairy
注意最後 10% 保留原句是為了將表徵偏向真實觀察值,而另外 10% 用其它詞替代原詞並不會影響模型對語言的理解能力,因為它只佔所有詞的 1.5%(0.1 × 0.15)。此外,作者在論文中還表示因為每次只能預測 15% 的詞,因此模型收斂比較慢。
下一句預測
輸入BERT模型CLS的encoder,輸出下一句預測任務的loss和概率矩陣,輸入為model.get_pooled_output()
標籤為:0代表是下一句,1代表是隨機語句
提出問題:transformer的輸入端encoder和輸出端decoder資料??
程式碼
def gather_indexes(sequence_tensor, positions):
"""Gathers the vectors at the specific positions over a minibatch."""
sequence_shape = modeling.get_shape_list(sequence_tensor, expected_rank=3)
batch_size = sequence_shape[0] #32
seq_length = sequence_shape[1] #128
width = sequence_shape[2] #768
#tf.range(start, limit, delta) # [3, 6, 9, 12, 15]
flat_offsets = tf.reshape(
tf.range(0, batch_size, dtype=tf.int32) * seq_length, [-1, 1]) #偏置b
#print(tf.Session().run(flat_offsets))
#print(seq_length) #128
#print('flat_offsets',flat_offsets) #flat_offsets Tensor("Reshape:0", shape=(32, 1), dtype=int32)
flat_positions = tf.reshape(positions + flat_offsets, [-1]) #在最後一列也就是第20列加上128
#print((positions + flat_offsets)) #Tensor("add_1:0", shape=(32, 20), dtype=int32)
#print(positions) #Tensor("IteratorGetNext:3", shape=(32, 20), dtype=int32)
#print('flat_positions',flat_positions) #flat_positions Tensor("Reshape_1:0", shape=(640,), dtype=int32)
flat_sequence_tensor = tf.reshape(sequence_tensor,
[batch_size * seq_length, width]) #[32*128,768]
#print(sequence_tensor) #shape=(32, 128, 768)
#print(width) #hidden 768
#print('flat_sequence_tensor',flat_sequence_tensor) #flat_sequence_tensor Tensor("Reshape_2:0", shape=(4096, 768), dtype=float32)
#tf.gather根據索引從引數軸上收集切片。
#索引必須是任何維度的整數張量(通常為 0-D 或 1-D)。生成輸出張量該張量的形狀為:params.shape[:axis] + indices.shape + params.shape[axis + 1:]
output_tensor = tf.gather(flat_sequence_tensor, flat_positions)
#print('output_tensor',output_tensor) #output_tensor Tensor("GatherV2:0", shape=(640, 768), dtype=float32)
#本質上是完成了將對應的遮蔽的20個位置的訓練後的向量取出(32*20,768)
return output_tensor
flat_offsets = tf.reshape(
tf.range(0, batch_size, dtype=tf.int32) * seq_length, [-1, 1])
[[ 0]
[ 128]
[ 256]
[ 384]
[ 512]
[ 640]
[ 768]
[ 896]
[1024]
[1152]
[1280]
[1408]
[1536]
[1664]
[1792]
[1920]
[2048]
[2176]
[2304]
[2432]
[2560]
[2688]
[2816]
[2944]
[3072]
[3200]
[3328]
[3456]
[3584]
[3712]
[3840]
[3968]]
position = masked_lm_positions--------20
shape = [32,20]
[0+[1,…20]]
[128+[1,…20]]
[1282+[1,…20]]
.
.
[12831,[1,…20]]
sequence_tensor=model.get_sequence_output() shape=[32128,768]
對照sequence_tensor可以看出第一行的20個被遮蔽的元素,對應[32128,768]中第一行,第二行對應[32*128,768]的第二行
import tensorflow as tf
temp = tf.range(0,10)*10 + tf.constant(1,shape=[10])
temp2 = tf.gather(temp,[1,5,9])
with tf.Session() as sess:
print sess.run(temp)
print sess.run(temp2)
[ 1 11 21 31 41 51 61 71 81 91]
[11 51 91]
get_masked_lm_output()函式的執行過程是,輸入的是transformer最後的輸出,在這個輸出中將對應的遮蔽的20個位置的向量取出一共是(3220,768)形成輸入intensor,然後將這個intensor和tranformer中的embeddin層相乘。
最後形成print(log_probs) #shape=(640, 30522)=(3220,30522),也就是說每一個字對應著一個30522的向量,也就是字典的大小。最後和label(640,)做比較,並計算loss值。
label_weights的作用?
INFO:tensorflow:masked_lm_ids: 1011 1011 2171 2003 6442 1010 6697 1998 2015 8835 1010 2909 25636 4308 1011 1997 2015 1011 13610 0
INFO:tensorflow:masked_lm_weights: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0
對於在20個遮蔽單詞中不夠20個的就補0,補0的位置同時label_weights對應為0,可以在計算時比較節省時間,並且可以緩解預測過程中預測數目,可能會對準確度的提高有一定的幫助,只是個人理解。
1)tokens:代表的是具體的詞彙
2)input_ids:將詞彙轉換成對應的字典中的序列號
3)input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0-----128的長度。不夠128的話補0,其他有詞彙的地方為1.
4)segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0。--------前部分的0是句子A部分,可以看成問題部分,中間的1是部分,可以看成是句子B部分或答案,最後的0是不夠128補充的0.masked_lm_positions:句子中遮蔽的詞彙的位置
5)masked_lm_ids:遮蔽的詞彙對應的字典序
6)masked_lm_weights: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0。遮蔽的詞彙不夠20的部分權重為0.
7)next_sentence_labels: 說明句子是否是一個正確的句子對。
相關文章
- BERT預訓練模型的演進過程!(附程式碼)模型
- PyTorch預訓練Bert模型PyTorch模型
- OpenPose訓練過程解析(2)
- PCIe鏈路層訓練過程
- 視覺化 Keras 訓練過程視覺化Keras
- Bert: 雙向預訓練+微調
- 【預訓練語言模型】 使用Transformers庫進行BERT預訓練模型ORM
- YOLOv3訓練過程中出現過擬合現象的解決方法YOLO
- NLP與深度學習(五)BERT預訓練模型深度學習模型
- 完勝 BERT,谷歌最佳 NLP 預訓練模型開源,單卡訓練僅需 4 天谷歌模型
- 利用Tensorboard視覺化模型、資料和訓練過程ORB視覺化模型
- Pytorch訓練時視訊記憶體分配過程探究PyTorch記憶體
- 使用Bert預訓練模型文字分類(內附原始碼)模型文字分類原始碼
- MYSQL儲存過程-練習3 repeat迴圈MySql儲存過程
- BERT新轉變:面向視覺基礎進行預訓練視覺
- pytorch中:使用bert預訓練模型進行中文語料任務,bert-base-chinese下載。PyTorch模型
- 在pytorch上基於tensorboard的訓練過程的視覺化PyTorchORB視覺化
- 【人臉識別7】haar+CART+Adaboost+Cascade訓練過程分析
- 【關係抽取-R-BERT】定義訓練和驗證迴圈
- 知識增廣的預訓練語言模型K-BERT:將知識圖譜作為訓練語料模型
- DL4J實戰之六:圖形化展示訓練過程
- 在深度學習訓練過程中如何設定資料增強?深度學習
- NLP生成任務超越BERT、GPT!微軟提出通用預訓練模型MASSGPT微軟模型
- 大模型高效微調-LoRA原理詳解和訓練過程深入分析大模型
- .NET 雲原生架構師訓練營(物件過程建模)--學習筆記架構物件筆記
- 雲端開爐,線上訓練,Bert-vits2-v2.2雲端線上訓練和推理實踐(基於GoogleColab)Go
- BERT:我訓練再久一點、資料量再大一點,就能重返SOTA
- 20241114 NOIP訓練賽 T3
- 2024.10.[2, 3]訓練記錄
- 6-3使用GPU訓練模型GPU模型
- 天池python訓練營D3Python
- Bert-vits2最終版Bert-vits2-2.3雲端訓練和推理(Colab免費GPU算力平臺)GPU
- 10-09訓練賽week3-C3
- 大力再出奇蹟,1024 張TPU,65536 batch size,僅76分鐘訓練完BERT!BAT
- 中文任務全面超越 BERT:百度正式釋出NLP預訓練模型ERNIE模型
- 最強NLP預訓練模型!谷歌BERT橫掃11項NLP任務記錄模型谷歌
- 跑步課程匯入能力,助力科學訓練
- 【學校訓練記錄】10月個人訓練賽3個人題解