前幾篇博文中介紹了Transformer,由於其優越的效能表現,在工業界使用的越來越廣泛,同時,配合遷移學習理論,越來越多的Transformer預訓練模型和原始碼庫逐漸開源,Huggingface就是其中做的最為出色的一家機構。Huggingface是一家在NLP社群做出傑出貢獻的紐約創業公司,其所提供的大量預訓練模型和程式碼等資源被廣泛的應用於學術研究當中。Huggingface所開源的Transformers提供了數以千計針對於各種任務的預訓練模型模型,開發者可以根據自身的需要,選擇模型進行訓練或微調,也可閱讀api文件和原始碼, 快速開發新模型。
本篇博文,我們對Huggingface所開源的Transformers進行介紹。在此之前,請透過下行命令安裝transformers庫:
pip install transformers
1 從AutoClass說起¶
transformers庫中提供了上百個演算法模型的實現,有BERT模型對應的BertModel類,有BART對應的BartModel類……,每當我們使用對應的預訓練模型時,都必須先找到對應類名,然後進行例項化,麻煩嗎?非常麻煩!
所以,transformers庫中提供統一的入口,也就是我們這裡說到的“AutoClass”系列的高階物件,透過在呼叫“AutoClass”的from_pretrained()方法時指定預訓練模型的名稱或預訓練模型所在目錄,即可快速、便捷得完成預訓練模型建立。有了“AutoClass”,只需要知道預訓練模型的名稱,或者將預訓練模型下載好,程式將根據預訓練模型配置檔案中model_type或者預訓練模型名稱、路徑進行模式匹配,自動決定例項化哪一個模型類,不再需要再到該模型在transfors庫中對應的類名。“AutoClass”所有類都不能夠透過init()方法進行例項化,只能透過from_pretrained()方法例項化指定的類。
如下所示,我們到Huggingface官網下載好一箇中文BERT預訓練模型,模型所有檔案存放在當前目錄下的“model/bert-base-chinese”路徑下。建立預訓練模型時,我們將這一路徑傳遞到from_pretrained()方法,即可完成模型建立,建立好的模型為BertModel類的例項。
from transformers import AutoModel
model = AutoModel.from_pretrained("./models/bert-base-chinese")
print(type(model))
Some weights of the model checkpoint at ./models/bert-base-chinese were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight'] - This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
<class 'transformers.models.bert.modeling_bert.BertModel'>
可以看到,。這一過程中,之所以會有提示資訊顯示,是因為有些權重引數並未使用上,這是正常的。
之所以說“AutoClass”是一個系列,是因為“AutoClass”不僅包括便捷建立預訓練模型的類物件AutoModel,還包括預訓練模型對應的分詞器類物件AutoTokenizer,預訓練模型配置管理類AutoTokenizer,以及其他各種特色功能的類物件,這裡不一一列舉,看一參考Huggingface官方文件對“AutoClass”的說明。
2 詞的向量表示——AutoTokenizer¶
幾乎所有的自然語言處理任務,都是從分詞和詞的向量表示開始的,Transformer演算法模型也不例外,所以,在Huggingface的transformers庫中提供了高階API物件——AutoTokenizer,用以載入預訓練的分詞器實現這一過程。
AutoTokenizer是Huggingface提供的“AutoClass”系列的高階物件,可以便捷的呼叫tokenizers庫(Huggingface提供的專門用於分詞等操作的程式碼庫)實現載入預訓練的分詞器。
透過在AutoTokenizer中定義的from_pretrained方法指定需要載入的分詞器名稱,即可從網路上自動載入分詞器,並例項化tokenizers庫中分詞器。tokenizers中定義的分詞器物件提供非常豐富的功能,例如定義詞庫、載入詞庫、截斷、填充、指定特殊標記等。
`這裡需要注意,大多數情況下,我們都是同時使用預定義的分詞器和預訓練模型,或者說是配套使用的,例如,我們使用的預訓練模型是“bert-base-chinese”,那麼,載入分詞器是,也必須使用“bert-base-chinese”對應的詞庫,否則,使用預訓練模型就效果將大大降低。`
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-chinese")
sentence = "床前明月光"
tokenizer(sentence)
{'input_ids': [101, 2414, 1184, 3209, 3299, 1045, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}
- 指定目錄載入分詞器
當然,有時候因為網路原因,也可以先手動從Huggingface官網下載模型,然後在from_pretrained方法中指定本地目錄方式進行載入。
tokenizer = AutoTokenizer.from_pretrained("./models/bert-base-chinese")
sentence = "床前明月光"
tokenizer(sentence)
{'input_ids': [101, 2414, 1184, 3209, 3299, 1045, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}
- 同時轉化多個句子
sentence = ["床前明月光", "床前明月光,疑是地上霜。"]
tokenizer(sentence)
{'input_ids': [[101, 2414, 1184, 3209, 3299, 1045, 102], [101, 2414, 1184, 3209, 3299, 1045, 8024, 4542, 3221, 1765, 677, 7458, 511, 102]], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}
- 其他引數功能
tokenizer內部還提供其他豐富的引數用於實現多種多樣功能:
tokenizer(
["床前明月光", "床前明月光,疑是地上霜。"],
padding=True, # 長度不足max_length時是否進行填充
truncation=True, # 長度超過max_length時是否進行截斷
max_length=10,
return_tensors="pt", # 指定返回資料型別,pt:pytorch的張量,tf:TensorFlow的張量
)
{'input_ids': tensor([[ 101, 2414, 1184, 3209, 3299, 1045, 102, 0, 0, 0], [ 101, 2414, 1184, 3209, 3299, 1045, 8024, 4542, 3221, 102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
3 配置——AutoConfig¶
每一類的演算法模型的框架結構都是不一樣的,所以超引數配置也不一樣,如果每次載入預訓練模型,都要使用者手動去找到對應的配置項、配置類,就是非不便捷了,所以,在“AutoClass”中也提供有專門的配置管理入口——AutoConfig。
一般來說,就算同一個演算法的預訓練模型,也可能有不同的網路結構,所以,我們下載的預訓練模型本身就提供有一個配置檔案,例如在Huggingface官網下載的預訓練模型,提供有一個config.json
檔案,AutoConfig將從裡面載入當前預訓練模型的特定配置項資訊進行覆蓋。
以BERT模型為例,我們下來看看預設的配置項:
from transformers import BertConfig
config = BertConfig()
config
BertConfig { "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.24.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
from transformers import AutoConfig
config = AutoConfig.from_pretrained("./models/bert-base-chinese")
config
BertConfig { "_name_or_path": "./models/bert-base-chinese", "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "directionality": "bidi", "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "lstm_dropout_prob": 0.5, "lstm_embedding_size": 768, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "position_embedding_type": "absolute", "transformers_version": "4.24.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 21128 }
可以看到,從預訓練模型載入出來的配置項與之前的預設配置項略有不同。而且,這個配置例項就是BertConfig類的例項,如下所示:
type(config)
transformers.models.bert.configuration_bert.BertConfig
透過config例項,我們可以對配置項進行修改,例如,上述配置中,編碼器結構為12層編碼器層,我們將其修改為5層,如下所示,經過修改後,最終建立的模型編碼器只包含5層結構,也只有前5層會載入預訓練結構,其他權重將會被捨棄。
config.num_hidden_layers=5
print(config)
BertConfig { "_name_or_path": "./models/bert-base-chinese", "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "directionality": "bidi", "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "lstm_dropout_prob": 0.5, "lstm_embedding_size": 768, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 5, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "position_embedding_type": "absolute", "transformers_version": "4.24.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 21128 }
修改之後的引數,如果後續需要再次使用,可以儲存到本地,傳入儲存路徑,將在指定目錄儲存為config.json
檔案:
config.save_pretrained("./models/bert-base-chinese")
4 建立預訓練模型——AutoModel¶
Huggingface官方提供了很多的預訓練模型,可以在Huggingface官網很容易找到。透過AutoModel類,建立預訓練模型最簡單的方法就是直接傳入預訓練模型名稱或者本地路徑,因為國內網路環境原因,建議先去將預訓練模型下載到本地,透過指定目錄的方式進行載入:
from transformers import AutoModel
model = AutoModel.from_pretrained("./models/bert-base-chinese")
Some weights of the model checkpoint at ./models/bert-base-chinese were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight'] - This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
透過這種方法,模型將直接載入預訓練模型config.json
的配置項。也可以在載入模型時,指定配置類例項,這樣就可以實現對預訓練模型的自定義,例如,傳入我們上一小節中修改後的config例項:
model = AutoModel.from_pretrained("./models/bert-base-chinese", config=config)
Some weights of the model checkpoint at ./models/bert-base-chinese were not used when initializing BertModel: ['cls.seq_relationship.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.self.key.weight', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.9.attention.output.dense.weight', 'cls.predictions.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.output.LayerNorm.weight', 'cls.predictions.decoder.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.8.attention.self.value.weight', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'cls.seq_relationship.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.5.output.LayerNorm.bias'] - This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
因為在上一章節,我們將編碼器結構層數改為5,所以,這裡提示很多權重引數並未使用。
同時,我們也可以透過在from_pretrained()方法中直接傳參的方式,傳入配置項,例如,我們將編碼器層數改為3層。注意,這種方式在指定了config引數時不在生效。
model = AutoModel.from_pretrained("./models/bert-base-chinese", num_hidden_layers=3)
Some weights of the model checkpoint at ./models/bert-base-chinese were not used when initializing BertModel: ['bert.encoder.layer.4.attention.self.value.bias', 'cls.seq_relationship.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.9.attention.output.dense.weight', 'cls.predictions.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.output.LayerNorm.weight', 'cls.predictions.decoder.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.8.attention.self.value.weight', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'cls.seq_relationship.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.self.value.weight'] - This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
我們嘗試一下將tokenizer編碼後的張量在model中進行前向傳播:
tens = model(**tokenizer("床前明月光,疑是地上霜。", return_tensors="pt"))
tens.last_hidden_state.shape
torch.Size([1, 14, 768])
當模型修改或者重新訓練後,可以透過model.save_pretrained()方法再次儲存,儲存後,在指定目錄中將生成兩個檔案:配置檔案(config.json),權重檔案(pytorch_model.bin)。
model.save_pretrained("./new_model/bert-base-chinese")
5 使用現成的任務模型¶
在transformers庫中,Huggingface還提供有許多完整網路模型用於各式各樣的AI任務,例如影像分類、文字分類、語音分類、翻譯、問答等,這類API大多以AutoModelFor*開頭,我們列印輸出看看:
import transformers
for api in dir(transformers):
if api.startswith('AutoModelFor'):
print(api)
AutoModelForAudioClassification AutoModelForAudioFrameClassification AutoModelForAudioXVector AutoModelForCTC AutoModelForCausalLM AutoModelForDepthEstimation AutoModelForDocumentQuestionAnswering AutoModelForImageClassification AutoModelForImageSegmentation AutoModelForInstanceSegmentation AutoModelForMaskedImageModeling AutoModelForMaskedLM AutoModelForMultipleChoice AutoModelForNextSentencePrediction AutoModelForObjectDetection AutoModelForPreTraining AutoModelForQuestionAnswering AutoModelForSemanticSegmentation AutoModelForSeq2SeqLM AutoModelForSequenceClassification AutoModelForSpeechSeq2Seq AutoModelForTableQuestionAnswering AutoModelForTokenClassification AutoModelForVideoClassification AutoModelForVision2Seq AutoModelForVisualQuestionAnswering AutoModelForZeroShotObjectDetection
以其中的AutoModelForSequenceClassification為例,介紹怎麼使用:
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("./models/bert-base-chinese", num_labels=2)
Some weights of the model checkpoint at ./models/bert-base-chinese were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight'] - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of BertForSequenceClassification were not initialized from the model checkpoint at ./models/bert-base-chinese and are newly initialized: ['classifier.weight', 'classifier.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
num_labels意思是,我們需要進行的任務最終標籤有兩類,即這是一個二分類模型。我們檢視一下模型結構:
model
BertForSequenceClassification( (bert): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(21128, 768, padding_idx=0) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (3): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (4): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (5): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (6): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (7): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (8): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (9): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (10): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (11): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) (dropout): Dropout(p=0.1, inplace=False) (classifier): Linear(in_features=768, out_features=2, bias=True) )
tokenizer("床前明月光,疑是地上霜。", return_tensors="pt")
{'input_ids': tensor([[ 101, 2414, 1184, 3209, 3299, 1045, 8024, 4542, 3221, 1765, 677, 7458, 511, 102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
tens = model(**tokenizer("床前明月光,疑是地上霜。", return_tensors="pt"))
tens
SequenceClassifierOutput(loss=None, logits=tensor([[-0.4371, -0.1223]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
最後輸出了兩個值,分貝對應於兩個類別。
6 自定義模型¶
from chb import *
import pandas as pd
from tqdm import tqdm
from collections import defaultdict
import torch
from torch import nn
from torch.utils.data.dataloader import DataLoader
from torch.utils.data.dataset import Dataset
from torch.optim import AdamW
from sklearn.model_selection import train_test_split
from transformers import AutoConfig,AutoModel,AutoTokenizer,get_linear_schedule_with_warmup,logging
import warnings
warnings.filterwarnings('ignore')
RANDOM_SEED = 1000
MAX_LEN = 50
BATCH_SIZE = 64
6.1 自定義資料集¶
x_lst = []
y_lst = []
with open('./data/中文文字-新聞分類資料集','r') as f:
# 獲得訓練資料的總行數
for _ in tqdm(f,desc='load dataset'):
try:
line = f.readline().replace('\u3000\u3000', '').replace('\n', '')
x, y = line.split('\t')
if y == 'label':
continue
x_lst.append(x)
y_lst.append(y)
except:
pass
load dataset: 5902it [00:00, 55378.83it/s]
len(x_lst), len(y_lst)
(5900, 5900)
x_lst[0], y_lst[0]
('昌平京基鷺府10月29日推別墅1200萬套起享97折新浪房產訊(編輯郭彪)京基鷺府(論壇相簿戶型樣板間點評地圖搜尋)售樓處位於昌平區京承高速北七家出口向西南公里路南。專案預計10月29日開盤,總價1200萬元/套起,2012年年底入住。待售戶型為聯排戶型面積為410-522平方米,獨棟戶型面積為938平方米,雙拼戶型面積為522平方米。京基鷺府專案位於昌平定泗路與東北路交界處。專案周邊配套齊全,幼兒園:伊頓雙語幼兒園、溫莎雙語幼兒園;中學:北師大亞太實驗學校、潞河中學(北京市重點);大學:王府語言學校、北京郵電大學、現代音樂學院;醫院:王府中西醫結合醫院(三級甲等)、潞河醫院、解放軍263醫院、安貞醫院昌平分院;購物:龍德廣場、中聯萬家商廈、世紀華聯超市、瑰寶購物中心、家樂福超市;酒店:拉斐特城堡、鮑魚島;休閒娛樂設施:九華山莊、溫都溫泉度假村、小湯山療養院、龍脈溫泉度假村、小湯山文化廣場、皇港高爾夫、高地高爾夫、北鴻高爾夫球場;銀行:工商銀行、建設銀行、中國銀行、北京農村商業銀行;郵局:中國郵政儲蓄;其它:北七家建材城、百安居建材超市、北七家鎮武裝部、北京宏翔鴻企業孵化基地等,享受便捷生活。京基鷺府坐守定泗路,立湯路交匯處。連線京昌、八達嶺、機場高速,南至5環,北上6環,緊鄰立湯路,一路向南,直抵鳥巢、水立方、長安街,距北京唯一不堵車的京承高速出口僅1公里,節約出行時間成本,形成了三橫、三縱的立體式交通網路專案周邊多為別墅專案,人口密度低,交通出行舒適度高。>>報名參加“樂動銀十”10月22日大型抄底看房團以上資訊僅供參考,最終以開發商公佈為準。訂閱會員置業刊我們將直接把最新的熱盤動向傳送到您的郵箱更多熱盤推薦:新銳白領淘低價1-2居網羅2萬內軌道精裝房手握20萬咋買板樓2居網羅城南沿軌優質盤關注娃娃教育網羅學區房不足百萬元住上通透2居', '房產')
label2id = dict()
id2label = dict()
for i, label in enumerate(set(y_lst)):
label2id[label] = i
id2label[i] = label
tokenizer = AutoTokenizer.from_pretrained("./models/bert-base-chinese")
先把所有的文字都轉化為編碼,而不是在後續資料集中轉化,這樣可以避免在後續訓練過程中,每一個epoch都要進行轉化,提升效率:
token_lens = []
for txt in tqdm(x_lst):
tokens = tokenizer.encode(txt, max_length=512)
token_lens.append(len(tokens))
0%| | 0/5900 [00:00<?, ?it/s]Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`. 100%|██████████| 5900/5900 [00:07<00:00, 739.64it/s]
class NewsDataset(Dataset):
def __init__(self,x_lst,y_lst,tokenizer,max_len):
self.x_lst=x_lst
self.y_lst=y_lst
self.tokenizer=tokenizer
self.max_len=max_len
def __len__(self):
return len(self.x_lst)
def __getitem__(self,index):
"""
index 為資料索引,迭代取第index條資料
"""
text=str(self.x_lst[index])
label=label2id[self.y_lst[index]]
encoding=self.tokenizer.encode_plus(
text,
add_special_tokens=True,
max_length=self.max_len,
return_token_type_ids=True,
pad_to_max_length=True,
return_attention_mask=True,
return_tensors='pt',
)
return {
'texts':text,
'input_ids':encoding['input_ids'].flatten(),
'attention_mask':encoding['attention_mask'].flatten(),
'labels':torch.tensor(label,dtype=torch.long)
}
x_train, x_val, y_train, y_val = train_test_split(x_lst, y_lst, test_size=0.15, random_state=RANDOM_SEED) # 劃分訓練集 測試集
# dataset
train_dataset = NewsDataset(x_train, y_train, tokenizer, MAX_LEN)
val_dataset = NewsDataset(x_val, y_val, tokenizer, MAX_LEN)
# dataloader
train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE)
val_dataloader = DataLoader(val_dataset, batch_size=BATCH_SIZE)
6.2 自定義網路¶
這裡我們使用BERT預訓練模型,同時接Dropout層和一層線形層,構成自定義網路:
class CustomBERTModel(nn.Module):
def __init__(self, n_classes):
super(CustomBERTModel, self).__init__()
self.bert = AutoModel.from_pretrained("./models/bert-base-chinese")
self.drop = nn.Dropout(p=0.3)
self.out = nn.Linear(self.bert.config.hidden_size, n_classes)
def forward(self, input_ids, attention_mask):
_, pooled_output = self.bert(
input_ids=input_ids,
attention_mask=attention_mask,
return_dict = False
)
output = self.drop(pooled_output) # dropout
return self.out(output)
device = set_device(cuda_index=1)
2022-12-20 16:12:39 set_device line 11 out: cuda:1
n_classes = len(label2id)
model = CustomBERTModel(n_classes)
model = model.to(device)
Some weights of the model checkpoint at ./models/bert-base-chinese were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight'] - This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
自定義資料集:
6.3 訓練¶
def train_epoch(model, data_loader,loss_fn,optimizer,device,scheduler, n_examples):
model.train()
losses = []
correct_predictions = 0
for i, d in bar(data_loader):
input_ids = d["input_ids"].to(device)
attention_mask = d["attention_mask"].to(device)
targets = d["labels"].to(device)
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask
)
_, preds = torch.max(outputs, dim=1)
loss = loss_fn(outputs, targets)
correct_predictions += torch.sum(preds == targets)
losses.append(loss.item())
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
scheduler.step()
optimizer.zero_grad()
return correct_predictions.double() / n_examples, np.mean(losses)
def eval_model(model, data_loader, loss_fn, device, n_examples):
model.eval() # 驗證預測模式
losses = []
correct_predictions = 0
with torch.no_grad():
for d in data_loader:
input_ids = d["input_ids"].to(device)
attention_mask = d["attention_mask"].to(device)
targets = d["labels"].to(device)
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask
)
_, preds = torch.max(outputs, dim=1)
loss = loss_fn(outputs, targets)
correct_predictions += torch.sum(preds == targets)
losses.append(loss.item())
return correct_predictions.double() / n_examples, np.mean(losses)
EPOCHS = 5 # 訓練輪數
optimizer = AdamW(model.parameters(), lr=2e-5)
total_steps = len(train_dataloader) * EPOCHS
scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps=0,
num_training_steps=total_steps
)
loss_fn = nn.CrossEntropyLoss().to(device)
best_accuracy = 0
is_best = False
t = Tableprint(['epoch', 'train_accuracy', 'train_loss', 'test_accuracy', 'test_loss', 'is_best'])
t.print_header()
for epoch in range(EPOCHS):
train_acc, train_loss = train_epoch(model,train_dataloader,loss_fn,optimizer,device,scheduler,len(x_train))
val_acc, val_loss = eval_model(
model,
val_dataloader,
loss_fn,
device,
len(x_val)
)
if val_acc > best_accuracy:
is_best = True
torch.save(model.state_dict(), './models/news_classification/best_model_state.bin')
best_accuracy = val_acc
else:
is_best = False
t.print_row(epoch, f"{train_acc:.4f}", f"{train_loss:.4f}", f"{val_acc:.4f}", f"{val_loss:.4f}", is_best)
+======+===========+====================+================+===================+===============+=============+ | | epoch | train_accuracy | train_loss | test_accuracy | test_loss | is_best | +======+===========+====================+================+===================+===============+=============+ | 1 | 0 | 0.6080 | 1.4608 | 0.8893 | 0.5278 | True | +------+-----------+--------------------+----------------+-------------------+---------------+-------------+ | 2 | 1 | 0.9196 | 0.3766 | 0.9096 | 0.3583 | True | +------+-----------+--------------------+----------------+-------------------+---------------+-------------+ | 3 | 2 | 0.9589 | 0.2015 | 0.9153 | 0.3413 | True | +------+-----------+--------------------+----------------+-------------------+---------------+-------------+ | 4 | 3 | 0.9765 | 0.1272 | 0.9153 | 0.3286 | False | +------+-----------+--------------------+----------------+-------------------+---------------+-------------+ | 5 | 4 | 0.9836 | 0.0919 | 0.9220 | 0.3239 | True | +------+-----------+--------------------+----------------+-------------------+---------------+-------------+
使用BERT預訓練模型+自定義網路,模型初始時就具有了較高的準確率。