解決載入GPT2（Tensorflow預訓練模型）的Linear權重到PyTorch的Linear權重形狀不匹配（互為轉置）問題

惋奈發表於2024-04-17

原文網址 : https://www.cnblogs.com/pplap/p/18141452

解決報錯內容：

RuntimeError: Error(s) in loading state_dict for PyTorchBasedGPT2:

size mismatch for transformer.h.0.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 768])......

一、錯誤原因分析

Pytorch中，Linear層的權重儲存形狀為[out_features, in_features]。而Tensorflow中Linear權重的儲存形狀為[in_features, out_features]。

這是由於兩個庫使用不同的數學運算表示 (參考https://www.null123.com/question/detail-2816063.html)：

Pytorch： y = Wx + B

Tensorflow: y = xW + B

當直接使用pytorch實現的GPT2架構模型去載入GPT2的預訓練引數時會發生：

1 PyTorchBasedGPT2.from_pretrained("openai-community/gpt2")

解決載入GPT2（Tensorflow預訓練模型）的Linear權重到PyTorch的Linear權重形狀不匹配（互為轉置）問題

 1 RuntimeError: Error(s) in loading state_dict for PyTorchBasedGPT2:
 2     size mismatch for transformer.h.0.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
 3     size mismatch for transformer.h.0.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
 4     size mismatch for transformer.h.0.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
 5     size mismatch for transformer.h.1.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
 6     size mismatch for transformer.h.1.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
 7     size mismatch for transformer.h.1.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
 8     size mismatch for transformer.h.2.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
 9     size mismatch for transformer.h.2.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
10     size mismatch for transformer.h.2.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
11     size mismatch for transformer.h.3.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
12     size mismatch for transformer.h.3.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
13     size mismatch for transformer.h.3.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
14     size mismatch for transformer.h.4.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
15     size mismatch for transformer.h.4.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
16     size mismatch for transformer.h.4.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
17     size mismatch for transformer.h.5.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
18     size mismatch for transformer.h.5.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
19     size mismatch for transformer.h.5.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
20     size mismatch for transformer.h.6.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
21     size mismatch for transformer.h.6.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
22     size mismatch for transformer.h.6.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
23     size mismatch for transformer.h.7.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
24     size mismatch for transformer.h.7.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
25     size mismatch for transformer.h.7.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
26     size mismatch for transformer.h.8.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
27     size mismatch for transformer.h.8.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
28     size mismatch for transformer.h.8.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
29     size mismatch for transformer.h.9.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
30     size mismatch for transformer.h.9.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
31     size mismatch for transformer.h.9.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
32     size mismatch for transformer.h.10.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
33     size mismatch for transformer.h.10.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
34     size mismatch for transformer.h.10.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
35     size mismatch for transformer.h.11.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
36     size mismatch for transformer.h.11.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
37     size mismatch for transformer.h.11.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
38     You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

View Error

二、解決方法

這時需要將原本的權重轉置後再使用Model.from_pretrained()載入模型。

1. 從Huggingface上拉模型，model_path為huggingface的repo名

1 model_path = "openai-community/gpt2"
2 model = transformers.AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, torch_dtype=torch.bfloat16)

2. 轉置原始權重中Linear的權重矩陣

　　如果不確定如何獲取矩陣可以先輸出模型檢視一下：

1 print(model)

獲取權重並轉置，在這裡需要轉置attn中的c_attn和c_proj，mlp中的c_fc和c_proj。（這幾層看起來是卷積，但是程式碼實現實際上就是Linear層）

1 for layer in model.transformer.h:
2      layer.attn.c_attn.weight = torch.nn.Parameter(layer.attn.c_attn.weight.transpose(0, 1).contiguous()) # .contiguous()負責返回一個資料相同但記憶體佈局連續的新張量
3      layer.attn.c_proj.weight = torch.nn.Parameter(layer.attn.c_proj.weight.transpose(0, 1).contiguous())
4      layer.mlp.c_fc.weight = torch.nn.Parameter(layer.mlp.c_fc.weight.transpose(0, 1).contiguous())
5      layer.mlp.c_proj.weight = torch.nn.Parameter(layer.mlp.c_proj.weight.transpose(0, 1).contiguous())

3. 最後儲存model到指定路徑

1 output_dir = "new_gpt2"
2 model.save_pretrained(output_dir)

這樣在pytorch實現的類GPT2模型載入引數時就可以順利從指定路徑載入了：

1 model = PyTorchBasedGPT2.from_pretrained("new_gpt2")
2 print(model)

得到模型：

MxNet預訓練模型到Pytorch模型的轉換
2018-06-28
模型PyTorch
pytorch-模型儲存與載入自己訓練的模型詳解
2020-10-31
PyTorch模型
pytorch---之固定某些層權重再訓練
2019-03-09
PyTorch
statsmodels中的summary解讀(以linear regression模型為例)
2018-08-17
模型
PyTorch預訓練Bert模型
2020-11-17
PyTorch模型
【預訓練語言模型】使用Transformers庫進行GPT2預訓練
2024-03-13
模型ORMGPT
【AI】Pytorch_預訓練模型
2021-08-26
AIPyTorch模型
Pytorch之Embedding與Linear的愛恨糾葛
2023-02-13
PyTorch
TensorFlow 呼叫預訓練好的模型—— Python 實現
2018-10-10
模型Python
基於Mindspore2.0的GPT2預訓練模型遷移教程
2023-03-10
GPT模型
如何將keras訓練的模型轉換成tensorflow lite模型
2018-08-21
Keras模型
如何訓練解決問題的能力？
2020-05-28
取出預訓練模型中間層的輸出(pytorch)
2023-03-12
模型PyTorch
模型訓練：資料預處理和預載入
2020-10-27
模型
獲取和生成基於TensorFlow的MobilNet預訓練模型
2020-11-03
模型
訓練模型的儲存與載入
2019-12-19
模型
NLP領域預訓練模型的現狀及分析
2019-11-05
模型
PyTorch 模型訓練實⽤教程（程式碼訓練步驟講解）
2020-09-25
PyTorch模型
PyTorch儲存模型斷點以及載入斷點繼續訓練
2023-04-27
PyTorch模型斷點
linear-gradient()
2024-06-24
【tf.keras】tf.keras載入AlexNet預訓練模型
2019-05-29
Keras模型
vue許可權問題解決方案
2019-05-07
Vue
PyTorch如何恢復指定權重
2018-08-28
PyTorch
The sol to Bismuth / Linear Sieve
2024-11-17
首個基於Mamba的MLLM來了！模型權重、訓練程式碼等已全部開源
2024-04-22
模型
Transformer模型中的權重矩陣
2024-06-04
ORM模型矩陣
令人拍案叫絕的 Wasserstein GAN，徹底解決GAN訓練不穩定問題
2018-07-20
輕量化模型訓練加速的思考（Pytorch實現）
2020-09-01
模型PyTorch
程式設計謎題：提升你解決問題的訓練場
2021-12-06
程式設計
自訓練 + 預訓練 = 更好的自然語言理解模型
2020-11-13
模型
MXNet的資料格式轉化為tensorflow訓練用的資料格式
2019-05-02
Pytorch：使用Tensorboard記錄訓練狀態
2022-11-26
PyTorchORB
知識增強的預訓練語言模型系列之ERNIE：如何為預訓練語言模型注入知識
2021-12-30
模型
如何使用screen解決ssh斷連訓練停止的問題
2024-03-11
Variational Quantum Linear Solver 的MindQuantum復現
2023-04-16
解決 PBootCMS 網站出現的“會話目錄寫入許可權不足”的問題
2024-10-01
boot網站會話
pytorch和tensorflow的愛恨情仇之定義可訓練的引數
2020-10-06
PyTorch
Machine Learning (1) - Linear Regression
2019-04-14
Mac

解決載入GPT2（Tensorflow預訓練模型）的Linear權重到PyTorch的Linear權重 形狀不匹配（互為轉置）問題

相關文章

解決載入GPT2（Tensorflow預訓練模型）的Linear權重到PyTorch的Linear權重形狀不匹配（互為轉置）問題