【Coursera GenAI with LLM】 Week 2 PEFT Class Notes

MiraMira發表於2024-03-14

原文網址 : https://www.cnblogs.com/miramira/p/18071182

With PEFT, we only train on small portion of parameters!

What's using memory while training model?

Trainable weights
Optimizer states
Gradients
Forward Activations
Temporary memory

PEFT Trade-offs

Parameter Efficiency
Memory Efficiency
Model Performance
Training Speed
Inference Costs

PEFT Methods

Selective: select subset of initial LLM parameters to fine-tune
Re-parameterize: re-parameterize model weights using a low-rank representation. ex. LoRA
Additive: add trainable layers or parameters to model while keeping all of the original LLM weights frozen
1. Adapter methods: add new trainable layers to the architecture of the model, typically inside the encoder or decoder components after the attention or feed-forward layers.
2. Soft prompt methods: keep the model architecture fixed and frozen, and focus on manipulating the input to achieve better performance

Re-cap of how Transformer works

The input prompt is turned into tokens
Tokens converted to embedding vectors and passed into the encoder and/or decoder parts of the transformer.
In Encoder and Decoder, there are two kinds of neural networks: self-attention and feedforward networks.
The weights of these networks are learned during pre-training.
During full fine-tuning, every parameter in these layers is updated.

Or, step 5, we can get LoRA going!

LoRA (Low-Rank Adaptation of LLM): LoRA is a strategy that reduces the number of parameters to be trained during fine-tuning by freezing all of the original model parameters and then injecting a pair of rank decomposition matrices alongside the original weights. Then you can get a LoRA fine-tuned LLM for a specific task

You can use a single GPU instead of multiple of them, if you are using LoRA.

You can switch out the matrices for different tasks, those matrices are typically very small:

It's not the case that bigger matrices, better performance. Ranks in the range of 4-32 can provide you with a good trade-off between reducing trainable parameters and preserving performance.

Prompt Tuning: different from prompt engineering, you add additional trainable tokens (soft prompts) to your prompt and leave it up to the supervised learning process to determine their optimal values

Soft prompts: weights of the model are frozen, but the embedding vectors of the soft prompt gets updated over time to optimize the model's completion of the prompt.

Bigger the model, more effective prompt tuning is:

【Coursera GenAI with LLM】 Week 3 Reinforcement Learning from Human Feedback Class Notes
2024-03-15
AI
Coursera課程筆記----C程式設計進階----Week 5
2020-05-09
筆記C程式程式設計
Coursera課程筆記----C++程式設計----Week3
2020-05-15
筆記C++程式設計
Ruby class_eval and instance_eval notes
2019-02-16
week2—南苑速遞
2024-03-31
Newstar_week1-2_wp
2024-10-21
HGAME-week2-web-wp
2022-02-17
GAMWeb
AWS GenAI LLM Chatbot: 多模型多RAG驅動的聊天機器人解決方案
2024-10-12
AI模型機器人
Coursera | 免費上Coursera-助學金申請流程
2020-12-28
Mongodb Notes
2019-03-03
MongoDB
Typora Notes
2024-04-29
ACM notes
2024-08-02
ACM
2.week 獨立開發第二週
2020-06-14
class_model v2
2024-11-22
最近打算參加一個比賽week-2
2018-09-03
Go 開發者進階週刊（Jan week 2）
2020-01-10
Go
[R] [Johns Hopkins] R Programming 作業 Week 2 - Air Pollution
2020-04-04
AI
0xGame 2024 [Week 2] 報告哈基米
2024-11-04
GAM
[Ruby Notes] Proc
2019-02-16
[Bun] Bun notes
2024-11-24
WireGuard Use Notes
2024-07-17
Redis Reading Notes
2024-07-17
Redis
J2EE - IncompatibleClassChangeError: Implementing class
2018-05-18
Error
mini-lsm通關筆記Week1Day2
2024-06-22
筆記
LLM-kimi：H2資料庫
2024-06-18
資料庫
Week 11 Problems
2024-05-06
Week 4 Problems
2024-03-20
nowcoder Week Contest
2024-08-10
ARTS Week 20
2020-12-27
Recommendation Systems Basic Notes
2024-04-28
Reinforcement Learning Basic Notes
2024-04-28
c++stl notes
2020-09-28
C++
下一站 GenAI @南京
2023-12-13
AI
下一站 GenAI @杭州
2023-12-13
AI
下一站 GenAI @上海
2023-12-13
AI
LLM大模型GPT2微調嘗試
2024-05-20
大模型GPT
TiDB 2.1 GA Release Notes
2018-11-30
TiDB
Notes about Vue Style Guide
2018-05-14
VueGUIIDE

【Coursera GenAI with LLM】 Week 2 PEFT Class Notes

相關文章