【Coursera GenAI with LLM】 Week 3 Reinforcement Learning from Human Feedback Class Notes

MiraMira發表於2024-03-15

原文網址 : https://www.cnblogs.com/miramira/p/18073641

Helpful? Honest? Harmless? Make sure AI response in those 3 ways.

If not, we need RLHF is reduce the toxicity of the LLM.

Reinforcement learning: is a type of machine learning in which an agent learns to make decisions related to a specific goal by taking actions in an environment, with the objective of maximizing some notion of a cumulative reward. RLHF can help making personalized LLMs.

RLHF cycle: iterate until reward score is high:

Select an instruct model, define your model alignment criterion (ex. helpfulness)
obtain human feedback through labeler workforce to rate the completions
Convert rankings into pairwise training data for the reward model
Train reward model to predict preferred completion from {y_j, y_k} for prompt x
Use the reward model as a binary classifier to automatically provide reward value for each prompt-completion pair
lower reward score, worse the performance
softmax(logits) = probabilities

RL Algorithm

RL algorithm updates the weights off the LLM based on the reward is signed to the completions generated by the current version off the LLM
ex. Q-Learning, PPO (Proximal Policy Optimization, the most popular method)
PPO optimize LLM to more aligned with human preferences

Reward hacking: the model will achieve high reward score but it actually doesn't align with the criterion, the quality is not improved

To avoid this, we can use the initial instruct model (aka reference model). * during training, we pass prompt dataset to both reference model and RL-updated LLM,
Then, we calculate KL Divergence Shift Penalty (a statistical measure of how different two probability distributions are) between two models
Add the penalty to the Reward Model, then go through PPO, PEFT, and back to reward model

Constitutional AI

First proposed in 2022 by researchers at Anthropic
a method for training models using a set of rules and principles that govern the model's behavior.

Red Teaming: make it to generate harmful responses. Then, remove all harmful responses

【Coursera GenAI with LLM】 Week 2 PEFT Class Notes
2024-03-14
AI
Reinforcement Learning Basic Notes
2024-04-28
論文閱讀翻譯之Deep reinforcement learning from human preferences
2024-09-11
Reinforcement Learning Chapter2
2024-02-05
APT
ARS Reinforcement Learning using Gymnasium
2024-11-20
Enhancing Diffusion Models with Reinforcement Learning
2024-07-24
Generating Diverse and Natural 3D Human Motions from Text
2024-06-20
3D
Coursera課程筆記----C++程式設計----Week3
2020-05-15
筆記C++程式設計
Papers of Multi Agent Reinforcement Learning(MARL)
2018-08-06
Join Query Optimization with Deep Reinforcement Learning Algorithms
2020-12-27
Go
Jan 2023-Prioritizing Samples in Reinforcement Learning with Reducible Loss
2023-05-17
吳恩達機器學習第三課 Unsupervised learning recommenders reinforcement learning
2024-06-10
吳恩達機器學習
Deep Learning for joint channel estimation and feedback in massive MIMO systems(1)
2020-12-17
Coursera 吳恩達《Machine Learning》視訊 + 作業
2018-08-03
吳恩達Mac
Coursera課程筆記----C程式設計進階----Week 5
2020-05-09
筆記C程式程式設計
Ruby class_eval and instance_eval notes
2019-02-16
文章學習29“Crafting a Toolchain for Image Restoration by Deep Reinforcement Learning”
2018-10-08
RaftAIREST
learning to Estimate 3D Hand Pose from Single RGB Images論文理解
2020-11-10
3D
強化學習(Reinforcement Learning)中的Q-Learning、DQN，面試看這篇就夠了！
2019-08-18
強化學習面試
Coursera Deep Learning 4 卷積神經網路第四周習題
2019-03-04
卷積神經網路
吳恩達機器學習學習筆記——Week 1——3. 引數學習（Parameter Learning）
2020-12-13
吳恩達機器學習筆記
[HGAME 2023 week3]kunmusicin
2024-06-18
GAM
hgame-week3-web-wp
2022-02-11
GAMWeb
【論文筆記】A Survey on Federated Learning: The Journey From Centralized to Distributed On-Site Learning and Beyond（綜述）
2022-04-25
筆記Zed
【論文閱讀】CVPR2022: Learning from all vehicles
2022-03-23
2024 Mar. Week-3 Summary
2024-03-25
AWS GenAI LLM Chatbot: 多模型多RAG驅動的聊天機器人解決方案
2024-10-12
AI模型機器人
day 3 of learning vue
2018-10-28
Vue
TypeError: Cannot read private member xxx from an object whose class did not declare it
2024-08-14
ErrorObject
閱讀筆記（Communication-Efficient Learning of Deep Networks from Decentralized Data）
2020-10-20
筆記Zed
10-09訓練賽week3-C3
2020-10-09
[計組 notes] Chapter 3 儲存系統
2020-11-11
APT
Coursera | 免費上Coursera-助學金申請流程
2020-12-28
LLM實戰：LLM微調加速神器-Unsloth + LLama3
2024-05-14
[PaperReading] EgoPoseFormer: A Simple Baseline for Stereo Egocentric 3D Human Pose Estimation
2024-11-22
GoORM3D
Mongodb Notes
2019-03-03
MongoDB
Typora Notes
2024-04-29
ACM notes
2024-08-02
ACM

【Coursera GenAI with LLM】 Week 3 Reinforcement Learning from Human Feedback Class Notes

相關文章