這下,大模型不能太過信任有「實錘」了。
論文標題:Alignment Faking in Large Language Models
論文地址:https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf
影片講解地址:https://www.youtube.com/watch?v=9eXV64O2Xp8
這下,大模型不能太過信任有「實錘」了。
論文標題:Alignment Faking in Large Language Models
論文地址:https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf
影片講解地址:https://www.youtube.com/watch?v=9eXV64O2Xp8