unsloth微調llama3實戰全過程

雨梦山人發表於2024-06-17

原文網址 : https://www.cnblogs.com/shanren/p/18251730

1、為什麼要進行大模型微調

微調的定義

大模型微調是利用特定領域的資料集對已預訓練的大模型進行進一步訓練的過程。它旨在最佳化模型在特定任務上的效能，使模型能夠更好地適應和完成特定領域的任務。

微調的核心原因

定製化功能：微調的核心原因是賦予大模型更加定製化的功能。通用大模型雖然強大，但在特定領域可能表現不佳。透過微調，可以使模型更好地適應特定領域的需求和特徵。
領域知識學習：透過引入特定領域的資料集進行微調，大模型可以學習該領域的知識和語言模式。這有助於模型在特定任務上取得更好的效能。

2、unsloth簡介

unsloth微調Llama 3, Mistral和Gemma速度快2-5倍，記憶體減少80% !unsloth是一個開源專案，它可以比HuggingFace快2-5倍地微調Llama 3、Mistral和Gemma語言模型，同時記憶體消耗減少80%。
github:https://github.com/unslothai/unsloth

3、實戰教學

1、硬體環境
我選擇的是在Autodl租用的顯示卡進行的微調。好處：1、省時省力，不需要自己搭建硬體環境。2、價格便宜，初學階段使用3080*2，20G視訊記憶體完全夠用，每小時只要1.08元。無卡開機只要0.01元。
2、軟體環境
採用的是Conda 進行安裝

conda create --name unsloth_env python=3.10
conda activate unsloth_env

conda install pytorch-cuda=12.1 pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

pip install --no-deps trl peft accelerate bitsandbytes

git拉取程式碼可能拉不下來，使用autodl提供的學術資源加速。拉取速度不快，多等會。

source /etc/network_turbo

3、執行程式碼
由於網路原因，可能無法訪問huggingface上的資源，可以使用國內的映象站。https://hf-mirror.com/

pip install -U huggingface_hub

export HF_ENDPOINT=https://hf-mirror.com

下面的程式碼需要儲存成.py檔案，上傳到伺服器root目錄下。使用python test-unlora.py執行。
微調前測試模型

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit", 
    max_seq_length = max_seq_length, 
    dtype = dtype,     
    load_in_4bit = load_in_4bit,
    token = "https://hf-mirror.com"  
)

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""

FastLanguageModel.for_inference(model) 
inputs = tokenizer(
[
    alpaca_prompt.format(
        "海綿寶寶的書法是不是叫做海綿體",
        "", 
        "", 
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

EOS_TOKEN = tokenizer.eos_token # 必須新增 EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # 必須新增EOS_TOKEN，否則生成將永無止境
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

上面的程式碼會拉取映象，估計要10分鐘左右。上述返回沒問題，則可以進行微調。
微調系統盤50G空間不夠，需要購買資料盤50-100G。
微調模型

import os
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

#載入模型
max_seq_length = 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit", 
    max_seq_length = max_seq_length, 
    dtype = dtype,     
    load_in_4bit = load_in_4bit,
    token = "https://hf-mirror.com"
)

#準備訓練資料
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # 必須新增 EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # 必須新增EOS_TOKEN，否則無限生成
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

#hugging face資料集路徑
dataset = load_dataset("kigner/ruozhiba-llama3", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

#設定訓練引數
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, 
    bias = "none",    
    use_gradient_checkpointing = True,
    random_state = 3407,
    max_seq_length = max_seq_length,
    use_rslora = False,  
    loftq_config = None, 
)

trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    tokenizer = tokenizer,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 60,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        output_dir = "outputs",
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
    ),
)
#開始訓練
trainer.train()

#儲存微調模型
model.save_pretrained("lora_model") 

#合併模型，儲存為16位hf
model.save_pretrained_merged("outputs", tokenizer, save_method = "merged_16bit",)

#合併模型，並量化成4位gguf
#model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")

如果需要量化成4位，則解開最後註釋。量化過需要磁碟空間比較大。

LLM實戰：LLM微調加速神器-Unsloth + LLama3
2024-05-14
LLM實戰：LLM微調加速神器-Unsloth + Qwen1.5
2024-05-16
gigapath部署以及微調全過程
2024-11-09
透過ORPO技術微調 llama3大模型(Fine-tune Llama 3 with ORPO)
2024-04-23
大模型
實戰獨立專案「幾行字」：從想法到上線全過程
2022-05-04
ATK&CK紅隊評估實戰靶場 (一）的搭建和模擬攻擊過程全過程
2021-05-09
微信公眾號支付開發全過程（Java 版）
2018-07-08
Java
整理 PC 端微信掃碼支付全過程 --- easywechat + Laravel 5.8
2019-03-19
Laravel
JdbcTemplate調儲存過程
2018-07-24
JDBC儲存過程
Hydro OJ搭建全過程
2024-05-03
JVM 效能調優實戰之：一次系統效能瓶頸的尋找過程
2018-07-02
JVM
java實現手機簡訊驗證全過程
2018-06-16
Java
頁面載入全過程
2018-11-27
MapReduce 執行全過程解析
2019-08-05
107-全過程部署fabc
2020-10-31
從0開始搭建微信小程式(前後端)的全過程
2019-04-14
微信小程式後端
RAG-GPT實踐過程中遇到的挑戰
2024-05-27
GPT
大模型高效微調-LoRA原理詳解和訓練過程深入分析
2024-06-11
大模型
go dns解析過程及調優
2022-01-12
GoDNS
LLM本地部署全過程記錄
2024-05-10
記憶體訪問全過程
2020-05-10
記憶體
TorchVision Faster R-CNN 微調，實戰 Kaggle 小麥檢測
2021-03-04
ASTCNN
實戰 nginx 調優
2020-11-16
Nginx
效能調優實戰
2020-10-27
FFmpeg開發筆記全目錄（FFmpeg開發實戰詳解，含直播系統的搭建過程）
2024-06-17
筆記
RAG實戰4-RAG過程中發生了什麼？
2024-03-09
SQL Server實戰五：儲存過程與觸發器
2024-05-08
SQLServer儲存過程觸發器
微信小程式授權過程
2020-10-09
微信小程式
記一次SQL調優過程
2019-10-10
SQL
innobackupex命令備份全過程圖解
2018-11-30
圖解
VS2010自定義模版全過程
2018-08-29
Ubuntu 16.04 安裝 MySQL 8.0 全過程
2019-11-01
UbuntuMySql
Linux TCP/IP協議棧全過程
2019-10-22
LinuxTCP協議
【Elasticsearch學習】文件搜尋全過程
2020-05-10
Elasticsearch
在青雲上部署oracle rac全過程
2018-04-09
Oracle
redhat 5.4下安裝MYSQL全過程
2021-09-09
RedhatMySql
理解 Android 程式啟動之全過程
2021-09-09
Android
【進階篇】基於 Redis 實現分散式鎖的全過程
2024-05-06
Redis分散式