unsloth微調llama3實戰全過程

雨梦山人發表於2024-06-17

1、為什麼要進行大模型微調

微調的定義

大模型微調是利用特定領域的資料集對已預訓練的大模型進行進一步訓練的過程。它旨在最佳化模型在特定任務上的效能,使模型能夠更好地適應和完成特定領域的任務。

微調的核心原因

定製化功能:微調的核心原因是賦予大模型更加定製化的功能。通用大模型雖然強大,但在特定領域可能表現不佳。透過微調,可以使模型更好地適應特定領域的需求和特徵。
領域知識學習:透過引入特定領域的資料集進行微調,大模型可以學習該領域的知識和語言模式。這有助於模型在特定任務上取得更好的效能。

2、unsloth簡介

unsloth微調Llama 3, Mistral和Gemma速度快2-5倍,記憶體減少80% !unsloth是一個開源專案,它可以比HuggingFace快2-5倍地微調Llama 3、Mistral和Gemma語言模型,同時記憶體消耗減少80%。
github:https://github.com/unslothai/unsloth

3、實戰教學

1、硬體環境
我選擇的是在Autodl租用的顯示卡進行的微調。好處:1、省時省力,不需要自己搭建硬體環境。2、價格便宜,初學階段使用3080*2,20G視訊記憶體完全夠用,每小時只要1.08元。無卡開機只要0.01元。
2、軟體環境
採用的是Conda 進行安裝

conda create --name unsloth_env python=3.10
conda activate unsloth_env

conda install pytorch-cuda=12.1 pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

pip install --no-deps trl peft accelerate bitsandbytes

git拉取程式碼可能拉不下來,使用autodl提供的學術資源加速。拉取速度不快,多等會。

source /etc/network_turbo

3、執行程式碼
由於網路原因,可能無法訪問huggingface上的資源,可以使用國內的映象站。https://hf-mirror.com/

pip install -U huggingface_hub

export HF_ENDPOINT=https://hf-mirror.com

下面的程式碼需要儲存成.py檔案,上傳到伺服器root目錄下。使用python test-unlora.py執行。
微調前測試模型

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit", 
    max_seq_length = max_seq_length, 
    dtype = dtype,     
    load_in_4bit = load_in_4bit,
    token = "https://hf-mirror.com"  
)

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""

FastLanguageModel.for_inference(model) 
inputs = tokenizer(
[
    alpaca_prompt.format(
        "海綿寶寶的書法是不是叫做海綿體",
        "", 
        "", 
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

EOS_TOKEN = tokenizer.eos_token # 必須新增 EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # 必須新增EOS_TOKEN,否則生成將永無止境
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

上面的程式碼會拉取映象,估計要10分鐘左右。上述返回沒問題,則可以進行微調。
微調系統盤50G空間不夠,需要購買資料盤50-100G。
微調模型

import os
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

#載入模型
max_seq_length = 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit", 
    max_seq_length = max_seq_length, 
    dtype = dtype,     
    load_in_4bit = load_in_4bit,
    token = "https://hf-mirror.com"
)

#準備訓練資料
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # 必須新增 EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # 必須新增EOS_TOKEN,否則無限生成
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

#hugging face資料集路徑
dataset = load_dataset("kigner/ruozhiba-llama3", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

#設定訓練引數
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, 
    bias = "none",    
    use_gradient_checkpointing = True,
    random_state = 3407,
    max_seq_length = max_seq_length,
    use_rslora = False,  
    loftq_config = None, 
)

trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    tokenizer = tokenizer,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 60,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        output_dir = "outputs",
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
    ),
)
#開始訓練
trainer.train()

#儲存微調模型
model.save_pretrained("lora_model") 

#合併模型,儲存為16位hf
model.save_pretrained_merged("outputs", tokenizer, save_method = "merged_16bit",)

#合併模型,並量化成4位gguf
#model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")


如果需要量化成4位,則解開最後註釋。量化過需要磁碟空間比較大。

相關文章