歡迎 Stable Diffusion 3.5 Large 加入 🧨 Diffusers

HuggingFace發表於2024-11-07

原文網址 : https://www.cnblogs.com/huggingface/p/18532238

作為 Stable Diffusion 3 的改進版本，Stable Diffusion 3.5 如今已在 Hugging Face Hub 中可用，並可以直接使用 🧨 Diffusers 中的程式碼執行。

本次釋出包含兩套模型引數:

一個大型的模型 (large，8B)
該模型經過時間步蒸餾的版本，僅需幾步推理即可生成圖片

在本文中，我們將介紹如何在 Diffusers 中使用 Stable Diffusion 3.5 (SD3.5)，涵蓋推理和訓練兩方面內容。

模型結構改進

對於 SD3.5-large 使用的 transformer 模型，其結構基本和 SD3-medium 裡的相同，但有以下更改:

QK normalization: 對於訓練大型的 Transformer 模型，使用 QK normalization 已經成為標準做法，所以 SD3.5-large 也不例外。
雙注意力層: 在 MMDiT 結構中，文字和影像兩個模態都在使用同一個注意力層; 而 SD3.5-large 則使用了兩個注意力層。

除此之外，文字編碼器 (text encoder)、影像的變分自編碼器 (VAE) 以及噪聲排程器 (noise scheduler) 均和 SD3-medium 保持一致。如果對 SD3 感興趣，可以參考這篇論文。

在 Diffusers 中使用 SD3.5

首先你需要確保安裝的 Diffusers 是最新版本:

pip install -U diffusers

由於模型存在訪問限制，你還需要到 Hugging Face 上 Stable Diffusion 3.5 Large 的頁面填寫表格並同意相關條款。完成後你還需要登陸賬號，才能訪問到模型。使用如下方法登陸 Hugging Face 賬號:

huggingface-cli login

下列程式碼將下載 SD3.5 的 8B 模型。下載的模型使用 torch.bfloat16 精度，這是 Stability AI 的原版格式，也推薦使用該精度進行推理。

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
	"stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
).to("cuda")

image = pipe(
    prompt="a photo of a cat holding a sign that says hello world",
    negative_prompt="",
    num_inference_steps=40,
    height=1024,
    width=1024,
    guidance_scale=4.5,
).images[0]

image.save("sd3_hello_world.png")

hello_world_cat

本次釋出也包含了一個 “時間步蒸餾” 的模型，該模型推理時無需 classifier-free guidance，可在短短几步推理內生成圖片 (通常是 4 到 8 步)。

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
	"stabilityai/stable-diffusion-3.5-large-turbo", torch_dtype=torch.bfloat16
).to("cuda")

image = pipe(
    prompt="a photo of a cat holding a sign that says hello world",
    num_inference_steps=4,
    height=1024,
    width=1024,
    guidance_scale=1.0,
).images[0]

image.save("sd3_hello_world.png")

hello_world_cat_2

此外，在 SD3 部落格和官方 Diffusers 文件中出現過的最佳化策略在 SD3.5 中都可使用。這些策略都對推理時視訊記憶體最佳化做了大量工作。由於 SD3.5-large 是一個比 SD3-medium 大得多的模型，視訊記憶體最佳化對於消費級場景下的使用顯得尤為重要。

在推理過程中使用量化策略

Diffusers 原生支援使用 bitsandbytes 進行量化，這可以進一步降低視訊記憶體使用。

首先，我們需要安裝必要的庫:

pip install -Uq git+https://github.com/huggingface/transformers@main
pip install -Uq bitsandbytes

接下來載入 “NF4”精度的模型:

from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
import torch

model_id = "stabilityai/stable-diffusion-3.5-large"
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = SD3Transformer2DModel.from_pretrained(
    model_id,
    subfolder="transformer",
    quantization_config=nf4_config,
    torch_dtype=torch.bfloat16
)

然後我們就能進行推理了:

from diffusers import StableDiffusion3Pipeline

pipeline = StableDiffusion3Pipeline.from_pretrained(
    model_id,
    transformer=model_nf4,
    torch_dtype=torch.bfloat16
)
pipeline.enable_model_cpu_offload()

prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
image = pipeline(
    prompt=prompt,
    negative_prompt="",
    num_inference_steps=28,
    guidance_scale=4.5,
    max_sequence_length=512,
).images[0]
image.save("whimsical.png")

happy_hippo

如果你想調節 BitsAndBytesConfig 中其它配置，你可以在這裡參考官方文件。

直接載入相同 nf4_config 配置的已量化模型也是可以的，這對 RAM 較低的機器來說非常實用，讀者可以在這裡的 Colab Notebook 來獲取完整示例。

在 SD3.5-large 上使用量化策略訓練 LoRA

藉助 bitsandbytes 和 peft ，我們可以在消費級顯示卡 (24GB 視訊記憶體) 上微調像 SD3.5 這樣的大模型。我們提供的 SD3 訓練指令碼可以在這裡用來訓練 LoRA，使用如下命令即可:

accelerate launch train_dreambooth_lora_sd3.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-3.5-large" \
  --dataset_name="Norod78/Yarn-art-style" \
  --output_dir="yart_art_sd3-5_lora" \
  --mixed_precision="bf16" \
  --instance_prompt="Frog, yarn art style" \
  --caption_column="text"\
  --resolution=768 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=4e-4 \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=700 \
  --rank=16 \
  --seed="0" \
  --push_to_hub

但如果想在訓練中加入量化，還需要調整一些地方，這包括以下幾個大概方向:

在初始化程式碼中的 transformer 時，加上量化配置，或者直接載入量化過的模型。
然後使用 peft 中的 prepare_model_for_kbit_training() 函式對模型進行準備操作。
其它步驟和原始碼保持一致即可 (感謝 peft 對 bitsandbytes 的強力支援)。

讀者可參考這裡的完整示例。

使用 single-file 方法載入 SD3.5 的 Transformer 模型

Stable Diffusion 3.5 的 transformer 模型還可以使用 Stability AI 釋出的原生引數檔案來進行初始化。這裡需要使用 from_single_file 方法:

import torch
from diffusers import SD3Transformer2DModel, StableDiffusion3Pipeline

transformer = SD3Transformer2DModel.from_single_file(
    "https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo/blob/main/sd3.5_large.safetensors",
    torch_dtype=torch.bfloat16,
)
pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
image = pipe("a cat holding a sign that says hello world").images[0]
image.save("sd35.png")

重要連結

SD3.5-large 在 Hugging Face Hub 上的模型集合
Diffusers 中 SD3.5 的官方文件
用來執行 SD3.5 量化推理的 Colab Notebook
LoRA 訓練程式碼
Stable Diffusion 3 官方論文
Stable Diffusion 3 中文部落格

宣告: 感謝 Daniel Frank 為本部落格提供了封面圖，感謝 Pedro Cuenca 和 Tom Aarsen 對本文的審校。

英文原文: https://hf.co/blog/sd3-5

原文作者: YiYi Xu, Aryan V S, Dhruv Nair, Sayak Paul, Linoy Tsaban, Apolinário from multimodal AI art, Alvaro Somoza, Aritra Roy Gosthipaty

譯者: hugging-hoi2022

歡迎 Stable Diffusion 3 加入 🧨 Diffusers
2024-06-17
Diffusers中基於Stable Diffusion的哪些影像操作
2023-02-24
SOFAStack Community | 歡迎加入
2020-03-09
ASTUnity
社團建立～歡迎加入
2020-11-17
外包交流VX群歡迎加入
2021-07-01
剛剛！Stable Diffusion 3.5最強模型全家桶來了，三個型號
2024-10-22
模型
Stable Diffusion中的embedding
2024-04-25
Outpainting with Stable Diffusion on an infinite canvas
2024-08-08
AICanvas
stable diffusion 入門教程
2024-07-30
stable diffusion學習筆記
2024-03-09
筆記
Windows 部署 Stable Diffusion web UI
2024-04-02
WindowsWebUI
線上教程 | 重回霸主地位，Stable Diffusion 3.5 輕鬆生成多元化風格影像
2024-10-31
Stable-diffusion WebUI API呼叫方法
2023-10-16
WebUIAPI
如何使用stable diffusion設計logo
2024-05-09
Go
Stable diffusion取樣器詳解
2024-06-04
歡迎加入 Android Q 測試版計劃！
2019-04-18
Android
[基礎] Stable Diffusion, High-Resolution Image Synthesis with Latent Diffusion Models
2024-03-14
ubuntu2204 部署 stable-diffusion-webui
2024-04-08
UbuntuWebUI
Stable Diffusion中的常用術語解析
2024-04-23
如何用 Serverless 一鍵部署 Stable Diffusion？
2023-05-11
Server
Stable Diffusion 小白的入坑鋪墊
2024-08-31
Stable Diffusion 生成個性圖片指南
2024-06-23
用StabilityMatrix一鍵安裝Stable Diffusion
2024-07-06
Stable Diffusion WebUI詳細使用指南
2024-05-29
WebUI
怎麼使用Stable diffusion中的models
2024-05-28
android招聘啦,美圖秀秀歡迎你加入！
2018-10-17
Android
在英特爾 CPU 上加速 Stable Diffusion 推理
2023-04-13
原來Stable Diffusion是這樣工作的
2024-06-06
Stable Diffusion WebUI 最新版使用文件整理
2025-01-16
WebUI
stable diffusion ControlNet使用介紹與進階技巧
2024-07-02
歡迎加入d3shop，一個DDD實戰專案
2024-09-10
在雲伺服器中部署stable diffusion webui教程。
2024-02-07
伺服器WebUI
stable-diffusion-webui官方版本地安裝教程
2023-10-29
WebUI
Stable Diffusion WebUI 頁面設定: 顯示 VAE CLIP
2024-05-09
WebUI
活動 | 歡迎加入網路安全威脅資訊共享計劃
2021-03-22
使用 LoRA 進行 Stable Diffusion 的高效引數微調
2023-02-10
Stable Diffusion解析：探尋AI繪畫背後的科技神秘
2024-02-27
AI
基於PAI-EAS一鍵部署Stable Diffusion AIGC繪畫
2024-01-23
AIGC