零程式碼教你安裝部署Stable Diffusion 3,一鍵生成高質量影像

华为云开发者联盟發表於2024-07-12

本文分享自華為雲社群《重磅!【支援中文】stable-diffusion-3安裝部署教程-SD3 來了》,作者:碼上開花_Lancer。

正如承諾的那樣,Stability AI在6月12日正式開源了Stable Diffusion 3(Medium版本)!不愧是AI生圖領域的“開源英雄”。最近一段時間,正當所有人都在為OpenAI釋出Sora狂歡時,Stability AI更是推出了Stable Diffusion 3的技術報告。這兩項技術不約而同都採用了Diffusion Transformer的架構設計。

cke_124.png

值得注意的是,Stable Diffusion 3的強大效能其實並不僅限於Diffusion Transformer在架構上所帶來的增益,其在提示詞、影像質量、文字拼寫方面的能力都得到了極大的提升。那麼究竟是什麼讓Stable Diffusion 3如此強大?今天我們就從Stable Diffusion 3的技術報告中解讀stable diffusion 3強大背後的技術原理。

cke_125.png

接下來就講講,怎麼在本地部署最新的Stable Diffusion 3,大致分為以下幾步(開始操作前,請確保你有“暢通”的網路):

一、前期準備

1.登入華為雲官方賬號:

cke_126.png

點選右上角“控制檯”,搜尋欄輸入“ModelArts”

cke_127.png

點選“開發環境”-“notebook”,“建立”:

cke_128.png

進入建立notebook,名稱“notebook-LangChain”,選擇GPU規格,“GPU: 1*T4(16GB)|CPU: 8核 32GB”,點選“立即建立”,磁碟規格選擇“50G”,點選“建立”

cke_129.png

點選返回“任務中心”,點選notebook進入

cke_130.png

以上步驟是從ModelArts上自己建立notebook,也可以直接點選案例進入體驗--stable-diffusion-3重磅來襲

cke_131.png

二、下載模型

[Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) 是一種多模態擴散轉換器 (MMDiT) 文字到影像模型,其特點是在影像質量、排版、複雜提示理解和資源效率方面大大提高了效能。有關更多技術細節,請參閱[研究報告](https://stability.ai/news/stable-diffusion-3-research-paper)。

cke_132.png

🔹 本案例需使用 Pytorch-2.0.1 GPU-V100 及以上規格執行

🔹 點選Run in ModelArts,將會進入到ModelArts CodeLab中,這時需要你登入華為雲賬號,如果沒有賬號,則需要註冊一個,且要進行實名認證,參考[《如何建立華為雲賬號並且實名認證》](https://bbs.huaweicloud.com/blogs/427460) 即可完成賬號註冊和實名認證。 登入之後,等待片刻,即可進入到CodeLab的執行環境

🔹 出現 Out Of Memory ,請檢查是否為您的引數配置過高導致,修改引數配置,重啟kernel或更換更高規格資源進行規避❗❗❗

cke_133.png

首先切換kernrl,

cke_134.png

1. 下載程式碼和模型

cke_135.png

import os
import moxing as mox

if not os.path.exists('opus-mt-zh-en'):
    mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/course/ModelBox/opus-mt-zh-en', 'opus-mt-zh-en')

if not os.path.exists('stable-diffusion-3-medium-diffusers'):
    mox.file.copy_parallel('obs://modelbox-course/stable-diffusion-3-medium-diffusers','stable-diffusion-3-medium-diffusers')
    
if not os.path.exists('/home/ma-user/work/frpc_linux_amd64'):
    mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/course/ModelBox/frpc_linux_amd64', '/home/ma-user/work/frpc_linux_amd64')
    INFO:root:Using MoXing-v2.1.0.5d9c87c8-5d9c87c8
​    
​    

    INFO:root:Using OBS-Python-SDK-3.20.9.1

cke_136.png

import os
import moxing as mox
from PIL import Image,ImageDraw,ImageFont,ImageFilter

# 匯入海報需要的素材
if not os.path.exists("/home/ma-user/work/Style"):
    mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/case_zoo/StableDiffusion/Style/AI_paint.jpg',"/home/ma-user/work/Style/AI_paint.jpg") 
    mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/case_zoo/StableDiffusion/Style/方正蘭亭準黑_GBK.ttf',"/home/ma-user/work/Style/方正蘭亭準黑_GBK.ttf") 
    if os.path.exists("/home/ma-user/work/material"):
        print('Download success')
    else:
        raise Exception('Download Failed')
else:
    print("Project already exists")  
    Project already exists

2. 配置執行環境

本案例依賴Python-3.9.15及以上環境,因此我們首先建立虛擬環境:

!/home/ma-user/anaconda3/bin/conda clean -i
!/home/ma-user/anaconda3/bin/conda create -n python-3.9.15 python=3.9.15 -y --override-channels --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
!/home/ma-user/anaconda3/envs/python-3.9.15/bin/pip install ipykernel
  /home/ma-user/anaconda3/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported versi 

      RequestsDependencyWarning)

    /home/ma-user/anaconda3/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported version!
​    
​    

      RequestsDependencyWarning)
​    
​    

    Collecting package metadata (current_repodata.json): done
​    
​    

    Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
​    
​    

    Collecting package metadata (repodata.json): done
​    
​    

    Solving environment: done
​    

    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m808.2/808.2 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m00:01[0m
​    
​    

    [?25hCollecting jupyter-client>=6.1.12 (from ipykernel)
​    

    Successfully installed asttokens-2.4.1 comm-0.2.2 debugpy-1.8.2 decorator-5.1.1 exceptiongroup-1.2.1 executing-2.0.1 importlib-metadata-8.0.0 ipykernel-6.29.5 ipython-8.18.1 jedi-0.19.1 jupyter-client-8.6.2 jupyter-core-5.7.2 matplotlib-inline-0.1.7 nest-asyncio-1.6.0 packaging-24.1 parso-0.8.4 pexpect-4.9.0 platformdirs-4.2.2 prompt-toolkit-3.0.47 psutil-6.0.0 ptyprocess-0.7.0 pure-eval-0.2.2 pygments-2.18.0 python-dateutil-2.9.0.post0 pyzmq-26.0.3 six-1.16.0 stack-data-0.6.3 tornado-6.4.1 traitlets-5.14.3 typing-extensions-4.12.2 wcwidth-0.2.13 zipp-3.19.2
import json
import os

data = {
   "display_name": "python-3.9.15",
   "env": {
      "PATH": "/home/ma-user/anaconda3/envs/python-3.9.15/bin:/home/ma-user/anaconda3/envs/python-3.7.10/bin:/modelarts/authoring/notebook-conda/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/anaconda3/envs/PyTorch-1.8/bin"
   },
   "language": "python",
   "argv": [
      "/home/ma-user/anaconda3/envs/python-3.9.15/bin/python",
      "-m",
      "ipykernel",
      "-f",
      "{connection_file}"
   ]
}

if not os.path.exists("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.9.15/"):
    os.mkdir("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.9.15/")

with open('/home/ma-user/anaconda3/share/jupyter/kernels/python-3.9.15/kernel.json', 'w') as f:
    json.dump(data, f, indent=4)

建立完成後,稍等片刻,或重新整理頁面,點選右上角kernel選擇python-3.9.15

cke_137.png

檢視Python版本

!python -V
    Python 3.9.15

檢視GPU型號,至少需要32GB視訊記憶體

!nvidia-smi
 

 Wed Jul 10 23:52:26 2024       
    
    +-----------------------------------------------------------------------------+
    
    | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
    
    |-------------------------------+----------------------+----------------------+
    
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    
    |                               |                      |               MIG M. |
    
    |===============================+======================+======================|
    
    |   0  Tesla V100-PCIE...  On   | 00000000:00:0D.0 Off |                    0 |
    
    | N/A   30C    P0    25W / 250W |      0MiB / 32510MiB |      0%      Default |
    
    |                               |                      |                  N/A |
    
    +-------------------------------+----------------------+----------------------+
​                                                                                   
​    

    +-----------------------------------------------------------------------------+
    
    | Processes:                                                                  |
    
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    
    |        ID   ID                                                   Usage      |
    
    |=============================================================================|
    
    |  No running processes found                                                 |
    
    +-----------------------------------------------------------------------------+ 

安裝SD3依賴包

!pip install --upgrade pip
!pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 
!pip install diffusers transformers sentencepiece accelerate protobuf gradio spaces
!cp /home/ma-user/work/frpc_linux_amd64 /home/ma-user/anaconda3/envs/python-3.9.15/lib/python3.9/site-packages/gradio/frpc_linux_amd64_v0.2
!chmod +x /home/ma-user/anaconda3/envs/python-3.9.15/lib/python3.9/site-packages/gradio/frpc_linux_amd64_v0.2
 Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple
    
    Requirement already satisfied: pip in /home/ma-user/anaconda3/envs/python-3.9.15/lib/python3.9/site-packages (24.0)
    
    Collecting pip
    
      Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/e7/54/0c1c068542cee73d8863336e974fc881e608d0170f3af15d0c0f28644531/pip-24.1.2-py3-none-any.whl (1.8 MB)
    
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m28.5 MB/s[0m eta [36m0:00:00[0m00:01[0m
    
    [?25hInstalling collected packages: pip
    
      Attempting uninstall: pip
    
        Found existing installation: pip 24.0
    
        Uninstalling pip-24.0:
    
          Successfully uninstalled pip-24.0
    
    Successfully installed pip-24.1.2
    Successfully installed accelerate-0.32.1 aiofiles-23.2.1 altair-5.3.0 annotated-types-0.7.0 anyio-4.4.0 attrs-23.2.0 click-8.1.7 contourpy-1.2.1 cycler-0.12.1 diffusers-0.29.2 dnspython-2.6.1 email_validator-2.2.0 fastapi-0.111.0 fastapi-cli-0.0.4 ffmpy-0.3.2 fonttools-4.53.1 fsspec-2024.6.1 gradio-4.37.2 gradio-client-1.0.2 h11-0.14.0 httpcore-1.0.5 httptools-0.6.1 httpx-0.27.0 huggingface-hub-0.23.4 importlib-resources-6.4.0 jsonschema-4.23.0 jsonschema-specifications-2023.12.1 kiwisolver-1.4.5 markdown-it-py-3.0.0 matplotlib-3.9.1 mdurl-0.1.2 numpy-1.26.4 orjson-3.10.6 pandas-2.2.2 protobuf-5.27.2 psutil-5.9.8 pydantic-2.8.2 pydantic-core-2.20.1 pydub-0.25.1 pyparsing-3.1.2 python-dotenv-1.0.1 python-multipart-0.0.9 pytz-2024.1 pyyaml-6.0.1 referencing-0.35.1 regex-2024.5.15 rich-13.7.1 rpds-py-0.19.0 ruff-0.5.1 safetensors-0.4.3 semantic-version-2.10.0 sentencepiece-0.2.0 shellingham-1.5.4 sniffio-1.3.1 spaces-0.28.3 starlette-0.37.2 tokenizers-0.19.1 tomlkit-0.12.0 toolz-0.12.1 tqdm-4.66.4 transformers-4.42.3 typer-0.12.3 tzdata-2024.1 ujson-5.10.0 uvicorn-0.30.1 uvloop-0.19.0 watchfiles-0.22.0 websockets-11.0.3 

3. 生成單張影像

cke_138.png

#@title 填寫英文提示詞 
import torch
from diffusers import StableDiffusion3Pipeline

# 清理 GPU 快取
torch.cuda.empty_cache()

# 確保使用半精度浮點數
torch_dtype = torch.float16

# 嘗試減少推理步驟
num_inference_steps = 20

# 調整引導比例
guidance_scale = 5.0

# 定義 Prompt
prompt = "cinematic photo of a red apple on a table in a classroom, on the blackboard are the words go big or go home written in chalk" #@param {type:"string"}

# 載入模型並將其移動到 GPU
pipe = StableDiffusion3Pipeline.from_pretrained("stable-diffusion-3-medium-diffusers", torch_dtype=torch_dtype).to("cuda")

# 根據提供的 Prompt 生成影像
image = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=guidance_scale).images[0]

# 定義儲存影像的路徑
save_path = '/home/ma-user/work/your_generated_image.png'

# 儲存影像到指定路徑
image.save(save_path)

# 如果需要在本地檢視影像,可以使用 show 方法
image.show()
prompt = "cinematic photo of a red apple on a table in a classroom, on the blackboard are the words go big or go home written in chalk" #@param {type:"string"}
  /home/ma-user/anaconda3/envs/python-3.9.15/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
    
      from .autonotebook import tqdm as notebook_tqdm
    
    Loading pipeline components...:  33%|███▎      | 3/9 [00:00<00:00,  7.87it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
    
    Loading pipeline components...:  44%|████▍     | 4/9 [00:00<00:00,  5.87it/s]
    
    Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s][A
    
    Loading checkpoint shards:  50%|█████     | 1/2 [00:00<00:00,  3.92it/s][A
    
    Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  3.95it/s][A
    
    Loading pipeline components...: 100%|██████████| 9/9 [00:02<00:00,  3.06it/s]
    
    100%|██████████| 20/20 [00:08<00:00,  2.27it/s]

cke_139.png

注意:

出現 Out Of Memory ,嘗試重啟 kernel 再次執行❗❗❗

cke_140.png

4.填寫作品名稱和作者姓名

cke_141.png

#@title 填寫作品名稱和作者姓名 
from PIL import Image, ImageDraw, ImageFont, ImageFilter

def gen_poster(img, txt1, txt2, path, zt):
    # 定義字型和顏色
    font1 = ImageFont.truetype(zt, 30)
    font2 = ImageFont.truetype(zt, 25)
    # 建立一個可以在影像上繪製的 Draw 物件
    img_draw = ImageDraw.Draw(img)
    
    # 在影像上繪製文字
    img_draw.text((180, 860), txt1, font=font1, fill='#961900')
    img_draw.text((130, 903), txt2, font=font2, fill='#252b3a')
    
    # 儲存影像
    img.save(path)

# 定義模板影像路徑和字型路徑
template_img = "/home/ma-user/work/Style/AI_paint.jpg"
zt = r"/home/ma-user/work/Style/方正蘭亭準黑_GBK.ttf"

# 開啟模板影像
temp_image = Image.open(template_img).convert("RGBA")

# 開啟生成的影像
image_path = "/home/ma-user/work/your_generated_image.png"  # 替換為你生成的影像路徑
image = Image.open(image_path)

# 計算新的大小以適應模板影像的寬度,同時保持圖片的原始比例
width_ratio = temp_image.width / image.width
new_height = int(image.height * width_ratio)
new_size = (temp_image.width, new_height)

# 調整生成的影像大小,使用 LANCZOS 重取樣演算法
image = image.resize(new_size, Image.Resampling.LANCZOS)

# 貼上調整大小後的影像到模板上
# 假設影像貼上的起始點是 (40, 266)
temp_image.paste(image, (40, 266))

# 定義作品名稱和作者姓名
title_char = "蘋果" #@param {type:"string"}
author_char = "ModelArts" #@param {type:"string"}

# 定義儲存海報的路徑
savepath = '/home/ma-user/work/AI_paint_output.png'  # 確保路徑正確,並且有寫許可權

# 呼叫函式生成海報
gen_poster(temp_image, title_char, author_char, savepath, zt)

# 使用 Image.open 來開啟並顯示生成的海報
Image.open(savepath).show()

cke_142.png

5. 執行Gradio應用

with gr.Blocks(css=css) as demo:
    gr.HTML("""<h1 align="center">Stable Diffusion 3</h1>""")
    
    with gr.Column(elem_id="col-container"):
        with gr.Row():
            prompt = gr.Text(
                label="提示詞",
                show_label=False,
                max_lines=1,
                placeholder="請輸入中文提示詞",
                container=False,
            )
            
            run_button = gr.Button("生成", scale=0)
        
        result = gr.Image(label="Result", show_label=False)
 
        with gr.Accordion("更多引數", open=False):
            
            negative_prompt = gr.Text(
                label="負面提示詞",
                max_lines=1,
                placeholder="請輸入負面提示詞",
            )
            
            seed = gr.Slider(
                label="Seed",
                minimum=0,
                maximum=MAX_SEED,
                step=1,
                value=0,
            )
            
            randomize_seed = gr.Checkbox(label="隨機種子", value=True)
            
            with gr.Row():
                
                width = gr.Slider(
                    label="",
                    minimum=256,
                    maximum=MAX_IMAGE_SIZE,
                    step=64,
                    value=1024,
                )
                
                height = gr.Slider(
                    label="",
                    minimum=256,
                    maximum=MAX_IMAGE_SIZE,
                    step=64,
                    value=1024,
                )
            
            with gr.Row():
                
                guidance_scale = gr.Slider(
                    label="Guidance scale",
                    minimum=0.0,
                    maximum=10.0,
                    step=0.1,
                    value=5.0,
                )
                
                num_inference_steps = gr.Slider(
                    label="迭代步數",
                    minimum=1,
                    maximum=50,
                    step=1,
                    value=28,
                )
    gr.on(
        triggers=[run_button.click, prompt.submit, negative_prompt.submit],
        fn = infer,
        inputs = [prompt, negative_prompt, seed, randomize_seed, width, height, guidance_scale, num_inference_steps],
        outputs = [result, seed]
    )
 
demo.launch(share=True)
    Writing demo.py

執行Gradio應用,執行成功後點選 Running on public URL後的網頁連結即可體驗!

!python demo.py
Loading pipeline components...:  56%|███████▏     | 5/9 [00:02<00:01,  2.28it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
​    
​    

    Loading pipeline components...:  67%|████████▋    | 6/9 [00:02<00:01,  2.61it/s]
​    
​    

    Loading checkpoint shards:   0%|                          | 0/2 [00:00<?, ?it/s][A
​    
​    

    Loading checkpoint shards:  50%|█████████         | 1/2 [00:00<00:00,  3.54it/s][A
​    
​    

    Loading checkpoint shards: 100%|██████████████████| 2/2 [00:00<00:00,  3.53it/s][A
​    
​    

    Loading pipeline components...: 100%|█████████████| 9/9 [00:03<00:00,  2.83it/s]
​    
​    

    /home/ma-user/anaconda3/envs/python-3.9.15/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
​    
​    

      return self.fget.__get__(instance, owner)()
​    
​    

    /home/ma-user/anaconda3/envs/python-3.9.15/lib/python3.9/site-packages/transformers/models/marian/tokenization_marian.py:175: UserWarning: Recommended: pip install sacremoses.
​    
​    

      warnings.warn("Recommended: pip install sacremoses.")
​    
​    

    Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
​    
​    

    Running on local URL:  http://127.0.0.1:7860
​    
​    

    Running on public URL: https://9c48446865ca38cc99.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)

   一幅畫的是一位宇航員騎著一隻穿著芭蕾舞裙的豬,手裡拿著一把粉紅色的傘,豬旁邊的地上是一隻戴著大禮帽的知更鳥,角落裡寫著“穩定擴散”的字樣。

A picture of an astronaut riding on a pig in a ballet dress with a pink umbrella next to a big hat on the ground, with the word “stable spread” in the corner.

出現 Out Of Memory ,嘗試重啟 kernel 再次執行❗❗❗

瀏覽器開啟local URL: http://127.0.0.1:7860 地址,

執行介面:

cke_143.png

三、其他案例展示:

Prompt: cinematic photo of a red apple on a table in a classroom, on the blackboard are the words "go big or go home" written in chalk

提示:教室裡的桌子上有一個紅蘋果的電影照片,黑板上用粉筆寫著“要麼做大,要麼回家”

cke_144.png

Prompt: a painting of an astronaut riding a pig wearing a tutu holding a pink umbrella, on the ground next to the pig is a robin bird wearing a top hat, in the corner are the words "stable diffusion"

提示:一幅畫的是一位宇航員騎著一隻穿著芭蕾舞裙的豬,手裡拿著一把粉紅色的傘,豬旁邊的地上是一隻戴著大禮帽的知更鳥,角落裡寫著“穩定擴散”的字樣。

cke_145.png

Prompt: Three transparent glass bottles on a wooden table. The one on the left has red liquid and the number 1. The one in the middle has blue liquid and the number 2. The one on the right has green liquid and the number 3.

提示:三個透明玻璃瓶放在木桌上。左邊的是紅色液體和數字1。中間有藍色液體和數字2。右邊的是綠色液體和數字3。

cke_146.png

參考:

官網:Stable Diffusion 3 — Stability AI

案例:stable-diffusion-3重磅來襲 (huaweicloud.com)

點選關注,第一時間瞭解華為雲新鮮技術~

相關文章