無需搭建環境,零門檻帶你體驗Open-Sora文生影片應用

华为云开发者联盟發表於2024-06-06

本文分享自華為雲社群《Open-Sora 文生影片原來在AI Gallery上也能體驗了》,作者:碼上開花_Lancer。

體驗連結:Open-Sora 文生影片案例體驗

不久前,OpenAI Sora 憑藉其驚人的影片生成效果迅速走紅,在一堆文字轉影片模型中脫穎而出,成為全球關注的焦點。之後,Colossal-AI團隊又推出了新的開源解決方案“Open-Sora 1.0”,涵蓋了整個訓練過程,包括資料處理、所有訓練細節和模型檢查點,與世界各地的AI愛好者攜手推進影片創作的新時代。

無需搭建環境,零門檻帶你體驗Open-Sora文生影片應用

詳細內容請參考:https://hpc-ai.com/blog/open-sora-v1.0

2024年4月份又更新了Open-Sora 1.1,它可以生成2s~15s,144p到720p解析度的影片,支援文字到影像、文字到影片以及影像到影片的生成,讓我們來看看Open-Sora 1.1的實際影片生成效果:

無需搭建環境,零門檻帶你體驗Open-Sora文生影片應用

案例體驗

🔹 本案例需使用 Pytorch-2.0.1 GPU-V100 及以上規格執行

無需搭建環境,零門檻帶你體驗Open-Sora文生影片應用

🔹 點選Run in ModelArts,將會進入到ModelArts CodeLab中,這時需要你登入華為雲賬號,如果沒有賬號,則需要註冊一個,且要進行實名認證,參考《如何建立華為雲賬號並且實名認證》 即可完成賬號註冊和實名認證。 登入之後,等待片刻,即可進入到CodeLab的執行環境

🔹 出現 Out Of Memory ,請檢查是否為您的引數配置過高導致,修改引數配置,重啟kernel或更換更高規格資源進行規避❗❗❗

1. 下載程式碼和模型

此處執行大約需要1分鐘,請耐心等待!

import os
import moxing as mox
​
if not os.path.exists('Open-Sora'):
    mox.file.copy_parallel('obs://modelbox-course/open-sora_1.1/Open-Sora', 'Open-Sora')
    
if not os.path.exists('/home/ma-user/.cache/huggingface'):
    mox.file.copy_parallel('obs://modelbox-course/huggingface', '/home/ma-user/.cache/huggingface')
    
if not os.path.exists('Open-Sora/opensora/models/sd-vae-ft-ema'):
    mox.file.copy_parallel('obs://modelbox-course/sd-vae-ft-ema', 'Open-Sora/opensora/models/sd-vae-ft-ema')
​
if not os.path.exists('Open-Sora/opensora/models/text_encoder/t5-v1_1-xxl'):
    mox.file.copy_parallel('obs://modelbox-course/t5-v1_1-xxl', 'Open-Sora/opensora/models/text_encoder/t5-v1_1-xxl')
    
if not os.path.exists('/home/ma-user/work/t5.py'):
    mox.file.copy_parallel('obs://modelbox-course/open-sora_1.1/t5.py', '/home/ma-user/work/t5.py')
    
if not os.path.exists('Open-Sora/opus-mt-zh-en'):
    mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/course/ModelBox/opus-mt-zh-en', 'Open-Sora/opus-mt-zh-en')
    
if not os.path.exists('/home/ma-user/work/frpc_linux_amd64'):
    mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/course/ModelBox/frpc_linux_amd64', '/home/ma-user/work/frpc_linux_amd64')
INFO:root:Using MoXing-v2.1.6.879ab2f4-879ab2f4
​
INFO:root:List OBS time cost: 0.02 seconds.
​
INFO:root:Copy parallel total time cost: 41.71 seconds.
​
INFO:root:List OBS time cost: 0.14 seconds.
​
INFO:root:Copy parallel total time cost: 2.91 seconds.

2. 配置執行環境

本案例依賴Python3.10.10及以上環境,因此我們首先建立虛擬環境:

!/home/ma-user/anaconda3/bin/conda clean -i
!/home/ma-user/anaconda3/bin/conda create -n python-3.10.10 python=3.10.10 -y --override-channels --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
!/home/ma-user/anaconda3/envs/python-3.10.10/bin/pip install ipykernel
/home/ma-user/anaconda3/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
/home/ma-user/anaconda3/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done
​
## Package Plan ##
​
  environment location: /home/ma-user/anaconda3/envs/python-3.10.10
​
  added / updated specs:
    - python=3.10.10
The following packages will be downloaded:
​
    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |             main           3 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    _openmp_mutex-5.1          |            1_gnu          21 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    bzip2-1.0.8                |       h5eee18b_6         262 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    ca-certificates-2024.3.11  |       h06a4308_0         127 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    ld_impl_linux-64-2.38      |       h1181459_1         654 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libffi-3.4.4               |       h6a678d5_1         141 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libgcc-ng-11.2.0           |       h1234567_1         5.3 MB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libgomp-11.2.0             |       h1234567_1         474 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libstdcxx-ng-11.2.0        |       h1234567_1         4.7 MB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libuuid-1.41.5             |       h5eee18b_0          27 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    ncurses-6.4                |       h6a678d5_0         914 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    openssl-1.1.1w             |       h7f8727e_0         3.7 MB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    pip-24.0                   |  py310h06a4308_0         2.7 MB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    python-3.10.10             |       h7a1cb2a_2        26.9 MB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    readline-8.2               |       h5eee18b_0         357 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    setuptools-69.5.1          |  py310h06a4308_0        1012 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    sqlite-3.45.3              |       h5eee18b_0         1.2 MB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    tk-8.6.14                  |       h39e8969_0         3.4 MB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    tzdata-2024a               |       h04d1e81_0         116 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    wheel-0.43.0               |  py310h06a4308_0         110 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    xz-5.4.6                   |       h5eee18b_1         643 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    zlib-1.2.13                |       h5eee18b_1         111 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    ------------------------------------------------------------
                                           Total:        52.8 MB
​
The following NEW packages will be INSTALLED:
​
  _libgcc_mutex      anaconda/pkgs/main/linux-64::_libgcc_mutex-0.1-main
  _openmp_mutex      anaconda/pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu
  bzip2              anaconda/pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6
  ca-certificates    anaconda/pkgs/main/linux-64::ca-certificates-2024.3.11-h06a4308_0
  ld_impl_linux-64   anaconda/pkgs/main/linux-64::ld_impl_linux-64-2.38-h1181459_1
  libffi             anaconda/pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1
  libgcc-ng          anaconda/pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1
  libgomp            anaconda/pkgs/main/linux-64::libgomp-11.2.0-h1234567_1
  libstdcxx-ng       anaconda/pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1
  libuuid            anaconda/pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0
  ncurses            anaconda/pkgs/main/linux-64::ncurses-6.4-h6a678d5_0
  openssl            anaconda/pkgs/main/linux-64::openssl-1.1.1w-h7f8727e_0
  pip                anaconda/pkgs/main/linux-64::pip-24.0-py310h06a4308_0
  python             anaconda/pkgs/main/linux-64::python-3.10.10-h7a1cb2a_2
  readline           anaconda/pkgs/main/linux-64::readline-8.2-h5eee18b_0
  setuptools         anaconda/pkgs/main/linux-64::setuptools-69.5.1-py310h06a4308_0
  sqlite             anaconda/pkgs/main/linux-64::sqlite-3.45.3-h5eee18b_0
  tk                 anaconda/pkgs/main/linux-64::tk-8.6.14-h39e8969_0
  tzdata             anaconda/pkgs/main/noarch::tzdata-2024a-h04d1e81_0
  wheel              anaconda/pkgs/main/linux-64::wheel-0.43.0-py310h06a4308_0
  xz                 anaconda/pkgs/main/linux-64::xz-5.4.6-h5eee18b_1
  zlib               anaconda/pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1
Downloading and Extracting Packages
libffi-3.4.4         | 141 KB    | ##################################### | 100% 
_openmp_mutex-5.1    | 21 KB     | ##################################### | 100% 
xz-5.4.6             | 643 KB    | ##################################### | 100% 
tzdata-2024a         | 116 KB    | ##################################### | 100% 
_libgcc_mutex-0.1    | 3 KB      | ##################################### | 100% 
zlib-1.2.13          | 111 KB    | ##################################### | 100% 
bzip2-1.0.8          | 262 KB    | ##################################### | 100% 
libuuid-1.41.5       | 27 KB     | ##################################### | 100% 
ca-certificates-2024 | 127 KB    | ##################################### | 100% 
libstdcxx-ng-11.2.0  | 4.7 MB    | ##################################### | 100% 
ncurses-6.4          | 914 KB    | ##################################### | 100% 
openssl-1.1.1w       | 3.7 MB    | ##################################### | 100% 
wheel-0.43.0         | 110 KB    | ##################################### | 100% 
python-3.10.10       | 26.9 MB   | ##################################### | 100% 
pip-24.0             | 2.7 MB    | ##################################### | 100% 
readline-8.2         | 357 KB    | ##################################### | 100% 
tk-8.6.14            | 3.4 MB    | ##################################### | 100% 
setuptools-69.5.1    | 1012 KB   | ##################################### | 100% 
libgcc-ng-11.2.0     | 5.3 MB    | ##################################### | 100% 
ld_impl_linux-64-2.3 | 654 KB    | ##################################### | 100% 
libgomp-11.2.0       | 474 KB    | ##################################### | 100% 
sqlite-3.45.3        | 1.2 MB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate python-3.10.10
#
# To deactivate an active environment, use
#
#     $ conda deactivate
​
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple
Collecting ipykernel
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/53/9d/40d5207db523363d9b5698f33778c18b0d591e3fdb6e0116b894b2a2491c/ipykernel-6.29.4-py3-none-any.whl (117 kB)
•[2K     •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m117.1/117.1 kB•[0m •[31m10.6 MB/s•[0m eta •[36m0:00:00•[0m
......
​
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/80/03/6ea8b1b2a5ab40a7a60dc464d3daa7aa546e0a74d74a9f8ff551ea7905db/executing-2.0.1-py2.py3-none-any.whl (24 kB)
Collecting asttokens>=2.1.0 (from stack-data->ipython>=7.23.1->ipykernel)
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/45/86/4736ac618d82a20d87d2f92ae19441ebc7ac9e7a581d7e58bbe79233b24a/asttokens-2.4.1-py2.py3-none-any.whl (27 kB)
Collecting pure-eval (from stack-data->ipython>=7.23.1->ipykernel)
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/2b/27/77f9d5684e6bce929f5cfe18d6cfbe5133013c06cb2fbf5933670e60761d/pure_eval-0.2.2-py3-none-any.whl (11 kB)
Installing collected packages: wcwidth, pure-eval, ptyprocess, typing-extensions, traitlets, tornado, six, pyzmq, pygments, psutil, prompt-toolkit, platformdirs, pexpect, parso, packaging, nest-asyncio, executing, exceptiongroup, decorator, debugpy, python-dateutil, matplotlib-inline, jupyter-core, jedi, comm, asttokens, stack-data, jupyter-client, ipython, ipykernel
Successfully installed asttokens-2.4.1 comm-0.2.2 debugpy-1.8.1 decorator-5.1.1 exceptiongroup-1.2.1 executing-2.0.1 ipykernel-6.29.4 ipython-8.25.0 jedi-0.19.1 jupyter-client-8.6.2 jupyter-core-5.7.2 matplotlib-inline-0.1.7 nest-asyncio-1.6.0 packaging-24.0 parso-0.8.4 pexpect-4.9.0 platformdirs-4.2.2 prompt-toolkit-3.0.46 psutil-5.9.8 ptyprocess-0.7.0 pure-eval-0.2.2 pygments-2.18.0 python-dateutil-2.9.0.post0 pyzmq-26.0.3 six-1.16.0 stack-data-0.6.3 tornado-6.4 traitlets-5.14.3 typing-extensions-4.12.1 wcwidth-0.2.13
import json
import os
​
data = {
   "display_name": "python-3.10.10",
   "env": {
      "PATH": "/home/ma-user/anaconda3/envs/python-3.10.10/bin:/home/ma-user/anaconda3/envs/python-3.7.10/bin:/modelarts/authoring/notebook-conda/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/anaconda3/envs/PyTorch-1.8/bin"
   },
   "language": "python",
   "argv": [
      "/home/ma-user/anaconda3/envs/python-3.10.10/bin/python",
      "-m",
      "ipykernel",
      "-f",
      "{connection_file}"
   ]
}
​
if not os.path.exists("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/"):
    os.mkdir("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/")
​
with open('/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/kernel.json', 'w') as f:
    json.dump(data, f, indent=4)
conda env list
/home/ma-user/anaconda3/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
# conda environments:
#
base                  *  /home/ma-user/anaconda3
python-3.10.10           /home/ma-user/anaconda3/envs/python-3.10.10
python-3.7.10            /home/ma-user/anaconda3/envs/python-3.7.10
Note: you may need to restart the kernel to use updated packages.

建立完成後,稍等片刻,或重新整理頁面,點選右上角kernel選擇python-3.10.10

檢視Python版本

!python -V
Python 3.10.10

檢查可用GPU,至少需要32GB視訊記憶體

!nvidia-smi
Wed Jun  5 16:22:37 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:00:0D.0 Off |                    0 |
| N/A   28C    P0    25W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

安裝依賴包

!pip install --upgrade pip
!pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 xformers==0.0.22
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple
Requirement already satisfied: pip in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (24.0)
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple
Collecting torch==2.0.1
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/8c/4d/17e07377c9c3d1a0c4eb3fde1c7c16b5a0ce6133ddbabc08ceef6b7f2645/torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)
•[2K     •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m619.9/619.9 MB•[0m •[31m8.2 MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m
•[?25hCollecting torchvision==0.15.2
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/87/0f/88f023bf6176d9af0f85feedf4be129f9cf2748801c4d9c690739a10c100/torchvision-0.15.2-cp310-cp310-manylinux1_x86_64.whl (6.0 MB)
•[2K     •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m6.0/6.0 MB•[0m •[31m109.5 MB/s•[0m eta •[36m0:00:00•[0ma •[36m0:00:01•[0m
•[?25hCollecting torchaudio==2.0.2
  Downloading 
  
•[?25hCollecting certifi>=2017.4.17 (from requests->torchvision==0.15.2)
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/5b/11/1e78951465b4a225519b8c3ad29769c49e0d8d157a070f681d5b6d64737f/certifi-2024.6.2-py3-none-any.whl (164 kB)
•[2K     •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m164.4/164.4 kB•[0m •[31m23.1 MB/s•[0m eta •[36m0:00:00•[0m
•[?25hCollecting mpmath<1.4.0,>=1.1.0 (from sympy->torch==2.0.1)
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/43/e3/7d92a15f894aa0c9c4b49b8ee9ac9850d6e63b03c9c32c0367a13ae62209/mpmath-1.3.0-py3-none-any.whl (536 kB)
•[2K     •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m536.2/536.2 kB•[0m •[31m32.8 MB/s•[0m eta •[36m0:00:00•[0m
•[?25hInstalling collected packages: mpmath, lit, urllib3, sympy, pillow, nvidia-nvtx-cu11, nvidia-nccl-cu11, nvidia-cusparse-cu11, nvidia-curand-cu11, nvidia-cufft-cu11, nvidia-cuda-runtime-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-cupti-cu11, nvidia-cublas-cu11, numpy, networkx, MarkupSafe, idna, filelock, cmake, charset-normalizer, certifi, requests, nvidia-cusolver-cu11, nvidia-cudnn-cu11, jinja2, triton, torch, xformers, torchvision, torchaudio
Successfully installed MarkupSafe-2.1.5 certifi-2024.6.2 charset-normalizer-3.3.2 cmake-3.29.3 filelock-3.14.0 idna-3.7 jinja2-3.1.4 lit-18.1.6 mpmath-1.3.0 networkx-3.3 numpy-1.26.4 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 pillow-10.3.0 requests-2.32.3 sympy-1.12.1 torch-2.0.1 torchaudio-2.0.2 torchvision-0.15.2 triton-2.0.0 urllib3-2.2.1 xformers-0.0.22
%cd Open-Sora
/home/ma-user/work/ma_share/open-spra_1/Open-Sora
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.
  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
'/home/ma-user/work/ma_share/open-spra_1/Open-Sora'
!pip install colossalai==0.3.6 accelerate==0.29.2 diffusers==0.27.2 ftfy==6.2.0 gdown==5.1.0 mmengine==0.10.3 pre-commit==3.7.0 pyav==12.0.5 tensorboard==2.16.2 timm==0.9.16 transformers==4.39.3 wandb==0.16.6
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple
Collecting colossalai==0.3.6
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/05/ed/57e80620ea8e35c3aa63a3207720b1890700fd12eea38b6592e9833e5c1b/colossalai-0.3.6.tar.gz (1.1 MB)
•[2K     •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m1.1/1.1 MB•[0m •[31m36.5 MB/s•[0m eta •[36m0:00:00•[0m
•[?25h  Preparing metadata (setup.py) ... •[?25ldone
•[?25hCollecting accelerate==0.29.2
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/1b/e8/2fc7af3fa77ddac89a9c9b390d2d31d1db0612247ba2274009946959604e/accelerate-0.29.2-py3-none-any.whl (297 kB)
•[2K     •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m297.4/297.4 kB•[0m •[31m14.5 MB/s•[0m eta •[36m0:00:00•[0m
•[?25hCollecting diffusers==0.27.2
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/75/c5/3b84fd731dd93c549a0c25657e4ce5a957aeccd32d60dba2958cd3cdac23/diffusers-0.27.2-py3-none-any.whl (2.0 MB)
!pip install .
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple
Processing /home/ma-user/work/ma_share/open-spra_1/Open-Sora
  Preparing metadata (setup.py) ... •[?25ldone
•[?25hRequirement already satisfied: colossalai in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (0.3.6)
Requirement already satisfied: accelerate in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (0.29.2)
Requirement already satisfied: diffusers in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (0.27.2)
Requirement already satisfied: ftfy in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (6.2.0)
Requirement already satisfied: gdown in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (5.1.0)
Requirement already satisfied: mmengine in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (0.10.3)
Collecting pandas (from opensora==1.1.0)
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/89/1b/12521efcbc6058e2673583bb096c2b5046a9df39bd73eca392c1efed24e5/pandas-2.2.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.0 MB)
•[2K     •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m13.0/13.0 MB•[0m •[31m60.4 MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m
•[?25hRequirement already satisfied: pre-commit in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (3.7.0)
Collecting pyarrow (from opensora==1.1.0)
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/91/83/57572c088ec185582f04b607d545a4a6ef7599c0a3c1e60d397743b0d609/pyarrow-16.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (40.9 MB)
•[2K     •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m40.9/40.9 MB•[0m •[31m36.9 MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m
•[?25hCollecting av (from opensora==1.1.0)
  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/0a/11/2b501d0a4de22826217a0b909e832f52fb5d503df50f424f3e31023a7bcc/av-12.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.3 MB)
•[2K     •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m34.3/34.3 MB•[0m •[31m96.1 MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m
•[?25hRequirement already satisfied: tensorboard in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (2.16.2)
Requirement already satisfied: timm in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (0.9.16)
Requirement already satisfied: tqdm in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (4.66.4)
Requirement already satisfied: transformers in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (4.39.3)
Requirement already satisfied: wandb in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (0.16.6)
Collecting rotary_embedding_torch (from opensora==1.1.0)
  Downloading 
Building wheels for collected packages: opensora, pandarallel
  Building wheel for opensora (setup.py) ... •[?25ldone
•[?25h  Created wheel for opensora: filename=opensora-1.1.0-py3-none-any.whl size=195249 sha256=86c66de7ded305b2e4fb07992d0147c0408086cc31cdc31d97bcea44d8f69596
  Stored in directory: /home/ma-user/.cache/pip/wheels/ae/34/85/7f84dd36f2e448d8d4455272d3358f557d0a570011d1701074
  Building wheel for pandarallel (setup.py) ... •[?25ldone
•[?25h  Created wheel for pandarallel: filename=pandarallel-1.6.5-py3-none-any.whl size=16673 sha256=b97386c92d34443f19cc88ea717c6cca143ef2b8f1f1ac79f4645c37d230bafc
  Stored in directory: /home/ma-user/.cache/pip/wheels/f6/dd/25/a1c3775e721641ff67c71b3652e901e7e52611c6c3091784c9
Successfully built opensora pandarallel
Installing collected packages: pytz, tzdata, pyarrow, dill, beartype, av, pandas, pandarallel, rotary_embedding_torch, opensora
Successfully installed av-12.1.0 beartype-0.18.5 dill-0.3.8 opensora-1.1.0 pandarallel-1.6.5 pandas-2.2.2 pyarrow-16.1.0 pytz-2024.1 rotary_embedding_torch-0.6.2 tzdata-2024.1
!pip install spaces gradio MoviePy -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn
!cp /home/ma-user/work/frpc_linux_amd64 /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/gradio/frpc_linux_amd64_v0.2
!chmod +x /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/gradio/frpc_linux_amd64_v0.2
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting spaces
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b2/3c/6205090507ea96e6e56d0deda8d0fc4c507026ef3772e55b637a5d0b7c61/spaces-0.28.3-py3-none-any.whl (18 kB)
Collecting gradio
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d1/37/f49320600cdf1fa856cc605a2e20e9debd34b5425b53f49abdb2ea463716/gradio-4.32.2-py3-none-any.whl (12.3 MB)
•[2K     •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m12.3/12.3 MB•[0m •[31m5.2 MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m
​
      Successfully uninstalled decorator-5.1.1
•[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fabric 3.2.2 requires decorator>=5, but you have decorator 4.4.2 which is incompatible.•[0m•[31m
•[0mSuccessfully installed MoviePy-1.0.3 aiofiles-23.2.1 altair-5.3.0 anyio-4.4.0 decorator-4.4.2 dnspython-2.6.1 email_validator-2.1.1 fastapi-0.111.0 fastapi-cli-0.0.4 ffmpy-0.3.2 gradio-4.32.2 gradio-client-0.17.0 h11-0.14.0 httpcore-1.0.5 httptools-0.6.1 httpx-0.27.0 imageio-2.34.1 imageio_ffmpeg-0.5.1 importlib-resources-6.4.0 orjson-3.10.3 proglog-0.1.10 pydub-0.25.1 python-dotenv-1.0.1 python-multipart-0.0.9 ruff-0.4.7 semantic-version-2.10.0 shellingham-1.5.4 sniffio-1.3.1 spaces-0.28.3 starlette-0.37.2 tomlkit-0.12.0 toolz-0.12.1 typer-0.12.3 ujson-5.10.0 uvicorn-0.30.1 uvloop-0.19.0 watchfiles-0.22.0 websockets-11.0.3

3. 生成影片

修改模型配置檔案:

%%writefile configs/opensora-v1-1/inference/sample.py
num_frames = 16
frame_interval = 3
fps = 24
image_size = (240, 426)
multi_resolution = "STDiT2"
​
# Define model
model = dict(
    type="STDiT2-XL/2",
    from_pretrained="hpcai-tech/OpenSora-STDiT-v2-stage3",
    input_sq_size=512,  # 使用huggingface上下載好的模型權重
    qk_norm=True,
    enable_flash_attn=True,
    enable_layernorm_kernel=True,
)
vae = dict(
    type="VideoAutoencoderKL",
    from_pretrained="./opensora/models/sd-vae-ft-ema",
    cache_dir=None,  # 修改為從當前目錄載入
    micro_batch_size=4,
)
text_encoder = dict(
    type="t5",
    from_pretrained="./opensora/models/text_encoder/t5-v1_1-xxl",
    cache_dir=None,  # 修改為從當前目錄載入
    model_max_length=200,
)
scheduler = dict(
    type="iddpm",
    num_sampling_steps=100,
    cfg_scale=7.0,
    cfg_channel=3,  # or None
)
dtype = "fp16"
​
# Condition
prompt_path = "./assets/texts/t2v_samples.txt"
prompt = None  # prompt has higher priority than prompt_path
​
# Others
batch_size = 1
seed = 42
save_dir = "./samples/samples/"
Overwriting configs/opensora-v1-1/inference/sample.py
import os
​
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
!cp /home/ma-user/work/t5.py /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/opensora/models/text_encoder/t5.py
# text to video
!python scripts/inference.py configs/opensora-v1-1/inference/sample.py --prompt "A fashion girl walking on the streets of Tokyo" --num-frames 32 --image-size 240 426
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel
  warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel")
Config (path: configs/opensora-v1-1/inference/sample.py): {'num_frames': 32, 'frame_interval': 3, 'fps': 24, 'image_size': [240, 426], 'multi_resolution': 'STDiT2', 'model': {'type': 'STDiT2-XL/2', 'from_pretrained': 'hpcai-tech/OpenSora-STDiT-v2-stage3', 'input_sq_size': 512, 'qk_norm': True, 'enable_flash_attn': True, 'enable_layernorm_kernel': True}, 'vae': {'type': 'VideoAutoencoderKL', 'from_pretrained': './opensora/models/sd-vae-ft-ema', 'cache_dir': None, 'micro_batch_size': 4}, 'text_encoder': {'type': 't5', 'from_pretrained': './opensora/models/text_encoder/t5-v1_1-xxl', 'cache_dir': None, 'model_max_length': 200}, 'scheduler': {'type': 'iddpm', 'num_sampling_steps': 100, 'cfg_scale': 7.0, 'cfg_channel': 3}, 'dtype': 'fp16', 'prompt_path': './assets/texts/t2v_samples.txt', 'prompt': ['A fashion girl walking on the streets of Tokyo'], 'batch_size': 1, 'seed': 42, 'save_dir': './samples/samples/', 'config': 'configs/opensora-v1-1/inference/sample.py', 'prompt_as_path': False, 'reference_path': None, 'loop': 1, 'sample_name': None, 'num_sample': 1}
Loading checkpoint shards:   0%|                          | 0/2 [00:00<?, ?it/s]/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:35<00:00, 17.87s/it]
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
100%|█████████████████████████████████████████| 100/100 [02:11<00:00,  1.32s/it]
Prompt: A fashion girl walking on the streets of Tokyo
Saved to ./samples/samples/sample_0.mp4

生成的影片儲存在Open-Sora/samples資料夾中,隨機檢視:

import os
import random
from moviepy.editor import *
from IPython.display import Image
​
# 影片存放目錄
video_root = 'samples/samples'
# 列出所有檔案
videos = os.listdir(video_root)
# 隨機抽取影片
video = random.sample(videos, 1)[0]
# 影片輸入路徑
video_path = os.path.join(video_root, video)
# 載入原始影片
clip = VideoFileClip(video_path)
# 儲存為GIF檔案
clip.write_gif("output_animation.gif", fps=10)
# 顯示生成結果
Image(open('output_animation.gif','rb').read())
MoviePy - Building file output_animation.gif with imageio.

4. Gradio 介面

修改配置檔案:

%%writefile configs/opensora-v1-1/inference/sample-ref.py
num_frames = 16
frame_interval = 3
fps = 24
image_size = (240, 426)
multi_resolution = "STDiT2"
​
# Condition
prompt_path = None
prompt = [
    "A car driving on the ocean.",
    "In an ornate, historical hall, a massive tidal wave peaks and begins to crash. Two surfers, seizing the moment, skillfully navigate the face of the wave.",
]
​
loop = 2
condition_frame_length = 4
# (
#   loop id, [the loop index of the condition image or video]
#   reference id, [the index of the condition image or video in the reference_path]
#   reference start, [the start frame of the condition image or video]
#   target start, [the location to insert]
#   length, [the number of frames to insert]
#   edit_ratio [the edit rate of the condition image or video]
# )
# See https://github.com/hpcaitech/Open-Sora/blob/main/docs/config.md#advanced-inference-config for more details
# See https://github.com/hpcaitech/Open-Sora/blob/main/docs/commands.md#inference-with-open-sora-11 for more examples
mask_strategy = [
    "0,0,0,0,8,0.3",
    None,
    "0",
]
reference_path = [
    "https://cdn.openai.com/tmp/s/interp/d0.mp4",
    None,
    "assets/images/condition/wave.png",
]
​
# Define model
model = dict(
    type="STDiT2-XL/2",
    from_pretrained="hpcai-tech/OpenSora-STDiT-v2-stage3",
    input_sq_size=512,  # 使用huggingface上下載好的模型權重
    qk_norm=True,
    enable_flash_attn=True,
    enable_layernorm_kernel=True,
)
vae = dict(
    type="VideoAutoencoderKL",
    from_pretrained="./opensora/models/sd-vae-ft-ema",
    cache_dir=None,  # 修改為從當前目錄載入
    micro_batch_size=4,
)
text_encoder = dict(
    type="t5",
    from_pretrained="./opensora/models/text_encoder/t5-v1_1-xxl",
    cache_dir=None,  # 修改為從當前目錄載入
    model_max_length=200,
)
scheduler = dict(
    type="iddpm",
    num_sampling_steps=100,
    cfg_scale=7.0,
    cfg_channel=3,  # or None
)
dtype = "fp16"
​
# Others
batch_size = 1
seed = 42
save_dir = "./samples/samples/"
Overwriting configs/opensora-v1-1/inference/sample-ref.py

修改Gradio應用:

%%writefile gradio/app-ref.py
import argparse
import importlib
import os
import subprocess
import sys
import re
import json
import math
import spaces
import torch
import gradio as gr
from tempfile import NamedTemporaryFile
import datetime
from transformers import pipeline
​
zh2en = pipeline("translation", model="./opus-mt-zh-en")
​
MODEL_TYPES = ["v1.1-stage2", "v1.1-stage3"]
CONFIG_MAP = {
    "v1.1-stage2": "configs/opensora-v1-1/inference/sample-ref.py",
    "v1.1-stage3": "configs/opensora-v1-1/inference/sample-ref.py",
}
HF_STDIT_MAP = {
    "v1.1-stage2": "hpcai-tech/OpenSora-STDiT-v2-stage2",
    "v1.1-stage3": "hpcai-tech/OpenSora-STDiT-v2-stage3",
}
RESOLUTION_MAP = {
    "144p": {
        "16:9": (256, 144), 
        "9:16": (144, 256),
        "4:3": (221, 165),
        "3:4": (165, 221),
        "1:1": (192, 192),
    },
    "240p": {
        "16:9": (426, 240), 
        "9:16": (240, 426),
        "4:3": (370, 278),
        "3:4": (278, 370),
        "1:1": (320, 320),
    },
    "360p": {
        "16:9": (640, 360), 
        "9:16": (360, 640),
        "4:3": (554, 416),
        "3:4": (416, 554),
        "1:1": (480, 480),
    },
    "480p": {
        "16:9": (854, 480), 
        "9:16": (480, 854),
        "4:3": (740, 555),
        "3:4": (555, 740),
        "1:1": (640, 640),
    },
    "720p": {
        "16:9": (1280, 720), 
        "9:16": (720, 1280),
        "4:3": (1108, 832),
        "3:4": (832, 1110),
        "1:1": (960, 960),
    },
}
​
​
# ============================
# Utils
# ============================
def collect_references_batch(reference_paths, vae, image_size):
    from opensora.datasets.utils import read_from_path
​
    refs_x = []
    for reference_path in reference_paths:
        if reference_path is None:
            refs_x.append([])
            continue
        ref_path = reference_path.split(";")
        ref = []
        for r_path in ref_path:
            r = read_from_path(r_path, image_size, transform_name="resize_crop")
            r_x = vae.encode(r.unsqueeze(0).to(vae.device, vae.dtype))
            r_x = r_x.squeeze(0)
            ref.append(r_x)
        refs_x.append(ref)
    # refs_x: [batch, ref_num, C, T, H, W]
    return refs_x
​
​
def process_mask_strategy(mask_strategy):
    mask_batch = []
    mask_strategy = mask_strategy.split(";")
    for mask in mask_strategy:
        mask_group = mask.split(",")
        assert len(mask_group) >= 1 and len(mask_group) <= 6, f"Invalid mask strategy: {mask}"
        if len(mask_group) == 1:
            mask_group.extend(["0", "0", "0", "1", "0"])
        elif len(mask_group) == 2:
            mask_group.extend(["0", "0", "1", "0"])
        elif len(mask_group) == 3:
            mask_group.extend(["0", "1", "0"])
        elif len(mask_group) == 4:
            mask_group.extend(["1", "0"])
        elif len(mask_group) == 5:
            mask_group.append("0")
        mask_batch.append(mask_group)
    return mask_batch
​
​
def apply_mask_strategy(z, refs_x, mask_strategys, loop_i):
    masks = []
    for i, mask_strategy in enumerate(mask_strategys):
        mask = torch.ones(z.shape[2], dtype=torch.float, device=z.device)
        if mask_strategy is None:
            masks.append(mask)
            continue
        mask_strategy = process_mask_strategy(mask_strategy)
        for mst in mask_strategy:
            loop_id, m_id, m_ref_start, m_target_start, m_length, edit_ratio = mst
            loop_id = int(loop_id)
            if loop_id != loop_i:
                continue
            m_id = int(m_id)
            m_ref_start = int(m_ref_start)
            m_length = int(m_length)
            m_target_start = int(m_target_start)
            edit_ratio = float(edit_ratio)
            ref = refs_x[i][m_id]  # [C, T, H, W]
            if m_ref_start < 0:
                m_ref_start = ref.shape[1] + m_ref_start
            if m_target_start < 0:
                # z: [B, C, T, H, W]
                m_target_start = z.shape[2] + m_target_start
            z[i, :, m_target_start : m_target_start + m_length] = ref[:, m_ref_start : m_ref_start + m_length]
            mask[m_target_start : m_target_start + m_length] = edit_ratio
        masks.append(mask)
    masks = torch.stack(masks)
    return masks
​
​
def process_prompts(prompts, num_loop):
    from opensora.models.text_encoder.t5 import text_preprocessing
​
    ret_prompts = []
    for prompt in prompts:
        if prompt.startswith("|0|"):
            prompt_list = prompt.split("|")[1:]
            text_list = []
            for i in range(0, len(prompt_list), 2):
                start_loop = int(prompt_list[i])
                text = prompt_list[i + 1]
                text = text_preprocessing(text)
                end_loop = int(prompt_list[i + 2]) if i + 2 < len(prompt_list) else num_loop
                text_list.extend([text] * (end_loop - start_loop))
            assert len(text_list) == num_loop, f"Prompt loop mismatch: {len(text_list)} != {num_loop}"
            ret_prompts.append(text_list)
        else:
            prompt = text_preprocessing(prompt)
            ret_prompts.append([prompt] * num_loop)
    return ret_prompts
​
​
def extract_json_from_prompts(prompts):
    additional_infos = []
    ret_prompts = []
    for prompt in prompts:
        parts = re.split(r"(?=[{\[])", prompt)
        assert len(parts) <= 2, f"Invalid prompt: {prompt}"
        ret_prompts.append(parts[0])
        if len(parts) == 1:
            additional_infos.append({})
        else:
            additional_infos.append(json.loads(parts[1]))
    return ret_prompts, additional_infos
​
​
# ============================
# Model-related
# ============================
def read_config(config_path):
    """
    Read the configuration file.
    """
    from mmengine.config import Config
​
    return Config.fromfile(config_path)
​
​
def build_models(model_type, config, enable_optimization=False):
    """
    Build the models for the given model type and configuration.
    """
    # build vae
    from opensora.registry import MODELS, build_module
​
    vae = build_module(config.vae, MODELS).cuda()
​
    # build text encoder
    text_encoder = build_module(config.text_encoder, MODELS)  # T5 must be fp32
    text_encoder.t5.model = text_encoder.t5.model.cuda()
​
    # build stdit
    # we load model from HuggingFace directly so that we don't need to
    # handle model download logic in HuggingFace Space
    from opensora.models.stdit.stdit2 import STDiT2
​
    stdit = STDiT2.from_pretrained(
        HF_STDIT_MAP[model_type],
        enable_flash_attn=enable_optimization,
        trust_remote_code=True,
    ).cuda()
​
    # build scheduler
    from opensora.registry import SCHEDULERS
​
    scheduler = build_module(config.scheduler, SCHEDULERS)
​
    # hack for classifier-free guidance
    text_encoder.y_embedder = stdit.y_embedder
​
    # move modelst to device
    vae = vae.to(torch.float16).eval()
    text_encoder.t5.model = text_encoder.t5.model.eval()  # t5 must be in fp32
    stdit = stdit.to(torch.float16).eval()
​
    # clear cuda
    torch.cuda.empty_cache()
    return vae, text_encoder, stdit, scheduler
​
​
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model-type",
        default="v1.1-stage3",
        choices=MODEL_TYPES,
        help=f"The type of model to run for the Gradio App, can only be {MODEL_TYPES}",
    )
    parser.add_argument("--output", default="./outputs", type=str, help="The path to the output folder")
    parser.add_argument("--port", default=None, type=int, help="The port to run the Gradio App on.")
    parser.add_argument("--host", default=None, type=str, help="The host to run the Gradio App on.")
    parser.add_argument("--share", action="store_true", help="Whether to share this gradio demo.")
    parser.add_argument(
        "--enable-optimization",
        action="store_true",
        help="Whether to enable optimization such as flash attention and fused layernorm",
    )
    return parser.parse_args()
​
​
# ============================
# Main Gradio Script
# ============================
# as `run_inference` needs to be wrapped by `spaces.GPU` and the input can only be the prompt text
# so we can't pass the models to `run_inference` as arguments.
# instead, we need to define them globally so that we can access these models inside `run_inference`
​
# read config
args = parse_args()
config = read_config(CONFIG_MAP[args.model_type])
​
# make outputs dir
os.makedirs(args.output, exist_ok=True)
​
# disable torch jit as it can cause failure in gradio SDK
# gradio sdk uses torch with cuda 11.3
torch.jit._state.disable()
​
# import after installation
from opensora.datasets import IMG_FPS, save_sample
from opensora.utils.misc import to_torch_dtype
​
# some global variables
dtype = to_torch_dtype(config.dtype)
device = torch.device("cuda")
​
# build model
vae, text_encoder, stdit, scheduler = build_models(args.model_type, config, enable_optimization=args.enable_optimization)
​
​
def run_inference(mode, prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale):
    torch.manual_seed(seed)
    with torch.inference_mode():
        # ======================
        # 1. Preparation
        # ======================
        # parse the inputs
        resolution = RESOLUTION_MAP[resolution][aspect_ratio]
​
        # gather args from config
        num_frames = config.num_frames
        frame_interval = config.frame_interval
        fps = config.fps
        condition_frame_length = config.condition_frame_length
​
        # compute number of loops
        if mode == "Text2Image":
            num_frames = 1
            num_loop = 1
        else:
            num_seconds = int(length.rstrip('s'))
            if num_seconds <= 16:
                num_frames = num_seconds * fps // frame_interval
                num_loop = 1
            else:
                config.num_frames = 16
                total_number_of_frames = num_seconds * fps / frame_interval
                num_loop = math.ceil((total_number_of_frames - condition_frame_length) / (num_frames - condition_frame_length))
​
        # prepare model args
        if config.num_frames == 1:
            fps = IMG_FPS
​
        model_args = dict()
        height_tensor = torch.tensor([resolution[0]], device=device, dtype=dtype)
        width_tensor = torch.tensor([resolution[1]], device=device, dtype=dtype)
        num_frames_tensor = torch.tensor([num_frames], device=device, dtype=dtype)
        ar_tensor = torch.tensor([resolution[0] / resolution[1]], device=device, dtype=dtype)
        fps_tensor = torch.tensor([fps], device=device, dtype=dtype)
        model_args["height"] = height_tensor
        model_args["width"] = width_tensor
        model_args["num_frames"] = num_frames_tensor
        model_args["ar"] = ar_tensor
        model_args["fps"] = fps_tensor
​
        # compute latent size
        input_size = (num_frames, *resolution)
        latent_size = vae.get_latent_size(input_size)
​
        # process prompt
        prompt = zh2en(prompt_text)[0].get("translation_text")
        prompt_raw = [prompt]
        print(prompt_raw)
        prompt_raw, _ = extract_json_from_prompts(prompt_raw)
        prompt_loops = process_prompts(prompt_raw, num_loop)
        video_clips = []
​
        # prepare mask strategy
        if mode == "Text2Image":
            mask_strategy = [None]
        elif mode == "Text2Video":
            if reference_image is not None:
                mask_strategy = ['0']
            else:
                mask_strategy = [None]
        else:
            raise ValueError(f"Invalid mode: {mode}")
​
        # =========================
        # 2. Load reference images
        # =========================
        if mode == "Text2Image":
            refs_x = collect_references_batch([None], vae, resolution)
        elif mode == "Text2Video":
            if reference_image is not None:
                # save image to disk
                from PIL import Image
                im = Image.fromarray(reference_image)
​
                with NamedTemporaryFile(suffix=".jpg") as temp_file:
                    im.save(temp_file.name)
                    refs_x = collect_references_batch([temp_file.name], vae, resolution)
            else:
                refs_x = collect_references_batch([None], vae, resolution)
        else:
            raise ValueError(f"Invalid mode: {mode}")
​
        # 4.3. long video generation
        for loop_i in range(num_loop):
            # 4.4 sample in hidden space
            batch_prompts = [prompt[loop_i] for prompt in prompt_loops]
            z = torch.randn(len(batch_prompts), vae.out_channels, *latent_size, device=device, dtype=dtype)
​
            # 4.5. apply mask strategy
            masks = None
​
            # if cfg.reference_path is not None:
            if loop_i > 0:
                ref_x = vae.encode(video_clips[-1])
                for j, refs in enumerate(refs_x):
                    if refs is None:
                        refs_x[j] = [ref_x[j]]
                    else:
                        refs.append(ref_x[j])
                    if mask_strategy[j] is None:
                        mask_strategy[j] = ""
                    else:
                        mask_strategy[j] += ";"
                    mask_strategy[
                        j
                    ] += f"{loop_i},{len(refs)-1},-{condition_frame_length},0,{condition_frame_length}"
​
            masks = apply_mask_strategy(z, refs_x, mask_strategy, loop_i)
​
            # 4.6. diffusion sampling
            # hack to update num_sampling_steps and cfg_scale
            scheduler_kwargs = config.scheduler.copy()
            scheduler_kwargs.pop('type')
            scheduler_kwargs['num_sampling_steps'] = sampling_steps
            scheduler_kwargs['cfg_scale'] = cfg_scale
​
            scheduler.__init__(
                **scheduler_kwargs
            )
            samples = scheduler.sample(
                stdit,
                text_encoder,
                z=z,
                prompts=batch_prompts,
                device=device,
                additional_args=model_args,
                mask=masks,  # scheduler must support mask
            )
            samples = vae.decode(samples.to(dtype))
            video_clips.append(samples)
​
            # 4.7. save video
            if loop_i == num_loop - 1:
                video_clips_list = [
                    video_clips[0][0]] + [video_clips[i][0][:, config.condition_frame_length :] 
                    for i in range(1, num_loop)
                ]
                video = torch.cat(video_clips_list, dim=1)
                current_datetime = datetime.datetime.now()
                timestamp = current_datetime.timestamp()
                save_path = os.path.join(args.output, f"output_{timestamp}")
                saved_path = save_sample(video, save_path=save_path, fps=config.fps // config.frame_interval)
                return saved_path
​
@spaces.GPU(duration=200)
def run_image_inference(prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale):
    return run_inference("Text2Image", prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale)
​
@spaces.GPU(duration=200)
def run_video_inference(prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale):
    return run_inference("Text2Video", prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale)
​
​
def main():
    # create demo
    with gr.Blocks() as demo:
        with gr.Row():
            with gr.Column():
                gr.HTML("""<h1 align="center">Open-Sora 1.1</h1>""")
​
        with gr.Row():
            with gr.Column():
                prompt_text = gr.Textbox(
                    label="Prompt",
                    placeholder="請輸入中文提示詞",
                    lines=4,
                )
                resolution = gr.Radio(
                     choices=["144p", "240p", "360p", "480p", "720p"],
                     value="240p",
                    label="Resolution", 
                )
                aspect_ratio = gr.Radio(
                     choices=["9:16", "16:9", "3:4", "4:3", "1:1"],
                     value="9:16",
                    label="Aspect Ratio (H:W)", 
                )
                length = gr.Radio(
                    choices=["2s", "4s", "8s", "16s"], 
                    value="2s",
                    label="Video Length (only effective for video generation)", 
                    info="8s may fail as Hugging Face ZeroGPU has the limitation of max 200 seconds inference time."
                )
​
                with gr.Row():
                    seed = gr.Slider(
                        value=1024,
                        minimum=1,
                        maximum=2048,
                        step=1,
                        label="Seed"
                    )
​
                    sampling_steps = gr.Slider(
                        value=100,
                        minimum=1,
                        maximum=200,
                        step=1,
                        label="Sampling steps"
                    )
                    cfg_scale = gr.Slider(
                        value=7.0,
                        minimum=0.0,
                        maximum=10.0,
                        step=0.1,
                        label="CFG Scale"
                    )
                
                reference_image = gr.Image(
                    label="Reference Image (Optional)",
                )
            
            with gr.Column():
                output_video = gr.Video(
                    label="Output Video",
                    height="100%"
                )
​
        with gr.Row():
             image_gen_button = gr.Button("Generate image")
             video_gen_button = gr.Button("Generate video")
        
​
        image_gen_button.click(
             fn=run_image_inference, 
             inputs=[prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale], 
             outputs=reference_image
             )
        video_gen_button.click(
             fn=run_video_inference, 
             inputs=[prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale], 
             outputs=output_video
             )
​
    # launch
    demo.launch(share=True, inbrowser=True)
    
​
if __name__ == "__main__":
    main()
Writing gradio/app-ref.py

執行Gradio應用,執行成功後點選

Running on public URL

後的網頁連結即可體驗!

!python gradio/app-ref.py
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/transformers/models/marian/tokenization_marian.py:197: UserWarning: Recommended: pip install sacremoses.
  warnings.warn("Recommended: pip install sacremoses.")
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel
  warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel")
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:32<00:00, 16.15s/it]
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://64147712240bbb3753.gradio.live
​
This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)

無需搭建環境,零門檻帶你體驗Open-Sora文生影片應用

我們也準備了一些提示詞以供參考:

一隻穿著紫色長袍的胖兔子穿過奇幻的風景

海浪衝擊著孤零零的燈塔,不祥的燈光

一個神秘的森林展示了旅行者的冒險經歷

一個藍頭髮的法師在唱歌

一個超現實的景觀,漂浮的島嶼和空中的瀑布

一隻藍鳥站在水裡

一個年輕人獨自走在海邊

粉紅色的玫瑰在玻璃表面滴,特寫

驅車遠眺,一列地鐵正從隧道中駛出

太空中所有的行星都是綠色和粉色的,背景是明亮的白色恆星

一座漂浮在星體空間的城市,有星星和星雲 高樓頂上的日出

粉色和青色粉末爆炸 樹林裡的鹿在陽光下凝視著相機

一道閃電,一個巫師從稀薄的空氣中出現了,他的長袍在風中翻騰

夜晚的未來賽博朋克城市景觀,高聳的霓虹燈照亮的摩天大樓

在這裡,樹木、花朵和動物聚集在一起,譜寫出一曲大自然的交響樂

一艘幽靈般的船在雲層中航行,在月光下的天空中航行 日落和美麗的海灘

一個年輕人獨自走在森林裡

生成好的影片也可以使用MusicGen進行配樂,使用AI進行短影片創作。

5. 影片效果展示

提示詞:一個極端的特寫一個頭發花白的鬍子的男人在他的60年代,他在思想深處思考宇宙的歷史,他坐在一家咖啡館在巴黎,他的眼睛關注人私生活方面大多像他坐在他們走不動,他穿著一件羊毛外套西裝外套和一件襯衫,他穿著一件棕色的貝雷帽,眼鏡和有一個非常專業的外表,和結束他提供了一個微妙的封閉式的笑容好像找到了答案的神秘生活,燈光非常電影化,金色的燈光和巴黎的街道和城市作為背景,景深,電影化的35mm膠片。

無需搭建環境,零門檻帶你體驗Open-Sora文生影片應用

提示詞:無人機拍攝的海浪衝擊著大蘇爾加雷角海灘上崎嶇的懸崖。藍色的海水拍打著白色的波浪,夕陽的金色光芒照亮了岩石海岸。遠處有一座小島,島上有一座燈塔,懸崖邊上長滿了綠色的灌木叢。從公路到海灘的陡峭落差是一個戲劇性的壯舉,懸崖的邊緣突出在海面上。這是一幅捕捉到海岸原始美景和太平洋海岸公路崎嶇景觀的景色。

無需搭建環境,零門檻帶你體驗Open-Sora文生影片應用

提示詞:一段高聳的無人機鏡頭捕捉到了海岸懸崖的雄偉之美,它的紅色和黃色分層岩石表面色彩豐富,映襯著充滿活力的綠松石般的大海。可以看到海鳥在懸崖峭壁上飛翔。當無人機從不同角度緩慢移動時,變化的陽光投射出移動的陰影,突出了懸崖的崎嶇紋理和周圍平靜的大海。水輕輕地拍打著岩石基座和附著在懸崖頂部的綠色植物,這一場景給人一種寧靜的感覺,在海洋的邊緣孤立。這段影片捕捉了未受人類建築影響的原始自然美的本質。

無需搭建環境,零門檻帶你體驗Open-Sora文生影片應用

提示詞:雄偉美麗的瀑布從懸崖上傾瀉而下,進入寧靜的湖泊。瀑布,以其強大的流量,是影片的中心焦點。周圍的景色鬱鬱蔥蔥,樹木和樹葉增添了自然美景。相機角度提供了瀑布的鳥瞰圖,讓觀眾欣賞瀑布的全部高度和壯觀。這段影片令人驚歎地展現了大自然的力量和美。

無需搭建環境,零門檻帶你體驗Open-Sora文生影片應用

提示詞:夜晚熙熙攘攘的城市街道,充滿了汽車前燈的光輝和街燈的環境光。場景是一個模糊的運動,汽車飛馳而過,行人在人行橫道上穿行。城市景觀是高聳的建築和照明標誌的混合,創造了一個充滿活力和動態的氛圍。影片的視角是高角度的,提供了街道及其周圍環境的鳥瞰圖。整個影片的風格充滿活力和活力,捕捉到了夜晚城市生活的精髓。

無需搭建環境,零門檻帶你體驗Open-Sora文生影片應用

提示詞:森林地區寧靜的夜景。第一個畫面是一個寧靜的湖泊,倒映著繁星滿天的夜空。第二幀展示了美麗的日落,在風景上投下溫暖的光芒。第三幀展示了夜空,充滿了星星和充滿活力的銀河系。這段影片是延時拍攝的,捕捉了從白天到夜晚的過渡,湖泊和森林作為恆定的背景。影片的風格是自然主義的,強調夜空的美麗和森林的寧靜。

無需搭建環境,零門檻帶你體驗Open-Sora文生影片應用

點選關注,第一時間瞭解華為雲新鮮技術~

相關文章