Datawhale X 魔搭 AI夏令營(三)

W12w發表於2024-08-17

一.初識ComfyUI

1.ComfyUI 是GUI的一種,是基於節點工作的使用者介面,主要用於操作影像的生成技術,ComfyUI 的特別之處在於它採用了一種模組化的設計,把影像生成的過程分解成了許多小的步驟,每個步驟都是一個節點。這些節點可以連線起來形成一個工作流程,這樣使用者就可以根據需要定製自己的影像生成過程

2.詳細教程見魔搭教程https://www.yuque.com/office/yuque/0/2024/pptx/1169882/1720432237458-e4b10804-b9cb-401d-aa2c-6de04b5276e0.pptx?from=https%3A%2F%2Fwww.yuque.com%2F2ai%2Fmodel%2Fgutsk9ezeymuebq9

二.開始實踐

1.下載安裝ComfyUI的執行檔案和task1中(見Datawhale X 魔搭 AI夏令營(一))微調完成Lora檔案
git lfs install git clone https://www.modelscope.cn/datasets/maochase/kolors_test_comfyui.git mv kolors_test_comfyui/* ./ rm -rf kolors_test_comfyui/ mkdir -p /mnt/workspace/models/lightning_logs/version_0/checkpoints/ mv epoch=0-step=500.ckpt /mnt/workspace/models/lightning_logs/version_0/checkpoints/

2.一鍵執行安裝程式

3.當執行到最後一個節點的內容輸出了一個訪問的連結的時候,複製連結到瀏覽器中訪問
https://internal-api-drive-stream.feishu.cn/space/api/box/stream/download/preview/GRrbbu8DXo3XrhxYzHwcvbvRnpf/?preview_type=16

三.淺嘗ComfyUI工作流

1.不帶Lora的工作流樣例(先下載工作流指令碼kolors_example.json)

點選檢視程式碼
      
{
  "last_node_id": 15,
  "last_link_id": 18,
  "nodes": [
    {
      "id": 11,
      "type": "VAELoader",
      "pos": [
        1323,
        240
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {},
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            12
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "sdxl.vae.safetensors"
      ]
    },
    {
      "id": 10,
      "type": "VAEDecode",
      "pos": [
        1368,
        369
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 18
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 12,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            13
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 14,
      "type": "KolorsSampler",
      "pos": [
        1011,
        371
      ],
      "size": {
        "0": 315,
        "1": 222
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "kolors_model",
          "type": "KOLORSMODEL",
          "link": 16
        },
        {
          "name": "kolors_embeds",
          "type": "KOLORS_EMBEDS",
          "link": 17
        }
      ],
      "outputs": [
        {
          "name": "latent",
          "type": "LATENT",
          "links": [
            18
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KolorsSampler"
      },
      "widgets_values": [
        1024,
        1024,
        1000102404233412,
        "fixed",
        25,
        5,
        "EulerDiscreteScheduler"
      ]
    },
    {
      "id": 6,
      "type": "DownloadAndLoadKolorsModel",
      "pos": [
        201,
        368
      ],
      "size": {
        "0": 315,
        "1": 82
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "kolors_model",
          "type": "KOLORSMODEL",
          "links": [
            16
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "DownloadAndLoadKolorsModel"
      },
      "widgets_values": [
        "Kwai-Kolors/Kolors",
        "fp16"
      ]
    },
    {
      "id": 3,
      "type": "PreviewImage",
      "pos": [
        1366,
        468
      ],
      "size": [
        535.4001724243165,
        562.2001106262207
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 13
        }
      ],
      "properties": {
        "Node name for S&R": "PreviewImage"
      }
    },
    {
      "id": 12,
      "type": "KolorsTextEncode",
      "pos": [
        519,
        529
      ],
      "size": [
        457.2893696934723,
        225.28656056301645
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "chatglm3_model",
          "type": "CHATGLM3MODEL",
          "link": 14,
          "slot_index": 0
        }
      ],
      "outputs": [
        {
          "name": "kolors_embeds",
          "type": "KOLORS_EMBEDS",
          "links": [
            17
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KolorsTextEncode"
      },
      "widgets_values": [
        "cinematic photograph of an astronaut riding a horse in space |\nillustration of a cat wearing a top hat and a scarf  |\nphotograph of a goldfish in a bowl |\nanime screencap of a red haired girl",
        "",
        1
      ]
    },
    {
      "id": 15,
      "type": "Note",
      "pos": [
        200,
        636
      ],
      "size": [
        273.5273818969726,
        149.55464588512064
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "properties": {
        "text": ""
      },
      "widgets_values": [
        "Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 13,
      "type": "DownloadAndLoadChatGLM3",
      "pos": [
        206,
        522
      ],
      "size": [
        274.5334274291992,
        58
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "outputs": [
        {
          "name": "chatglm3_model",
          "type": "CHATGLM3MODEL",
          "links": [
            14
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "DownloadAndLoadChatGLM3"
      },
      "widgets_values": [
        "fp16"
      ]
    }
  ],
  "links": [
    [
      12,
      11,
      0,
      10,
      1,
      "VAE"
    ],
    [
      13,
      10,
      0,
      3,
      0,
      "IMAGE"
    ],
    [
      14,
      13,
      0,
      12,
      0,
      "CHATGLM3MODEL"
    ],
    [
      16,
      6,
      0,
      14,
      0,
      "KOLORSMODEL"
    ],
    [
      17,
      12,
      0,
      14,
      1,
      "KOLORS_EMBEDS"
    ],
    [
      18,
      14,
      0,
      10,
      0,
      "LATENT"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.1,
      "offset": {
        "0": -114.73954010009766,
        "1": -139.79705810546875
      }
    }
  },
  "version": 0.4
}

    

2.完成第一次生圖

https://internal-api-drive-stream.feishu.cn/space/api/box/stream/download/preview/DIr4bvsLQoCzexxEnIUc2xAmneb/?preview_type=16

https://internal-api-drive-stream.feishu.cn/space/api/box/stream/download/preview/G4XjbLn9YoXhkuxHvNJcZyKinmd/?preview_type=16
3. 帶Lora的工作流樣例(工作流指令碼kolors_with_lora_example.json)

點選檢視程式碼
      
{
  "last_node_id": 16,
  "last_link_id": 20,
  "nodes": [
    {
      "id": 11,
      "type": "VAELoader",
      "pos": [
        1323,
        240
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {},
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            12
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "sdxl.vae.safetensors"
      ]
    },
    {
      "id": 10,
      "type": "VAEDecode",
      "pos": [
        1368,
        369
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 18
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 12,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            13
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 15,
      "type": "Note",
      "pos": [
        200,
        636
      ],
      "size": {
        "0": 273.5273742675781,
        "1": 149.5546417236328
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "properties": {
        "text": ""
      },
      "widgets_values": [
        "Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 13,
      "type": "DownloadAndLoadChatGLM3",
      "pos": [
        206,
        522
      ],
      "size": {
        "0": 274.5334167480469,
        "1": 58
      },
      "flags": {},
      "order": 2,
      "mode": 0,
      "outputs": [
        {
          "name": "chatglm3_model",
          "type": "CHATGLM3MODEL",
          "links": [
            14
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "DownloadAndLoadChatGLM3"
      },
      "widgets_values": [
        "fp16"
      ]
    },
    {
      "id": 6,
      "type": "DownloadAndLoadKolorsModel",
      "pos": [
        201,
        368
      ],
      "size": {
        "0": 315,
        "1": 82
      },
      "flags": {},
      "order": 3,
      "mode": 0,
      "outputs": [
        {
          "name": "kolors_model",
          "type": "KOLORSMODEL",
          "links": [
            19
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "DownloadAndLoadKolorsModel"
      },
      "widgets_values": [
        "Kwai-Kolors/Kolors",
        "fp16"
      ]
    },
    {
      "id": 12,
      "type": "KolorsTextEncode",
      "pos": [
        519,
        529
      ],
      "size": {
        "0": 457.28936767578125,
        "1": 225.28656005859375
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "chatglm3_model",
          "type": "CHATGLM3MODEL",
          "link": 14,
          "slot_index": 0
        }
      ],
      "outputs": [
        {
          "name": "kolors_embeds",
          "type": "KOLORS_EMBEDS",
          "links": [
            17
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KolorsTextEncode"
      },
      "widgets_values": [
        "二次元,長髮,少女,白色背景",
        "",
        1
      ]
    },
    {
      "id": 3,
      "type": "PreviewImage",
      "pos": [
        1366,
        469
      ],
      "size": {
        "0": 535.400146484375,
        "1": 562.2001342773438
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 13
        }
      ],
      "properties": {
        "Node name for S&R": "PreviewImage"
      }
    },
    {
      "id": 16,
      "type": "LoadKolorsLoRA",
      "pos": [
        606,
        368
      ],
      "size": {
        "0": 317.4000244140625,
        "1": 82
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "kolors_model",
          "type": "KOLORSMODEL",
          "link": 19
        }
      ],
      "outputs": [
        {
          "name": "kolors_model",
          "type": "KOLORSMODEL",
          "links": [
            20
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "LoadKolorsLoRA"
      },
      "widgets_values": [
        "/mnt/workspace/models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt",
        2
      ]
    },
    {
      "id": 14,
      "type": "KolorsSampler",
      "pos": [
        1011,
        371
      ],
      "size": {
        "0": 315,
        "1": 266
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "kolors_model",
          "type": "KOLORSMODEL",
          "link": 20
        },
        {
          "name": "kolors_embeds",
          "type": "KOLORS_EMBEDS",
          "link": 17
        },
        {
          "name": "latent",
          "type": "LATENT",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "latent",
          "type": "LATENT",
          "links": [
            18
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KolorsSampler"
      },
      "widgets_values": [
        1024,
        1024,
        0,
        "fixed",
        25,
        5,
        "EulerDiscreteScheduler",
        1
      ]
    }
  ],
  "links": [
    [
      12,
      11,
      0,
      10,
      1,
      "VAE"
    ],
    [
      13,
      10,
      0,
      3,
      0,
      "IMAGE"
    ],
    [
      14,
      13,
      0,
      12,
      0,
      "CHATGLM3MODEL"
    ],
    [
      17,
      12,
      0,
      14,
      1,
      "KOLORS_EMBEDS"
    ],
    [
      18,
      14,
      0,
      10,
      0,
      "LATENT"
    ],
    [
      19,
      6,
      0,
      16,
      0,
      "KOLORSMODEL"
    ],
    [
      20,
      16,
      0,
      14,
      0,
      "KOLORSMODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.2100000000000002,
      "offset": {
        "0": -183.91309381910426,
        "1": -202.11110769225016
      }
    }
  },
  "version": 0.4
}

    

4.生圖步驟同上

四.Lora詳解https://www.bilibili.com/video/BV1nT421k7Fa/?t=28.133192&spm_id_from=333.1350.jump_directly&vd_source=87de892b60ecaebe6b8575e21f4aa997

五.準備一個高質量的資料集

當我們進行圖片生成相關的工作時,選擇合適的資料集是非常重要的。如何找到適合自己的資料集呢,這裡給大家整理了一些重要的參考維度,希望可以幫助你快速找到適合的資料集:

1.明確你的需求和目標

a.關注應用場景**:確定你的模型將被應用到什麼樣的場景中(例如,藝術風格轉換、產品影像生成、醫療影像合成等)。

b.關注資料型別**:你需要什麼樣的圖片?比如是真實世界的照片還是合成影像?是黑白的還是彩色的?是高解析度還是低解析度?

c.關注資料量**:考慮你的任務應該需要多少圖片來支援訓練和驗證。

2.資料集來源整理

以下渠道來源均需要考慮合規性問題,請大家在使用資料集過程中謹慎選擇

相關文章