LISA: Reasoning Segmentation via Large Language Model

脂环發表於2024-06-12

原文網址 : https://www.cnblogs.com/lipoicyclic/p/18244045

Segmentation

Motivation & Abs

現有的感知系統依賴人類的指示，難以主動推理以理解人類意圖。

新任務：reasoning segmentation，模型需要根據給定的複雜 / 具有隱含意義的文字輸出相應的seg mask。

新的benchmark：包含1000張左右影像的資料集（image-instruction-mask）。

模型：LISA，既有LLM的語言生成能力，又有生成分割mask的能力。訓練好的模型在非reasoning的資料集上也有著較強的zs能力，同時僅僅使用少量reasoning data對模型進行ft就可以大幅提升效能。

Reasoning Segmentation

reasoning segmentation相當於更加困難的referring segmentation，查詢的文字是更復雜的表達或者更長的句子，涉及到對現實世界知識的推理。資料集：文字為短語和長句子，影像總計1218張，包含239張訓練影像，200張驗證影像以及779張測試影像。

Method

Architecture

Embedding as Mask. 之前的方法如LLaVA以及BLIP2等僅能接受圖片輸入同時輸出文字，無法輸出細粒度的分割mask。VisionLLM提供了一種解決方案，將掩碼錶示為一系列的多邊形頂點，使之能夠用文字描述，然而使用多邊形序列的端到端訓練最佳化困難，並且可能會損害泛化能力，除非使用大量資料和計算資源。為此，作者提出了使用embedding作為mask的正規化從而將分割能力融入LLM，對LLM的詞彙表進行擴充，額外新增了<SEG> token，用來代表輸出的分割結果。

截圖2024-06-11 17.36.54

給定文字指令\(\hat{y}_{txt}\)以及輸入影像\(x_{img}\)，作者將其輸入多模態LLM \(\mathcal{F}\)，得到輸出\(\hat{y}_{txt}\)（包含<SEG>標記）。同時將SAM image encoder給出的dense feature與<SEG>送入SAM的decoder即可得到分割mask。

損失函式：

截圖2024-06-12 14.51.57

截圖2024-06-12 14.52.10

這種方式能夠支援端到端的訓練，比兩階段的方法更加有效。

訓練

訓練資料形式。

Semantic Set Dataset：訓練時對每張圖片隨機選擇幾個類別，類別對應的mask為GT。QA模版如同：“USER: <IMAGE> Can you segment the {class name} in this image? ASSISTANT: It is <SEG>.”

Vanilla Referring Segmentation Dataset：資料包含圖片和對應物體的文字描述。QA模版：“USER: <IMAGE> Can you segment {description} in this image? ASSISTANT: Sure, it is <SEG>.”

Visual Question Answering Dataset：目的是保持MLLM的VQA能力。

可學習引數。用lora微調LLM，凍住image encoder，訓練mask decoder、LLM token embedding、LLM head、projection layer。

為什麼不會發生災難遺忘：訓練使用了VQA資料。

實驗

截圖2024-06-12 15.10.01

Metric: gIoU和cIoU，gIoU 為所有影像IoU的平均值，而 cIoU 由累積並集上的累積交集定義。由於cIoU高度偏向於大面積物體，而且波動太大，所以首選gIoU。截圖2024-06-12 15.25.17

Awesome-LLM: a curated list of Large Language Model
2024-06-03
論文解讀《MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots》
2024-10-01
ASTAI
LAMBDA: A Large Model Based Data Agent
2024-11-15
Large language models as surrogate models in evolutionary algorithms: A preliminary study
2024-12-06
Go
大型語言模型(Large Language Models)的介紹
2024-09-22
模型
Paper Reading: JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
2024-12-10
AI
MODEL COMPRESSION VIA DISTILLATION AND QUANTIZATION翻譯
2024-06-23
F. Lisa and the Martians
2024-06-17
大語言模型無法理解連結串列 Large Language Models Fails to Understand Chained Table[up to 202407017]
2024-07-17
模型AI
large pool
2019-06-24
2018-08-08 - Lisa’s Code Standard
2018-08-08
Understanding Dataset Design Choices for Multi-hop Reasoning
2020-11-20
第五課第一週程式設計作業assignment-Dinosaurus+Island+--+Character+level+language+model+final
2018-09-05
程式設計
利用詞向量進行推理（Reasoning with word vectors）
2022-01-22
閱讀論文：《Compositional Attention Networks for Machine Reasoning》
2022-04-10
Mac
[Paper Reading] KOSMOS: Language Is Not All You Need: Aligning Perception with Language Models
2024-03-27
ORACLE LARGE MEMORY(zt)
2019-02-28
Oracle
什麼是細分Segmentation？ - KDnuggets
2021-06-23
Segmentation
SQL Injection via DNS
2020-08-19
SQLDNS
machine learning model(algorithm model) .vs. statistical model
2018-08-16
MacGo
Html language common symbolic entities
2024-04-04
HTMLSymbol
R language notes | pipes: chaining
2020-11-29
AI
目標檢測：Segmentation is All You Need ？
2019-05-07
Segmentation
Segmentation of retinal OCT images using a random forest classifier
2020-12-30
SegmentationrandomREST
DDRG翻譯.Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection
2020-10-16
WPF Datagrid display via DataGridTemplateColumn
2024-10-03
WPF KeyDown MVVM Via Behavior
2024-08-15
MVVM
【Leetcode】827. Making A Large Island
2020-10-21
LeetCode
.NET Core Common Language Runtime (CoreCLR)
2019-01-16
Google分析language垃圾資訊
2018-07-03
Go
A Survey of Natural Language Question Answering System
2018-08-04
【重要論文】The dictionary and the language learner
2024-10-10
monaco-editor 的 Language Services
2024-06-13
Swift之旅_Language Guide1
2018-04-21
SwiftGUIIDE
死磕The Swift Programming Language——學
2021-09-09
Swift
golang programming language study methods websocket
2021-01-12
GolangWeb
【EmbedMask】《EmbedMask：Embedding Coupling for One-stage Instance Segmentation》
2020-09-23
Segmentation
WPF play vide via MediaPlayer VideoDrawing
2024-04-12
IDE

LISA: Reasoning Segmentation via Large Language Model

Motivation & Abs

Reasoning Segmentation

Method

Architecture

訓練

實驗

相關文章