表七:合成 detail caption 資料在 LLaVA-1.5-7B 模型訓練中的作用 參考文獻[1] Zhuang Li, Yuyang Chai, Terry Zhuo Yue, Lizhen Qu, Gholamreza Haffari, Fei Li, Donghong Ji, and Quan Hung Tran. Factual: A benchmark for faithful and consistent textual scene graph parsing. arXiv:2305.17497, 2023[2] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. ICCV 2023[3] Matthias Minderer, Alexey Gritsenko, and Neil Houlsby. Scaling open-vocabulary object detection. NIPS 2024[4] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288, 2023[5] Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following, 2023[6] Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, Jiazheng Xu, Bin Xu, Juanzi Li, Yuxiao Dong, Ming Ding, and Jie Tang. Cogvlm: Visual expert for pretrained language models. arXiv:2311.03079, 2023[7] Lin Chen, Jisong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, and Dahua Lin. Sharegpt4v: Improving large multi-modal models with better captions. arXiv:2311.12793, 2023[8] Zhang Li, Biao Yang, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu, and Xiang Bai. Monkey: Image resolution and text label are important things for large multi-modal models. arXiv:2311.06607, 2023
豆包大模型團隊
位元組跳動豆包大模型團隊成立於 2023 年,致力於開發業界最先進的 AI 大模型技術,成為世界一流的研究團隊,為科技和社會發展作出貢獻。 豆包大模型團隊在 AI 領域擁有長期願景與決心,研究方向涵蓋 NLP、CV、語音等,在中國、新加坡、美國等地設有實驗室和研究崗位。團隊依託平臺充足的資料、計算等資源,在相關領域持續投入,已推出自研通用大模型,提供多模態能力,下游支援豆包、釦子、即夢等 50 + 業務,並透過火山引擎開放給企業客戶。目前,豆包 APP 已成為中國市場使用者量最大的 AIGC 應用。歡迎加入位元組跳動豆包大模型團隊。 https://mp.weixin.qq.com/s/ZjQ-v6reZXhBP6G27cbmlQ