AIxiv專欄是機器之心釋出學術、技術內容的欄目。過去數年,機器之心AIxiv專欄接收報導了2000多篇內容,覆蓋全球各大高校與企業的頂級實驗室,有效促進了學術交流與傳播。如果您有優秀的工作想要分享,歡迎投稿或者聯絡報導。投稿郵箱:liyazhou@jiqizhixin.com;zhaoyunfeng@jiqizhixin.com
最近,來自中國科學技術大學、北京大學、上海 AI Lab等團隊的研究人員釋出了引人矚目的 ShareGPT4Video系列,旨在提升影片理解和生成能力。
論文連結: https://arxiv.org/abs/2406.04325v1 專案連結:https://sharegpt4video.github.io/ 資料集連結:https://huggingface.co/datasets/ShareGPT4Video/ShareGPT4Video 程式碼連結: https://github.com/ShareGPT4Omni/ShareGPT4Video Demo連結: https://huggingface.co/spaces/Lin-Chen/ShareCaptioner-Video
清晰地理解幀間的時序變化。 詳細準確地描述幀內內容。 對任意長度影片的可擴充套件性。
Caption 1:
The video segment documented a significant event in Kochi, Kerala, where 2 buildings razed in Kochi. The broadcast began with a split-screen presentation: on one side, thick clouds of dust were seen billowing into the sky, marking the onset of the demolition process, while on the other side, reporter Gopikrishnan provided live coverage, indicated by "BREAKING NEWS" captions and a consistent timestamp of "11:10 AM." The news ticker at the bottom of the screen simultaneously ran other global events, maintaining a flow of information. As the video progresses, the split-screen footage of the razed house turns into a close-up. A notable change in the headline to "KOCHI FLATS RAZED" signaled the demolition's culmination. A brief interlude offered a visual contradiction by showcasing the flats presumably before their demolition, providing a stark before and after comparison. As the video progressed, the left building's collapse initiated a dramatic alteration in the skyline, marked by significant dust plumes. Subsequently, another building was shown partially collapsing amid debris, fully obscured by dust in seconds, with surrounding greenery remaining untouched. This transitioned into a graphic interlude featuring the "India Today" logo, briefly pausing the live footage. Resuming to the aftermath, split imagery displayed the rubble and ongoing smoke. Then, the imagery continued to juxtapose the scenes of destruction against intact high-rise buildings nearby. The narrative was augmented by the revelation that the Supreme Court directed the demolition within a broader national news context. Throughout, the report maintained a real-time approach, threading continuity and urgency across the unfolding event's documentation.
Caption 2:
The video begins with an individual seated on a gray couch in a cozy domestic setting, about to unbox a product from a red CCM-branded box placed on a white table in front of them. Initially, the person is seen drinking from a blue can, indicating a casual atmosphere. Soon after, the individual shifts attention from the can to the red box, signifying the start of the unboxing process. The red box, initially closed, gradually becomes the focal point as the person prepares to open it, conveying a build-up of anticipation. As the video progresses, the box is flipped over and then opened, revealing its content still hidden under white tissue paper adorned with prints, adding to the suspense. The individual’s engagement with the box evolves, from initially preparing to open it, to actively delving into its contents. A momentary pause in activity is captured before the anticipation culminates with the individual lifting an object from the box. This object, identifiable by a yellow label, is then examined closely by the person, indicating a thorough inspection or perusal of the product or its packaging. Throughout the video, the surrounding environment remains consistent and undisturbed, with household items like a potted plant and a wall clock maintaining the setting's homely ambiance. The camera’s perspective remains fixed, focusing on the unfolding unboxing event without any movement, thus allowing the viewer to observe the narrative closely. Another partially open brown box is visible beside the main red box, though its role or contents are not elaborated upon. The video encapsulates the anticipation, action, and reveal inherent to unboxing experiences in a home setting.