1 code implementation • 12 Jan 2024 • Seongyun Lee, Seungone Kim, Sue Hyun Park, Geewook Kim, Minjoon Seo
Assessing long-form responses generated by Vision-Language Models (VLMs) is challenging.
1 code implementation • 13 Nov 2023 • Seongyun Lee, Sue Hyun Park, Yongrae Jo, Minjoon Seo
Building on this approach, we introduce Volcano, a multimodal self-feedback guided revision model.
Ranked #43 on Visual Question Answering on MM-Vet
no code implementations • 5 Jul 2023 • Yongrae Jo, Seongyun Lee, Aiden SJ Lee, Hyunji Lee, Hanseok Oh, Minjoon Seo
This is accomplished by introducing a soft moment mask that represents a temporal segment in the video and jointly optimizing it with the prefix parameters of a language model.
1 code implementation • 3 Feb 2023 • Seongyun Lee, Hyunjae Kim, Jaewoo Kang
Question answering (QA) models often rely on large-scale training datasets, which necessitates the development of a data generation framework to reduce the cost of manual annotations.
Ranked #1 on Question Answering on MultiSpanQA