Search Results for author: Chae Won Kim

Found 5 papers, 3 papers with code

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

1 code implementation24 May 2024 Byung-Kwan Lee, Chae Won Kim, Beomchan Park, Yong Man Ro

Recently, open-source LLVMs have curated high-quality visual instruction tuning datasets and utilized additional vision encoders or multiple computer vision models in order to narrow the performance gap with powerful closed-source LLVMs.

MoAI: Mixture of All Intelligence for Large Language and Vision Models

1 code implementation12 Mar 2024 Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro

Therefore, we present a new LLVM, Mixture of All Intelligence (MoAI), which leverages auxiliary visual information obtained from the outputs of external segmentation, detection, SGG, and OCR models.

Scene Understanding Visual Question Answering

CoLLaVO: Crayon Large Language and Vision mOdel

1 code implementation17 Feb 2024 Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro

Our findings reveal that the image understanding capabilities of current VLMs are strongly correlated with their zero-shot performance on vision language (VL) tasks.

Large Language Model Object +3

Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video

no code implementations27 Feb 2023 Minsu Kim, Chae Won Kim, Yong Man Ro

The proposed DVFA can align the input transcription (i. e., sentence) with the talking face video without accessing the speech audio.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cannot find the paper you are looking for? You can Submit a new open access paper.