Search Results for author: Masayoshi Kondo

Found 3 papers, 0 papers with code

On the Audio Hallucinations in Large Audio-Video Language Models

no code implementations • 18 Jan 2024 • Taichi Nishimura, Shota Nakada, Masayoshi Kondo

This paper refers to this as audio hallucinations and analyzes them in large audio-video language models.

Paper
Add Code

Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval

no code implementations • 1 Dec 2023 • Taichi Nishimura, Shota Nakada, Masayoshi Kondo

The zero-shot QASIR yields two discoveries: (1) it enables VLMs to generalize to super images and (2) the grid size $N$, image resolution, and VLM size are key trade-off parameters between performance and computation costs.

Image Retrieval Partially Relevant Video Retrieval +2

Paper
Add Code

Leveraging Image-Text Similarity and Caption Modification for the DataComp Challenge: Filtering Track and BYOD Track

no code implementations • 23 Oct 2023 • Shuhei Yokoo, Peifei Zhu, Yuchi Ishikawa, Mikihiro Tanaka, Masayoshi Kondo, Hirokatsu Kataoka

Our solution adopts large multimodal models CLIP and BLIP-2 to filter and modify web crawl data, and utilize external datasets along with a bag of tricks to improve the data quality.

text similarity

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.