Search Results for author: Masayoshi Kondo

Found 3 papers, 0 papers with code

On the Audio Hallucinations in Large Audio-Video Language Models

no code implementations18 Jan 2024 Taichi Nishimura, Shota Nakada, Masayoshi Kondo

This paper refers to this as audio hallucinations and analyzes them in large audio-video language models.

Hallucination Sentence

Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval

no code implementations1 Dec 2023 Taichi Nishimura, Shota Nakada, Masayoshi Kondo

The zero-shot QASIR yields two discoveries: (1) it enables VLMs to generalize to super images and (2) the grid size $N$, image resolution, and VLM size are key trade-off parameters between performance and computation costs.

Image Retrieval Partially Relevant Video Retrieval +2

Leveraging Image-Text Similarity and Caption Modification for the DataComp Challenge: Filtering Track and BYOD Track

no code implementations23 Oct 2023 Shuhei Yokoo, Peifei Zhu, Yuchi Ishikawa, Mikihiro Tanaka, Masayoshi Kondo, Hirokatsu Kataoka

Our solution adopts large multimodal models CLIP and BLIP-2 to filter and modify web crawl data, and utilize external datasets along with a bag of tricks to improve the data quality.

text similarity

Cannot find the paper you are looking for? You can Submit a new open access paper.