no code implementations • 28 Mar 2024 • Eri Onami, Shuhei Kurita, Taiki Miyanishi, Taro Watanabe
Document question answering is a task of question answering on given documents such as reports, slides, pamphlets, and websites, and it is a truly demanding task as paper and electronic forms of documents are so common in our society.
no code implementations • 28 Feb 2024 • Koki Maeda, Shuhei Kurita, Taiki Miyanishi, Naoaki Okazaki
Given the accelerating progress of vision and language modeling, accurate evaluation of machine-generated image captions remains critical.
1 code implementation • NeurIPS 2023 • Taiki Miyanishi, Fumiya Kitamori, Shuhei Kurita, Jungdae Lee, Motoaki Kawanabe, Nakamasa Inoue
To tackle this problem, we introduce the CityRefer dataset for city-level visual grounding.
1 code implementation • 23 May 2023 • Taiki Miyanishi, Daichi Azuma, Shuhei Kurita, Motoki Kawanabe
We present a novel task for cross-dataset visual grounding in 3D scenes (Cross3DVG), which overcomes limitations of existing 3D visual grounding models, specifically their restricted 3D resources and consequent tendencies of overfitting a specific 3D dataset.
1 code implementation • CVPR 2022 • Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Motoaki Kawanabe
We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA).