no code implementations • 9 Apr 2024 • Juhong Min, Shyamal Buch, Arsha Nagrani, Minsu Cho, Cordelia Schmid
This paper addresses the task of video question answering (videoQA) via a decomposed multi-stage, modular reasoning framework.
Ranked #3 on Zero-Shot Video Question Answer on NExT-QA
no code implementations • 7 Nov 2023 • SeungWook Kim, Juhong Min, Minsu Cho
Recent studies show that leveraging the match-wise relationships within the 4D correlation map yields significant improvements in establishing semantic correspondences - but at the cost of increased computation and latency.
1 code implementation • 14 Jun 2022 • Juhong Min, Yucheng Zhao, Chong Luo, Minsu Cho
We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.
1 code implementation • CVPR 2022 • SeungWook Kim, Juhong Min, Minsu Cho
Establishing correspondences between images remains a challenging task, especially under large appearance changes due to different viewpoints or intra-class variations.
Ranked #10 on Semantic correspondence on SPair-71k
no code implementations • 29 Sep 2021 • Seung Wook Kim, Juhong Min, Minsu Cho
Establishing correspondences between images remains a challenging task, especially under large appearance changes due to different viewpoints and intra-class variations.
1 code implementation • 11 Sep 2021 • Juhong Min, SeungWook Kim, Minsu Cho
To validate the proposed techniques, we develop the neural network with CHM layers that perform convolutional matching in the space of translation and scaling.
1 code implementation • ICCV 2021 • Dahyun Kang, Heeseung Kwon, Juhong Min, Minsu Cho
We propose to address the problem of few-shot classification by meta-learning "what to observe" and "where to attend" in a relational perspective.
Ranked #15 on Few-Shot Image Classification on CUB 200 5-way 5-shot
1 code implementation • 4 Apr 2021 • Juhong Min, Dahyun Kang, Minsu Cho
Few-shot semantic segmentation aims at learning to segment a target object from a query image using only a few annotated support images of the target class.
Ranked #13 on Few-Shot Semantic Segmentation on FSS-1000 (5-shot)
1 code implementation • CVPR 2021 • Juhong Min, Minsu Cho
Despite advances in feature representation, leveraging geometric relations is crucial for establishing reliable visual correspondences under large variations of images.
Ranked #4 on Semantic correspondence on PF-WILLOW
no code implementations • ICCV 2021 • Juhong Min, Dahyun Kang, Minsu Cho
Few-shot semantic segmentation aims at learning to segment a target object from a query image using only a few annotated support images of the target class.
1 code implementation • ECCV 2020 • Juhong Min, Jongmin Lee, Jean Ponce, Minsu Cho
Feature representation plays a crucial role in visual correspondence, and recent methods for image matching resort to deeply stacked convolutional layers.
Ranked #2 on Semantic correspondence on Caltech-101
no code implementations • 28 Aug 2019 • Juhong Min, Jongmin Lee, Jean Ponce, Minsu Cho
In this paper, we present a new large-scale benchmark dataset of semantically paired images, SPair-71k, which contains 70, 958 image pairs with diverse variations in viewpoint and scale.
1 code implementation • ICCV 2019 • Juhong Min, Jongmin Lee, Jean Ponce, Minsu Cho
Establishing visual correspondences under large intra-class variations requires analyzing images at different levels, from features linked to semantics and context to local patterns, while being invariant to instance-specific details.
Ranked #1 on Semantic correspondence on Caltech-101