no code implementations • 29 Nov 2022 • Heeseung Kwon, Francisco M. Castro, Manuel J. Marin-Jimenez, Nicolas Guil, Karteek Alahari
Vision Transformers (ViTs) have become a dominant paradigm for visual representation learning with self-attention operators.
1 code implementation • NeurIPS 2021 • Manjin Kim, Heeseung Kwon, Chunyu Wang, Suha Kwak, Minsu Cho
Convolution has been arguably the most important feature transform for modern neural networks, leading to the advance of deep learning.
Ranked #11 on Action Recognition on Diving-48
1 code implementation • ICCV 2021 • Dahyun Kang, Heeseung Kwon, Juhong Min, Minsu Cho
We propose to address the problem of few-shot classification by meta-learning "what to observe" and "where to attend" in a relational perspective.
Ranked #15 on Few-Shot Image Classification on CUB 200 5-way 5-shot
1 code implementation • ICCV 2021 • Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho
With a sufficient volume of the neighborhood in space and time, it effectively captures long-term interaction and fast motion in the video, leading to robust action recognition.
Ranked #18 on Action Recognition on Something-Something V1 (using extra training data)
1 code implementation • 1 Jan 2021 • Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho
We leverage the whole volume of STSS and let our model learn to extract an effective motion representation from it.
2 code implementations • ECCV 2020 • Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho
As the frame-by-frame optical flows require heavy computation, incorporating motion information has remained a major computational bottleneck for video understanding.
Ranked #1 on Video Classification on Something-Something V2
2 code implementations • 13 Jul 2020 • Gyeongsik Moon, Heeseung Kwon, Kyoung Mu Lee, Minsu Cho
Most current action recognition methods heavily rely on appearance information by taking an RGB sequence of entire image regions as input.