no code implementations • 24 Dec 2023 • Christian Simon, Sen He, Juan-Manuel Perez-Rua, Mengmeng Xu, Amine Benhalloum, Tao Xiang
Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability.
no code implementations • 7 Dec 2023 • Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua
In this study, we explore Transformer-based diffusion models for image and video generation.
no code implementations • 9 Oct 2023 • Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He
In this paper, for the first time, we introduce optical flow into the attention module in the diffusion model's U-Net to address the inconsistency issue for text-to-video editing.
1 code implementation • 27 Nov 2022 • Sauradip Nag, Mengmeng Xu, Xiatian Zhu, Juan-Manuel Perez-Rua, Bernard Ghanem, Yi-Zhe Song, Tao Xiang
In this work, we introduce a new multi-modality few-shot (MMFS) TAD problem, which can be considered as a marriage of FS-TAD and ZS-TAD by leveraging few-shot support videos and new class names jointly.
1 code implementation • CVPR 2023 • Mengmeng Xu, Yanghao Li, Cheng-Yang Fu, Bernard Ghanem, Tao Xiang, Juan-Manuel Perez-Rua
Our experiments show the proposed adaptations improve egocentric query detection, leading to a better visual query localization system in both 2D and 3D configurations.
1 code implementation • 3 Aug 2022 • Mengmeng Xu, Cheng-Yang Fu, Yanghao Li, Bernard Ghanem, Juan-Manuel Perez-Rua, Tao Xiang
The repeated gradient computation of the same object lead to an inefficient training; (2) The false positive rate is high on background frames.
no code implementations • 6 Oct 2021 • Swathikiran Sudhakaran, Adrian Bulat, Juan-Manuel Perez-Rua, Alex Falcon, Sergio Escalera, Oswald Lanz, Brais Martinez, Georgios Tzimiropoulos
This report presents the technical details of our submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021.
2 code implementations • 21 Jun 2021 • Andrés Villa, Juan-Manuel Perez-Rua, Vladimir Araujo, Juan Carlos Niebles, Victor Escorcia, Alvaro Soto
Recently, few-shot learning has received increasing interest.
1 code implementation • NeurIPS 2021 • Adrian Bulat, Juan-Manuel Perez-Rua, Swathikiran Sudhakaran, Brais Martinez, Georgios Tzimiropoulos
In this work, we propose a Video Transformer model the complexity of which scales linearly with the number of frames in the video sequence and hence induces no overhead compared to an image-based Transformer model.
Ranked #32 on Action Classification on Kinetics-600
no code implementations • 28 Mar 2021 • Mengmeng Xu, Juan-Manuel Perez-Rua, Xiatian Zhu, Bernard Ghanem, Brais Martinez
This results in a task discrepancy problem for the video encoder -- trained for action classification, but used for TAL.
1 code implementation • 20 Jan 2021 • Xiatian Zhu, Antoine Toisoul, Juan-Manuel Perez-Rua, Li Zhang, Brais Martinez, Tao Xiang
Extensive experiments on four standard few-shot action benchmarks show that our method clearly outperforms previous state-of-the-art methods, with the improvement particularly significant (10+\%) on the most challenging fine-grained action recognition benchmark.
1 code implementation • ICCV 2021 • Mengmeng Xu, Juan-Manuel Perez-Rua, Victor Escorcia, Brais Martinez, Xiatian Zhu, Li Zhang, Bernard Ghanem, Tao Xiang
However, most existing models developed for these tasks are pre-trained on general video action classification tasks.
Ranked #23 on Temporal Action Localization on ActivityNet-1.3
no code implementations • 3 Jul 2020 • Juan-Manuel Perez-Rua, Antoine Toisoul, Brais Martinez, Victor Escorcia, Li Zhang, Xiatian Zhu, Tao Xiang
In this challenge, action recognition is posed as the problem of simultaneously predicting a single `verb' and `noun' class label given an input trimmed video clip.
no code implementations • 2 Apr 2020 • Juan-Manuel Perez-Rua, Brais Martinez, Xiatian Zhu, Antoine Toisoul, Victor Escorcia, Tao Xiang
Departing from existing alternatives, our W3 module models all three facets of video attention jointly.
Ranked #1 on Action Recognition on EgoGesture
no code implementations • CVPR 2020 • Juan-Manuel Perez-Rua, Xiatian Zhu, Timothy Hospedales, Tao Xiang
To this end we propose OpeN-ended Centre nEt (ONCE), a detector designed for incrementally learning to detect novel class objects with few examples.
1 code implementation • CVPR 2019 • Juan-Manuel Perez-Rua, Valentin Vielzeuf, Stephane Pateux, Moez Baccouche, Frederic Jurie
We tackle the problem of finding good architectures for multimodal classification problems.
no code implementations • 1 Aug 2018 • Juan-Manuel Perez-Rua, Moez Baccouche, Stephane Pateux
We demonstrate with experiments on the CIFAR-10 dataset that our method, denominated Efficient progressive neural architecture search (EPNAS), leads to increased search efficiency, while retaining competitiveness of found architectures.
no code implementations • 17 Apr 2018 • Juan-Manuel Perez-Rua, Tomas Crivelli, Patrick Bouthemy, Patrick Perez
We bypass the need for a tailored loss function on the regression parameters by attaching to our model a differentiable hard-wired decoder corresponding to the polynomial operation at hand.
no code implementations • CVPR 2016 • Juan-Manuel Perez-Rua, Tomas Crivelli, Patrick Bouthemy, Patrick Perez
With this in mind, we propose a novel approach to occlusion detection where visibility or not of a point in next frame is formulated in terms of visual reconstruction.