no code implementations • 10 Apr 2024 • Jiawei Ren, Mengmeng Xu, Jui-Chieh Wu, Ziwei Liu, Tao Xiang, Antoine Toisoul
Diffusion models generate images with an unprecedented level of quality, but how can we freely rearrange image layouts?
no code implementations • 24 Dec 2023 • Christian Simon, Sen He, Juan-Manuel Perez-Rua, Mengmeng Xu, Amine Benhalloum, Tao Xiang
Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability.
no code implementations • 7 Dec 2023 • Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua
In this study, we explore Transformer-based diffusion models for image and video generation.
no code implementations • 9 Oct 2023 • Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He
In this paper, for the first time, we introduce optical flow into the attention module in the diffusion model's U-Net to address the inconsistency issue for text-to-video editing.
no code implementations • 26 May 2023 • Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, Jürgen Schmidhuber
What should be the social structure of an NLSOM?
1 code implementation • 6 Apr 2023 • Mengmeng Xu, Mattia Soldan, Jialin Gao, Shuming Liu, Juan-Manuel Pérez-Rúa, Bernard Ghanem
To alleviate the boundary ambiguity, we propose to study the video activity localization problem from a denoising perspective.
Ranked #1 on Video Grounding on MAD
no code implementations • CVPR 2023 • Haoqian Wu, Keyu Chen, Haozhe Liu, Mingchen Zhuge, Bing Li, Ruizhi Qiao, Xiujun Shu, Bei Gan, Liangsheng Xu, Bo Ren, Mengmeng Xu, Wentian Zhang, Raghavendra Ramachandra, Chia-Wen Lin, Bernard Ghanem
Temporal video segmentation is the get-to-go automatic video analysis, which decomposes a long-form video into smaller components for the following-up understanding tasks.
1 code implementation • 27 Nov 2022 • Sauradip Nag, Mengmeng Xu, Xiatian Zhu, Juan-Manuel Perez-Rua, Bernard Ghanem, Yi-Zhe Song, Tao Xiang
In this work, we introduce a new multi-modality few-shot (MMFS) TAD problem, which can be considered as a marriage of FS-TAD and ZS-TAD by leveraging few-shot support videos and new class names jointly.
1 code implementation • CVPR 2023 • Mengmeng Xu, Yanghao Li, Cheng-Yang Fu, Bernard Ghanem, Tao Xiang, Juan-Manuel Perez-Rua
Our experiments show the proposed adaptations improve egocentric query detection, leading to a better visual query localization system in both 2D and 3D configurations.
1 code implementation • 3 Aug 2022 • Mengmeng Xu, Cheng-Yang Fu, Yanghao Li, Bernard Ghanem, Juan-Manuel Perez-Rua, Tao Xiang
The repeated gradient computation of the same object lead to an inefficient training; (2) The false positive rate is high on background frames.
1 code implementation • 14 May 2022 • Shuming Liu, Mengmeng Xu, Chen Zhao, Xu Zhao, Bernard Ghanem
We propose to sequentially forward the snippet frame through the video encoder, and backward only a small necessary portion of gradients to update the encoder.
no code implementations • 26 Apr 2022 • Mengmeng Xu, Erhan Gundogdu, Maksim Lapin, Bernard Ghanem, Michael Donoser, Loris Bazzani
Long-form video understanding requires designing approaches that are able to temporally localize activities or language.
Contrastive Learning Few Shot Temporal Action Localization +3
no code implementations • 3 Mar 2022 • Chen Zhao, Merey Ramazanova, Mengmeng Xu, Bernard Ghanem
To address these issues and precisely model temporal action detection, we formulate the task of temporal action detection in a novel perspective of semantic segmentation.
no code implementations • NeurIPS 2021 • Mengmeng Xu, Juan Manuel Perez Rua, Xiatian Zhu, Bernard Ghanem, Brais Martinez
This results in a task discrepancy problem for the video encoder – trained for action classification, but used for TAL.
Ranked #9 on Temporal Action Localization on HACS
7 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.
1 code implementation • EMNLP 2021 • Jialin Gao, Xin Sun, Mengmeng Xu, Xi Zhou, Bernard Ghanem
Temporal language grounding in videos aims to localize the temporal span relevant to the given query sentence.
no code implementations • 28 Mar 2021 • Mengmeng Xu, Juan-Manuel Perez-Rua, Xiatian Zhu, Bernard Ghanem, Brais Martinez
This results in a task discrepancy problem for the video encoder -- trained for action classification, but used for TAL.
1 code implementation • ICCV 2021 • Mengmeng Xu, Juan-Manuel Perez-Rua, Victor Escorcia, Brais Martinez, Xiatian Zhu, Li Zhang, Bernard Ghanem, Tao Xiang
However, most existing models developed for these tasks are pre-trained on general video action classification tasks.
Ranked #23 on Temporal Action Localization on ActivityNet-1.3
1 code implementation • 19 Nov 2020 • Mattia Soldan, Mengmeng Xu, Sisi Qu, Jesper Tegner, Bernard Ghanem
Grounding language queries in videos aims at identifying the time interval (or moment) semantically relevant to a language query.
Ranked #1 on Natural Language Moment Retrieval on DiDeMo
no code implementations • 24 Aug 2020 • Guohao Li, Mengmeng Xu, Silvio Giancola, Ali Thabet, Bernard Ghanem
In this paper, we introduce a new NAS framework, dubbed LC-NAS, where we search for point cloud architectures that are constrained to a target latency.
no code implementations • 10 Jul 2020 • Sisi Qu, Mengmeng Xu, Bernard Ghanem, Jesper Tegner
EDNA uses the diffusion signal as a proxy for computing node similarities between networks.
7 code implementations • CVPR 2020 • Mengmeng Xu, Chen Zhao, David S. Rojas, Ali Thabet, Bernard Ghanem
In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem.
Ranked #5 on Temporal Action Localization on EPIC-KITCHENS-100
no code implementations • 10 Apr 2019 • Alejandro Pardo, Mengmeng Xu, Ali Thabet, Pablo Arbelaez, Bernard Ghanem
We adopt a hybrid supervised learning framework to train the object detector from both these types of annotation.