Search Results for author: Mengmeng Xu

Found 23 papers, 10 papers with code

Move Anything with Layered Scene Diffusion

no code implementations • 10 Apr 2024 • Jiawei Ren, Mengmeng Xu, Jui-Chieh Wu, Ziwei Liu, Tao Xiang, Antoine Toisoul

Diffusion models generate images with an unprecedented level of quality, but how can we freely rearrange image layouts?

Paper
Add Code

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

no code implementations • 24 Dec 2023 • Christian Simon, Sen He, Juan-Manuel Perez-Rua, Mengmeng Xu, Amine Benhalloum, Tao Xiang

Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability.

Image to 3D Neural Rendering

Paper
Add Code

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

no code implementations • 7 Dec 2023 • Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua

In this study, we explore Transformer-based diffusion models for image and video generation.

Text-to-Video Generation Video Generation

Paper
Add Code

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing

no code implementations • 9 Oct 2023 • Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He

In this paper, for the first time, we introduce optical flow into the attention module in the diffusion model's U-Net to address the inconsistency issue for text-to-video editing.

Optical Flow Estimation Text-to-Video Editing +1

Paper
Add Code

Mindstorms in Natural Language-Based Societies of Mind

no code implementations • 26 May 2023 • Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, Jürgen Schmidhuber

What should be the social structure of an NLSOM?

3D Generation Image Captioning +2

Paper
Add Code

Boundary-Denoising for Video Activity Localization

1 code implementation • 6 Apr 2023 • Mengmeng Xu, Mattia Soldan, Jialin Gao, Shuming Liu, Juan-Manuel Pérez-Rúa, Bernard Ghanem

To alleviate the boundary ambiguity, we propose to study the video activity localization problem from a denoising perspective.

Ranked #1 on Video Grounding on MAD

Action Detection Decoder +3

Paper
Code

NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation

no code implementations • CVPR 2023 • Haoqian Wu, Keyu Chen, Haozhe Liu, Mingchen Zhuge, Bing Li, Ruizhi Qiao, Xiujun Shu, Bei Gan, Liangsheng Xu, Bo Ren, Mengmeng Xu, Wentian Zhang, Raghavendra Ramachandra, Chia-Wen Lin, Bernard Ghanem

Temporal video segmentation is the get-to-go automatic video analysis, which decomposes a long-form video into smaller components for the following-up understanding tasks.

Video Segmentation Video Semantic Segmentation

Paper
Add Code

Multi-Modal Few-Shot Temporal Action Detection

1 code implementation • 27 Nov 2022 • Sauradip Nag, Mengmeng Xu, Xiatian Zhu, Juan-Manuel Perez-Rua, Bernard Ghanem, Yi-Zhe Song, Tao Xiang

In this work, we introduce a new multi-modality few-shot (MMFS) TAD problem, which can be considered as a marriage of FS-TAD and ZS-TAD by leveraging few-shot support videos and new class names jointly.

Action Detection Few-Shot Object Detection +3

Paper
Code

Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization

1 code implementation • CVPR 2023 • Mengmeng Xu, Yanghao Li, Cheng-Yang Fu, Bernard Ghanem, Tao Xiang, Juan-Manuel Perez-Rua

Our experiments show the proposed adaptations improve egocentric query detection, leading to a better visual query localization system in both 2D and 3D configurations.

Object

Paper
Code

Negative Frames Matter in Egocentric Visual Query 2D Localization

1 code implementation • 3 Aug 2022 • Mengmeng Xu, Cheng-Yang Fu, Yanghao Li, Bernard Ghanem, Juan-Manuel Perez-Rua, Tao Xiang

The repeated gradient computation of the same object lead to an inefficient training; (2) The false positive rate is high on background frames.

Object

Paper
Code

ETAD: Training Action Detection End to End on a Laptop

1 code implementation • 14 May 2022 • Shuming Liu, Mengmeng Xu, Chen Zhao, Xu Zhao, Bernard Ghanem

We propose to sequentially forward the snippet frame through the video encoder, and backward only a small necessary portion of gradients to update the encoder.

Action Detection Video Understanding

Paper
Code

Contrastive Language-Action Pre-training for Temporal Localization

no code implementations • 26 Apr 2022 • Mengmeng Xu, Erhan Gundogdu, Maksim Lapin, Bernard Ghanem, Michael Donoser, Loris Bazzani

Long-form video understanding requires designing approaches that are able to temporally localize activities or language.

Contrastive Learning Few Shot Temporal Action Localization +3

Paper
Add Code

SegTAD: Precise Temporal Action Detection via Semantic Segmentation

no code implementations • 3 Mar 2022 • Chen Zhao, Merey Ramazanova, Mengmeng Xu, Bernard Ghanem

To address these issues and precisely model temporal action detection, we formulate the task of temporal action detection in a novel perspective of semantic segmentation.

Action Detection object-detection +3

Paper
Add Code

Low-Fidelity Video Encoder Optimization for Temporal Action Localization

no code implementations • NeurIPS 2021 • Mengmeng Xu, Juan Manuel Perez Rua, Xiatian Zhu, Bernard Ghanem, Brais Martinez

This results in a task discrepancy problem for the video encoder – trained for action classification, but used for TAL.

Ranked #9 on Temporal Action Localization on HACS

Action Classification Optical Flow Estimation +2

Paper
Add Code

Ego4D: Around the World in 3,000 Hours of Egocentric Video

7 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

5,140

Paper
Code

Relation-aware Video Reading Comprehension for Temporal Language Grounding

1 code implementation • EMNLP 2021 • Jialin Gao, Xin Sun, Mengmeng Xu, Xi Zhou, Bernard Ghanem

Temporal language grounding in videos aims to localize the temporal span relevant to the given query sentence.

Reading Comprehension Relation +1

Paper
Code

Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization

no code implementations • 28 Mar 2021 • Mengmeng Xu, Juan-Manuel Perez-Rua, Xiatian Zhu, Bernard Ghanem, Brais Martinez

This results in a task discrepancy problem for the video encoder -- trained for action classification, but used for TAL.

Action Classification Model Optimization +3

Paper
Add Code

Boundary-sensitive Pre-training for Temporal Localization in Videos

1 code implementation • ICCV 2021 • Mengmeng Xu, Juan-Manuel Perez-Rua, Victor Escorcia, Brais Martinez, Xiatian Zhu, Li Zhang, Bernard Ghanem, Tao Xiang

However, most existing models developed for these tasks are pre-trained on general video action classification tasks.

Ranked #23 on Temporal Action Localization on ActivityNet-1.3

Action Classification Classification +3

Paper
Code

VLG-Net: Video-Language Graph Matching Network for Video Grounding

1 code implementation • 19 Nov 2020 • Mattia Soldan, Mengmeng Xu, Sisi Qu, Jesper Tegner, Bernard Ghanem

Grounding language queries in videos aims at identifying the time interval (or moment) semantically relevant to a language query.

Ranked #1 on Natural Language Moment Retrieval on DiDeMo

Graph Matching Moment Retrieval +3

Paper
Code

LC-NAS: Latency Constrained Neural Architecture Search for Point Cloud Networks

no code implementations • 24 Aug 2020 • Guohao Li, Mengmeng Xu, Silvio Giancola, Ali Thabet, Bernard Ghanem

In this paper, we introduce a new NAS framework, dubbed LC-NAS, where we search for point cloud architectures that are constrained to a target latency.

Neural Architecture Search Point Cloud Classification +2

Paper
Add Code

Learning Heat Diffusion for Network Alignment

no code implementations • 10 Jul 2020 • Sisi Qu, Mengmeng Xu, Bernard Ghanem, Jesper Tegner

EDNA uses the diffusion signal as a proxy for computing node similarities between networks.

Paper
Add Code

G-TAD: Sub-Graph Localization for Temporal Action Detection

7 code implementations • CVPR 2020 • Mengmeng Xu, Chen Zhao, David S. Rojas, Ali Thabet, Bernard Ghanem

In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem.

Ranked #5 on Temporal Action Localization on EPIC-KITCHENS-100

Temporal Action Localization

216

Paper
Code

BAOD: Budget-Aware Object Detection

no code implementations • 10 Apr 2019 • Alejandro Pardo, Mengmeng Xu, Ali Thabet, Pablo Arbelaez, Bernard Ghanem

We adopt a hybrid supervised learning framework to train the object detector from both these types of annotation.

Active Learning Object +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.