Search Results for author: Bernard Ghanem

Found 229 papers, 115 papers with code

Combating Missing Modalities in Egocentric Videos at Test Time

no code implementations • 23 Apr 2024 • Merey Ramazanova, Alejandro Pardo, Bernard Ghanem, Motasem Alfarra

Understanding videos that contain multiple modalities is crucial, especially in egocentric videos, where combining various sensory inputs significantly improves tasks like action recognition and moment localization.

Paper
Add Code

Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation

1 code implementation • 19 Apr 2024 • Wenxuan Zhang, Youssef Mohamed, Bernard Ghanem, Philip H. S. Torr, Adel Bibi, Mohamed Elhoseiny

DietCL meticulously allocates computational budget for both types of data.

Paper
Code

SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap

1 code implementation • 17 Apr 2024 • Vladimir Somers, Victor Joos, Anthony Cioppa, Silvio Giancola, Seyed Abolfazl Ghasemzadeh, Floriane Magera, Baptiste Standaert, Amir Mohammad Mansourian, Xin Zhou, Shohreh Kasaei, Bernard Ghanem, Alexandre Alahi, Marc Van Droogenbroeck, Christophe De Vleeschouwer

This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i. e. a minimap).

Ranked #1 on Game State Reconstruction on SoccerNet-GSR

Camera Calibration Game State Reconstruction

Paper
Code

X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model

no code implementations • 7 Apr 2024 • Jan Held, Hani Itani, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck

The rapid advancement of artificial intelligence has led to significant improvements in automated decision-making.

Action Recognition Decision Making +4

Paper
Add Code

DATENeRF: Depth-Aware Text-based Editing of NeRFs

no code implementations • 6 Apr 2024 • Sara Rojas, Julien Philip, Kai Zhang, Sai Bi, Fujun Luan, Bernard Ghanem, Kalyan Sunkavall

However, extending these techniques to edit scenes in Neural Radiance Fields (NeRF) is complex, as editing individual 2D frames can result in inconsistencies across multiple views.

Paper
Add Code

Towards Automated Movie Trailer Generation

no code implementations • 4 Apr 2024 • Dawit Mureja Argaw, Mattia Soldan, Alejandro Pardo, Chen Zhao, Fabian Caba Heilbron, Joon Son Chung, Bernard Ghanem

Movie trailers are an essential tool for promoting films and attracting audiences.

Machine Translation

Paper
Add Code

Privacy-preserving Optics for Enhancing Protection in Face De-identification

no code implementations • 31 Mar 2024 • Jhon Lopez, Carlos Hinojosa, Henry Arguello, Bernard Ghanem

Specifically, our approach first learns an optical encoder along with a regression model to obtain a face heatmap while hiding the face identity from the source image.

De-identification Privacy Preserving

Paper
Add Code

Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders

1 code implementation • 26 Mar 2024 • Alexandre Eymaël, Renaud Vandeghen, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck

In particular, SiamMAE recently introduced a Siamese network, training a shared-weight encoder from two frames of a video with a high asymmetric masking ratio (95%).

Self-Supervised Learning

Paper
Code

On Pretraining Data Diversity for Self-Supervised Learning

1 code implementation • 20 Mar 2024 • Hasan Abed Al Kader Hammoud, Tuhin Das, Fabio Pizzati, Philip Torr, Adel Bibi, Bernard Ghanem

We explore the impact of training with more diverse datasets, characterized by the number of unique samples, on the performance of self-supervised learning (SSL) under a fixed computational budget.

Self-Supervised Learning

Paper
Code

GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning

1 code implementation • 18 Mar 2024 • Xiaojie Li, Yibo Yang, Xiangtai Li, Jianlong Wu, Yue Yu, Bernard Ghanem, Min Zhang

To tackle these challenges, we present GenView, a controllable framework that augments the diversity of positive views leveraging the power of pretrained generative models while preserving semantics.

Contrastive Learning Data Augmentation +1

Paper
Code

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

1 code implementation • 15 Feb 2024 • Abdullah Hamdi, Luke Melas-Kyriazi, Guocheng Qian, Jinjie Mai, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi

With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%.

3D Reconstruction Novel View Synthesis

157

Paper
Code

Can Large Language Model Agents Simulate Human Trust Behaviors?

1 code implementation • 7 Feb 2024 • Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Kai Shu, Adel Bibi, Ziniu Hu, Philip Torr, Bernard Ghanem, Guohao Li

In addition, we probe into the biases in agent trust and the differences in agent trust towards agents and humans.

Language Modelling Large Language Model

Paper
Code

SPAD : Spatially Aware Multiview Diffusers

no code implementations • 7 Feb 2024 • Yash Kant, Ziyi Wu, Michael Vasilkovsky, Guocheng Qian, Jian Ren, Riza Alp Guler, Bernard Ghanem, Sergey Tulyakov, Igor Gilitschenski, Aliaksandr Siarohin

We present SPAD, a novel approach for creating consistent multi-view images from text prompts or single images.

3D Generation Novel View Synthesis +1

Paper
Add Code

SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?

1 code implementation • 2 Feb 2024 • Hasan Abed Al Kader Hammoud, Hani Itani, Fabio Pizzati, Philip Torr, Adel Bibi, Bernard Ghanem

We present SynthCLIP, a novel framework for training CLIP models with entirely synthetic text-image pairs, significantly departing from previous methods relying on real data.

Paper
Code

AToM: Amortized Text-to-Mesh using 2D Diffusion

no code implementations • 1 Feb 2024 • Guocheng Qian, Junli Cao, Aliaksandr Siarohin, Yash Kant, Chaoyang Wang, Michael Vasilkovsky, Hsin-Ying Lee, Yuwei Fang, Ivan Skorokhodov, Peiye Zhuang, Igor Gilitschenski, Jian Ren, Bernard Ghanem, Kfir Aberman, Sergey Tulyakov

We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously.

Text to 3D

Paper
Add Code

Exploring Missing Modality in Multimodal Egocentric Datasets

no code implementations • 21 Jan 2024 • Merey Ramazanova, Alejandro Pardo, Humam Alwassel, Bernard Ghanem

Multimodal video understanding is crucial for analyzing egocentric videos, where integrating multiple sensory signals significantly enhances action recognition and moment localization.

Action Recognition Video Understanding

Paper
Add Code

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

1 code implementation • 18 Jan 2024 • Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, Ming-Hsuan Yang

Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation.

Interactive Segmentation Panoptic Segmentation +3

187

Paper
Code

Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

1 code implementation • 8 Jan 2024 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem

We use two coefficients on either type of residual connections respectively, and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision.

object-detection Small Object Detection +1

Paper
Code

Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

1 code implementation • 19 Dec 2023 • Angela Castillo, Jonas Kohler, Juan C. Pérez, Juan Pablo Pérez, Albert Pumarola, Bernard Ghanem, Pablo Arbeláez, Ali Thabet

Our findings provide insights into the efficiency of the conditional denoising process that contribute to more practical and swift deployment of text-conditioned diffusion models.

Denoising Neural Architecture Search

206

Paper
Code

Artificial intelligence optical hardware empowers high-resolution hyperspectral video understanding at 1.2 Tb/s

no code implementations • 17 Dec 2023 • Maksim Makarenko, Qizhou Wang, Arturo Burguete-Lopez, Silvio Giancola, Bernard Ghanem, Luca Passone, Andrea Fratalocchi

The technology platform combines artificial intelligence hardware, processing information optically, with state-of-the-art machine vision networks, resulting in a data processing speed of 1. 2 Tb/s with hundreds of frequency bands and megapixel spatial resolution at video rates.

Semantic Segmentation Video Semantic Segmentation +1

Paper
Add Code

Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models

1 code implementation • 3 Dec 2023 • Andrés Villa, Juan Carlos León Alcázar, Alvaro Soto, Bernard Ghanem

This paper introduces a Multi-modal Evaluation Benchmark named MERLIM, a scalable test-bed to assess the performance of IT-LVLMs on fundamental computer vision tasks.

Hallucination

Paper
Code

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

no code implementations • 30 Nov 2023 • Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.

Video Understanding

Paper
Add Code

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

2 code implementations • 28 Nov 2023 • Shuming Liu, Chen-Lin Zhang, Chen Zhao, Bernard Ghanem

In this paper, we reduce the memory consumption for end-to-end training, and manage to scale up the TAD backbone to 1 billion parameters and the input video to 1, 536 frames, leading to significant detection performance.

Ranked #1 on Temporal Action Localization on EPIC-KITCHENS-100

Action Detection Temporal Action Localization

Paper
Code

SplitNeRF: Split Sum Approximation Neural Field for Joint Geometry, Illumination, and Material Estimation

1 code implementation • 28 Nov 2023 • Jesus Zarzar, Bernard Ghanem

We present a novel approach for digitizing real-world objects by estimating their geometry, material properties, and environmental lighting from a set of posed images with fixed lighting.

Paper
Code

From Categories to Classifier: Name-Only Continual Learning by Exploring the Web

no code implementations • 19 Nov 2023 • Ameya Prabhu, Hasan Abed Al Kader Hammoud, Ser-Nam Lim, Bernard Ghanem, Philip H. S. Torr, Adel Bibi

Continual Learning (CL) often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice.

Continual Learning Image Classification +1

Paper
Add Code

Towards Demystifying the Generalization Behaviors When Neural Collapse Emerges

no code implementations • 12 Oct 2023 • Peifeng Gao, Qianqian Xu, Yibo Yang, Peisong Wen, Huiyang Shao, Zhiyong Yang, Bernard Ghanem, Qingming Huang

While there have been extensive studies on optimization characteristics showing the global optimality of neural collapse, little research has been done on the generalization behaviors during the occurrence of NC.

Paper
Add Code

Automatic Animation of Hair Blowing in Still Portrait Photos

no code implementations • ICCV 2023 • Wenpeng Xiao, Wentao Liu, Yitong Wang, Bernard Ghanem, Bing Li

Considering the complexity of hair structure, we innovatively treat hair wisp extraction as an instance segmentation problem, where a hair wisp is referred to as an instance.

Image Animation Instance Segmentation +2

Paper
Add Code

SoccerNet 2023 Challenges Results

2 code implementations • 12 Sep 2023 • Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim, Chen Chen, Fabian Deuser, Feng Yan, Fufu Yu, Gal Shitrit, Guanshuo Wang, Gyusik Choi, Hankyul Kim, Hao Guo, Hasby Fahrudin, Hidenari Koguchi, Håkan Ardö, Ibrahim Salah, Ido Yerushalmy, Iftikar Muhammad, Ikuma Uchida, Ishay Be'ery, Jaonary Rabarisoa, Jeongae Lee, Jiajun Fu, Jianqin Yin, Jinghang Xu, Jongho Nang, Julien Denize, Junjie Li, Junpei Zhang, Juntae Kim, Kamil Synowiec, Kenji Kobayashi, Kexin Zhang, Konrad Habel, Kota Nakajima, Licheng Jiao, Lin Ma, Lizhi Wang, Luping Wang, Menglong Li, Mengying Zhou, Mohamed Nasr, Mohamed Abdelwahed, Mykola Liashuha, Nikolay Falaleev, Norbert Oswald, Qiong Jia, Quoc-Cuong Pham, Ran Song, Romain Hérault, Rui Peng, Ruilong Chen, Ruixuan Liu, Ruslan Baikulov, Ryuto Fukushima, Sergio Escalera, Seungcheon Lee, Shimin Chen, Shouhong Ding, Taiga Someya, Thomas B. Moeslund, Tianjiao Li, Wei Shen, Wei zhang, Wei Li, Wei Dai, Weixin Luo, Wending Zhao, Wenjie Zhang, Xinquan Yang, Yanbiao Ma, Yeeun Joo, Yingsen Zeng, Yiyang Gan, Yongqiang Zhu, Yujie Zhong, Zheng Ruan, Zhiheng Li, Zhijian Huang, Ziyu Meng

More information on the tasks, challenges, and leaderboards are available on https://www. soccer-net. org.

Action Spotting Camera Calibration +3

Paper
Code

Learning Semantic Segmentation with Query Points Supervision on Aerial Images

no code implementations • 11 Sep 2023 • Santiago Rivier, Carlos Hinojosa, Silvio Giancola, Bernard Ghanem

In this work, we present a weakly supervised learning algorithm to train semantic segmentation algorithms that only rely on query point annotations instead of full mask labels.

Image Segmentation Segmentation +3

Paper
Add Code

Learning to Read Analog Gauges from Synthetic Data

no code implementations • 28 Aug 2023 • Juan Leon-Alcazar, Yazeed Alnumay, Cheng Zheng, Hassane Trigui, Sahejad Patel, Bernard Ghanem

We propose a two-stage CNN pipeline that identifies the key structural components of an analog gauge and outputs an angular reading.

Paper
Add Code

ShadowNet for Data-Centric Quantum System Learning

no code implementations • 22 Aug 2023 • Yuxuan Du, Yibo Yang, Tongliang Liu, Zhouchen Lin, Bernard Ghanem, DaCheng Tao

Understanding the dynamics of large quantum systems is hindered by the curse of dimensionality.

Quantum State Tomography

Paper
Add Code

Learning to Identify Critical States for Reinforcement Learning from Videos

1 code implementation • ICCV 2023 • Haozhe Liu, Mingchen Zhuge, Bing Li, Yuhui Wang, Francesco Faccio, Bernard Ghanem, Jürgen Schmidhuber

Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions.

reinforcement-learning

Paper
Code

Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction

1 code implementation • 10 Aug 2023 • Yangyang Xu, Yibo Yang, Bernard Ghanem, Lefei Zhang, Du Bo, DaCheng Tao

In this work, we present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction.

Multi-Task Learning

Paper
Code

Neural Collapse Terminus: A Unified Solution for Class Incremental Learning and Its Variants

2 code implementations • 3 Aug 2023 • Yibo Yang, Haobo Yuan, Xiangtai Li, Jianlong Wu, Lefei Zhang, Zhouchen Lin, Philip Torr, DaCheng Tao, Bernard Ghanem

Beyond the normal case, long-tail class incremental learning and few-shot class incremental learning are also proposed to consider the data imbalance and data scarcity, respectively, which are common in real-world implementations and further exacerbate the well-known problem of catastrophic forgetting.

Few-Shot Class-Incremental Learning Incremental Learning

Paper
Code

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

1 code implementation • 30 Jun 2023 • Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem

We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors.

Image to 3D

1,454

Paper
Code

Towards Open Vocabulary Learning: A Survey

1 code implementation • 28 Jun 2023 • Jianzong Wu, Xiangtai Li, Shilin Xu, Haobo Yuan, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, DaCheng Tao

To our knowledge, this is the first comprehensive literature review of open vocabulary learning.

Open Set Learning Out-of-Distribution Detection +3

637

Paper
Code

Enhancing Neural Rendering Methods with Image Augmentations

no code implementations • 15 Jun 2023 • Juan C. Pérez, Sara Rojas, Jesus Zarzar, Bernard Ghanem

We found that introducing image augmentations during training presents challenges such as geometric and photometric inconsistencies for learning NRMs from images.

3D Reconstruction Neural Rendering +1

Paper
Add Code

Dynamically Masked Discriminator for Generative Adversarial Networks

1 code implementation • 13 Jun 2023 • Wentian Zhang, Haozhe Liu, Bing Li, Jinheng Xie, Yawen Huang, Yuexiang Li, Yefeng Zheng, Bernard Ghanem

By treating the generated data in training as a stream, we propose to detect whether the discriminator slows down the learning of new knowledge in generated data.

Continual Learning

Paper
Code

Exploring Open-Vocabulary Semantic Segmentation without Human Labels

no code implementations • 1 Jun 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Mohamed Elhoseiny, Sean Chang Culatana

Although acquired extensive knowledge of visual concepts, it is non-trivial to exploit knowledge from these VL models to the task of semantic segmentation, as they are usually trained at an image level.

Open Vocabulary Semantic Segmentation Segmentation +3

Paper
Add Code

Just a Glimpse: Rethinking Temporal Information for Video Continual Learning

no code implementations • 28 May 2023 • Lama Alssum, Juan Leon Alcazar, Merey Ramazanova, Chen Zhao, Bernard Ghanem

Studying continual learning in the video domain poses even more challenges, as video data contains a large number of frames, which places a higher burden on the replay memory.

Class Incremental Learning Incremental Learning

Paper
Add Code

Mindstorms in Natural Language-Based Societies of Mind

no code implementations • 26 May 2023 • Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, Jürgen Schmidhuber

What should be the social structure of an NLSOM?

3D Generation Image Captioning +2

Paper
Add Code

How To Not Train Your Dragon: Training-free Embodied Object Goal Navigation with Semantic Frontiers

no code implementations • 26 May 2023 • Junting Chen, Guohao Li, Suryansh Kumar, Bernard Ghanem, Fisher Yu

Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.

Imitation Learning Navigate +2

Paper
Add Code

Rapid Adaptation in Online Continual Learning: Are We Evaluating It Right?

1 code implementation • ICCV 2023 • Hasan Abed Al Kader Hammoud, Ameya Prabhu, Ser-Nam Lim, Philip H. S. Torr, Adel Bibi, Bernard Ghanem

We revisit the common practice of evaluating adaptation of Online Continual Learning (OCL) algorithms through the metric of online accuracy, which measures the accuracy of the model on the immediate next few samples.

Continual Learning

Paper
Code

Large-capacity and Flexible Video Steganography via Invertible Neural Network

1 code implementation • CVPR 2023 • Chong Mou, Youmin Xu, Jiechong Song, Chen Zhao, Bernard Ghanem, Jian Zhang

For large-capacity, we present a reversible pipeline to perform multiple videos hiding and recovering through a single invertible neural network (INN).

Paper
Code

LLM as A Robotic Brain: Unifying Egocentric Memory and Control

no code implementations • 19 Apr 2023 • Jinjie Mai, Jun Chen, Bing Li, Guocheng Qian, Mohamed Elhoseiny, Bernard Ghanem

In this paper, we propose a novel and generalizable framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control.

Embodied Question Answering Language Modelling +2

Paper
Add Code

Revisiting Test Time Adaptation under Online Evaluation

1 code implementation • 10 Apr 2023 • Motasem Alfarra, Hani Itani, Alejandro Pardo, Shyma Alhuwaider, Merey Ramazanova, Juan C. Pérez, Zhipeng Cai, Matthias Müller, Bernard Ghanem

To address this issue, we propose a more realistic evaluation protocol for TTA methods, where data is received in an online fashion from a constant-speed data stream, thereby accounting for the method's adaptation speed.

Test-time Adaptation

Paper
Code

SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries

no code implementations • 10 Apr 2023 • Hassan Mkhallati, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck

By providing broadcasters with a tool to summarize the content of their video with the same level of engagement as a live game, our method could help satisfy the needs of the numerous fans who follow their team but cannot necessarily watch the live game.

Dense Video Captioning

Paper
Add Code

VARS: Video Assistant Referee System for Automated Soccer Decision Making from Multiple Views

no code implementations • 10 Apr 2023 • Jan Held, Anthony Cioppa, Silvio Giancola, Abdullah Hamdi, Bernard Ghanem, Marc Van Droogenbroeck

The Video Assistant Referee (VAR) has revolutionized association football, enabling referees to review incidents on the pitch, make informed decisions, and ensure fairness.

Decision Making Fairness

Paper
Add Code

Towards Active Learning for Action Spotting in Association Football Videos

no code implementations • 9 Apr 2023 • Silvio Giancola, Anthony Cioppa, Julia Georgieva, Johsan Billingham, Andreas Serner, Kerry Peek, Bernard Ghanem, Marc Van Droogenbroeck

In this paper, we propose an active learning framework that selects the most informative video samples to be annotated next, thus drastically reducing the annotation effort and accelerating the training of action spotting models to reach the highest accuracy at a faster pace.

Action Spotting Active Learning

Paper
Add Code

Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions

no code implementations • 6 Apr 2023 • Jia-Hong Huang, Modar Alfadly, Bernard Ghanem, Marcel Worring

This work proposes a new method that utilizes semantically related questions, referred to as basic questions, acting as noise to evaluate the robustness of VQA models.

In-Context Learning Question Answering +1

Paper
Add Code

Boundary-Denoising for Video Activity Localization

1 code implementation • 6 Apr 2023 • Mengmeng Xu, Mattia Soldan, Jialin Gao, Shuming Liu, Juan-Manuel Pérez-Rúa, Bernard Ghanem

To alleviate the boundary ambiguity, we propose to study the video activity localization problem from a denoising perspective.

Ranked #1 on Video Grounding on MAD

Action Detection Denoising +2

Paper
Code

Online Distillation with Continual Learning for Cyclic Domain Shifts

1 code implementation • 3 Apr 2023 • Joachim Houyon, Anthony Cioppa, Yasir Ghunaim, Motasem Alfarra, Anaïs Halin, Maxim Henry, Bernard Ghanem, Marc Van Droogenbroeck

In this paper, we propose a solution to this issue by leveraging the power of continual learning methods to reduce the impact of domain shifts.

Autonomous Driving Continual Learning

Paper
Code

CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

2 code implementations • NeurIPS 2023 • Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, Bernard Ghanem

Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond: https://github. com/camel-ai/camel.

Instruction Following Language Modelling +1

4,421

Paper
Code

Don't FREAK Out: A Frequency-Inspired Approach to Detecting Backdoor Poisoned Samples in DNNs

no code implementations • 23 Mar 2023 • Hasan Abed Al Kader Hammoud, Adel Bibi, Philip H. S. Torr, Bernard Ghanem

In this paper we investigate the frequency sensitivity of Deep Neural Networks (DNNs) when presented with clean samples versus poisoned samples.

Paper
Add Code

Computationally Budgeted Continual Learning: What Does Matter?

1 code implementation • CVPR 2023 • Ameya Prabhu, Hasan Abed Al Kader Hammoud, Puneet Dokania, Philip H. S. Torr, Ser-Nam Lim, Bernard Ghanem, Adel Bibi

Our conclusions are consistent in a different number of stream time steps, e. g., 20 to 200, and under several computational budgets.

Continual Learning

Paper
Code

A Unified Continual Learning Framework with General Parameter-Efficient Tuning

1 code implementation • ICCV 2023 • Qiankun Gao, Chen Zhao, Yifan Sun, Teng Xi, Gang Zhang, Bernard Ghanem, Jian Zhang

1) Learning: the pre-trained model adapts to the new task by tuning an online PET module, along with our adaptation speed calibration to align different PET modules, 2) Accumulation: the task-specific knowledge learned by the online PET module is accumulated into an offline PET module through momentum update, 3) Ensemble: During inference, we respectively construct two experts with online/offline PET modules (which are favored by the novel/historical tasks) for prediction ensemble.

Continual Learning

Paper
Code

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

1 code implementation • ICCV 2023 • Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, Jian Zhang

In this work, we propose a training-Free conditional Diffusion Model (FreeDoM) used for various conditions.

Face Detection

245

Paper
Code

Re-ReND: Real-time Rendering of NeRFs across Devices

1 code implementation • ICCV 2023 • Sara Rojas, Jesus Zarzar, Juan Camilo Perez, Artsiom Sanakoyeu, Ali Thabet, Albert Pumarola, Bernard Ghanem

Re-ReND is designed to achieve real-time performance by converting the NeRF into a representation that can be efficiently processed by standard graphics pipelines.

Paper
Code

Improving GAN Training via Feature Space Shrinkage

1 code implementation • 2 Mar 2023 • Haozhe Liu, Wentian Zhang, Bing Li, Haoqian Wu, Nanjun He, Yawen Huang, Yuexiang Li, Bernard Ghanem, Yefeng Zheng

The evaluation results demonstrate that our AdaptiveMix can facilitate the training of GANs and effectively improve the image quality of generated samples.

Out of Distribution (OOD) Detection

Paper
Code

Localizing Moments in Long Video Via Multimodal Guidance

1 code implementation • ICCV 2023 • Wayner Barrios, Mattia Soldan, Alberto Mario Ceballos-Arroyo, Fabian Caba Heilbron, Bernard Ghanem

In this paper, we propose a method for improving the performance of natural language grounding in long videos by identifying and pruning out non-describable windows.

Ranked #1 on Natural Language Moment Retrieval on MAD

Natural Language Moment Retrieval Natural Language Visual Grounding +2

Paper
Code

Real-Time Evaluation in Online Continual Learning: A New Hope

1 code implementation • CVPR 2023 • Yasir Ghunaim, Adel Bibi, Kumail Alhamoud, Motasem Alfarra, Hasan Abed Al Kader Hammoud, Ameya Prabhu, Philip H. S. Torr, Bernard Ghanem

We show that a simple baseline outperforms state-of-the-art CL methods under this evaluation, questioning the applicability of existing methods in realistic settings.

Continual Learning

Paper
Code

Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition

no code implementations • 3 Jan 2023 • Hasan Abed Al Kader Hammoud, Shuming Liu, Mohammed Alkhrashi, Fahad Albalawi, Bernard Ghanem

Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain.

Action Recognition Temporal Action Localization

Paper
Add Code

NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation

no code implementations • CVPR 2023 • Haoqian Wu, Keyu Chen, Haozhe Liu, Mingchen Zhuge, Bing Li, Ruizhi Qiao, Xiujun Shu, Bei Gan, Liangsheng Xu, Bo Ren, Mengmeng Xu, Wentian Zhang, Raghavendra Ramachandra, Chia-Wen Lin, Bernard Ghanem

Temporal video segmentation is the get-to-go automatic video analysis, which decomposes a long-form video into smaller components for the following-up understanding tasks.

Video Segmentation Video Semantic Segmentation

Paper
Add Code

AdaptiveMix: Improving GAN Training via Feature Space Shrinkage

1 code implementation • CVPR 2023 • Haozhe Liu, Wentian Zhang, Bing Li, Haoqian Wu, Nanjun He, Yawen Huang, Yuexiang Li, Bernard Ghanem, Yefeng Zheng

The evaluation results demonstrate that our AdaptiveMix can facilitate the training of GANs and effectively improve the image quality of generated samples.

Out of Distribution (OOD) Detection

Paper
Code

Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only

no code implementations • ICCV 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Sean Chang Culatana, Mohamed Elhoseiny

Semantic segmentation is a crucial task in computer vision that involves segmenting images into semantically meaningful regions at the pixel level.

Open Vocabulary Semantic Segmentation Segmentation +3

Paper
Add Code

Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

no code implementations • CVPR 2023 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Bernard Ghanem

Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content.

Temporal Action Localization

Paper
Add Code

MVTN: Learning Multi-View Transformations for 3D Understanding

1 code implementation • 27 Dec 2022 • Abdullah Hamdi, Faisal AlZahrani, Silvio Giancola, Bernard Ghanem

Multi-view projection techniques have shown themselves to be highly effective in achieving top-performing results in the recognition of 3D shapes.

3D Classification 3D Shape Classification +2

Paper
Code

SPARF: Large-Scale Learning of 3D Sparse Radiance Fields from Few Input Images

1 code implementation • 18 Dec 2022 • Abdullah Hamdi, Bernard Ghanem, Matthias Nießner

SuRFNet employs partial SRFs from few/one images and a specialized SRF loss to learn to generate high-quality sparse voxel radiance fields that can be rendered from novel views.

Novel View Synthesis

Paper
Code

EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries

1 code implementation • ICCV 2023 • Jinjie Mai, Abdullah Hamdi, Silvio Giancola, Chen Zhao, Bernard Ghanem

Yet, we point out that the low number of camera poses caused by camera re-localization from previous VQ3D methods severally hinders their overall success rate.

3D Reconstruction Object +2

Paper
Code

PIVOT: Prompting for Video Continual Learning

no code implementations • CVPR 2023 • Andrés Villa, Juan León Alcázar, Motasem Alfarra, Kumail Alhamoud, Julio Hurtado, Fabian Caba Heilbron, Alvaro Soto, Bernard Ghanem

In this paper, we address the problem of continual learning for video data.

Continual Learning

Paper
Add Code

SimCS: Simulation for Domain Incremental Online Continual Segmentation

no code implementations • 29 Nov 2022 • Motasem Alfarra, Zhipeng Cai, Adel Bibi, Bernard Ghanem, Matthias Müller

This work explores the problem of Online Domain-Incremental Continual Segmentation (ODICS), where the model is continually trained over batches of densely labeled images from different domains, with limited computation and no information about the task boundaries.

Autonomous Driving Continual Learning +2

Paper
Add Code

On Robust Learning from Noisy Labels: A Permutation Layer Approach

no code implementations • 29 Nov 2022 • Salman AlSubaihi, Mohammed Alkhrashi, Raied Aljadaany, Fahad Albalawi, Bernard Ghanem

We provide two variants of PermLL in this paper: one applies the permutation layer to the model's prediction, while the other applies it directly to the given noisy label.

Paper
Add Code

Multi-Modal Few-Shot Temporal Action Detection

1 code implementation • 27 Nov 2022 • Sauradip Nag, Mengmeng Xu, Xiatian Zhu, Juan-Manuel Perez-Rua, Bernard Ghanem, Yi-Zhe Song, Tao Xiang

In this work, we introduce a new multi-modality few-shot (MMFS) TAD problem, which can be considered as a marriage of FS-TAD and ZS-TAD by leveraging few-shot support videos and new class names jointly.

Action Detection Few-Shot Object Detection +3

Paper
Code

Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

1 code implementation • 25 Nov 2022 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Bernard Ghanem

Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content.

Temporal Action Localization

Paper
Code

SegNeRF: 3D Part Segmentation with Neural Radiance Fields

no code implementations • 21 Nov 2022 • Jesus Zarzar, Sara Rojas, Silvio Giancola, Bernard Ghanem

The predicted semantic fields allow SegNeRF to achieve an average mIoU of $\textbf{30. 30%}$ for 2D novel view segmentation, and $\textbf{37. 46%}$ for 3D part segmentation, boasting competitive performance against point-based methods by using only a few posed images.

3D Part Segmentation 3D Reconstruction +2

Paper
Add Code

Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training

1 code implementation • 21 Nov 2022 • Ling Yang, Zhilin Huang, Yang song, Shenda Hong, Guohao Li, Wentao Zhang, Bin Cui, Bernard Ghanem, Ming-Hsuan Yang

Generating images from graph-structured inputs, such as scene graphs, is uniquely challenging due to the difficulty of aligning nodes and connections in graphs with objects and their relations in images.

Image Generation

Paper
Code

Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization

1 code implementation • CVPR 2023 • Mengmeng Xu, Yanghao Li, Cheng-Yang Fu, Bernard Ghanem, Tao Xiang, Juan-Manuel Perez-Rua

Our experiments show the proposed adaptations improve egocentric query detection, leading to a better visual query localization system in both 2D and 3D configurations.

Object

Paper
Code

Estimating more camera poses for ego-centric videos is essential for VQ3D

no code implementations • 18 Nov 2022 • Jinjie Mai, Chen Zhao, Abdullah Hamdi, Silvio Giancola, Bernard Ghanem

Visual queries 3D localization (VQ3D) is a task in the Ego4D Episodic Memory Benchmark.

Object Pose Estimation

Paper
Add Code

Decoupled Mixup for Generalized Visual Recognition

1 code implementation • 26 Oct 2022 • Haozhe Liu, Wentian Zhang, Jinheng Xie, Haoqian Wu, Bing Li, Ziqi Zhang, Yuexiang Li, Yawen Huang, Bernard Ghanem, Yefeng Zheng

Since the observation is that noise-prone regions such as textural and clutter backgrounds are adverse to the generalization ability of CNN models during training, we enhance features from discriminative regions and suppress noise-prone ones when combining an image pair.

Paper
Code

SoccerNet 2022 Challenges Results

7 code implementations • 5 Oct 2022 • Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li

The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.

Action Spotting Camera Calibration +3

Paper
Code

Generalizability of Adversarial Robustness Under Distribution Shifts

no code implementations • 29 Sep 2022 • Kumail Alhamoud, Hasan Abed Al Kader Hammoud, Motasem Alfarra, Bernard Ghanem

Recent progress in empirical and certified robustness promises to deliver reliable and deployable Deep Neural Networks (DNNs).

Adversarial Robustness Domain Generalization

Paper
Add Code

Combating Mode Collapse in GANs via Manifold Entropy Estimation

1 code implementation • 25 Aug 2022 • Haozhe Liu, Bing Li, Haoqian Wu, Hanbang Liang, Yawen Huang, Yuexiang Li, Bernard Ghanem, Yefeng Zheng

In this paper, we propose a novel training pipeline to address the mode collapse issue of GANs.

Paper
Code

Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding

1 code implementation • 25 Aug 2022 • Guocheng Qian, Abdullah Hamdi, Xingdi Zhang, Bernard Ghanem

Pretrained on a large number of widely available images, significant gains of PViT are observed in the tasks of 3D point cloud classification, part segmentation, and semantic segmentation on ScanObjectNN, ShapeNetPart, and S3DIS, respectively.

3D Point Cloud Classification Inductive Bias +2

Paper
Code

Negative Frames Matter in Egocentric Visual Query 2D Localization

1 code implementation • 3 Aug 2022 • Mengmeng Xu, Cheng-Yang Fu, Yanghao Li, Bernard Ghanem, Juan-Manuel Perez-Rua, Tao Xiang

The repeated gradient computation of the same object lead to an inefficient training; (2) The false positive rate is high on background frames.

Object

Paper
Code

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

1 code implementation • 4 Jul 2022 • Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou

In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR).

Language Modelling Object State Change Classification

204

Paper
Code

PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies

3 code implementations • 9 Jun 2022 • Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Abed Al Kader Hammoud, Mohamed Elhoseiny, Bernard Ghanem

In this work, we revisit the classical PointNet++ through a systematic study of model training and scaling strategies, and offer two major contributions.

Ranked #3 on 3D Semantic Segmentation on OpenTrench3D

3D Classification 3D Part Segmentation +3

700

Paper
Code

Certified Robustness in Federated Learning

1 code implementation • 6 Jun 2022 • Motasem Alfarra, Juan C. Pérez, Egor Shulgin, Peter Richtárik, Bernard Ghanem

However, as in the single-node supervised learning setup, models trained in federated learning suffer from vulnerability to imperceptible input transformations known as adversarial attacks, questioning their deployment in security-related applications.

Federated Learning

Paper
Code

Egocentric Video-Language Pretraining

2 code implementations • 3 Jun 2022 • Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou

Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention.

Ranked #2 on Video Summarization on Query-Focused Video Summarization Dataset

Action Recognition Contrastive Learning +11

204

Paper
Code

ETAD: Training Action Detection End to End on a Laptop

1 code implementation • 14 May 2022 • Shuming Liu, Mengmeng Xu, Chen Zhao, Xu Zhao, Bernard Ghanem

We propose to sequentially forward the snippet frame through the video encoder, and backward only a small necessary portion of gradients to update the encoder.

Action Detection Video Understanding

Paper
Code

UnrealNAS: Can We Search Neural Architectures with Unreal Data?

no code implementations • 4 May 2022 • Zhen Dong, Kaicheng Zhou, Guohao Li, Qiang Zhou, Mingfei Guo, Bernard Ghanem, Kurt Keutzer, Shanghang Zhang

Neural architecture search (NAS) has shown great success in the automatic design of deep neural networks (DNNs).

Neural Architecture Search

Paper
Add Code

Contrastive Language-Action Pre-training for Temporal Localization

no code implementations • 26 Apr 2022 • Mengmeng Xu, Erhan Gundogdu, Maksim Lapin, Bernard Ghanem, Michael Donoser, Loris Bazzani

Long-form video understanding requires designing approaches that are able to temporally localize activities or language.

Contrastive Learning Few Shot Temporal Action Localization +3

Paper
Add Code

SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos

no code implementations • 14 Apr 2022 • Anthony Cioppa, Silvio Giancola, Adrien Deliege, Le Kang, Xin Zhou, Zhiyu Cheng, Bernard Ghanem, Marc Van Droogenbroeck

Tracking objects in soccer videos is extremely important to gather both player and team statistics, whether it is to estimate the total distance run, the ball possession or the team formation.

Benchmarking Multiple Object Tracking

Paper
Add Code

3DeformRS: Certifying Spatial Deformations on Point Clouds

1 code implementation • CVPR 2022 • Gabriel Pérez S., Juan C. Pérez, Motasem Alfarra, Silvio Giancola, Bernard Ghanem

In this work, we propose 3DeformRS, a method to certify the robustness of point cloud Deep Neural Networks (DNNs) against real-world deformations.

Autonomous Driving

Paper
Code

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search

1 code implementation • 11 Apr 2022 • Guocheng Qian, Xuanyang Zhang, Guohao Li, Chen Zhao, Yukang Chen, Xiangyu Zhang, Bernard Ghanem, Jian Sun

TNAS performs a modified bi-level Breadth-First Search in the proposed trees to discover a high-performance architecture.

Ranked #7 on Neural Architecture Search on NAS-Bench-201, CIFAR-10

Neural Architecture Search

Paper
Code

Real-time Hyperspectral Imaging in Hardware via Trained Metasurface Encoders

1 code implementation • CVPR 2022 • Maksim Makarenko, Arturo Burguete-Lopez, Qizhou Wang, Fedor Getman, Silvio Giancola, Bernard Ghanem, Andrea Fratalocchi

Hyperspectral imaging has attracted significant attention to identify spectral signatures for image classification and automated pattern recognition in computer vision.

Image Classification Semantic Segmentation +1

Paper
Code

End-to-End Active Speaker Detection

1 code implementation • 27 Mar 2022 • Juan Leon Alcazar, Moritz Cordes, Chen Zhao, Bernard Ghanem

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process: feature extraction and spatio-temporal context aggregation.

Ranked #4 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker (using extra training data)

Audio-Visual Active Speaker Detection

Paper
Code

R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning

1 code implementation • 24 Mar 2022 • Qiankun Gao, Chen Zhao, Bernard Ghanem, Jian Zhang

After RRL, the classification head is refined with global class-balanced classification loss to address the data imbalance issue as well as learn the decision boundaries between new and previous classes.

Class Incremental Learning Incremental Learning +3

Paper
Code

Learning Scene Flow in 3D Point Clouds with Noisy Pseudo Labels

no code implementations • 23 Mar 2022 • Bing Li, Cheng Zheng, Guohao Li, Bernard Ghanem

To provide an alternative, we propose a novel approach that utilizes monocular RGB images and point clouds to generate pseudo scene flow labels for training scene flow networks.

Pseudo Label Self-Supervised Learning

Paper
Add Code

SegTAD: Precise Temporal Action Detection via Semantic Segmentation

no code implementations • 3 Mar 2022 • Chen Zhao, Merey Ramazanova, Mengmeng Xu, Bernard Ghanem

To address these issues and precisely model temporal action detection, we formulate the task of temporal action detection in a novel perspective of semantic segmentation.

Action Detection object-detection +3

Paper
Add Code

OWL (Observe, Watch, Listen): Audiovisual Temporal Context for Localizing Actions in Egocentric Videos

no code implementations • 10 Feb 2022 • Merey Ramazanova, Victor Escorcia, Fabian Caba Heilbron, Chen Zhao, Bernard Ghanem

We validate our approach in two large-scale datasets, EPIC-Kitchens, and HOMAGE.

Temporal Action Localization Temporal Localization

Paper
Add Code

Towards Assessing and Characterizing the Semantic Robustness of Face Recognition

no code implementations • 10 Feb 2022 • Juan C. Pérez, Motasem Alfarra, Ali Thabet, Pablo Arbeláez, Bernard Ghanem

We propose a methodology for assessing and characterizing the robustness of FRMs against semantic perturbations to their input.

Face Recognition

Paper
Add Code

On the Robustness of Quality Measures for GANs

1 code implementation • 31 Jan 2022 • Motasem Alfarra, Juan C. Pérez, Anna Frühstück, Philip H. S. Torr, Peter Wonka, Bernard Ghanem

Finally, we show that the FID can be robustified by simply replacing the standard Inception with a robust Inception.

Paper
Code

vCLIMB: A Novel Video Class Incremental Learning Benchmark

no code implementations • CVPR 2022 • Andrés Villa, Kumail Alhamoud, Juan León Alcázar, Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem

We perform in-depth evaluations of existing CL methods in vCLIMB, and observe two unique challenges in video data.

Class Incremental Learning Incremental Learning

Paper
Add Code

Spatio-temporal Relation Modeling for Few-shot Action Recognition

1 code implementation • CVPR 2022 • Anirudh Thatipelli, Sanath Narayan, Salman Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, Bernard Ghanem

Experiments are performed on four few-shot action recognition benchmarks: Kinetics, SSv2, HMDB51 and UCF101.

Ranked #1 on Few Shot Action Recognition on UCF101 (using extra training data)

Few-Shot action recognition Few Shot Action Recognition +1

Paper
Code

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

1 code implementation • CVPR 2022 • Mattia Soldan, Alejandro Pardo, Juan León Alcázar, Fabian Caba Heilbron, Chen Zhao, Silvio Giancola, Bernard Ghanem

The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques.

Ranked #2 on Natural Language Moment Retrieval on MAD

Moment Retrieval Natural Language Moment Retrieval

134

Paper
Code

Low-Fidelity Video Encoder Optimization for Temporal Action Localization

no code implementations • NeurIPS 2021 • Mengmeng Xu, Juan Manuel Perez Rua, Xiatian Zhu, Bernard Ghanem, Brais Martinez

This results in a task discrepancy problem for the video encoder – trained for action classification, but used for TAL.

Ranked #9 on Temporal Action Localization on HACS

Action Classification Optical Flow Estimation +2

Paper
Add Code

Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding

2 code implementations • 30 Nov 2021 • Abdullah Hamdi, Silvio Giancola, Bernard Ghanem

To this end, we introduce the concept of the multi-view point cloud (Voint cloud), representing each 3D point as a set of features extracted from several view-points.

3D Classification 3D Part Segmentation +3

Paper
Code

ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning

1 code implementation • NeurIPS 2021 • Guocheng Qian, Hasan Abed Al Kader Hammoud, Guohao Li, Ali Thabet, Bernard Ghanem

We then introduce a new Anisotropic Reduction function into our Separable SA module and propose an Anisotropic Separable SA (ASSA) module that substantially increases the network's accuracy.

Ranked #33 on 3D Part Segmentation on ShapeNet-Part

3D Part Segmentation 3D Point Cloud Classification +2

Paper
Code

Ego4D: Around the World in 3,000 Hours of Egocentric Video

6 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

5,013

Paper
Code

Relation-aware Video Reading Comprehension for Temporal Language Grounding

1 code implementation • EMNLP 2021 • Jialin Gao, Xin Sun, Mengmeng Xu, Xi Zhou, Bernard Ghanem

Temporal language grounding in videos aims to localize the temporal span relevant to the given query sentence.

Reading Comprehension Relation +1

Paper
Code

MovieCuts: A New Dataset and Benchmark for Cut Type Recognition

1 code implementation • 12 Sep 2021 • Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem

Advances in automatic Cut-type recognition can unleash new experiences in the video editing industry, such as movie analysis for education, video re-editing, virtual cinematography, machine-assisted trailer generation, machine-assisted video editing, among others.

Video Editing Vocal Bursts Type Prediction

Paper
Code

Check Your Other Door! Creating Backdoor Attacks in the Frequency Domain

no code implementations • 12 Sep 2021 • Hasan Abed Al Kader Hammoud, Bernard Ghanem

Deep Neural Networks (DNNs) are ubiquitous and span a variety of applications ranging from image classification to real-time object detection.

Backdoor Attack Image Classification +2

Paper
Add Code

Learning to Cut by Watching Movies

1 code implementation • ICCV 2021 • Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem

Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise.

Contrastive Learning Video Editing

Paper
Code

Enhancing Adversarial Robustness via Test-time Transformation Ensembling

1 code implementation • 29 Jul 2021 • Juan C. Pérez, Motasem Alfarra, Guillaume Jeanneret, Laura Rueda, Ali Thabet, Bernard Ghanem, Pablo Arbeláez

Deep learning models are prone to being fooled by imperceptible perturbations known as adversarial attacks.

Adversarial Robustness

Paper
Code

ANCER: Anisotropic Certification via Sample-wise Volume Maximization

1 code implementation • 9 Jul 2021 • Francisco Eiras, Motasem Alfarra, M. Pawan Kumar, Philip H. S. Torr, Puneet K. Dokania, Bernard Ghanem, Adel Bibi

Randomized smoothing has recently emerged as an effective tool that enables certification of deep neural network classifiers at scale.

Paper
Code

DeformRS: Certifying Input Deformations with Randomized Smoothing

2 code implementations • 2 Jul 2021 • Motasem Alfarra, Adel Bibi, Naeemullah Khan, Philip H. S. Torr, Bernard Ghanem

Deep neural networks are vulnerable to input deformations in the form of vector fields of pixel displacements and to other parameterized geometric deformations e. g. translations, rotations, etc.

Paper
Code

Training Graph Neural Networks with 1000 Layers

4 code implementations • 14 Jun 2021 • Guohao Li, Matthias Müller, Bernard Ghanem, Vladlen Koltun

Deep graph neural networks (GNNs) have achieved excellent results on various tasks on increasingly large graph datasets with millions of nodes and edges.

Ranked #1 on Node Property Prediction on ogbn-proteins

Graph Sampling Node Property Prediction

1,120

Paper
Code

APES: Audiovisual Person Search in Untrimmed Video

1 code implementation • 3 Jun 2021 • Juan Leon Alcazar, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem, Fabian Caba Heilbron

To showcase the potential of our new dataset, we propose an audiovisual baseline and benchmark for person retrieval.

Person Retrieval Person Search +3

Paper
Code

SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation

1 code implementation • 10 May 2021 • Bing Li, Cheng Zheng, Silvio Giancola, Bernard Ghanem

We propose a novel scene flow estimation approach to capture and infer 3D motions from point clouds.

Scene Flow Estimation

Paper
Code

Camera Calibration and Player Localization in SoccerNet-v2 and Investigation of their Representations for Action Spotting

no code implementations • 19 Apr 2021 • Anthony Cioppa, Adrien Deliège, Floriane Magera, Silvio Giancola, Olivier Barnich, Bernard Ghanem, Marc Van Droogenbroeck

Specifically, we distill a powerful commercial calibration tool in a recent neural network architecture on the large-scale SoccerNet dataset, composed of untrimmed broadcast videos of 500 soccer games.

Action Spotting Camera Calibration +1

Paper
Add Code

Temporally-Aware Feature Pooling for Action Spotting in Soccer Broadcasts

1 code implementation • 14 Apr 2021 • Silvio Giancola, Bernard Ghanem

In this paper, we focus our analysis on action spotting in soccer broadcast, which consists in temporally localizing the main actions in a soccer game.

Ranked #7 on Action Spotting on SoccerNet-v2 (Average-mAP metric)

Action Spotting

159

Paper
Code

Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization

no code implementations • 28 Mar 2021 • Mengmeng Xu, Juan-Manuel Perez-Rua, Xiatian Zhu, Bernard Ghanem, Brais Martinez

This results in a task discrepancy problem for the video encoder -- trained for action classification, but used for TAL.

Action Classification Model Optimization +3

Paper
Add Code

Combating Adversaries with Anti-Adversaries

1 code implementation • ICML Workshop AML 2021 • Motasem Alfarra, Juan C. Pérez, Ali Thabet, Adel Bibi, Philip H. S. Torr, Bernard Ghanem

Deep neural networks are vulnerable to small input perturbations known as adversarial attacks.

Paper
Code

AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation

3 code implementations • 24 Feb 2021 • Bing Li, Yuanlue Zhu, Yitong Wang, Chia-Wen Lin, Bernard Ghanem, Linlin Shen

Specifically, a new generator architecture is proposed to simultaneously transfer color/texture styles and transform local facial shapes into anime-like counterparts based on the style of a reference anime-face, while preserving the global structure of the source photo-face.

Face Generation Translation

Paper
Code

MAAS: Multi-modal Assignation for Active Speaker Detection

1 code implementation • ICCV 2021 • Juan León-Alcázar, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem

Active speaker detection requires a solid integration of multi-modal cues.

Ranked #13 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker

Audio-Visual Active Speaker Detection

Paper
Code

On the Decision Boundaries of Neural Networks. A Tropical Geometry Perspective

no code implementations • 1 Jan 2021 • Motasem Alfarra, Adel Bibi, Hasan Abed Al Kader Hammoud, Mohamed Gaafar, Bernard Ghanem

This work tackles the problem of characterizing and understanding the decision boundaries of neural networks with piecewise linear non-linearity activations.

Network Pruning

Paper
Add Code

High Quality Disparity Remapping With Two-Stage Warping

no code implementations • ICCV 2021 • Bing Li, Chia-Wen Lin, Cheng Zheng, Shan Liu, Junsong Yuan, Bernard Ghanem, C.-C. Jay Kuo

In the second stage, we derive another warping model to refine warping results in less important regions by eliminating serious distortions in shape, disparity and 3D structure.

Vocal Bursts Intensity Prediction Vocal Bursts Valence Prediction

Paper
Add Code

DeeperGCN: Training Deeper GCNs with Generalized Aggregation Functions

no code implementations • 1 Jan 2021 • Guohao Li, Chenxin Xiong, Ali Thabet, Bernard Ghanem

We add our generalized aggregation into a deep GCN framework and show it achieves state-of-the-art results in six benchmarks from OGB.

Point Cloud Classification Representation Learning

Paper
Add Code

SALA: Soft Assignment Local Aggregation for Parameter Efficient 3D Semantic Segmentation

no code implementations • 29 Dec 2020 • Hani Itani, Silvio Giancola, Ali Thabet, Bernard Ghanem

Since it is learnable, this mapping is allowed to be different per layer instead of being applied uniformly throughout the depth of the network.

3D Semantic Segmentation

Paper
Add Code

Data-Dependent Randomized Smoothing

no code implementations • 8 Dec 2020 • Motasem Alfarra, Adel Bibi, Philip H. S. Torr, Bernard Ghanem

In this work, we revisit Gaussian randomized smoothing and show that the variance of the Gaussian distribution can be optimized at each input so as to maximize the certification radius for the construction of the smooth classifier.

Paper
Add Code

Video Self-Stitching Graph Network for Temporal Action Localization

1 code implementation • ICCV 2021 • Chen Zhao, Ali Thabet, Bernard Ghanem

We have two key components in VSGN: video self-stitching (VSS) and cross-scale graph pyramid network (xGPN).

Ranked #16 on Temporal Action Localization on ActivityNet-1.3

Temporal Action Localization

Paper
Code

SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos

3 code implementations • 26 Nov 2020 • Adrien Deliège, Anthony Cioppa, Silvio Giancola, Meisam J. Seikavandi, Jacob V. Dueholm, Kamal Nasrollahi, Bernard Ghanem, Thomas B. Moeslund, Marc Van Droogenbroeck

In this work, we propose SoccerNet-v2, a novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production.

Ranked #1 on Camera shot segmentation on SoccerNet-v2

Action Spotting Boundary Detection +5

159

Paper
Code

MVTN: Multi-View Transformation Network for 3D Shape Recognition

2 code implementations • ICCV 2021 • Abdullah Hamdi, Silvio Giancola, Bernard Ghanem

MVTN exhibits clear performance gains in the tasks of 3D shape classification and 3D shape retrieval without the need for extra training supervision.

Ranked #1 on 3D Object Retrieval on ModelNet40

3D Classification 3D Object Retrieval +6

Paper
Code

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

1 code implementation • 23 Nov 2020 • Humam Alwassel, Silvio Giancola, Bernard Ghanem

Extensive experiments show that using features trained with our novel pretraining strategy significantly improves the performance of recent state-of-the-art methods on three tasks: Temporal Action Localization, Action Proposal Generation, and Dense Video Captioning.

Ranked #5 on Temporal Action Proposal Generation on ActivityNet-1.3

Action Classification Dense Video Captioning +2

105

Paper
Code

Boundary-sensitive Pre-training for Temporal Localization in Videos

1 code implementation • ICCV 2021 • Mengmeng Xu, Juan-Manuel Perez-Rua, Victor Escorcia, Brais Martinez, Xiatian Zhu, Li Zhang, Bernard Ghanem, Tao Xiang

However, most existing models developed for these tasks are pre-trained on general video action classification tasks.

Ranked #23 on Temporal Action Localization on ActivityNet-1.3

Action Classification Classification +3

Paper
Code

VLG-Net: Video-Language Graph Matching Network for Video Grounding

1 code implementation • 19 Nov 2020 • Mattia Soldan, Mengmeng Xu, Sisi Qu, Jesper Tegner, Bernard Ghanem

Grounding language queries in videos aims at identifying the time interval (or moment) semantically relevant to a language query.

Ranked #1 on Natural Language Moment Retrieval on DiDeMo

Graph Matching Moment Retrieval +3

Paper
Code

Robust Optimization as Data Augmentation for Large-scale Graphs

3 code implementations • CVPR 2022 • Kezhi Kong, Guohao Li, Mucong Ding, Zuxuan Wu, Chen Zhu, Bernard Ghanem, Gavin Taylor, Tom Goldstein

Data augmentation helps neural networks generalize better by enlarging the training set, but it remains an open question how to effectively augment graph data to enhance the performance of GNNs (Graph Neural Networks).

Ranked #1 on Graph Property Prediction on ogbg-ppa

Data Augmentation Graph Classification +4

266

Paper
Code

LC-NAS: Latency Constrained Neural Architecture Search for Point Cloud Networks

no code implementations • 24 Aug 2020 • Guohao Li, Mengmeng Xu, Silvio Giancola, Ali Thabet, Bernard Ghanem

In this paper, we introduce a new NAS framework, dubbed LC-NAS, where we search for point cloud architectures that are constrained to a target latency.

Neural Architecture Search Point Cloud Classification +2

Paper
Add Code

The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020)

1 code implementation • 3 Aug 2020 • Samuel Albanie, Yang Liu, Arsha Nagrani, Antoine Miech, Ernesto Coto, Ivan Laptev, Rahul Sukthankar, Bernard Ghanem, Andrew Zisserman, Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid, Shi-Zhe Chen, Yida Zhao, Qin Jin, Kaixu Cui, Hui Liu, Chen Wang, Yudong Jiang, Xiaoshuai Hao

This report summarizes the results of the first edition of the challenge together with the findings of the participants.

Natural Language Queries Retrieval +3

327

Paper
Code

Learning Heat Diffusion for Network Alignment

no code implementations • 10 Jul 2020 • Sisi Qu, Mengmeng Xu, Bernard Ghanem, Jesper Tegner

EDNA uses the diffusion signal as a proxy for computing node similarities between networks.

Paper
Add Code

Network Moments: Extensions and Sparse-Smooth Attacks

no code implementations • 21 Jun 2020 • Modar Alfadly, Adel Bibi, Emilio Botero, Salman AlSubaihi, Bernard Ghanem

This has incited research on the reaction of DNNs to noisy input, namely developing adversarial input attacks and strategies that lead to robust DNNs to these attacks.

Paper
Add Code

DeeperGCN: All You Need to Train Deeper GCNs

3 code implementations • 13 Jun 2020 • Guohao Li, Chenxin Xiong, Ali Thabet, Bernard Ghanem

Graph Convolutional Networks (GCNs) have been drawing significant attention with the power of representation learning on graphs.

Ranked #1 on Node Property Prediction on ogbn-proteins

Graph Learning Graph Property Prediction +3

12,992

Paper
Code

Rethinking Clustering for Robustness

1 code implementation • 13 Jun 2020 • Motasem Alfarra, Juan C. Pérez, Adel Bibi, Ali Thabet, Pablo Arbeláez, Bernard Ghanem

This paper studies how encouraging semantically-aligned features during deep neural network training can increase network robustness.

Clustering

Paper
Code

Active Speakers in Context

1 code implementation • CVPR 2020 • Juan Leon Alcazar, Fabian Caba Heilbron, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem

Current methods for active speak er detection focus on modeling short-term audiovisual information from a single speaker.

Ranked #15 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker

Audio-Visual Active Speaker Detection

Paper
Code

Adaptive Learning of the Optimal Batch Size of SGD

no code implementations • 3 May 2020 • Motasem Alfarra, Slavomir Hanzely, Alyazeed Albasyoni, Bernard Ghanem, Peter Richtarik

Recent advances in the theoretical understanding of SGD led to a formula for the optimal batch size minimizing the number of effective data passes, i. e., the number of iterations times the batch size.

Paper
Add Code

On the Decision Boundaries of Neural Networks: A Tropical Geometry Perspective

no code implementations • 20 Feb 2020 • Motasem Alfarra, Adel Bibi, Hasan Hammoud, Mohamed Gaafar, Bernard Ghanem

Our main finding is that the decision boundaries are a subset of a tropical hypersurface, which is intimately related to a polytope formed by the convex hull of two zonotopes.

Network Pruning

Paper
Add Code

RGB-based Semantic Segmentation Using Self-Supervised Depth Pre-Training

no code implementations • 6 Feb 2020 • Jean Lahoud, Bernard Ghanem

These labels, denoted by HN-labels, represent different height and normal patches, which allow mining of local semantic information that is useful in the task of semantic RGB segmentation.

Ranked #101 on Semantic Segmentation on NYU Depth v2

Segmentation Semantic Segmentation

Paper
Add Code

Analytical Moment Regularizer for Training Robust Networks

no code implementations • ICLR 2020 • Modar Alfadly, Adel Bibi, Muhammed Kocabas, Bernard Ghanem

In this work, we propose a new training regularizer that aims to minimize the probabilistic expected training loss of a DNN subject to a generic Gaussian input.

Data Augmentation

Paper
Add Code

Gabor Layers Enhance Network Robustness

1 code implementation • ECCV 2020 • Juan C. Pérez, Motasem Alfarra, Guillaume Jeanneret, Adel Bibi, Ali Thabet, Bernard Ghanem, Pablo Arbeláez

We revisit the benefits of merging classical vision concepts with deep learning models.

Paper
Code

A Context-Aware Loss Function for Action Spotting in Soccer Videos

1 code implementation • CVPR 2020 • Anthony Cioppa, Adrien Deliège, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck, Rikke Gade, Thomas B. Moeslund

We benchmark our loss on a large dataset of soccer videos, SoccerNet, and achieve an improvement of 12. 8% over the baseline.

Ranked #3 on Action Spotting on SoccerNet

Action Spotting Video Understanding

Paper
Code

AdvPC: Transferable Adversarial Perturbations on 3D Point Clouds

1 code implementation • ECCV 2020 • Abdullah Hamdi, Sara Rojas, Ali Thabet, Bernard Ghanem

Our proposed attack increases the attack success rate by up to 40% for those transferred to unseen networks (transferability), while maintaining a high success rate on the attacked network.

Adversarial Attack Classify 3D Point Clouds

Paper
Code

SGAS: Sequential Greedy Architecture Search

1 code implementation • CVPR 2020 • Guohao Li, Guocheng Qian, Itzel C. Delgadillo, Matthias Müller, Ali Thabet, Bernard Ghanem

Architecture design has become a crucial component of successful deep learning.

Ranked #4 on Node Classification on PPI

Classification General Classification +4

160

Paper
Code

PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks

1 code implementation • CVPR 2021 • Guocheng Qian, Abdulellah Abualshour, Guohao Li, Ali Thabet, Bernard Ghanem

We combine Inception DenseGCN with NodeShuffle into a new point upsampling pipeline called PU-GCN.

3D Reconstruction Point Cloud Super Resolution +1

164

Paper
Code

Assessing the Robustness of Visual Question Answering Models

no code implementations • 30 Nov 2019 • Jia-Hong Huang, Modar Alfadly, Bernard Ghanem, Marcel Worring

In this work, we propose a new method that uses semantically related questions, dubbed basic questions, acting as noise to evaluate the robustness of VQA models.

Question Answering Visual Question Answering

Paper
Add Code

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

1 code implementation • NeurIPS 2020 • Humam Alwassel, Dhruv Mahajan, Bruno Korbar, Lorenzo Torresani, Bernard Ghanem, Du Tran

To the best of our knowledge, XDC is the first self-supervised learning method that outperforms large-scale fully-supervised pretraining for action recognition on the same architecture.

Ranked #2 on Self-Supervised Action Recognition on UCF101 (finetuned)

Audio Classification Clustering +5

Paper
Code

PointRGCN: Graph Convolution Networks for 3D Vehicles Detection Refinement

no code implementations • 27 Nov 2019 • Jesus Zarzar, Silvio Giancola, Bernard Ghanem

We integrate residual GCNs in a two-stage 3D object detection pipeline, where 3D object proposals are refined using a novel graph representation.

Ranked #14 on 3D Object Detection on KITTI Cars Hard

3D Object Detection Autonomous Driving +2

Paper
Add Code

G-TAD: Sub-Graph Localization for Temporal Action Detection

7 code implementations • CVPR 2020 • Mengmeng Xu, Chen Zhao, David S. Rojas, Ali Thabet, Bernard Ghanem

In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem.

Ranked #5 on Temporal Action Localization on EPIC-KITCHENS-100

Temporal Action Localization

216

Paper
Code

DeepGCNs: Making GCNs Go as Deep as CNNs

4 code implementations • 15 Oct 2019 • Guohao Li, Matthias Müller, Guocheng Qian, Itzel C. Delgadillo, Abdulellah Abualshour, Ali Thabet, Bernard Ghanem

This work transfers concepts such as residual/dense connections and dilated convolutions from CNNs to GCNs in order to successfully train very deep GCNs.

Ranked #5 on 3D Semantic Segmentation on PartNet

3D Point Cloud Classification 3D Semantic Segmentation +2

1,120

Paper
Code

Expected Tight Bounds for Robust Deep Neural Network Training

no code implementations • 25 Sep 2019 • Salman AlSubaihi, Adel Bibi, Modar Alfadly, Abdullah Hamdi, Bernard Ghanem

al. that bounded input intervals can be inexpensively propagated from layer to layer through deep networks.

Paper
Add Code

On the Decision Boundaries of Deep Neural Networks: A Tropical Geometry Perspective

no code implementations • 25 Sep 2019 • Motasem Alfarra, Adel Bibi, Hasan Hammoud, Mohamed Gaafar, Bernard Ghanem

We use tropical geometry, a new development in the area of algebraic geometry, to provide a characterization of the decision boundaries of a simple neural network of the form (Affine, ReLU, Affine).

Network Pruning

Paper
Add Code

Finding Moments in Video Collections Using Natural Language

2 code implementations • 30 Jul 2019 • Victor Escorcia, Mattia Soldan, Josef Sivic, Bernard Ghanem, Bryan Russell

We evaluate our approach on two recently proposed datasets for temporal localization of moments in video with natural language (DiDeMo and Charades-STA) extended to our video corpus moment retrieval setting.

Moment Retrieval Re-Ranking +3

148

Paper
Code

Constrained Clustering: General Pairwise and Cardinality Constraints

1 code implementation • 24 Jul 2019 • Adel Bibi, Ali Alqahtani, Bernard Ghanem

Extensive experiments on both synthetic and real data demonstrate when: (1) utilizing a single category of constraint, the proposed model is superior to or competitive with SOTA constrained clustering models, and (2) utilizing both categories of constraints jointly, the proposed model shows better performance than the case of the single category.

Constrained Clustering

Paper
Code

3D Instance Segmentation via Multi-Task Metric Learning

no code implementations • ICCV 2019 • Jean Lahoud, Bernard Ghanem, Marc Pollefeys, Martin R. Oswald

The second goal is to learn instance information by densely estimating directional information of the instance's center of mass for each voxel.

Ranked #2 on 3D Semantic Instance Segmentation on ScanNetV2

3D Instance Segmentation 3D Reconstruction +6

Paper
Add Code

Expected Tight Bounds for Robust Training

2 code implementations • 28 May 2019 • Salman Al-Subaihi, Adel Bibi, Modar Alfadly, Abdullah Hamdi, Bernard Ghanem

In this paper, we closely examine the bounds of a block of layers composed in the form of Affine-ReLU-Affine.

Paper
Code

MAP Inference via L2-Sphere Linear Program Reformulation

1 code implementation • 9 May 2019 • Baoyuan Wu, Li Shen, Tong Zhang, Bernard Ghanem

Thus, LS-LP is equivalent to the original MAP inference problem.

valid

Paper
Code

Rethinking Learning-based Demosaicing, Denoising, and Super-Resolution Pipeline

1 code implementation • 7 May 2019 • Guocheng Qian, Yuanhao Wang, Jinjin Gu, Chao Dong, Wolfgang Heidrich, Bernard Ghanem, Jimmy S. Ren

In this work, we comprehensively study the effects of pipelines on the mixture problem of learning-based DN, DM, and SR, in both sequential and joint solutions.

Demosaicking Denoising +1

268

Paper
Code

Deep Layers as Stochastic Solvers

no code implementations • ICLR 2019 • Adel Bibi, Bernard Ghanem, Vladlen Koltun, Rene Ranftl

In particular, we show that a forward pass through a standard dropout layer followed by a linear layer and a non-linear activation is equivalent to optimizing a convex optimization objective with a single iteration of a $\tau$-nice Proximal Stochastic Gradient method.

Paper
Add Code

Analytical Moment Regularizer for Gaussian Robust Networks

1 code implementation • 24 Apr 2019 • Modar Alfadly, Adel Bibi, Bernard Ghanem

Despite the impressive performance of deep neural networks (DNNs) on numerous vision tasks, they still exhibit yet-to-understand uncouth behaviours.

Data Augmentation

Paper
Code

Learning a Controller Fusion Network by Online Trajectory Filtering for Vision-based UAV Racing

no code implementations • 18 Apr 2019 • Matthias Müller, Guohao Li, Vincent Casser, Neil Smith, Dominik L. Michels, Bernard Ghanem

A common approach is to learn an end-to-end policy that directly predicts controls from raw images by imitating an expert.

Paper
Add Code

IAN: Combining Generative Adversarial Networks for Imaginative Face Generation

no code implementations • 16 Apr 2019 • Abdullah Hamdi, Bernard Ghanem

Generative Adversarial Networks (GANs) have gained momentum for their ability to model image distributions.

Face Generation

Paper
Add Code

MAIN: Multi-Attention Instance Network for Video Segmentation

no code implementations • 11 Apr 2019 • Juan Leon Alcazar, Maria A. Bravo, Ali K. Thabet, Guillaume Jeanneret, Thomas Brox, Pablo Arbelaez, Bernard Ghanem

Instance-level video segmentation requires a solid integration of spatial and temporal information.

One-shot visual object segmentation Segmentation +2

Paper
Add Code

BAOD: Budget-Aware Object Detection

no code implementations • 10 Apr 2019 • Alejandro Pardo, Mengmeng Xu, Ali Thabet, Pablo Arbelaez, Bernard Ghanem

We adopt a hybrid supervised learning framework to train the object detector from both these types of annotation.

Active Learning Object +2

Paper
Add Code

ThumbNet: One Thumbnail Image Contains All You Need for Recognition

no code implementations • 10 Apr 2019 • Chen Zhao, Bernard Ghanem

Although deep convolutional neural networks (CNNs) have achieved great success in computer vision tasks, its real-world application is still impeded by its voracious demand of computational resources.

Paper
Add Code

Towards Analyzing Semantic Robustness of Deep Neural Networks

1 code implementation • 9 Apr 2019 • Abdullah Hamdi, Bernard Ghanem

Despite the impressive performance of Deep Neural Networks (DNNs) on various vision tasks, they still exhibit erroneous high sensitivity toward semantic primitives (e. g. object pose).

Adversarial Attack Autonomous Driving +1

Paper
Code

DeepGCNs: Can GCNs Go as Deep as CNNs?

1 code implementation • ICCV 2019 • Guohao Li, Matthias Müller, Ali Thabet, Bernard Ghanem

Finally, we use these new concepts to build a very deep 56-layer GCN, and show how it significantly boosts performance (+3. 7% mIoU over state-of-the-art) in the task of point cloud semantic segmentation.

3D Semantic Segmentation Graph Classification +1

627

Paper
Code

RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization

1 code implementation • 30 Mar 2019 • Alejandro Pardo, Humam Alwassel, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem

RefineLoc shows competitive results with the state-of-the-art in weakly-supervised temporal localization.

Temporal Localization Weakly Supervised Action Localization +2

Paper
Code

MortonNet: Self-Supervised Learning of Local Features in 3D Point Clouds

1 code implementation • 30 Mar 2019 • Ali Thabet, Humam Alwassel, Bernard Ghanem

In fact, we show how Morton features can be used to significantly improve performance (+3% for 2 popular semantic segmentation algorithms) in the task of semantic segmentation of point clouds on the challenging and large-scale S3DIS dataset.

Segmentation Self-Supervised Learning +1

Paper
Code

Efficient Bird Eye View Proposals for 3D Siamese Tracking

no code implementations • 25 Mar 2019 • Jesus Zarzar, Silvio Giancola, Bernard Ghanem

Successively, we refine our selection of 3D object candidates by exploiting the similarity capability of a 3D Siamese network.

Object Tracking Region Proposal

Paper
Add Code

Leveraging Shape Completion for 3D Siamese Tracking

1 code implementation • CVPR 2019 • Silvio Giancola, Jesus Zarzar, Bernard Ghanem

We design a Siamese tracker that encodes model and candidate shapes into a compact latent representation.

3D Object Tracking Autonomous Vehicles +2

116

Paper
Code

SADA: Semantic Adversarial Diagnostic Attacks for Autonomous Applications

1 code implementation • 5 Dec 2018 • Abdullah Hamdi, Matthias Müller, Bernard Ghanem

In contrast, we present a general framework for adversarial attacks on trained agents, which covers semantic perturbations to the environment of the agent performing the task as well as pixel-level attacks.

Adversarial Attack Autonomous Driving +3

Paper
Code

SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network

no code implementations • ECCV 2018 • Yancheng Bai, Yongqiang Zhang, Mingli Ding, Bernard Ghanem

In the MTGAN, the generator is a super-resolution network, which can up-sample small blurred images into fine-scale ones and recover detailed information for more accurate detection.

Generative Adversarial Network Object +4

Paper
Add Code

Face Super-resolution Guided by Facial Component Heatmaps

no code implementations • ECCV 2018 • Xin Yu, Basura Fernando, Bernard Ghanem, Fatih Porikli, Richard Hartley

State-of-the-art face super-resolution methods use deep convolutional neural networks to learn a mapping between low-resolution (LR) facial patterns and their corresponding high-resolution (HR) counterparts by exploring local information.

Face Hallucination Hallucination +1

Paper
Add Code

The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary

no code implementations • 11 Aug 2018 • Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, Cuong Duc Dao

The guest tasks focused on complementary aspects of the activity recognition problem at large scale and involved three challenging and recently compiled datasets: the Kinetics-600 dataset from Google DeepMind, the AVA dataset from Berkeley and Google, and the Moments in Time dataset from MIT and IBM Research.

Activity Recognition

Paper
Add Code

Diagnosing Error in Temporal Action Detectors

1 code implementation • ECCV 2018 • Humam Alwassel, Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem

Despite the recent progress in video understanding and the continuous rate of improvement in temporal action localization throughout the years, it is still unclear how far (or close?)

Temporal Action Localization Video Understanding

Paper
Code

Finding Tiny Faces in the Wild With Generative Adversarial Network

no code implementations • CVPR 2018 • Yancheng Bai, Yongqiang Zhang, Mingli Ding, Bernard Ghanem

In this paper, we proposed an algorithm to directly generate a clear high-resolution face from a blurry small one by adopting a generative adversarial network (GAN).

Face Detection Generative Adversarial Network

Paper
Add Code

W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection

no code implementations • CVPR 2018 • Yongqiang Zhang, Yancheng Bai, Mingli Ding, Yongqiang Li, Bernard Ghanem

Finally, we use these pseudo ground-truths to train a fully-supervised detector.

Ranked #9 on Weakly Supervised Object Detection on PASCAL VOC 2012 test

Multiple Instance Learning object-detection +1

Paper
Add Code

Analytic Expressions for Probabilistic Moments of PL-DNN With Gaussian Input

no code implementations • CVPR 2018 • Adel Bibi, Modar Alfadly, Bernard Ghanem

Moreover, we show how these expressions can be used to systematically construct targeted and non-targeted adversarial attacks.

Image Classification

Paper
Add Code

Driving Policy Transfer via Modularity and Abstraction

no code implementations • 25 Apr 2018 • Matthias Müller, Alexey Dosovitskiy, Bernard Ghanem, Vladlen Koltun

Simulation can help end-to-end driving systems by providing a cheap, safe, and diverse training environment.

Autonomous Driving

Paper
Add Code

SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

2 code implementations • 12 Apr 2018 • Silvio Giancola, Mohieddine Amine, Tarek Dghaily, Bernard Ghanem

A total of 6, 637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution).

Ranked #6 on Action Spotting on SoccerNet

Action Classification Action Detection +2

101

Paper
Code

Supervised Convolutional Sparse Coding

no code implementations • 8 Apr 2018 • Lama Affara, Bernard Ghanem, Peter Wonka

Convolutional Sparse Coding (CSC) is a well-established image representation model especially suited for image restoration tasks.

Image Reconstruction Image Restoration

Paper
Add Code

Guess Where? Actor-Supervision for Spatiotemporal Action Localization

2 code implementations • 5 Apr 2018 • Victor Escorcia, Cuong D. Dao, Mihir Jain, Bernard Ghanem, Cees Snoek

Second, we propose an actor-based attention mechanism that enables the localization of the actions from action class labels and actor proposals and is end-to-end trainable.

Action Localization Weakly Supervised Action Localization

Paper
Code

Multi-label Learning with Missing Labels using Mixed Dependency Graphs

no code implementations • 31 Mar 2018 • Baoyuan Wu, Fan Jia, Wei Liu, Bernard Ghanem, Siwei Lyu

This work focuses on the problem of multi-label learning with missing labels (MLML), which aims to label each test instance with multiple class labels given training instances that have an incomplete/partial set of these labels.

Image Retrieval Missing Labels +2

Paper
Add Code

Tagging like Humans: Diverse and Distinct Image Annotation

no code implementations • CVPR 2018 • Baoyuan Wu, Weidong Chen, Peng Sun, Wei Liu, Bernard Ghanem, Siwei Lyu

In D2IA, we generate a relevant and distinct tag subset, in which the tags are relevant to the image contents and semantically distinct to each other, using sequential sampling from a determinantal point process (DPP) model.

Generative Adversarial Network TAG

Paper
Add Code

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

1 code implementation • ECCV 2018 • Matthias Müller, Adel Bibi, Silvio Giancola, Salman Al-Subaihi, Bernard Ghanem

In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild.

Object object-detection +2

168

Paper
Code

OIL: Observational Imitation Learning

no code implementations • 3 Mar 2018 • Guohao Li, Matthias Müller, Vincent Casser, Neil Smith, Dominik L. Michels, Bernard Ghanem

Recent work has explored the problem of autonomous navigation by imitating a teacher and learning an end-to-end policy, which directly predicts controls from raw images.

Autonomous Driving Autonomous Navigation +2

Paper
Add Code

Contextual Multi-Scale Region Convolutional 3D Network for Activity Detection

no code implementations • 28 Jan 2018 • Yancheng Bai, Huijuan Xu, Kate Saenko, Bernard Ghanem

In this paper, we propose the contextual multi-scale region convolutional 3D network (CMS-RC3D) for activity detection.

Action Detection Activity Detection

Paper
Add Code

A Novel Framework for Robustness Analysis of Visual QA Models

no code implementations • 16 Nov 2017 • Jia-Hong Huang, Cuong Duc Dao, Modar Alfadly, Bernard Ghanem

In VQA, adversarial attacks can target the image and/or the proposed main question and yet there is a lack of proper analysis of the later.

Question Answering Visual Question Answering

Paper
Add Code

ActivityNet Challenge 2017 Summary

no code implementations • 22 Oct 2017 • Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Ranjay Khrisna, Victor Escorcia, Kenji Hata, Shyamal Buch

The ActivityNet Large Scale Activity Recognition Challenge 2017 Summary: results and challenge participants papers.

Activity Recognition

Paper
Add Code

2D-Driven 3D Object Detection in RGB-D Images

no code implementations • ICCV 2017 • Jean Lahoud, Bernard Ghanem

We then use the 3D information to orient, place, and score bounding boxes around objects.

Ranked #4 on Object Detection In Indoor Scenes on SUN RGB-D

3D Object Detection Object +2

Paper
Add Code

Constrained Convolutional Sparse Coding for Parametric Based Reconstruction of Line Drawings

no code implementations • ICCV 2017 • Sara Shaheen, Lama Affara, Bernard Ghanem

The process of drawing a line drawing can be approximated as the sparse spatial localization of a number of typical basic strokes, which in turn can be cast as a non-standard CSC model that considers the line drawing formation process from parametric curves.

Image Compression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.