no code implementations • 8 Mar 2024 • Lin Zhang, Shentong Mo, Yijing Zhang, Pedro Morgado
We hope our established benchmark can open new avenues for controllable visual generation.
1 code implementation • 2 Dec 2023 • Shentong Mo, Pedro Morgado
Thus, to address the computational complexity, we propose an alternative procedure that factorizes the local representations before representing audio-visual interactions.
1 code implementation • ICCV 2023 • Cheng-En Wu, Yu Tian, Haichao Yu, Heng Wang, Pedro Morgado, Yu Hen Hu, Linjie Yang
Vision-language models such as CLIP learn a generic text-image embedding from large-scale training data.
1 code implementation • 30 May 2023 • Shentong Mo, Pedro Morgado
The ability to accurately recognize, localize and separate sound sources is fundamental to any audio-visual perception task.
1 code implementation • 27 Sep 2022 • Himangi Mittal, Pedro Morgado, Unnat Jain, Abhinav Gupta
However, learning representations from videos can be challenging.
Ranked #1 on Long Term Action Anticipation on Ego4D (ED@20 Noun metric)
1 code implementation • 30 Aug 2022 • Shentong Mo, Pedro Morgado
We also propose a new approach for visual sound source localization that addresses both these problems.
no code implementations • 23 Mar 2022 • Senthil Purushwalkam, Pedro Morgado, Abhinav Gupta
As a result, SSL holds the promise to learn representations from data in-the-wild, i. e., without the need for finite and static datasets.
1 code implementation • 17 Mar 2022 • Shentong Mo, Pedro Morgado
Unsupervised audio-visual source localization aims at localizing visible sound sources in a video without relying on ground-truth localization for training.
no code implementations • CVPR 2021 • Pedro Morgado, Ishan Misra, Nuno Vasconcelos
Second, since self-supervised contrastive learning relies on random sampling of negative instances, instances that are semantically similar to the base instance can be used as faulty negatives.
no code implementations • NeurIPS 2020 • Pedro Morgado, Yi Li, Nuno Vasconcelos
To learn from these spatial cues, we tasked a network to perform contrastive audio-visual spatial alignment of 360{\deg} video and spatial audio.
no code implementations • 27 Jul 2020 • Pedro Morgado, Yunsheng Li, Jose Costa Pereira, Mohammad Saberian, Nuno Vasconcelos
The use of a fixed set of proxies (weights of the CNN classification layer) is proposed to eliminate this ambiguity, and a procedure to design proxy sets that are nearly optimal for both classification and hashing is introduced.
1 code implementation • ECCV 2020 • Tz-Ying Wu, Pedro Morgado, Pei Wang, Chih-Hui Ho, Nuno Vasconcelos
Motivated by this, a deep realistic taxonomic classifier (Deep-RTC) is proposed as a new solution to the long-tail problem, combining realism with hierarchical predictions.
1 code implementation • CVPR 2021 • Pedro Morgado, Nuno Vasconcelos, Ishan Misra
Our method uses contrastive learning for cross-modal discrimination of video from audio and vice-versa.
Ranked #3 on Self-Supervised Audio Classification on ESC-50
1 code implementation • CVPR 2019 • Pedro Morgado, Nuno Vasconcelos
Under the standard paradigm of network fine-tuning, an entirely new CNN is learned per task, and the final network size is independent of task complexity.
no code implementations • NeurIPS 2018 • Pedro Morgado, Nuno Nvasconcelos, Timothy Langlois, Oliver Wang
We introduce an approach to convert mono audio recorded by a 360° video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere.
1 code implementation • 7 Sep 2018 • Pedro Morgado, Nuno Vasconcelos, Timothy Langlois, Oliver Wang
Using our approach, we show that it is possible to infer the spatial location of sound sources based only on 360 video and a mono audio track.
1 code implementation • CVPR 2017 • Pedro Morgado, Nuno Vasconcelos
The role of semantics in zero-shot learning is considered.