1 code implementation • 18 Apr 2024 • Nikolina Kubiak, Armin Mustafa, Graeme Phillipson, Stephen Jolly, Simon Hadfield
In this paper we present S3R-Net, the Self-Supervised Shadow Removal Network.
1 code implementation • 5 Dec 2023 • Soon Yau Cheong, Armin Mustafa, Andrew Gilbert
This paper introduces ViscoNet, a novel method that enhances text-to-image human generation models with visual prompting.
no code implementations • 25 Oct 2023 • Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa
In the context of Audio Visual Question Answering (AVQA) tasks, the audio visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic.
Ranked #3 on Audio-visual Question Answering on MUSIC-AVQA
Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +2
no code implementations • 9 Aug 2023 • Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton
To address this issue, we (i) embed relative positional encoding in the self-attention mechanism and (ii) exploit multi-scale temporal relationships by designing a novel non hierarchical network, in contrast to the recent transformer-based approaches that use a hierarchical structure.
Ranked #1 on Action Detection on MultiTHUMOS
1 code implementation • 18 Apr 2023 • Soon Yau Cheong, Armin Mustafa, Andrew Gilbert
Text-to-image models (T2I) such as StableDiffusion have been used to generate high quality images of people.
Ranked #1 on Pose Transfer on Deep-Fashion (FID metric)
no code implementations • 26 Mar 2023 • Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa
Generating grammatically and semantically correct captions in video captioning is a challenging task.
1 code implementation • 9 Mar 2022 • Soon Yau Cheong, Armin Mustafa, Andrew Gilbert
Therefore we propose a new method; Keypoint Pose Encoding (KPE); KPE is 10 times more memory efficient and over 73% faster at generating high quality images from text input conditioned on the pose.
1 code implementation • 25 Oct 2021 • Nikolina Kubiak, Armin Mustafa, Graeme Phillipson, Stephen Jolly, Simon Hadfield
We then remap this unified input domain using a discriminator that is presented with the generated outputs and the style reference, i. e. images of the desired illumination conditions.
no code implementations • CVPR 2021 • Armin Mustafa, Akin Caliskan, Lourdes Agapito, Adrian Hilton
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image.
no code implementations • 19 Apr 2021 • Akin Caliskan, Armin Mustafa, Adrian Hilton
We present a novel method to learn temporally consistent 3D reconstruction of clothed people from a monocular video.
no code implementations • 29 Sep 2020 • Akin Caliskan, Armin Mustafa, Evren Imre, Adrian Hilton
This paper introduces two advances to overcome this limitation: firstly a new synthetic dataset of realistic clothed people, 3DVH; and secondly, a novel multiple-view loss function for training of monocular volumetric shape estimation, which is demonstrated to significantly improve generalisation and reconstruction accuracy.
no code implementations • 14 Apr 2020 • Mertalp Ocal, Armin Mustafa
In this paper, we introduce RealMonoDepth a self-supervised monocular depth estimation approach which learns to estimate the real scene depth for a diverse range of indoor and outdoor scenes.
no code implementations • 2 Oct 2019 • Akin Caliskan, Armin Mustafa, Evren Imre, Adrian Hilton
We show that it is possible to learn stereo matching from synthetic people dataset and improve performance on real datasets for stereo reconstruction of people from narrow and wide baseline stereo data.
1 code implementation • 17 Sep 2019 • Quang-Hieu Pham, Pierre Sevestre, Ramanpreet Singh Pahwa, Huijing Zhan, Chun Ho Pang, Yuda Chen, Armin Mustafa, Vijay Chandrasekhar, Jie Lin
With the increasing global popularity of self-driving cars, there is an immediate need for challenging real-world datasets for benchmarking and training various computer vision tasks such as 3D object detection.
no code implementations • ICCV 2019 • Armin Mustafa, Chris Russell, Adrian Hilton
We introduce the first approach to solve the challenging problem of unsupervised 4D visual scene understanding for complex dynamic scenes with multiple interacting people from multi-view video.
no code implementations • 18 Jul 2019 • Armin Mustafa, Marco Volino, Hansung Kim, Jean-yves Guillemaut, Adrian Hilton
Existing techniques for dynamic scene reconstruction from multiple wide-baseline cameras primarily focus on reconstruction in controlled environments, with fixed calibrated cameras and strong prior constraints.
no code implementations • 30 Apr 2018 • Armin Mustafa, Marco Volino, Jean-yves Guillemaut, Adrian Hilton
Evaluation of the proposed light-field scene flow against existing multi-view dense correspondence approaches demonstrates a significant improvement in accuracy of temporal coherence.
no code implementations • CVPR 2017 • Armin Mustafa, Adrian Hilton
Semantic co-segmentation exploits the coherence in semantic class labels both spatially, between views at a single time instant, and temporally, between widely spaced time instants of dynamic objects with similar shape and appearance.
no code implementations • CVPR 2016 • Armin Mustafa, Hansung Kim, Jean-yves Guillemaut, Adrian Hilton
Sparse-to-dense temporal correspondence is integrated with joint multi-view segmentation and reconstruction to obtain a complete 4D representation of static and dynamic objects.
no code implementations • ICCV 2015 • Armin Mustafa, Hansung Kim, Jean-yves Guillemaut, Adrian Hilton
The primary contributions of this paper are twofold: an automatic method for initial coarse dynamic scene segmentation and reconstruction without prior knowledge of background appearance or structure; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes from multiple wide-baseline static or moving cameras.