1 code implementation • 14 Apr 2024 • Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf
The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario.
1 code implementation • IEEE Transactions on Image Processing 2024 • Anas Zafar, Danyal Aftab, Rizwan Qureshi, Xinqi Fan, Pingjun Chen, Jia Wu, Hazrat Ali, Shah Nawaz, Sheheryar Khan, Mubarak Shah
In this paper, we propose a novel and computationally efficient architecture Single Stage Adaptive Multi-Attention Network (SSAMAN) for image restoration tasks, particularly for image denoising and image deblurring.
Ranked #1 on Image Denoising on DND
no code implementations • 31 Jul 2023 • Vu Ngoc Tu, Van Thong Huynh, Hyung-Jeong Yang, M. Zaigham Zaheer, Shah Nawaz, Karthik Nandakumar, Soo-Hyung Kim
Conversational engagement estimation is posed as a regression problem, entailing the identification of the favorable attention and involvement of the participants in the conversation.
1 code implementation • 10 Mar 2023 • Muhammad Saad Saeed, Shah Nawaz, Muhammad Haris Khan, Muhammad Zaigham Zaheer, Karthik Nandakumar, Muhammad Haroon Yousaf, Arif Mahmood
With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text.
no code implementations • 25 Feb 2023 • Saqlain Hussain Shah, Muhammad Saad Saeed, Shah Nawaz, Muhammad Haroon Yousaf
To achieve this task, we proposed a two-branch network to learn joint representations of faces and voices in a multimodal system.
1 code implementation • 22 Aug 2022 • Muhammad Saad Saeed, Shah Nawaz, Muhammad Haris Khan, Sajid Javed, Muhammad Haroon Yousaf, Alessio Del Bue
In addition, we leverage cross-modal verification and matching tasks to analyze the impact of multiple languages on face-voice association.
no code implementations • 15 Apr 2022 • Murad Popattia, Muhammad Rafi, Rizwan Qureshi, Shah Nawaz
A pairwise ranking objective is used for training this embedding space which allows similar images, topics and captions in the shared semantic space to maintain a partial order in the visual-semantic hierarchy and hence, helps the model to produce more visually accurate captions.
no code implementations • 3 Jan 2022 • Shah Nawaz, Jacopo Cavazza, Alessio Del Bue
Zero-shot learning methods rely on fixed visual and semantic embeddings, extracted from independent vision and language models, both pre-trained for other large-scale tasks.
2 code implementations • 20 Dec 2021 • Muhammad Saad Saeed, Muhammad Haris Khan, Shah Nawaz, Muhammad Haroon Yousaf, Alessio Del Bue
Prior works adopt pairwise or triplet loss formulations to learn an embedding space amenable for associated matching and verification tasks.
1 code implementation • 25 Feb 2021 • Ignazio Gallo, Shah Nawaz, Nicola Landro, Riccardo La Grassa
The question we answer with this paper is: ‘can we convert a text document into an image to take advantage of image neural models to classify text documents?’ To answer this question we present a novel text classification method that converts a document into an encoded image, using word embedding.
no code implementations • 28 Apr 2020 • Muhammad Saad Saeed, Shah Nawaz, Pietro Morerio, Arif Mahmood, Ignazio Gallo, Muhammad Haroon Yousaf, Alessio Del Bue
Recent years have seen a surge in finding association between faces and voices within a cross-modal biometric application along with speaker recognition.
3 code implementations • 16 Jan 2020 • Shah Nawaz, Alessandro Calefati, Moreno Caraffini, Nicola Landro, Ignazio Gallo
In recent years, natural language descriptions are used to obtain information on discriminative parts of the object.
Ranked #1 on Multi-Modal Document Classification on CUB-200-2011
no code implementations • 18 Sep 2019 • Shah Nawaz, Muhammad Kamran Janjua, Ignazio Gallo, Arif Mahmood, Alessandro Calefati
We quantitatively and qualitatively evaluate the proposed approach on VoxCeleb, a benchmarks audio-visual dataset on a multitude of tasks including cross-modal verification, cross-modal matching, and cross-modal retrieval.
1 code implementation • 9 Sep 2019 • Ignazio Gallo, Shah Nawaz, Alessandro Calefati, Riccardo La Grassa, Nicola Landro
Visualization refers to our ability to create an image in our head based on the text we read or the words we hear.
no code implementations • 3 Sep 2019 • Shah Nawaz, Muhammad Kamran Janjua, Ignazio Gallo, Arif Mahmood, Alessandro Calefati, Faisal Shafait
Our proposed measure evaluates the semantic similarity between the image and text representations in the latent embedding space.
1 code implementation • 2 Apr 2019 • Omer Arshad, Ignazio Gallo, Shah Nawaz, Alessandro Calefati
With massive explosion of social media such as Twitter and Instagram, people daily share billions of multimedia posts, containing images and text.
no code implementations • 16 Oct 2018 • Muhammad Kamran Janjua, Shah Nawaz, Alessandro Calefati, Ignazio Gallo
Majority of the current dimensionality reduction or retrieval techniques rely on embedding the learned feature representations onto a computable metric space.
1 code implementation • 3 Oct 2018 • Ignazio Gallo, Alessandro Calefati, Shah Nawaz, Muhammad Kamran Janjua
To learn feature representations of resulting images, standard Convolutional Neural Networks (CNNs) are employed for the classification task.
no code implementations • 31 Aug 2018 • Shah Nawaz, Alessandro Calefati, Muhammad Kamran Janjua, Ignazio Gallo
The question we answer with this work is: can we convert a text document into an image to exploit best image classification models to classify documents?
1 code implementation • 23 Jul 2018 • Alessandro Calefati, Muhammad Kamran Janjua, Shah Nawaz, Ignazio Gallo
Conventionally, CNNs have been trained with softmax as supervision signal to penalize the classification loss.
Ranked #8 on Face Verification on YouTube Faces DB
no code implementations • 19 Jul 2018 • Shah Nawaz, Muhammad Kamran Janjua, Alessandro Calefati, Ignazio Gallo
We show that text encodings can capture semantic relationships between multiple modalities.