2 code implementations • 7 Mar 2024 • Xiang Li, Kai Qiu, Jinglu Wang, Xiaohao Xu, Rita Singh, Kashu Yamazak, Hao Chen, Xiaonan Huang, Bhiksha Raj
Referring perception, which aims at grounding visual objects with multimodal referring guidance, is essential for bridging the gap between humans, who provide instructions, and the environment where intelligent systems perceive.
1 code implementation • 2 Feb 2024 • Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj
Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment.
1 code implementation • 1 Feb 2024 • Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang
Here, we exploit this capability and introduce PAM, a no-reference metric for assessing audio quality for different audio processing tasks.
1 code implementation • 15 Nov 2023 • Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj
This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation.
2 code implementations • ICCV 2023 • Yandong Wen, Weiyang Liu, Yao Feng, Bhiksha Raj, Rita Singh, Adrian Weller, Michael J. Black, Bernhard Schölkopf
In this paper, we focus on a general yet important learning problem, pairwise similarity learning (PSL).
no code implementations • 3 Oct 2023 • Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh
In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs.
no code implementations • 2 Oct 2023 • Muhammad Ahmed Shah, Roshan Sharma, Hira Dhamyal, Raphael Olivier, Ankit Shah, Joseph Konan, Dareen Alharthi, Hazim T Bukhari, Massa Baali, Soham Deshmukh, Michael Kuhlmann, Bhiksha Raj, Rita Singh
We hypothesize that for attacks to be transferrable, it is sufficient if the proxy can approximate the target model in the neighborhood of the harmful query.
no code implementations • 1 Oct 2023 • Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang, Zicheng Liu
This paper presents a novel approach to object completion, with the primary goal of reconstructing a complete object from its partially visible components.
1 code implementation • 1 Oct 2023 • Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh
In this paper, we propose an evaluation technique involving the training of an ASR model on synthetic speech and assessing its performance on real speech.
3 code implementations • 29 Sep 2023 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj
We propose a semantic decomposition method based on product quantization, where the multi-source semantics can be decomposed and represented by several disentangled and noise-suppressed single-source semantics.
no code implementations • 23 Sep 2023 • Ankit Shah, Fuyu Tang, Zelin Ye, Rita Singh, Bhiksha Raj
Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known.
1 code implementation • 14 Sep 2023 • Soham Deshmukh, Benjamin Elizalde, Dimitra Emmanouilidou, Bhiksha Raj, Rita Singh, Huaming Wang
During inference, the text encoder is replaced with the pretrained CLAP audio encoder.
no code implementations • 26 Jul 2023 • Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha Raj
Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion.
1 code implementation • 26 Jul 2023 • Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj
This work unveils the enigmatic link between phonemes and facial features.
no code implementations • 17 Jul 2023 • Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj
End-to-end speech summarization has been shown to improve performance over cascade baselines.
no code implementations • 22 May 2023 • Hao Chen, Ankit Shah, Jindong Wang, Ran Tao, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj
In this paper, we introduce imprecise label learning (ILL), a framework for the unification of learning with various imprecise label configurations.
Ranked #1 on Learning with noisy labels on mini WebVision 1.0
1 code implementation • NeurIPS 2023 • Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming Wang
We introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks.
2 code implementations • 13 May 2023 • Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj
This paper presents a novel approach for detecting ChatGPT-generated vs. human-written text using language models.
no code implementations • 14 Nov 2022 • Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh
We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval.
no code implementations • 29 Oct 2022 • Roshan Sharma, Hira Dhamyal, Bhiksha Raj, Rita Singh
Accordingly, models that have been proposed for emotion detection use one or the other of these label types.
no code implementations • 25 Jun 2022 • Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj
This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track.
no code implementations • 11 Apr 2022 • Ankit Shah, Hira Dhamyal, Yang Gao, Daniel Arancibia, Mario Arancibia, Bhiksha Raj, Rita Singh
Lately, there has been a global effort by multiple research groups to detect COVID-19 from voice.
no code implementations • 10 Oct 2021 • Rita Singh, Ankit Shah, Hira Dhamyal
This paper reflects on the effect of several categories of medical conditions on human voice, focusing on those that may be hypothesized to have effects on voice, but for which the changes themselves may be subtle enough to have eluded observation in standard analytical examinations of the voice signal.
no code implementations • ICCV 2021 • Yandong Wen, Weiyang Liu, Bhiksha Raj, Rita Singh
We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos.
Ranked #15 on 3D Face Reconstruction on REALY
1 code implementation • 12 Sep 2021 • Weiyang Liu, Yandong Wen, Bhiksha Raj, Rita Singh, Adrian Weller
As one of the earliest works in hyperspherical face recognition, SphereFace explicitly proposed to learn face embeddings with large inter-class angular margin.
no code implementations • ICLR 2022 • Yandong Wen, Weiyang Liu, Adrian Weller, Bhiksha Raj, Rita Singh
In this paper, we start by identifying the discrepancy between training and evaluation in the existing multi-class classification framework and then discuss the potential limitations caused by the "competitive" nature of softmax normalization.
no code implementations • 16 Jul 2021 • Hao Liang, Lulan Yu, Guikang Xu, Bhiksha Raj, Rita Singh
With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper.
1 code implementation • 12 Jun 2021 • Soham Deshmukh, Bhiksha Raj, Rita Singh
To that extent, we propose a shared encoder architecture with sound event detection as a primary task and an additional secondary decoder for a self-supervised auxiliary task.
1 code implementation • 9 Nov 2020 • Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh
We further propose Multinomial Masked Proxy (MMP) loss to leverage the hardness of speaker pairs.
no code implementations • 29 Oct 2020 • Soham Deshmukh, Mahmoud Al Ismail, Rita Singh
In the pathogenesis of COVID-19, impairment of respiratory functions is often one of the key symptoms.
no code implementations • 21 Oct 2020 • Mahmoud Al Ismail, Soham Deshmukh, Rita Singh
Phonation, or the vibration of the vocal folds, is the primary source of vocalization in the production of voiced sounds by humans.
1 code implementation • 17 Aug 2020 • Soham Deshmukh, Bhiksha Raj, Rita Singh
Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED) and is formulated as Multiple Instance Learning (MIL) problem.
1 code implementation • NeurIPS 2019 • Yandong Wen, Bhiksha Raj, Rita Singh
The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set.
no code implementations • 13 Nov 2019 • Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh
Our tests show significant differences in the manner and choice of phonemes in acted and natural speech, concluding moderate to low validity and value in using acted speech databases for emotion classification tasks.
no code implementations • 24 Oct 2019 • Shahan Ali Memon, Hira Dhamyal, Oren Wright, Daniel Justice, Vijaykumar Palat, William Boler, Bhiksha Raj, Rita Singh
While we limit ourselves to a single modality (i. e. speech), our framework is applicable to studies of emotion perception from all such loosely annotated data in general.
no code implementations • 26 May 2019 • Daanish Ali Khan, Linhong Li, Ninghao Sha, Zhuoran Liu, Abelino Jimenez, Bhiksha Raj, Rita Singh
Recent breakthroughs in the field of deep learning have led to advancements in a broad spectrum of tasks in computer vision, audio processing, natural language processing and other areas.
1 code implementation • 25 May 2019 • Yandong Wen, Rita Singh, Bhiksha Raj
Voice profiling aims at inferring various human parameters from their speech, e. g. gender, age, etc.
no code implementations • 18 Mar 2019 • Wenbo Zhao, Yang Gao, Shahan Ali Memon, Bhiksha Raj, Rita Singh
Addressing these problems, we propose a binary tree-structured hierarchical routing mixture of experts (HRME) model that has classifiers as non-leaf node experts and simple regression models as leaf node experts.
1 code implementation • 7 Feb 2019 • Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet
Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier.
no code implementations • 1 Oct 2018 • Shahan Ali Memon, Wenbo Zhao, Bhiksha Raj, Rita Singh
Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.
no code implementations • 27 Sep 2018 • Wenbo Zhao, Shahan Ali Memon, Bhiksha Raj, Rita Singh
Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.
no code implementations • 12 Jul 2018 • Yandong Wen, Mahmoud Al Ismail, Bhiksha Raj, Rita Singh
In many retrieval problems, where we must retrieve one or more entries from a gallery in response to a probe, it is common practice to learn to do by directly comparing the probe and gallery entries to one another.
no code implementations • ICLR 2019 • Yandong Wen, Mahmoud Al Ismail, Weiyang Liu, Bhiksha Raj, Rita Singh
We propose a novel framework, called Disjoint Mapping Network (DIMNet), for cross-modal biometric matching, in particular of voices and faces.
no code implementations • 19 Feb 2018 • Yang Gao, Rita Singh, Bhiksha Raj
In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the target speaker.
Sound Audio and Speech Processing
no code implementations • 1 Dec 2017 • Wenbo Zhao, Yang Gao, Rita Singh
The goal of this paper is to demonstrate that breath sounds are indeed bio-signatures that can be used to identify speakers.
no code implementations • 27 Feb 2016 • Rahul Radhakrishnan Iyer, Sanjeel Parekh, Vikas Mohandoss, Anush Ramsurat, Bhiksha Raj, Rita Singh
Existing video indexing and retrieval methods on popular web-based multimedia sharing websites are based on user-provided sparse tagging.
no code implementations • 27 Feb 2015 • Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj
Given the large number of new musical tracks released each year, automated approaches to plagiarism detection are essential to help us track potential violations of copyright.