Search Results for author: Rita Singh

Found 47 papers, 18 papers with code

$\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

2 code implementations • 7 Mar 2024 • Xiang Li, Kai Qiu, Jinglu Wang, Xiaohao Xu, Rita Singh, Kashu Yamazak, Hao Chen, Xiaonan Huang, Bhiksha Raj

Referring perception, which aims at grounding visual objects with multimodal referring guidance, is essential for bridging the gap between humans, who provide instructions, and the environment where intelligent systems perceive.

Benchmarking

Paper
Code

A General Framework for Learning from Weak Supervision

1 code implementation • 2 Feb 2024 • Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj

Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment.

Weakly-supervised Learning

Paper
Code

PAM: Prompting Audio-Language Models for Audio Quality Assessment

1 code implementation • 1 Feb 2024 • Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang

Here, we exploit this capability and introduce PAM, a no-reference metric for assessing audio quality for different audio processing tasks.

Music Generation Text-to-Music Generation

Paper
Code

Token Prediction as Implicit Classification to Identify LLM-Generated Text

1 code implementation • 15 Nov 2023 • Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj

This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation.

text-classification Text Classification +1

Paper
Code

Pairwise Similarity Learning is SimPLE

2 code implementations • ICCV 2023 • Yandong Wen, Weiyang Liu, Yao Feng, Bhiksha Raj, Rita Singh, Adrian Weller, Michael J. Black, Bernhard Schölkopf

In this paper, we focus on a general yet important learning problem, pairwise similarity learning (PSL).

Face Recognition Image Retrieval +4

251

Paper
Code

Prompting Audios Using Acoustic Properties For Emotion Representation

no code implementations • 3 Oct 2023 • Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs.

Contrastive Learning Retrieval +1

Paper
Add Code

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

no code implementations • 2 Oct 2023 • Muhammad Ahmed Shah, Roshan Sharma, Hira Dhamyal, Raphael Olivier, Ankit Shah, Joseph Konan, Dareen Alharthi, Hazim T Bukhari, Massa Baali, Soham Deshmukh, Michael Kuhlmann, Bhiksha Raj, Rita Singh

We hypothesize that for attacks to be transferrable, it is sufficient if the proxy can approximate the target model in the neighborhood of the harmful query.

Language Modelling Large Language Model

Paper
Add Code

Completing Visual Objects via Bridging Generation and Segmentation

no code implementations • 1 Oct 2023 • Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang, Zicheng Liu

This paper presents a novel approach to object completion, with the primary goal of reconstructing a complete object from its partially visible components.

Image Generation Object +1

Paper
Add Code

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

1 code implementation • 1 Oct 2023 • Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

In this paper, we propose an evaluation technique involving the training of an ASR model on synthetic speech and assessing its performance on real speech.

speech-recognition Speech Recognition +1

Paper
Code

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition

3 code implementations • 29 Sep 2023 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj

We propose a semantic decomposition method based on product quantization, where the multi-source semantics can be decomposed and represented by several disentangled and noise-suppressed single-source semantics.

Quantization

Paper
Code

Importance of negative sampling in weak label learning

no code implementations • 23 Sep 2023 • Ankit Shah, Fuyu Tang, Zelin Ye, Rita Singh, Bhiksha Raj

Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known.

Paper
Add Code

Training Audio Captioning Models without Audio

1 code implementation • 14 Sep 2023 • Soham Deshmukh, Benjamin Elizalde, Dimitra Emmanouilidou, Bhiksha Raj, Rita Singh, Huaming Wang

During inference, the text encoder is replaced with the pretrained CLAP audio encoder.

Audio captioning

Paper
Code

Rethinking Voice-Face Correlation: A Geometry View

no code implementations • 26 Jul 2023 • Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha Raj

Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion.

3D Face Reconstruction Face Generation

Paper
Add Code

The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features

1 code implementation • 26 Jul 2023 • Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj

This work unveils the enigmatic link between phonemes and facial features.

Paper
Code

BASS: Block-wise Adaptation for Speech Summarization

no code implementations • 17 Jul 2023 • Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj

End-to-end speech summarization has been shown to improve performance over cascade baselines.

Paper
Add Code

Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations

no code implementations • 22 May 2023 • Hao Chen, Ankit Shah, Jindong Wang, Ran Tao, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj

In this paper, we introduce imprecise label learning (ILL), a framework for the unification of learning with various imprecise label configurations.

Ranked #1 on Learning with noisy labels on mini WebVision 1.0

Learning with noisy labels Partial Label Learning

Paper
Add Code

Pengi: An Audio Language Model for Audio Tasks

1 code implementation • NeurIPS 2023 • Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming Wang

We introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks.

Audio captioning Audio Question Answering +6

247

Paper
Code

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content

2 code implementations • 13 May 2023 • Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj

This paper presents a novel approach for detecting ChatGPT-generated vs. human-written text using language models.

text-classification Text Classification

Paper
Code

Describing emotions with acoustic property prompts for speech emotion recognition

no code implementations • 14 Nov 2022 • Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval.

Retrieval Speech Emotion Recognition

Paper
Add Code

Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition

no code implementations • 29 Oct 2022 • Roshan Sharma, Hira Dhamyal, Bhiksha Raj, Rita Singh

Accordingly, models that have been proposed for emotion detection use one or the other of these label types.

Multi-Task Learning Speech Emotion Recognition

Paper
Add Code

Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction

no code implementations • 25 Jun 2022 • Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj

This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track.

Paper
Add Code

On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice

no code implementations • 11 Apr 2022 • Ankit Shah, Hira Dhamyal, Yang Gao, Daniel Arancibia, Mario Arancibia, Bhiksha Raj, Rita Singh

Lately, there has been a global effort by multiple research groups to detect COVID-19 from voice.

Paper
Add Code

An Overview of Techniques for Biomarker Discovery in Voice Signal

no code implementations • 10 Oct 2021 • Rita Singh, Ankit Shah, Hira Dhamyal

This paper reflects on the effect of several categories of medical conditions on human voice, focusing on those that may be hypothesized to have effects on voice, but for which the changes themselves may be subtle enough to have eluded observation in standard analytical examinations of the voice signal.

Paper
Add Code

Self-Supervised 3D Face Reconstruction via Conditional Estimation

no code implementations • ICCV 2021 • Yandong Wen, Weiyang Liu, Bhiksha Raj, Rita Singh

We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos.

Ranked #15 on 3D Face Reconstruction on REALY

3D Face Reconstruction Disentanglement

Paper
Add Code

SphereFace Revived: Unifying Hyperspherical Face Recognition

1 code implementation • 12 Sep 2021 • Weiyang Liu, Yandong Wen, Bhiksha Raj, Rita Singh, Adrian Weller

As one of the earliest works in hyperspherical face recognition, SphereFace explicitly proposed to learn face embeddings with large inter-class angular margin.

Face Recognition

Paper
Code

SphereFace2: Binary Classification is All You Need for Deep Face Recognition

no code implementations • ICLR 2022 • Yandong Wen, Weiyang Liu, Adrian Weller, Bhiksha Raj, Rita Singh

In this paper, we start by identifying the discrepancy between training and evaluation in the existing multi-class classification framework and then discuss the potential limitations caused by the "competitive" nature of softmax normalization.

Binary Classification Classification +2

Paper
Add Code

Controlled AutoEncoders to Generate Faces from Voices

no code implementations • 16 Jul 2021 • Hao Liang, Lulan Yu, Guikang Xu, Bhiksha Raj, Rita Singh

With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper.

MORPH Retrieval

Paper
Add Code

Improving weakly supervised sound event detection with self-supervised auxiliary tasks

1 code implementation • 12 Jun 2021 • Soham Deshmukh, Bhiksha Raj, Rita Singh

To that extent, we propose a shared encoder architecture with sound event detection as a primary task and an additional secondary decoder for a self-supervised auxiliary task.

Event Detection Sound Event Detection +2

Paper
Code

Masked Proxy Loss For Text-Independent Speaker Verification

1 code implementation • 9 Nov 2020 • Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh

We further propose Multinomial Masked Proxy (MMP) loss to leverage the hardness of speaker pairs.

Metric Learning Speaker Recognition +2

Paper
Code

Interpreting glottal flow dynamics for detecting COVID-19 from voice

no code implementations • 29 Oct 2020 • Soham Deshmukh, Mahmoud Al Ismail, Rita Singh

In the pathogenesis of COVID-19, impairment of respiratory functions is often one of the key symptoms.

Paper
Add Code

Detection of COVID-19 through the analysis of vocal fold oscillations

no code implementations • 21 Oct 2020 • Mahmoud Al Ismail, Soham Deshmukh, Rita Singh

Phonation, or the vibration of the vocal folds, is the primary source of vocalization in the production of voiced sounds by humans.

Paper
Add Code

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

1 code implementation • 17 Aug 2020 • Soham Deshmukh, Bhiksha Raj, Rita Singh

Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED) and is formulated as Multiple Instance Learning (MIL) problem.

Event Detection Multiple Instance Learning +3

Paper
Code

Face Reconstruction from Voice using Generative Adversarial Networks

1 code implementation • NeurIPS 2019 • Yandong Wen, Bhiksha Raj, Rita Singh

The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set.

Face Reconstruction

182

Paper
Code

The phonetic bases of vocal expressed emotion: natural versus acted

no code implementations • 13 Nov 2019 • Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Our tests show significant differences in the manner and choice of phonemes in acted and natural speech, concluding moderate to low validity and value in using acted speech databases for emotion classification tasks.

Emotion Classification General Classification +1

Paper
Add Code

Detecting gender differences in perception of emotion in crowdsourced data

no code implementations • 24 Oct 2019 • Shahan Ali Memon, Hira Dhamyal, Oren Wright, Daniel Justice, Vijaykumar Palat, William Boler, Bhiksha Raj, Rita Singh

While we limit ourselves to a single modality (i. e. speech), our framework is applicable to studies of emotion perception from all such loosely annotated data in general.

Paper
Add Code

Non-Determinism in Neural Networks for Adversarial Robustness

no code implementations • 26 May 2019 • Daanish Ali Khan, Linhong Li, Ninghao Sha, Zhuoran Liu, Abelino Jimenez, Bhiksha Raj, Rita Singh

Recent breakthroughs in the field of deep learning have led to advancements in a broad spectrum of tasks in computer vision, audio processing, natural language processing and other areas.

Adversarial Robustness

Paper
Add Code

Reconstructing faces from voices

1 code implementation • 25 May 2019 • Yandong Wen, Rita Singh, Bhiksha Raj

Voice profiling aims at inferring various human parameters from their speech, e. g. gender, age, etc.

182

Paper
Code

Hierarchical Routing Mixture of Experts

no code implementations • 18 Mar 2019 • Wenbo Zhao, Yang Gao, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Addressing these problems, we propose a binary tree-structured hierarchical routing mixture of experts (HRME) model that has classifiers as non-leaf node experts and simple regression models as leaf node experts.

regression

Paper
Add Code

Hide and Speak: Towards Deep Neural Networks for Speech Steganography

1 code implementation • 7 Feb 2019 • Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet

Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier.

Paper
Code

Neural Regression Trees

no code implementations • 1 Oct 2018 • Shahan Ali Memon, Wenbo Zhao, Bhiksha Raj, Rita Singh

Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.

Classification General Classification +1

Paper
Add Code

Neural Regression Tree

no code implementations • 27 Sep 2018 • Wenbo Zhao, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.

Classification regression

Paper
Add Code

Optimal Strategies for Matching and Retrieval Problems by Comparing Covariates

no code implementations • 12 Jul 2018 • Yandong Wen, Mahmoud Al Ismail, Bhiksha Raj, Rita Singh

In many retrieval problems, where we must retrieve one or more entries from a gallery in response to a probe, it is common practice to learn to do by directly comparing the probe and gallery entries to one another.

Retrieval

Paper
Add Code

Disjoint Mapping Network for Cross-modal Matching of Voices and Faces

no code implementations • ICLR 2019 • Yandong Wen, Mahmoud Al Ismail, Weiyang Liu, Bhiksha Raj, Rita Singh

We propose a novel framework, called Disjoint Mapping Network (DIMNet), for cross-modal biometric matching, in particular of voices and faces.

Paper
Add Code

Voice Impersonation using Generative Adversarial Networks

no code implementations • 19 Feb 2018 • Yang Gao, Rita Singh, Bhiksha Raj

In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the target speaker.

Sound Audio and Speech Processing

Paper
Add Code

Speaker identification from the sound of the human breath

no code implementations • 1 Dec 2017 • Wenbo Zhao, Yang Gao, Rita Singh

The goal of this paper is to demonstrate that breath sounds are indeed bio-signatures that can be used to identify speakers.

Speaker Identification Speaker Recognition

Paper
Add Code

Content-based Video Indexing and Retrieval Using Corr-LDA

no code implementations • 27 Feb 2016 • Rahul Radhakrishnan Iyer, Sanjeel Parekh, Vikas Mohandoss, Anush Ramsurat, Bhiksha Raj, Rita Singh

Existing video indexing and retrieval methods on popular web-based multimedia sharing websites are based on user-provided sparse tagging.

Retrieval

Paper
Add Code

Plagiarism Detection in Polyphonic Music using Monaural Signal Separation

no code implementations • 27 Feb 2015 • Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj

Given the large number of new musical tracks released each year, automated approaches to plagiarism detection are essential to help us track potential violations of copyright.

General Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.