Search Results for author: Bhiksha Raj

Found 125 papers, 44 papers with code

Learning with Noisy Foundation Models

no code implementations11 Mar 2024 Hao Chen, Jindong Wang, Zihan Wang, Ran Tao, Hongxin Wei, Xing Xie, Masashi Sugiyama, Bhiksha Raj

Foundation models are usually pre-trained on large-scale datasets and then adapted to downstream tasks through tuning.

$\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

2 code implementations7 Mar 2024 Xiang Li, Kai Qiu, Jinglu Wang, Xiaohao Xu, Rita Singh, Kashu Yamazak, Hao Chen, Xiaonan Huang, Bhiksha Raj

Referring perception, which aims at grounding visual objects with multimodal referring guidance, is essential for bridging the gap between humans, who provide instructions, and the environment where intelligent systems perceive.

Benchmarking

AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition

no code implementations18 Feb 2024 Zhaorun Chen, Zhuokai Zhao, Zhihong Zhu, Ruiqi Zhang, Xiang Li, Bhiksha Raj, Huaxiu Yao

Recent advancements in large language models (LLMs) have shown promise in multi-step reasoning tasks, yet their reliance on extensive manual labeling to provide procedural feedback remains a significant impediment.

Evaluating and Improving Continual Learning in Spoken Language Understanding

no code implementations16 Feb 2024 Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, Bhiksha Raj

In this work, we propose an evaluation methodology that provides a unified evaluation on stability, plasticity, and generalizability in continual learning.

Continual Learning Spoken Language Understanding

Customizable Perturbation Synthesis for Robust SLAM Benchmarking

1 code implementation12 Feb 2024 Xiaohao Xu, Tianyi Zhang, Sibo Wang, Xiang Li, Yongqi Chen, Ye Li, Bhiksha Raj, Matthew Johnson-Roberson, Xiaonan Huang

To this end, we propose a novel, customizable pipeline for noisy data synthesis, aimed at assessing the resilience of multi-modal SLAM models against various perturbations.

Benchmarking Simultaneous Localization and Mapping

On Catastrophic Inheritance of Large Foundation Models

no code implementations2 Feb 2024 Hao Chen, Bhiksha Raj, Xing Xie, Jindong Wang

Large foundation models (LFMs) are claiming incredible performances.

A General Framework for Learning from Weak Supervision

1 code implementation2 Feb 2024 Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj

Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment.

Weakly-supervised Learning

PAM: Prompting Audio-Language Models for Audio Quality Assessment

1 code implementation1 Feb 2024 Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang

Here, we exploit this capability and introduce PAM, a no-reference metric for assessing audio quality for different audio processing tasks.

Music Generation Text-to-Music Generation

AugSumm: towards generalizable speech summarization using synthetic labels from large language model

1 code implementation10 Jan 2024 Jee-weon Jung, Roshan Sharma, William Chen, Bhiksha Raj, Shinji Watanabe

We tackle this challenge by proposing AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries for training and evaluation.

Language Modelling Large Language Model +1

Token Prediction as Implicit Classification to Identify LLM-Generated Text

1 code implementation15 Nov 2023 Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj

This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation.

text-classification Text Classification +1

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

no code implementations11 Oct 2023 Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Shuo Han, Yunyang Zeng, Ankit Shah, Bhiksha Raj

Within the ambit of VoIP (Voice over Internet Protocol) telecommunications, the complexities introduced by acoustic transformations merit rigorous analysis.

Benchmarking Denoising +1

Privacy-oriented manipulation of speaker representations

no code implementations10 Oct 2023 Francisco Teixeira, Alberto Abad, Bhiksha Raj, Isabel Trancoso

Speaker embeddings are ubiquitous, with applications ranging from speaker recognition and diarization to speech synthesis and voice anonymisation.

Speaker Recognition Speech Synthesis

Continual Contrastive Spoken Language Understanding

no code implementations4 Oct 2023 Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj

In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning.

Class Incremental Learning Contrastive Learning +2

Prompting Audios Using Acoustic Properties For Emotion Representation

no code implementations3 Oct 2023 Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs.

Contrastive Learning Retrieval +1

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

no code implementations2 Oct 2023 Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu

Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs.

Denoising Self-Supervised Learning +2

Completing Visual Objects via Bridging Generation and Segmentation

no code implementations1 Oct 2023 Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang, Zicheng Liu

This paper presents a novel approach to object completion, with the primary goal of reconstructing a complete object from its partially visible components.

Image Generation Object +1

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

1 code implementation1 Oct 2023 Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

In this paper, we propose an evaluation technique involving the training of an ASR model on synthetic speech and assessing its performance on real speech.

speech-recognition Speech Recognition +1

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks

no code implementations29 Sep 2023 Hao Chen, Jindong Wang, Ankit Shah, Ran Tao, Hongxin Wei, Xing Xie, Masashi Sugiyama, Bhiksha Raj

This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.

Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition

3 code implementations29 Sep 2023 Xiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj

We propose a semantic decomposition method based on product quantization, where the multi-source semantics can be decomposed and represented by several disentangled and noise-suppressed single-source semantics.

Quantization

Importance of negative sampling in weak label learning

no code implementations23 Sep 2023 Ankit Shah, Fuyu Tang, Zelin Ye, Rita Singh, Bhiksha Raj

Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known.

Fixed Inter-Neuron Covariability Induces Adversarial Robustness

no code implementations7 Aug 2023 Muhammad Ahmed Shah, Bhiksha Raj

The vulnerability to adversarial perturbations is a major flaw of Deep Neural Networks (DNNs) that raises question about their reliability when in real-world scenarios.

Adversarial Robustness

Rethinking Voice-Face Correlation: A Geometry View

no code implementations26 Jul 2023 Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha Raj

Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion.

3D Face Reconstruction Face Generation

BASS: Block-wise Adaptation for Speech Summarization

no code implementations17 Jul 2023 Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj

End-to-end speech summarization has been shown to improve performance over cascade baselines.

UTOPIA: Unconstrained Tracking Objects without Preliminary Examination via Cross-Domain Adaptation

no code implementations16 Jun 2023 Pha Nguyen, Kha Gia Quach, John Gauch, Samee U. Khan, Bhiksha Raj, Khoa Luu

Then, a new cross-domain MOT adaptation from existing datasets is proposed without any pre-defined human knowledge in understanding and modeling objects.

Domain Adaptation Multiple Object Tracking +1

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content

2 code implementations13 May 2023 Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj

This paper presents a novel approach for detecting ChatGPT-generated vs. human-written text using language models.

text-classification Text Classification

FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding

1 code implementation CVPR 2023 Thanh-Dat Truong, Ngan Le, Bhiksha Raj, Jackson Cothren, Khoa Luu

Although Domain Adaptation in Semantic Scene Segmentation has shown impressive improvement in recent years, the fairness concerns in the domain adaptation have yet to be well defined and addressed.

Autonomous Driving Domain Adaptation +4

Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP Platforms

no code implementations16 Mar 2023 Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Hojeong Lee, Ankit Shah, Shuo Han, Yunyang Zeng, Amanda Shu, Haohui Liu, Xuankai Chang, Hamza Khalid, Minseon Gwak, Kawon Lee, Minjeong Kim, Bhiksha Raj

In this paper, we present a method for fine-tuning models trained on the Deep Noise Suppression (DNS) 2020 Challenge to improve their performance on Voice over Internet Protocol (VoIP) applications.

Multi-Task Learning Speech Enhancement +2

Approach to Learning Generalized Audio Representation Through Batch Embedding Covariance Regularization and Constant-Q Transforms

no code implementations7 Mar 2023 Ankit Shah, Shuyi Chen, Kejun Zhou, Yue Chen, Bhiksha Raj

Preliminary results show (1) the proposed BECR can incur a more dispersed embedding on the test set, (2) BECR improves the PaSST model without extra computation complexity, and (3) STFT preprocessing outperforms CQT in all tasks we tested.

Zero-Shot Learning

Synergy between human and machine approaches to sound/scene recognition and processing: An overview of ICASSP special session

no code implementations20 Feb 2023 Laurie M. Heller, Benjamin Elizalde, Bhiksha Raj, Soham Deshmukh

Machine Listening, as usually formalized, attempts to perform a task that is, from our perspective, fundamentally human-performable, and performed by humans.

Scene Recognition

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

2 code implementations16 Feb 2023 Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

We can add this criterion as an auxiliary loss to any model that produces speech, to optimize speech outputs to match the values of clean speech in these features.

Speech Enhancement Time Series +1

SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning

4 code implementations26 Jan 2023 Hao Chen, Ran Tao, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Bhiksha Raj, Marios Savvides

The critical challenge of Semi-Supervised Learning (SSL) is how to effectively leverage the limited labeled data and massive unlabeled data to improve the model's generalization performance.

imbalanced classification

Understanding Political Polarisation using Language Models: A dataset and method

no code implementations2 Jan 2023 Samiran Gode, Supreeth Bare, Bhiksha Raj, Hyungon Yoo

To understand the polarization we begin by showing results from some classical language models in Word2Vec and Doc2Vec.

Language Modelling

VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

1 code implementation28 Nov 2022 Kashu Yamazaki, Khoa Vo, Sang Truong, Bhiksha Raj, Ngan Le

Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with several temporal event locations in coherent storytelling.

Sentence Video Captioning

Panoramic Video Salient Object Detection with Ambisonic Audio Guidance

no code implementations26 Nov 2022 Xiang Li, Haoyuan Cao, Shijie Zhao, Junlin Li, Li Zhang, Bhiksha Raj

In this paper, we aim to tackle the video salient object detection problem for panoramic videos, with their corresponding ambisonic audios.

Object object-detection +2

An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning

no code implementations20 Nov 2022 Hao Chen, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Marios Savvides, Bhiksha Raj

While standard SSL assumes uniform data distribution, we consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data.

Pseudo Label

Describing emotions with acoustic property prompts for speech emotion recognition

no code implementations14 Nov 2022 Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval.

Retrieval Speech Emotion Recognition

XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers

no code implementations29 Oct 2022 Roshan Sharma, Bhiksha Raj

Transformers are among the state of the art for many tasks in speech, vision, and natural language processing, among others.

speech-recognition Speech Recognition

Privacy-preserving Automatic Speaker Diarization

no code implementations26 Oct 2022 Francisco Teixeira, Alberto Abad, Bhiksha Raj, Isabel Trancoso

Automatic Speaker Diarization (ASD) is an enabling technology with numerous applications, which deals with recordings of multiple speakers, raising special concerns in terms of privacy.

Privacy Preserving speaker-diarization +1

There is more than one kind of robustness: Fooling Whisper with adversarial examples

1 code implementation26 Oct 2022 Raphael Olivier, Bhiksha Raj

Whisper is a recent Automatic Speech Recognition (ASR) model displaying impressive robustness to both out-of-distribution inputs and random noise.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation

1 code implementation5 Oct 2022 Khoa Vo, Sang Truong, Kashu Yamazaki, Bhiksha Raj, Minh-Triet Tran, Ngan Le

PMR module represents each video snippet by a visual-linguistic feature, in which main actors and surrounding environment are represented by visual information, whereas relevant objects are depicted by linguistic features through an image-text model.

Action Detection Temporal Action Proposal Generation

Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models

1 code implementation17 Sep 2022 Raphael Olivier, Hadi Abdullah, Bhiksha Raj

To exploit ASR models in real-world, black-box settings, an adversary can leverage the transferability property, i. e. that an adversarial sample produced for a proxy ASR can also fool a different remote ASR.

Adversarial Attack Automatic Speech Recognition +3

Online Video Instance Segmentation via Robust Context Fusion

no code implementations12 Jul 2022 Xiang Li, Jinglu Wang, Xiaohao Xu, Bhiksha Raj, Yan Lu

We propose a robust context fusion network to tackle VIS in an online fashion, which predicts instance segmentation frame-by-frame with a few preceding frames.

Instance Segmentation Segmentation +2

How many perturbations break this model? Evaluating robustness beyond adversarial accuracy

1 code implementation8 Jul 2022 Raphael Olivier, Bhiksha Raj

Finally, with sparsity we can measure increases in robustness that do not affect accuracy: we show for example that data augmentation can by itself increase adversarial robustness, without using adversarial training.

Adversarial Attack Adversarial Robustness +1

Improving Speech Enhancement through Fine-Grained Speech Characteristics

1 code implementation1 Jul 2022 Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

We first identify key acoustic parameters that have been found to correlate well with voice quality (e. g. jitter, shimmer, and spectral flux) and then propose objective functions which are aimed at reducing the difference between clean speech and enhanced speech with respect to these features.

Speech Enhancement

Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction

no code implementations25 Jun 2022 Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj

This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track.

Towards End-to-End Private Automatic Speaker Recognition

no code implementations23 Jun 2022 Francisco Teixeira, Alberto Abad, Bhiksha Raj, Isabel Trancoso

This poses two important issues: first, knowledge of the speaker embedding extraction model may create security and robustness liabilities for the authentication system, as this knowledge might help attackers in crafting adversarial examples able to mislead the system; second, from the point of view of a service provider the speaker embedding extraction model is arguably one of the most valuable components in the system and, as such, disclosing it would be highly undesirable.

Privacy Preserving Speaker Recognition +1

Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution

no code implementations18 Jun 2022 Chonghan Chen, Qi Jiang, Chih-Hao Wang, Noel Chen, Haohan Wang, Xiang Li, Bhiksha Raj

With our proposed QCM, the downstream fusion module receives visual features that are more discriminative and focused on the desired object described in the expression, leading to more accurate predictions.

Visual Grounding

FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning

4 code implementations15 May 2022 Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, Xing Xie

Semi-supervised Learning (SSL) has witnessed great success owing to the impressive performances brought by various methods based on pseudo labeling and consistency regularization.

Fairness Semi-Supervised Image Classification

Recent improvements of ASR models in the face of adversarial attacks

1 code implementation29 Mar 2022 Raphael Olivier, Bhiksha Raj

Like many other tasks involving neural networks, Speech Recognition models are vulnerable to adversarial attacks.

speech-recognition Speech Recognition

Point3D: tracking actions as moving points with 3D CNNs

no code implementations20 Mar 2022 Shentong Mo, Jingfei Xia, Xiaoqing Tan, Bhiksha Raj

Our Point3D consists of a Point Head for action localization and a 3D Head for action classification.

Action Classification Action Localization +1

Ontological Learning from Weak Labels

no code implementations4 Mar 2022 Larry Tang, Po Hao Chou, Yi Yu Zheng, Ziqian Ge, Ankit Shah, Bhiksha Raj

We find that the baseline Siamese does not perform better by incorporating ontology information in the weak and multi-label scenario, but that the GCN does capture the ontology knowledge better for weak, multi-labeled data.

Sequential Randomized Smoothing for Adversarially Robust Speech Recognition

1 code implementation EMNLP 2021 Raphael Olivier, Bhiksha Raj

We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Self-Supervised 3D Face Reconstruction via Conditional Estimation

no code implementations ICCV 2021 Yandong Wen, Weiyang Liu, Bhiksha Raj, Rita Singh

We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos.

3D Face Reconstruction Disentanglement

SphereFace Revived: Unifying Hyperspherical Face Recognition

1 code implementation12 Sep 2021 Weiyang Liu, Yandong Wen, Bhiksha Raj, Rita Singh, Adrian Weller

As one of the earliest works in hyperspherical face recognition, SphereFace explicitly proposed to learn face embeddings with large inter-class angular margin.

Face Recognition

The Right to Talk: An Audio-Visual Transformer Approach

1 code implementation ICCV 2021 Thanh-Dat Truong, Chi Nhan Duong, The De Vu, Hoang Anh Pham, Bhiksha Raj, Ngan Le, Khoa Luu

Therefore, this work introduces a new Audio-Visual Transformer approach to the problem of localization and highlighting the main speaker in both audio and visual channels of a multi-speaker conversation video in the wild.

SphereFace2: Binary Classification is All You Need for Deep Face Recognition

no code implementations ICLR 2022 Yandong Wen, Weiyang Liu, Adrian Weller, Bhiksha Raj, Rita Singh

In this paper, we start by identifying the discrepancy between training and evaluation in the existing multi-class classification framework and then discuss the potential limitations caused by the "competitive" nature of softmax normalization.

Binary Classification Classification +2

Controlled AutoEncoders to Generate Faces from Voices

no code implementations16 Jul 2021 Hao Liang, Lulan Yu, Guikang Xu, Bhiksha Raj, Rita Singh

With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper.

MORPH Retrieval

Improving weakly supervised sound event detection with self-supervised auxiliary tasks

1 code implementation12 Jun 2021 Soham Deshmukh, Bhiksha Raj, Rita Singh

To that extent, we propose a shared encoder architecture with sound event detection as a primary task and an additional secondary decoder for a self-supervised auxiliary task.

Event Detection Sound Event Detection +2

Training image classifiers using Semi-Weak Label Data

no code implementations19 Mar 2021 Anxiang Zhang, Ankit Shah, Bhiksha Raj

Thus, this paper introduces a novel semi-weak label learning paradigm as a middle ground to mitigate the problem.

Multiple Instance Learning

Constant Random Perturbations Provide Adversarial Robustness with Minimal Effect on Accuracy

1 code implementation15 Mar 2021 Bronya Roni Chernyak, Bhiksha Raj, Tamir Hazan, Joseph Keshet

This paper proposes an attack-independent (non-adversarial training) technique for improving adversarial robustness of neural network models, with minimal loss of standard accuracy.

Adversarial Robustness

Contrast and Order Representations for Video Self-Supervised Learning

no code implementations ICCV 2021 Kai Hu, Jie Shao, YuAn Liu, Bhiksha Raj, Marios Savvides, Zhiqiang Shen

To address this, we present a contrast-and-order representation (CORP) framework for learning self-supervised video representations that can automatically capture both the appearance information within each frame and temporal information across different frames.

Action Recognition Self-Supervised Learning

Is normalization indispensable for training deep neural network?

1 code implementation NeurIPS 2020 Jie Shao, Kai Hu, Changhu Wang, xiangyang xue, Bhiksha Raj

In this paper, we study what would happen when normalization layers are removed from the network, and show how to train deep neural networks without normalization layers and without performance degradation.

General Classification Image Classification +5

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

1 code implementation17 Aug 2020 Soham Deshmukh, Bhiksha Raj, Rita Singh

Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED) and is formulated as Multiple Instance Learning (MIL) problem.

Event Detection Multiple Instance Learning +3

Exploiting Non-Linear Redundancy for Neural Model Compression

no code implementations28 May 2020 Muhammad A. Shah, Raphael Olivier, Bhiksha Raj

Deploying deep learning models, comprising of non-linear combination of millions, even billions, of parameters is challenging given the memory, power and compute constraints of the real world.

Model Compression

Automatic In-the-wild Dataset Annotation with Deep Generalized Multiple Instance Learning

no code implementations LREC 2020 Joana Correia, Isabel Trancoso, Bhiksha Raj

The automation of the diagnosis and monitoring of speech affecting diseases in real life situations, such as Depression or Parkinson{'}s disease, depends on the existence of rich and large datasets that resemble real life conditions, such as those collected from in-the-wild multimedia repositories like YouTube.

Multiple Instance Learning

Face Reconstruction from Voice using Generative Adversarial Networks

1 code implementation NeurIPS 2019 Yandong Wen, Bhiksha Raj, Rita Singh

The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set.

Face Reconstruction

The phonetic bases of vocal expressed emotion: natural versus acted

no code implementations13 Nov 2019 Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Our tests show significant differences in the manner and choice of phonemes in acted and natural speech, concluding moderate to low validity and value in using acted speech databases for emotion classification tasks.

Emotion Classification General Classification +1

Detecting gender differences in perception of emotion in crowdsourced data

no code implementations24 Oct 2019 Shahan Ali Memon, Hira Dhamyal, Oren Wright, Daniel Justice, Vijaykumar Palat, William Boler, Bhiksha Raj, Rita Singh

While we limit ourselves to a single modality (i. e. speech), our framework is applicable to studies of emotion perception from all such loosely annotated data in general.

Non-Determinism in Neural Networks for Adversarial Robustness

no code implementations26 May 2019 Daanish Ali Khan, Linhong Li, Ninghao Sha, Zhuoran Liu, Abelino Jimenez, Bhiksha Raj, Rita Singh

Recent breakthroughs in the field of deep learning have led to advancements in a broad spectrum of tasks in computer vision, audio processing, natural language processing and other areas.

Adversarial Robustness

Reconstructing faces from voices

1 code implementation25 May 2019 Yandong Wen, Rita Singh, Bhiksha Raj

Voice profiling aims at inferring various human parameters from their speech, e. g. gender, age, etc.

Nonlinear Semi-Parametric Models for Survival Analysis

1 code implementation14 May 2019 Chirag Nagpal, Rohan Sangave, Amit Chahar, Parth Shah, Artur Dubrawski, Bhiksha Raj

Semi-parametric survival analysis methods like the Cox Proportional Hazards (CPH) regression (Cox, 1972) are a popular approach for survival analysis.

regression Survival Analysis

Hierarchical Routing Mixture of Experts

no code implementations18 Mar 2019 Wenbo Zhao, Yang Gao, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Addressing these problems, we propose a binary tree-structured hierarchical routing mixture of experts (HRME) model that has classifiers as non-leaf node experts and simple regression models as leaf node experts.

regression

Hide and Speak: Towards Deep Neural Networks for Speech Steganography

1 code implementation7 Feb 2019 Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet

Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier.

Higher-order Network for Action Recognition

no code implementations19 Nov 2018 Kai Hu, Bhiksha Raj

Capturing spatiotemporal dynamics is an essential topic in video recognition.

Action Recognition General Classification +2

Neural Regression Trees

no code implementations1 Oct 2018 Shahan Ali Memon, Wenbo Zhao, Bhiksha Raj, Rita Singh

Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.

Classification General Classification +1

Neural Regression Tree

no code implementations27 Sep 2018 Wenbo Zhao, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one.

Classification regression

Disjoint Mapping Network for Cross-modal Matching of Voices and Faces

no code implementations ICLR 2019 Yandong Wen, Mahmoud Al Ismail, Weiyang Liu, Bhiksha Raj, Rita Singh

We propose a novel framework, called Disjoint Mapping Network (DIMNet), for cross-modal biometric matching, in particular of voices and faces.

Optimal Strategies for Matching and Retrieval Problems by Comparing Covariates

no code implementations12 Jul 2018 Yandong Wen, Mahmoud Al Ismail, Bhiksha Raj, Rita Singh

In many retrieval problems, where we must retrieve one or more entries from a gallery in response to a probe, it is common practice to learn to do by directly comparing the probe and gallery entries to one another.

Retrieval

A Closer Look at Weak Label Learning for Audio Events

1 code implementation24 Apr 2018 Ankit Shah, Anurag Kumar, Alexander G. Hauptmann, Bhiksha Raj

In this work, we first describe a CNN based approach for weakly supervised training of audio events.

Audio Classification Event Detection +2

Voice Impersonation using Generative Adversarial Networks

no code implementations19 Feb 2018 Yang Gao, Rita Singh, Bhiksha Raj

In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the target speaker.

Sound Audio and Speech Processing

Framework for evaluation of sound event detection in web videos

no code implementations2 Nov 2017 Rohan Badlani, Ankit Shah, Benjamin Elizalde, Anurag Kumar, Bhiksha Raj

The framework crawls videos using search queries corresponding to 78 sound event labels drawn from three datasets.

Event Detection Sound Event Detection

Be Careful What You Backpropagate: A Case For Linear Output Activations & Gradient Boosting

no code implementations13 Jul 2017 Anders Oland, Aayush Bansal, Roger B. Dannenberg, Bhiksha Raj

To this end, we demonstrate faster convergence and better performance on diverse classification tasks: image classification using CIFAR-10 and ImageNet, and semantic segmentation using PASCAL VOC 2012.

Classification General Classification +2

Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data

no code implementations9 Jul 2017 Anurag Kumar, Bhiksha Raj

We propose that learning algorithms that can exploit weak labels offer an effective method to learn from web data.

SphereFace: Deep Hypersphere Embedding for Face Recognition

21 code implementations CVPR 2017 Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song

This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space.

Face Identification Face Recognition +1

On the Origin of Deep Learning

no code implementations24 Feb 2017 Haohan Wang, Bhiksha Raj

This paper is a review of the evolutionary history of deep learning models.

The Incredible Shrinking Neural Network: New Perspectives on Learning Representations Through The Lens of Pruning

no code implementations16 Jan 2017 Aditya Sharma, Nikolas Wolfe, Bhiksha Raj

How much can pruning algorithms teach us about the fundamentals of learning representations in neural networks?

Network Pruning

Audio Event and Scene Recognition: A Unified Approach using Strongly and Weakly Labeled Data

no code implementations12 Nov 2016 Anurag Kumar, Bhiksha Raj

In this paper we propose a novel learning framework called Supervised and Weakly Supervised Learning where the goal is to learn simultaneously from weakly and strongly labeled data.

Scene Recognition Weakly-supervised Learning

Discovering Sound Concepts and Acoustic Relations In Text

no code implementations23 Sep 2016 Anurag Kumar, Bhiksha Raj, Ndapandula Nakashole

In this paper we describe approaches for discovering acoustic concepts and relations in text.

Dependency Parsing

An Approach for Self-Training Audio Event Detectors Using Web Data

no code implementations20 Sep 2016 Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane

The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube.

Event Detection

Features and Kernels for Audio Event Recognition

no code implementations19 Jul 2016 Anurag Kumar, Bhiksha Raj

One of the most important problems in audio event detection research is absence of benchmark results for comparison with any proposed method.

Sound Multimedia

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

no code implementations13 Jul 2016 Sebastian Sager, Benjamin Elizalde, Damian Borth, Christian Schulze, Bhiksha Raj, Ian Lane

One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels.

TAG

Classifier Risk Estimation under Limited Labeling Resources

no code implementations9 Jul 2016 Anurag Kumar, Bhiksha Raj

In this paper we propose strategies for estimating performance of a classifier when labels cannot be obtained for the whole test set.

Audio Event Detection using Weakly Labeled Data

no code implementations9 May 2016 Anurag Kumar, Bhiksha Raj

This helps in obtaining a complete description of the recording and is notable since temporal information was never known in the first place in weakly labeled data.

Event Detection Multiple Instance Learning

Content-based Video Indexing and Retrieval Using Corr-LDA

no code implementations27 Feb 2016 Rahul Radhakrishnan Iyer, Sanjeel Parekh, Vikas Mohandoss, Anush Ramsurat, Bhiksha Raj, Rita Singh

Existing video indexing and retrieval methods on popular web-based multimedia sharing websites are based on user-provided sparse tagging.

Retrieval

Environmental Noise Embeddings for Robust Speech Recognition

no code implementations11 Jan 2016 Suyoun Kim, Bhiksha Raj, Ian Lane

We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model.

Management Multi-Task Learning +2

Handcrafted Local Features are Convolutional Neural Networks

no code implementations16 Nov 2015 Zhenzhong Lan, Shoou-I Yu, Ming Lin, Bhiksha Raj, Alexander G. Hauptmann

We approach this problem by first showing that local handcrafted features and Convolutional Neural Networks (CNNs) share the same convolution-pooling network structure.

Action Recognition Optical Flow Estimation +2

Privacy-Preserving Multi-Document Summarization

no code implementations6 Aug 2015 Luís Marujo, José Portêlo, Wang Ling, David Martins de Matos, João P. Neto, Anatole Gershman, Jaime Carbonell, Isabel Trancoso, Bhiksha Raj

State-of-the-art extractive multi-document summarization systems are usually designed without any concern about privacy issues, meaning that all documents are open to third parties.

Document Summarization Multi-Document Summarization +1

Plagiarism Detection in Polyphonic Music using Monaural Signal Separation

no code implementations27 Feb 2015 Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj

Given the large number of new musical tracks released each year, automated approaches to plagiarism detection are essential to help us track potential violations of copyright.

General Classification

Unsupervised Fusion Weight Learning in Multiple Classifier Systems

no code implementations6 Feb 2015 Anurag Kumar, Bhiksha Raj

We also introduce a novel metric for ranking instances based on an index which depends upon the rank of weighted scores of test points among the weighted scores of training points.

Learning Model-Based Sparsity via Projected Gradient Descent

no code implementations7 Sep 2012 Sohail Bahmani, Petros T. Boufounos, Bhiksha Raj

As an example we elaborate on application of the main results to estimation in Generalized Linear Model.

Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers

no code implementations NeurIPS 2010 Manas Pathak, Shantanu Rane, Bhiksha Raj

As increasing amounts of sensitive personal information finds its way into data repositories, it is important to develop analysis mechanisms that can derive aggregate information from these repositories without revealing information about individual data instances.

Privacy Preserving

Sparse Overcomplete Latent Variable Decomposition of Counts Data

no code implementations NeurIPS 2007 Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis

An important problem in many fields is the analysis of counts data to extract meaningful latent components.

Cannot find the paper you are looking for? You can Submit a new open access paper.