Search Results for author: Rohan Kumar Das

Found 18 papers, 5 papers with code

Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

1 code implementation • 14 Apr 2024 • Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario.

Paper
Code

Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training

no code implementations • 1 Apr 2024 • Ruijie Tao, Xinyuan Qian, Rohan Kumar Das, Xiaoxue Gao, Jiadong Wang, Haizhou Li

Audio-visual active speaker detection (AV-ASD) aims to identify which visible face is speaking in a scene with one or more persons.

Audio-Visual Active Speaker Detection Denoising +1

Paper
Add Code

Dual Knowledge Distillation for Efficient Sound Event Detection

no code implementations • 5 Feb 2024 • Yang Xiao, Rohan Kumar Das

To address this issue, we introduce a novel framework referred to as dual knowledge distillation for developing efficient SED systems in this work.

Ranked #2 on Sound Event Detection on DESED (using extra training data)

Event Detection Knowledge Distillation +1

Paper
Add Code

Adaptive-avg-pooling based Attention Vision Transformer for Face Anti-spoofing

no code implementations • 10 Jan 2024 • Jichen Yang, Fangfan Chen, Rohan Kumar Das, Zhengyu Zhu, Shunsi Zhang

In this work, we propose a novel vision transformer referred to as adaptive-avg-pooling based attention vision transformer (AAViT) that uses modules of adaptive average pooling and attention to replace the module of average value computing.

Avg Face Anti-Spoofing

Paper
Add Code

A Multi-Task Learning Framework for Sound Event Detection using High-level Acoustic Characteristics of Sounds

no code implementations • 18 May 2023 • Tanmay Khandelwal, Rohan Kumar Das

Sound event detection (SED) entails identifying the type of sound and estimating its temporal boundaries from acoustic signals.

Event Detection Multi-Task Learning +1

Paper
Add Code

Leveraging Audio-Tagging Assisted Sound Event Detection using Weakified Strong Labels and Frequency Dynamic Convolutions

no code implementations • 25 Apr 2023 • Tanmay Khandelwal, Rohan Kumar Das, Andrew Koh, Eng Siong Chng

Stage-1 of our proposed framework focuses on audio-tagging (AT), which assists the sound event detection (SED) system in Stage-2.

Audio Tagging Event Detection +1

Paper
Add Code

I4U System Description for NIST SRE'20 CTS Challenge

no code implementations • 2 Nov 2022 • Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, Claudio Vair, Andreas Nautsch, Hanwu Sun, Liang He, Tianyu Liang, Qiongqiong Wang, Mickael Rouvier, Pierre-Michel Bousquet, Rohan Kumar Das, Ignacio Viñals Bailo, Meng Liu, Héctor Deldago, Xuechen Liu, Md Sahidullah, Sandro Cumani, Boning Zhang, Koji Okabe, Hitoshi Yamamoto, Ruijie Tao, Haizhou Li, Alfonso Ortega Giménez, Longbiao Wang, Luis Buera

This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge.

Speaker Recognition

Paper
Add Code

Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

no code implementations • 27 Oct 2022 • Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li

We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels.

Contrastive Learning Self-Supervised Learning +1

Paper
Add Code

MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances

no code implementations • 3 Feb 2022 • Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, Haizhou Li

The time delay neural network (TDNN) represents one of the state-of-the-art of neural solutions to text-independent speaker verification.

Text-Independent Speaker Verification

Paper
Add Code

HLT-NUS SUBMISSION FOR 2020 NIST Conversational Telephone Speech SRE

3 code implementations • 12 Nov 2021 • Rohan Kumar Das, Ruijie Tao, Haizhou Li

This work provides a brief description of Human Language Technology (HLT) Laboratory, National University of Singapore (NUS) system submission for 2020 NIST conversational telephone speech (CTS) speaker recognition evaluation (SRE).

Domain Adaptation Speaker Recognition

527

Paper
Code

Self-supervised Speaker Recognition with Loss-gated Learning

1 code implementation • 8 Oct 2021 • Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li

In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals.

Self-Supervised Learning Speaker Recognition

Paper
Code

Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition

no code implementations • 2 Oct 2021 • Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna

The automatic recognition of pathological speech, particularly from children with any articulatory impairment, is a challenging task due to various reasons.

Data Augmentation speech-recognition +1

Paper
Add Code

Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection

4 code implementations • 14 Jul 2021 • Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li

Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers.

Audio-Visual Active Speaker Detection

254

Paper
Code

NUS-HLT Report for ActivityNet Challenge 2021 AVA (Speaker)

1 code implementation • The ActivityNet Large-Scale Activity Recognition Challenge Workshop, CVPR 2021 • Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li

Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers.

Ranked #9 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker

Audio-Visual Active Speaker Detection

254

Paper
Code

Speaker-Utterance Dual Attention for Speaker and Utterance Verification

no code implementations • 20 Aug 2020 • Tianchi Liu, Rohan Kumar Das, Maulik Madhavi, ShengMei Shen, Haizhou Li

The proposed SUDA features an attention mask mechanism to learn the interaction between the speaker and utterance information streams.

Speaker Verification

Paper
Add Code

LONG RANGE ACOUSTIC AND DEEP FEATURES PERSPECTIVE ON ASVSPOOF 2019

no code implementations • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2020 • Rohan Kumar Das, Jichen Yang and Haizhou Li

In this paper, we summarize the findings from the perspective of long range acoustic and deep features for spoof detection.

Speaker Verification

Paper
Add Code

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

no code implementations • 16 Apr 2019 • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Cheng-Lin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans

The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE).

Domain Adaptation Speaker Recognition

Paper
Add Code

Generative x-vectors for text-independent speaker verification

no code implementations • 17 Sep 2018 • Longting Xu, Rohan Kumar Das, Emre Yilmaz, Jichen Yang, Haizhou Li

Speaker verification (SV) systems using deep neural network embeddings, so-called the x-vector systems, are becoming popular due to its good performance superior to the i-vector systems.

Text-Independent Speaker Verification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.