Search Results for author: Jee-weon Jung

Found 48 papers, 20 papers with code

a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification

1 code implementation • 3 Mar 2024 • Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen, Nicholas Evans, Jean-Francois Bonastre, Itshak Lapidot

Spoofing detection is today a mainstream research topic.

Benchmarking Speaker Verification

Paper
Code

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

1 code implementation • 25 Feb 2024 • Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro

We propose a novel Tri-Modal Translation (TMT) model that translates between arbitrary modalities spanning speech, image, and text.

Machine Translation Translation

Paper
Code

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

no code implementations • 30 Jan 2024 • Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe

In this work, we aim to improve the performance and efficiency of OWSM without extra training data.

Paper
Add Code

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

2 code implementations • 30 Jan 2024 • Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Barry-John Theobald, Ahmed Hussen Abdelaziz, Shinji Watanabe

First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models.

Ranked #1 on Speaker Verification on VoxCeleb (using extra training data)

Self-Supervised Learning Speaker Recognition +1

7,878

Paper
Code

Improving Design of Input Condition Invariant Speech Enhancement

1 code implementation • 25 Jan 2024 • Wangyou Zhang, Jee-weon Jung, Shinji Watanabe, Yanmin Qian

In this paper we propose novel architectures to improve the input condition invariant SE model so that performance in simulated conditions remains competitive while real condition degradation is much mitigated.

Speech Enhancement

7,878

Paper
Code

AugSumm: towards generalizable speech summarization using synthetic labels from large language model

1 code implementation • 10 Jan 2024 • Jee-weon Jung, Roshan Sharma, William Chen, Bhiksha Raj, Shinji Watanabe

We tackle this challenge by proposing AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries for training and evaluation.

Language Modelling Large Language Model +1

Paper
Code

Understanding Probe Behaviors through Variational Bounds of Mutual Information

1 code implementation • 15 Dec 2023 • Kwanghee Choi, Jee-weon Jung, Shinji Watanabe

With the success of self-supervised representations, researchers seek a better understanding of the information encapsulated within a representation.

Paper
Code

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

no code implementations • 4 Oct 2023 • Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe

Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models.

Ranked #1 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition

no code implementations • 2 Oct 2023 • Samuele Cornell, Jee-weon Jung, Shinji Watanabe, Stefano Squartini

This paper presents a novel framework for joint speaker diarization (SD) and automatic speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented recognition).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

1 code implementation • 25 Sep 2023 • Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

Pre-training speech models on large volumes of data has achieved remarkable success.

Speech Recognition Translation

7,878

Paper
Code

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

no code implementations • 14 Sep 2023 • Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe

We propose a decoder-only language model, VoxtLM, that can perform four tasks: speech recognition, speech synthesis, text generation, and speech continuation.

Language Modelling speech-recognition +3

Paper
Add Code

Encoder-decoder multimodal speaker change detection

no code implementations • 1 Jun 2023 • Jee-weon Jung, Soonshin Seo, Hee-Soo Heo, Geonmin Kim, You Jin Kim, Young-ki Kwon, Minjae Lee, Bong-Jin Lee

The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications.

Automatic Speech Recognition Change Detection +2

Paper
Add Code

Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing

1 code implementation • 31 May 2023 • Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen

Audio anti-spoofing for automatic speaker verification aims to safeguard users' identities from spoofing attacks.

Speaker Verification

Paper
Code

Towards single integrated spoofing-aware speaker verification embeddings

1 code implementation • 30 May 2023 • Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge.

Speaker Verification

Paper
Code

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge

1 code implementation • 20 Feb 2023 • Jaesung Huh, Andrew Brown, Jee-weon Jung, Joon Son Chung, Arsha Nagrani, Daniel Garcia-Romero, Andrew Zisserman

This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022.

Speaker Diarization Speaker Recognition +1

Paper
Code

Absolute decision corrupts absolutely: conservative online speaker diarisation

no code implementations • 9 Nov 2022 • Youngki Kwon, Hee-Soo Heo, Bong-Jin Lee, You Jin Kim, Jee-weon Jung

Our focus lies in developing an online speaker diarisation framework which demonstrates robust performance across diverse domains.

Paper
Add Code

High-resolution embedding extractor for speaker diarisation

no code implementations • 8 Nov 2022 • Hee-Soo Heo, Youngki Kwon, Bong-Jin Lee, You Jin Kim, Jee-weon Jung

Extracted dense frame-level embeddings can each represent a speaker.

Vocal Bursts Intensity Prediction

Paper
Add Code

Disentangled representation learning for multilingual speaker recognition

no code implementations • 1 Nov 2022 • Kihyun Nam, Youkyum Kim, Jaesung Huh, Hee Soo Heo, Jee-weon Jung, Joon Son Chung

The goal of this paper is to learn robust speaker representation for bilingual speaking scenario.

Disentanglement Metric Learning +1

Paper
Add Code

In search of strong embedding extractors for speaker diarisation

no code implementations • 26 Oct 2022 • Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, Joon Son Chung

First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and diarisation.

Data Augmentation Speaker Verification

Paper
Add Code

Large-scale learning of generalised representations for speaker recognition

no code implementations • 20 Oct 2022 • Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesong Lee, Hye-jin Shim, Youngki Kwon, Joon Son Chung, Shinji Watanabe

We also show that training with proposed large data configurations gives better performance.

Inductive Bias Speaker Recognition

Paper
Add Code

Frequency and Multi-Scale Selective Kernel Attention for Speaker Verification

1 code implementation • 3 Apr 2022 • Sung Hwan Mun, Jee-weon Jung, Min Hyun Han, Nam Soo Kim

The SKA mechanism allows each convolutional layer to adaptively select the kernel size in a data-driven fashion.

Speaker Verification

Paper
Code

Curriculum learning for self-supervised speaker verification

no code implementations • 28 Mar 2022 • Hee-Soo Heo, Jee-weon Jung, Jingu Kang, Youngki Kwon, You Jin Kim, Bong-Jin Lee, Joon Son Chung

The goal of this paper is to train effective self-supervised speaker representations without identity labels.

Self-Supervised Learning Speaker Recognition +1

Paper
Add Code

SASV 2022: The First Spoofing-Aware Speaker Verification Challenge

no code implementations • 28 Mar 2022 • Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, Tomi Kinnunen

Pre-trained spoofing detection and speaker verification models are provided as open source and are used in two baseline SASV solutions.

Speaker Verification

Paper
Add Code

Pushing the limits of raw waveform speaker recognition

2 code implementations • 16 Mar 2022 • Jee-weon Jung, You Jin Kim, Hee-Soo Heo, Bong-Jin Lee, Youngki Kwon, Joon Son Chung

Our best model achieves an equal error rate of 0. 89%, which is competitive with the state-of-the-art models based on handcrafted features, and outperforms the best model based on raw waveform inputs by a large margin.

Self-Supervised Learning Speaker Recognition +1

972

Paper
Code

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

1 code implementation • 24 Feb 2022 • Hemlata Tak, Massimiliano Todisco, Xin Wang, Jee-weon Jung, Junichi Yamagishi, Nicholas Evans

The performance of spoofing countermeasure systems depends fundamentally upon the use of sufficiently representative training data.

Data Augmentation DeepFake Detection +3

Paper
Code

Multi-scale speaker embedding-based graph attention networks for speaker diarisation

no code implementations • 7 Oct 2021 • Youngki Kwon, Hee-Soo Heo, Jee-weon Jung, You Jin Kim, Bong-Jin Lee, Joon Son Chung

The objective of this work is effective speaker diarisation using multi-scale speaker embeddings.

Graph Attention

Paper
Add Code

Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity

no code implementations • 7 Oct 2021 • You Jin Kim, Hee-Soo Heo, Jee-weon Jung, Youngki Kwon, Bong-Jin Lee, Joon Son Chung

The objective of this work is to train noise-robust speaker embeddings adapted for speaker diarisation.

Dimensionality Reduction

Paper
Add Code

AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks

1 code implementation • 4 Oct 2021 • Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, Nicholas Evans

Artefacts that differentiate spoofed from bona-fide utterances can reside in spectral or temporal domains.

Ranked #1 on Voice Anti-spoofing on ASVspoof 2019 - LA

Graph Attention Voice Anti-spoofing

116

Paper
Code

End-to-End Spectro-Temporal Graph Attention Networks for Speaker Verification Anti-Spoofing and Speech Deepfake Detection

1 code implementation • 27 Jul 2021 • Hemlata Tak, Jee-weon Jung, Jose Patino, Madhu Kamble, Massimiliano Todisco, Nicholas Evans

Artefacts that serve to distinguish bona fide speech from spoofed or deepfake speech are known to reside in specific subbands and temporal segments.

DeepFake Detection Face Swapping +2

Paper
Code

Attentive max feature map and joint training for acoustic scene classification

no code implementations • 15 Apr 2021 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Ha-Jin Yu

Furthermore, adopting the proposed attentive max feature map, our team placed fourth in the recent DCASE 2021 challenge.

Acoustic Scene Classification Multi-Task Learning +1

Paper
Add Code

Learning Metrics from Mean Teacher: A Supervised Learning Method for Improving the Generalization of Speaker Verification System

no code implementations • 14 Apr 2021 • Ju-ho Kim, Hye-jin Shim, Jee-weon Jung, Ha-Jin Yu

By learning the reliable intermediate representation of the mean teacher network, we expect that the proposed method can explore more discriminatory embedding spaces and improve the generalization performance of the speaker verification system.

Speaker Verification

Paper
Add Code

Graph Attention Networks for Anti-Spoofing

no code implementations • 8 Apr 2021 • Hemlata Tak, Jee-weon Jung, Jose Patino, Massimiliano Todisco, Nicholas Evans

This paper reports our use of graph attention networks (GATs) to model these relationships and to improve spoofing detection performance.

Graph Attention Speaker Verification

Paper
Add Code

Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network

no code implementations • 7 Apr 2021 • Jee-weon Jung, Hee-Soo Heo, Youngki Kwon, Joon Son Chung, Bong-Jin Lee

In this work, we propose an overlapped speech detection system trained as a three-class classifier.

Binary Classification speaker-diarization +1

Paper
Add Code

Adapting Speaker Embeddings for Speaker Diarisation

no code implementations • 7 Apr 2021 • Youngki Kwon, Jee-weon Jung, Hee-Soo Heo, You Jin Kim, Bong-Jin Lee, Joon Son Chung

The goal of this paper is to adapt speaker embeddings for solving the problem of speaker diarisation.

Clustering Dimensionality Reduction +1

Paper
Add Code

Graph Attention Networks for Speaker Verification

no code implementations • 22 Oct 2020 • Jee-weon Jung, Hee-Soo Heo, Ha-Jin Yu, Joon Son Chung

The proposed framework inputs segment-wise speaker embeddings from an enrollment and a test utterance and directly outputs a similarity score.

Graph Attention Speaker Verification

Paper
Add Code

DCASENET: A joint pre-trained deep neural network for detecting and classifying acoustic scenes and events

1 code implementation • 21 Sep 2020 • Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu

Single task deep neural networks that perform a target task among diverse cross-related tasks in the acoustic scene and event literature are being developed.

Acoustic Scene Classification Audio Tagging +3

Paper
Code

Capturing scattered discriminative information using a deep architecture in acoustic scene classification

no code implementations • 9 Jul 2020 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Ha-Jin Yu

Various experiments are conducted using the detection and classification of acoustic scenes and events 2020 task1-a dataset to validate the proposed methods.

Acoustic Scene Classification General Classification +1

Paper
Add Code

Integrated Replay Spoofing-aware Text-independent Speaker Verification

no code implementations • 10 Jun 2020 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Seung-bin Kim, Ha-Jin Yu

In this paper, we propose two approaches for building an integrated system of speaker verification and presentation attack detection: an end-to-end monolithic approach and a back-end modular approach.

Multi-Task Learning Speaker Identification +1

Paper
Add Code

Segment Aggregation for short utterances speaker verification using raw waveforms

1 code implementation • 7 May 2020 • Seung-bin Kim, Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu

The proposed method segments an input utterance into several short utterances and then aggregates the segment embeddings extracted from the segmented inputs to compose a speaker embedding.

Speaker Verification

Paper
Code

Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms

2 code implementations • 1 Apr 2020 • Jee-weon Jung, Seung-bin Kim, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu

Recent advances in deep learning have facilitated the design of speaker verification systems that directly input raw waveforms.

Text-Independent Speaker Verification

332

Paper
Code

A study on the role of subsidiary information in replay attack spoofing detection

no code implementations • 31 Jan 2020 • Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu

For addition, we utilize the multi-task learning framework to include subsidiary information to the code.

Binary Classification Multi-Task Learning

Paper
Add Code

Self-supervised pre-training with acoustic configurations for replay spoofing detection

no code implementations • 22 Oct 2019 • Hye-jin Shim, Hee-Soo Heo, Jee-weon Jung, Ha-Jin Yu

Constructing a dataset for replay spoofing detection requires a physical process of playing an utterance and re-recording it, presenting a challenge to the collection of large-scale datasets.

Speaker Verification

Paper
Add Code

Cosine similarity-based adversarial process

no code implementations • 1 Jul 2019 • Hee-Soo Heo, Jee-weon Jung, Hye-jin Shim, IL-Ho Yang, Ha-Jin Yu

In particular, the adversarial process degrades the performance of the subsidiary model by eliminating the subsidiary information in the input which, in assumption, may degrade the performance of the primary model.

Speaker Identification

Paper
Add Code

Replay attack detection with complementary high-resolution information using end-to-end DNN for the ASVspoof 2019 Challenge

1 code implementation • 23 Apr 2019 • Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu

To detect unrevealed characteristics that reside in a replayed speech, we directly input spectrograms into an end-to-end DNN without knowledge-based intervention.

Paper
Code

RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification

4 code implementations • 17 Apr 2019 • Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, Hye-jin Shim, Ha-Jin Yu

In this study, we explore end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classification.

Classification Data Augmentation +2

332

Paper
Code

End-to-end losses based on speaker basis vectors and all-speaker hard negative mining for speaker verification

no code implementations • 7 Feb 2019 • Hee-Soo Heo, Jee-weon Jung, IL-Ho Yang, Sung-Hyun Yoon, Hye-jin Shim, Ha-Jin Yu

Each speaker basis is designed to represent the corresponding speaker in the process of training deep neural networks.

Metric Learning Speaker Verification

Paper
Add Code

Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings

no code implementations • 25 Oct 2018 • Jee-weon Jung, Hee-Soo Heo, Hye-jin Shim, Ha-Jin Yu

The short duration of an input utterance is one of the most critical threats that degrade the performance of speaker verification systems.

Text-Independent Speaker Verification

Paper
Add Code

Replay spoofing detection system for automatic speaker verification using multi-task learning of noise classes

no code implementations • 29 Aug 2018 • Hye-jin Shim, Jee-weon Jung, Hee-Soo Heo, Sung-Hyun Yoon, Ha-Jin Yu

We explore the effectiveness of training a deep neural network simultaneously for replay attack spoofing detection and replay noise classification.

General Classification Multi-Task Learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.