Search Results for author: Yu Tsao

Found 161 papers, 48 papers with code

A Flexible and Extensible Framework for Multiple Answer Modes Question Answering

no code implementations ROCLING 2021 Cheng-Chung Fan, Chia-Chih Kuo, Shang-Bao Luo, Pei-Jun Liao, Kuang-Yu Chang, Chiao-Wei Hsu, Meng-Tse Wu, Shih-Hong Tsai, Tzu-Man Wu, Aleksandra Smolka, Chao-Chun Liang, Hsin-Min Wang, Kuan-Yu Chen, Yu Tsao, Keh-Yih Su

Only a few of them adopt several answer generation modules for providing different mechanisms; however, they either lack an aggregation mechanism to merge the answers from various modules, or are too complicated to be implemented with neural networks.

Answer Generation Question Answering

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

no code implementations26 Feb 2024 Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao, Yu-Chiang Frank Wang

To improve the robustness of the encoder for SE, a novel self-distillation mechanism combined with adversarial training is introduced.

Quantization Speech Enhancement

Audio-Visual Speech Enhancement in Noisy Environments via Emotion-Based Contextual Cues

no code implementations26 Feb 2024 Tassadaq Hussain, Kia Dashtipour, Yu Tsao, Amir Hussain

By integrating emotional features, the proposed system achieves significant improvements in both objective and subjective assessments of speech quality and intelligibility, especially in challenging noise environments.

Speech Enhancement

A Non-Intrusive Neural Quality Assessment Model for Surface Electromyography Signals

no code implementations8 Feb 2024 Cho-Yuan Lee, Kuan-Chen Wang, Kai-Chun Liu, Xugang Lu, Ping-Cheng Yeh, Yu Tsao

In practical scenarios involving the measurement of surface electromyography (sEMG) in muscles, particularly those areas near the heart, one of the primary sources of contamination is the presence of electrocardiogram (ECG) signals.

SDEMG: Score-based Diffusion Model for Surface Electromyographic Signal Denoising

1 code implementation6 Feb 2024 Yu-Tung Liu, Kuan-Chen Wang, Kai-Chun Liu, Sheng-Yu Peng, Yu Tsao

In this study, we proposed a novel approach, termed SDEMG, as a score-based diffusion model for sEMG signal denoising.

Denoising

HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids

1 code implementation2 Jan 2024 Dyah A. M. G. Wisnu, Epri W. Pratiwi, Stefano Rini, Ryandhimas E. Zezario, Hsin-Min Wang, Yu Tsao

This paper introduces HAAQI-Net, a non-intrusive deep learning model for music quality assessment tailored to hearing aid users.

Music Quality Assessment

D4AM: A General Denoising Framework for Downstream Acoustic Models

1 code implementation28 Nov 2023 Chi-Chang Lee, Yu Tsao, Hsin-Min Wang, Chu-Song Chen

To our knowledge, this is the first work that deploys an effective combination scheme of regression (denoising) and classification (ASR) objectives to derive a general pre-processor applicable to various unseen ASR systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Lightly Weighted Automatic Audio Parameter Extraction for the Quality Assessment of Consensus Auditory-Perceptual Evaluation of Voice

no code implementations27 Nov 2023 Yi-Heng Lin, Wen-Hsuan Tseng, Li-Chin Chen, Ching-Ting Tan, Yu Tsao

The Consensus Auditory-Perceptual Evaluation of Voice is a widely employed tool in clinical voice quality assessment that is significant for streaming communication among clinical professionals and benchmarking for the determination of further treatment.

Benchmarking

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model

no code implementations15 Nov 2023 Hsin-Tien Chiang, Szu-Wei Fu, Hsin-Min Wang, Yu Tsao, John H. L. Hansen

Furthermore, we demonstrated that incorporating SSL models resulted in greater transferability to OOD dataset.

AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection

no code implementations5 Nov 2023 Sahibzada Adil Shahzad, Ammarah Hashmi, Yan-Tsung Peng, Yu Tsao, Hsin-Min Wang

This study proposes a new method based on a multi-modal self-supervised-learning (SSL) feature extractor to exploit inconsistency between audio and visual modalities for multi-modal video forgery detection.

DeepFake Detection Face Swapping +2

Neural domain alignment for spoken language recognition based on optimal transport

no code implementations20 Oct 2023 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

Our previous study discovered that completely aligning the distributions between the source and target domains can introduce a negative transfer, where classes or irrelevant classes from the source domain map to a different class in the target domain during distribution alignment.

Unsupervised Domain Adaptation

The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains

no code implementations4 Oct 2023 Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi

We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech.

Speech Synthesis Text-To-Speech Synthesis

Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASR

no code implementations28 Sep 2023 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

Due to the modality discrepancy between textual and acoustic modeling, efficiently transferring linguistic knowledge from a pretrained language model (PLM) to acoustic encoding for automatic speech recognition (ASR) still remains a challenging task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cross-modal Alignment with Optimal Transport for CTC-based ASR

no code implementations24 Sep 2023 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

Since the PLM is built from text while the acoustic model is trained with speech, a cross-modal alignment is required in order to transfer the context dependent linguistic knowledge from the PLM to acoustic encoding.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

A Study on Incorporating Whisper for Robust Speech Assessment

1 code implementation22 Sep 2023 Ryandhimas E. Zezario, Yu-Wen Chen, Szu-Wei Fu, Yu Tsao, Hsin-Min Wang, Chiou-Shann Fuh

The first part of this study investigates the correlation between the embedding features of Whisper and two self-supervised learning (SSL) models with subjective quality and intelligibility scores.

Self-Supervised Learning

Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement

no code implementations20 Sep 2023 Shafique Ahmed, Chia-Wei Chen, Wenze Ren, Chin-Jou Li, Ernie Chu, Jun-Cheng Chen, Amir Hussain, Hsin-Min Wang, Yu Tsao, Jen-Cheng Hou

Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems.

Speech Enhancement

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model

no code implementations18 Aug 2023 Ryandhimas E. Zezario, Bo-Ren Brian Bai, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net.

Multi-Task Learning Pseudo Label

Audio-Visual Speech Enhancement Using Self-supervised Learning to Improve Speech Intelligibility in Cochlear Implant Simulations

no code implementations15 Jul 2023 Richard Lee Lai, Jen-Cheng Hou, Mandar Gogate, Kia Dashtipour, Amir Hussain, Yu Tsao

The aim of this study is to explore the effectiveness of audio-visual speech enhancement (AVSE) in enhancing the intelligibility of vocoded speech in cochlear implant (CI) simulations.

Self-Supervised Learning Speech Enhancement

Study on the Correlation between Objective Evaluations and Subjective Speech Quality and Intelligibility

no code implementations10 Jul 2023 Hsin-Tien Chiang, Kuo-Hsuan Hung, Szu-Wei Fu, Heng-Cheng Kuo, Ming-Hsueh Tsai, Yu Tsao

Moreover, new objective measures are proposed that combine current objective measures using deep learning techniques to predict subjective quality and intelligibility.

IANS: Intelligibility-aware Null-steering Beamforming for Dual-Microphone Arrays

no code implementations9 Jul 2023 Wen-Yuan Ting, Syu-Siang Wang, Yu Tsao, Borching Su

Beamforming techniques are popular in speech-related applications due to their effective spatial filtering capabilities.

Deep denoising autoencoder-based non-invasive blood flow detection for arteriovenous fistula

no code implementations12 Jun 2023 Li-Chin Chen, Yi-Heng Lin, Li-Ning Peng, Feng-Ming Wang, Yu-Hsin Chen, Po-Hsun Huang, Shang-Feng Yang, Yu Tsao

Clinical guidelines underscore the importance of regularly monitoring and surveilling arteriovenous fistula (AVF) access in hemodialysis patients to promptly detect any dysfunction.

Denoising Dimensionality Reduction +1

ElectrodeNet -- A Deep Learning Based Sound Coding Strategy for Cochlear Implants

no code implementations26 May 2023 Enoch Hsin-Ho Huang, Rong Chao, Yu Tsao, Chao-Min Wu

ElectrodeNet, a deep learning based sound coding strategy for the cochlear implant (CI), is proposed to emulate the advanced combination encoder (ACE) strategy by replacing the conventional envelope detection using various artificial neural networks.

Sentence

Deep Learning-based Fall Detection Algorithm Using Ensemble Model of Coarse-fine CNN and GRU Networks

no code implementations13 Apr 2023 Chien-Pin Liu, Ju-Hsuan Li, En-Ping Chu, Chia-Yeh Hsieh, Kai-Chun Liu, Chia-Tai Chan, Yu Tsao

In order to achieve better fall detection performance, an ensemble model that combines a coarse-fine convolutional neural network and gated recurrent unit is proposed in this study.

Self-supervised learning-based general laboratory progress pretrained model for cardiovascular event detection

no code implementations13 Mar 2023 Li-Chin Chen, Kuo-Hsuan Hung, Yi-Ju Tseng, Hsin-Yao Wang, Tse-Min Lu, Wei-Chieh Huang, Yu Tsao

This study employed self-supervised learning (SSL) to pretrain a generalized laboratory progress (GLP) model that captures the overall progression of six common laboratory markers in prevalent cardiovascular cases, with the intention of transferring this knowledge to aid in the detection of specific cardiovascular event.

Event Detection Self-Supervised Learning +1

PreFallKD: Pre-Impact Fall Detection via CNN-ViT Knowledge Distillation

no code implementations7 Mar 2023 Tin-Han Chi, Kai-Chun Liu, Chia-Yeh Hsieh, Yu Tsao, Chia-Tai Chan

The experiment results show that PreFallKD could boost the student model during the testing phase and achieves reliable F1-score (92. 66%) and lead time (551. 3 ms).

Data Augmentation Knowledge Distillation

On the robustness of non-intrusive speech quality model by adversarial examples

no code implementations11 Nov 2022 Hsin-Yi Lin, Huan-Hsin Tseng, Yu Tsao

It has been shown recently that deep learning based models are effective on speech quality prediction and could outperform traditional metrics in various perspectives.

Multimodal Forgery Detection Using Ensemble Learning

1 code implementation Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2022 Ammarah Hashmi, Sahibzada Adil Shahzad, Wasim Ahmad, Chia Wen Lin, Yu Tsao, Hsin-Min Wang

The recent rapid revolution in Artificial Intelligence (AI) technology has enabled the creation of hyper-realistic deepfakes, and detecting deepfake videos (also known as AIsynthesized videos) has become a critical task.

 Ranked #1 on Multimodal Forgery Detection on FakeAVCeleb (using extra training data)

Ensemble Learning Face Swapping +1

Inference and Denoise: Causal Inference-based Neural Speech Enhancement

1 code implementation2 Nov 2022 Tsun-An Hsieh, Chao-Han Huck Yang, Pin-Yu Chen, Sabato Marco Siniscalchi, Yu Tsao

This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention.

Causal Inference Speech Enhancement

T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5

1 code implementation1 Nov 2022 Chan-Jan Hsu, Ho-Lam Chung, Hung-Yi Lee, Yu Tsao

In Spoken language understanding (SLU), a natural solution is concatenating pre-trained speech models (e. g. HuBERT) and pretrained language models (PLM, e. g. T5).

Language Modelling Question Answering +1

Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal Self-Supervised Embeddings

no code implementations31 Oct 2022 I-Chun Chern, Kuo-Hsuan Hung, Yi-Ting Chen, Tassadaq Hussain, Mandar Gogate, Amir Hussain, Yu Tsao, Jen-Cheng Hou

In summary, our results confirm the effectiveness of our proposed model for the AVSS task with proper fine-tuning strategies, demonstrating that multi-modal self-supervised embeddings obtained from AV-HuBERT can be generalized to audio-visual regression tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

A Training and Inference Strategy Using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech

2 code implementations27 Oct 2022 Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

The lack of clean speech is a practical challenge to the development of speech enhancement systems, which means that there is an inevitable mismatch between their training criterion and evaluation metric.

Speech Enhancement

CasNet: Investigating Channel Robustness for Speech Separation

1 code implementation27 Oct 2022 Fan-Lin Wang, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

In this study, inheriting the use of our previously constructed TAT-2mix corpus, we address the channel mismatch problem by proposing a channel-aware audio separation network (CasNet), a deep learning framework for end-to-end time-domain speech separation.

Speech Separation

ECG Artifact Removal from Single-Channel Surface EMG Using Fully Convolutional Networks

1 code implementation24 Oct 2022 Kuan-Chen Wang, Kai-Chun Liu, Sheng-Yu Peng, Yu Tsao

Electrocardiogram (ECG) artifact contamination often occurs in surface electromyography (sEMG) applications when the measured muscles are in proximity to the heart.

Denoising

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

1 code implementation19 Jul 2022 Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe

To showcase such integration, we performed experiments on carefully designed synthetic datasets for noisy-reverberant multi-channel ST and SLU tasks, which can be used as benchmark corpora for future research.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding

no code implementations ACL 2022 Chan-Jan Hsu, Hung-Yi Lee, Yu Tsao

Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimodal transformers have been effective in visual-language tasks.

Natural Language Understanding

A Study of Using Cepstrogram for Countermeasure Against Replay Attacks

1 code implementation9 Apr 2022 Shih-kuang Lee, Yu Tsao, Hsin-Min Wang

This study investigated the cepstrogram properties and demonstrated their effectiveness as powerful countermeasures against replay attacks.

Boosting Self-Supervised Embeddings for Speech Enhancement

1 code implementation7 Apr 2022 Kuo-Hsuan Hung, Szu-Wei Fu, Huan-Hsin Tseng, Hsin-Tien Chiang, Yu Tsao, Chii-Wann Lin

We further study the relationship between the noise robustness of SSL representation via clean-noisy distance (CN distance) and the layer importance for SE.

Self-Supervised Learning Speech Enhancement

MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

no code implementations7 Apr 2022 Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

In this study, we propose a multi-branched speech intelligibility prediction model (MBI-Net), for predicting the subjective intelligibility scores of HA users.

Perceptual Contrast Stretching on Target Feature for Speech Enhancement

1 code implementation31 Mar 2022 Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

Specifically, the contrast of target features is stretched based on perceptual importance, thereby improving the overall SE performance.

Speech Enhancement

Partial Coupling of Optimal Transport for Spoken Language Identification

no code implementations31 Mar 2022 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

In order to reduce domain discrepancy to improve the performance of cross-domain spoken language identification (SLID) system, as an unsupervised domain adaptation (UDA) method, we have proposed a joint distribution alignment (JDA) model based on optimal transport (OT).

Language Identification Spoken language identification +1

Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks

1 code implementation30 Mar 2022 Fan-Lin Wang, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

However, domain mismatch between training/test situations due to factors, such as speaker, content, channel, and environment, remains a severe problem for speech separation.

Speech Separation

Subspace-based Representation and Learning for Phonotactic Spoken Language Recognition

no code implementations28 Mar 2022 Hung-Shin Lee, Yu Tsao, Shyh-Kang Jeng, Hsin-Min Wang

Phonotactic constraints can be employed to distinguish languages by representing a speech utterance as a multinomial distribution or phone events.

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition

1 code implementation25 Mar 2022 Hung-Shin Lee, Pin-Yuan Chen, Yao-Fei Cheng, Yu Tsao, Hsin-Min Wang

In this paper, a noise-aware training framework based on two cascaded neural structures is proposed to jointly optimize speech enhancement and speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Continuous Speech for Improved Learning Pathological Voice Disorders

no code implementations22 Feb 2022 Syu-Siang Wang, Chi-Te Wang, Chih-Chung Lai, Yu Tsao, Shih-Hau Fang

The experiments were conducted on a large-scale database, wherein 1, 045 continuous speech were collected by the speech clinic of a hospital from 2012 to 2019.

When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing

no code implementations17 Feb 2022 Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Yu Tsao, Pin-Yu Chen

Our experiments on intent classification show that our proposed BERT-QTC model attains competitive experimental results in the Snips and ATIS spoken language datasets.

intent-classification Intent Classification +4

EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement

no code implementations14 Feb 2022 Kuan-Chen Wang, Kai-Chun Liu, Hsin-Min Wang, Yu Tsao

Multimodal learning has been proven to be an effective method to improve speech enhancement (SE) performance, especially in challenging situations such as low signal-to-noise ratios, speech noise, or unseen noise types.

Electromyography (EMG) Speech Enhancement

A Novel Speech Intelligibility Enhancement Model based on CanonicalCorrelation and Deep Learning

no code implementations11 Feb 2022 Tassadaq Hussain, Muhammad Diyan, Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Yu Tsao, Amir Hussain

Current deep learning (DL) based approaches to speech intelligibility enhancement in noisy environments are often trained to minimise the feature distance between noise-free speech and enhanced speech signals.

Speech Enhancement

Conditional Diffusion Probabilistic Model for Speech Enhancement

2 code implementations10 Feb 2022 Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs.

Speech Enhancement Speech Synthesis

A Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning for Hearing-Assistive Technologies

no code implementations8 Feb 2022 Tassadaq Hussain, Muhammad Diyan, Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Yu Tsao, Amir Hussain

Current deep learning (DL) based approaches to speech intelligibility enhancement in noisy environments are generally trained to minimise the distance between clean and enhanced speech features.

Speech Enhancement

A Novel Temporal Attentive-Pooling based Convolutional Recurrent Architecture for Acoustic Signal Enhancement

no code implementations24 Jan 2022 Tassadaq Hussain, Wei-Chien Wang, Mandar Gogate, Kia Dashtipour, Yu Tsao, Xugang Lu, Adeel Ahsan, Amir Hussain

To address this problem, we propose to integrate a novel temporal attentive-pooling (TAP) mechanism into a conventional convolutional recurrent neural network, termed as TAP-CRNN.

Predicting the Travel Distance of Patients to Access Healthcare using Deep Neural Networks

no code implementations7 Dec 2021 Li-Chin Chen, Ji-Tian Sheu, Yuh-Jue Chuang, Yu Tsao

The aim of this study is to propose a deep neural network approach to model the complex decision of patient choice in travel distance to access care, which is an important indicator for policymaking in allocating resources.

Specificity

Toward Real-World Voice Disorder Classification

no code implementations5 Dec 2021 Heng-Cheng Kuo, Yu-Peng Hsieh, Huan-Hsin Tseng, Chi-Te Wang, Shih-Hau Fang, Yu Tsao

Conclusion: By deploying factorized convolutional neural networks and domain adversarial training, domain-invariant features can be derived for voice disorder classification with limited resources.

Classification Model Compression

Instrumented shoulder functional assessment using inertial measurement units for frozen shoulder

no code implementations26 Nov 2021 Ting-Yang Lu, Kai-Chun Liu, Chia-Yeh Hsieh, Chih-Ya Chang, Yu Tsao, Chia-Tai Chan

Moreover, features of subtasks provided subtle information related to clinical conditions that have not been revealed in features of a complete task, especially the defined subtask 1 and 2 of each task.

Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport

1 code implementation NeurIPS 2021 Hsin-Yi Lin, Huan-Hsin Tseng, Xugang Lu, Yu Tsao

This paper presents a novel discriminator-constrained optimal transport network (DOTN) that performs unsupervised domain adaptation for speech enhancement (SE), which is an essential regression task in speech processing.

Speech Enhancement Unsupervised Domain Adaptation

OSSEM: one-shot speaker adaptive speech enhancement using meta learning

no code implementations10 Nov 2021 Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli

Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers.

Meta-Learning Speech Enhancement

HASA-net: A non-intrusive hearing-aid speech assessment network

no code implementations10 Nov 2021 Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao

Without the need of a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations.

SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points

no code implementations8 Nov 2021 Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo

In this paper, a novel sign-exponent-only floating-point network (SEOFP-NET) technique is proposed to compress the model size and accelerate the inference time for speech enhancement, a regression task of speech signal processing.

Model Compression regression +1

InQSS: a speech intelligibility and quality assessment model using a multi-task learning network

1 code implementation4 Nov 2021 Yu-Wen Chen, Yu Tsao

Speech intelligibility and quality assessment models are essential tools for researchers to evaluate and improve speech processing models.

Multi-Task Learning

Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

1 code implementation3 Nov 2021 Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

In this study, we propose a cross-domain multi-objective speech assessment model called MOSA-Net, which can estimate multiple speech assessment metrics simultaneously.

Speech Enhancement

Speech Enhancement Based on Cyclegan with Noise-informed Training

no code implementations19 Oct 2021 Wen-Yuan Ting, Syu-Siang Wang, Hsin-Li Chang, Borching Su, Yu Tsao

Herein, we investigate a potential limitation of the clean-to-noisy conversion part and propose a novel noise-informed training (NIT) approach to improve the performance of the original CycleGAN SE system.

Speech Enhancement

Speech Enhancement-assisted Voice Conversion in Noisy Environments

no code implementations19 Oct 2021 Yun-Ju Chan, Chiang-Jen Peng, Syu-Siang Wang, Hsin-Min Wang, Yu Tsao, Tai-Shih Chi

Numerous voice conversion (VC) techniques have been proposed for the conversion of voices among different speakers.

Speech Enhancement Voice Conversion

MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

2 code implementations12 Oct 2021 Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao

Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training.

Speech Enhancement

Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition

1 code implementation8 Oct 2021 Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao

In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system.

Spoken Command Recognition Transfer Learning

Analyzing the Robustness of Unsupervised Speech Recognition

no code implementations7 Oct 2021 Guan-Ting Lin, Chan-Jan Hsu, Da-Rong Liu, Hung-Yi Lee, Yu Tsao

In this work, we further analyze the training robustness of unsupervised ASR on the domain mismatch scenarios in which the domains of unpaired speech and text are different.

Generative Adversarial Network speech-recognition +2

Mutual Information Continuity-constrained Estimator

no code implementations29 Sep 2021 Tsun-An Hsieh, Cheng Yu, Ying Hung, Chung-Ching Lin, Yu Tsao

Accordingly, we propose Mutual Information Continuity-constrained Estimator (MICE).

Density Estimation

Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion

no code implementations8 Sep 2021 Yi-Syuan Liou, Wen-Chin Huang, Ming-Chi Yen, Shu-Wei Tsai, Yu-Huai Peng, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Voice conversion (VC) is an effective approach to electrolaryngeal (EL) speech enhancement, a task that aims to improve the quality of the artificial voice from an electrolarynx device.

Dynamic Time Warping Speech Enhancement +1

A Study on Speech Enhancement Based on Diffusion Probabilistic Model

1 code implementation25 Jul 2021 Yen-Ju Lu, Yu Tsao, Shinji Watanabe

Based on this property, we propose a diffusion probabilistic model-based speech enhancement (DiffuSE) model that aims to recover clean speech signals from noisy signals.

Speech Enhancement

SVSNet: An End-to-end Speaker Voice Similarity Assessment Model

no code implementations20 Jul 2021 Cheng-Hung Hu, Yu-Huai Peng, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

Neural evaluation metrics derived for numerous speech generation tasks have recently attracted great attention.

Voice Conversion Voice Similarity

Speech Recovery for Real-World Self-powered Intermittent Devices

no code implementations9 Jun 2021 Yu-Chen Lin, Tsun-An Hsieh, Kuo-Hsuan Hung, Cheng Yu, Harinath Garudadri, Yu Tsao, Tei-Wei Kuo

The incompleteness of speech inputs severely degrades the performance of all the related speech signal processing applications.

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion

no code implementations2 Jun 2021 Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao, Hsin-Min Wang, Tomoki Toda

First, a powerful parallel sequence-to-sequence model converts the input dysarthric speech into a normal speech of a reference speaker as an intermediate product, and a nonparallel, frame-wise VC model realized with a variational autoencoder then converts the speaker identity of the reference speech back to that of the patient while assumed to be capable of preserving the enhanced quality.

Voice Conversion

Multimodal Deep Learning Framework for Image Popularity Prediction on Social Media

no code implementations18 May 2021 Fatma S. Abousaleh, Wen-Huang Cheng, Neng-Hao Yu, Yu Tsao

In this study, motivated by multimodal learning, which uses information from various modalities, and the current success of convolutional neural networks (CNNs) in various fields, we propose a deep learning model, called visual-social convolutional neural network (VSCNN), which predicts the popularity of a posted image by incorporating various types of visual and social features into a unified network model.

Image popularity prediction Multimodal Deep Learning

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

3 code implementations8 Apr 2021 Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory.

Speech Enhancement

Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification

no code implementations7 Apr 2021 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

However, in most of the discriminative training for SiamNN, only the distribution of pair-wised sample distances is considered, and the additional discriminative information in joint distribution of samples is ignored.

Binary Classification feature selection +1

The AS-NU System for the M2VoC Challenge

no code implementations7 Apr 2021 Cheng-Hung Hu, Yi-Chiao Wu, Wen-Chin Huang, Yu-Huai Peng, Yu-Wen Chen, Pin-Jui Ku, Tomoki Toda, Yu Tsao, Hsin-Min Wang

The first track focuses on using a small number of 100 target utterances for voice cloning, while the second track focuses on using only 5 target utterances for voice cloning.

Voice Cloning

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

no code implementations7 Feb 2021 Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Wen-Chin Huang, Xugang Lu, Yu Tsao

Synthesized speech from articulatory movements can have real-world use for patients with vocal cord disorders, situations requiring silent speech, or in high-noise environments.

Coupling a generative model with a discriminative learning framework for speaker verification

no code implementations9 Jan 2021 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

By initializing the two-branch neural network with the generatively learned model parameters of the JB model, we train the model parameters with the pairwise samples as a binary discrimination task.

Decision Making feature selection +1

Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

1 code implementation7 Jan 2021 Chiang-Jen Peng, Yun-Ju Chan, Cheng Yu, Syu-Siang Wang, Yu Tsao, Tai-Shih Chi

In this study, we propose an attention-based MTL (ATM) approach that integrates MTL and the attention-weighting mechanism to simultaneously realize a multi-model learning structure that performs speech enhancement (SE) and speaker identification (SI).

Multi-Task Learning Speaker Identification +1

Unsupervised neural adaptation model based on optimal transport for spoken language identification

no code implementations24 Dec 2020 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

By minimizing the classification loss on the training data set with the adaptation loss on both training and testing data sets, the statistical distribution difference between training and testing domains is reduced.

Language Identification Spoken language identification

Domain-adaptive Fall Detection Using Deep Adversarial Training

no code implementations20 Dec 2020 Kai-Chun Liu, Michael Can, Heng-Cheng Kuo, Chia-Yeh Hsieh, Hsiang-Yun Huang, Chia-Tai Chan, Yu Tsao

The proposed DAFD can transfer knowledge from the source domain to the target domain by minimizing the domain discrepancy to avoid mismatch problems.

BIG-bench Machine Learning Domain Adaptation +2

Speech Enhancement with Zero-Shot Model Selection

1 code implementation17 Dec 2020 Ryandhimas E. Zezario, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Experimental results confirmed that the proposed ZMOS approach can achieve better performance in both seen and unseen noise types compared to the baseline systems and other model selection systems, which indicates the effectiveness of the proposed approach in providing robust SE performance.

Ensemble Learning Model Selection +2

Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information

no code implementations15 Nov 2020 Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe, Yu Tsao

Experimental results from speech denoising, speech dereverberation, and impaired speech enhancement tasks confirmed that contextual BPC information improves SE performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

1 code implementation9 Nov 2020 Ryandhimas E. Zezario, Szu-Wei Fu, Chiou-Shann Fuh, Yu Tsao, Hsin-Min Wang

To overcome this limitation, we propose a deep learning-based non-intrusive speech intelligibility assessment model, namely STOI-Net.

A Study of Incorporating Articulatory Movement Information in Speech Enhancement

no code implementations3 Nov 2020 Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Xugang Lu, Yu Tsao

Although deep learning algorithms are widely used for improving speech enhancement (SE) performance, the performance remains limited under highly challenging conditions, such as unseen noise or noise signals having low signal-to-noise ratios (SNRs).

Speech Enhancement

Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement

1 code implementation28 Oct 2020 Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both related to a smooth transition in speech segments that may carry linguistic information, e. g. phones and syllables.

Speech Enhancement

The Academia Sinica Systems of Voice Conversion for VCC2020

no code implementations6 Oct 2020 Yu-Huai Peng, Cheng-Hung Hu, Alexander Kang, Hung-Shin Lee, Pin-Yuan Chen, Yu Tsao, Hsin-Min Wang

This paper describes the Academia Sinica systems for the two tasks of Voice Conversion Challenge 2020, namely voice conversion within the same language (Task 1) and cross-lingual voice conversion (Task 2).

Task 2 Voice Conversion

Improved Lite Audio-Visual Speech Enhancement

1 code implementation30 Aug 2020 Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao

Experimental results confirm that compared to conventional AVSE systems, iLAVSE can effectively overcome the aforementioned three practical issues and can improve enhancement performance.

Speech Enhancement

CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application

1 code implementation21 Aug 2020 Yu-Wen Chen, Kuo-Hsuan Hung, You-Jin Li, Alexander Chao-Fu Kang, Ya-Hsin Lai, Kai-Chun Liu, Szu-Wei Fu, Syu-Siang Wang, Yu Tsao

The CITISEN provides three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC), allowing CITISEN to be used as a platform for utilizing and evaluating SE models and flexibly extend the models to address various noise environments and users.

Acoustic Scene Classification Data Augmentation +2

Incorporating Broad Phonetic Information for Speech Enhancement

no code implementations13 Aug 2020 Yen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih-weih Hung, Yu Tsao

In noisy conditions, knowing speech contents facilitates listeners to more effectively suppress background noise components and to retrieve pure speech signals.

Denoising Speech Enhancement

Using Deep Learning and Explainable Artificial Intelligence in Patients' Choices of Hospital Levels

no code implementations24 Jun 2020 Lichin Chen, Yu Tsao, Ji-Tian Sheu

This study also used explainable artificial intelligence methods to interpret the contribution of features for the general public and individuals.

Explainable artificial intelligence Specificity

Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

no code implementations18 Jun 2020 Szu-Wei Fu, Chien-Feng Liao, Tsun-An Hsieh, Kuo-Hsuan Hung, Syu-Siang Wang, Cheng Yu, Heng-Cheng Kuo, Ryandhimas E. Zezario, You-Jin Li, Shang-Yi Chuang, Yen-Ju Lu, Yu Tsao

The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications.

Speech Enhancement

SERIL: Noise Adaptive Speech Enhancement using Regularization-based Incremental Learning

1 code implementation24 May 2020 Chi-Chang Lee, Yu-Chen Lin, Hsuan-Tien Lin, Hsin-Min Wang, Yu Tsao

The results verify that the SERIL model can effectively adjust itself to new noise environments while overcoming the catastrophic forgetting issue.

Incremental Learning Speech Enhancement

MIMO Speech Compression and Enhancement Based on Convolutional Denoising Autoencoder

no code implementations24 May 2020 You-Jin Li, Syu-Siang Wang, Yu Tsao, Borching Su

For speech-related applications in IoT environments, identifying effective methods to handle interference noises and compress the amount of data in transmissions is essential to achieve high-quality services.

Denoising

Lite Audio-Visual Speech Enhancement

1 code implementation24 May 2020 Shang-Yi Chuang, Yu Tsao, Chen-Chou Lo, Hsin-Min Wang

Previous studies have confirmed the effectiveness of incorporating visual information into speech enhancement (SE) systems.

Data Compression Denoising +1

WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech Enhancement

1 code implementation6 Apr 2020 Tsun-An Hsieh, Hsin-Min Wang, Xugang Lu, Yu Tsao

In WaveCRN, the speech locality feature is captured by a convolutional neural network (CNN), while the temporal sequential property of the locality feature is modeled by stacked simple recurrent units (SRU).

Denoising Speech Denoising +2

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

1 code implementation Interspeech 2020 Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi

The intelligibility of natural speech is seriously degraded when exposed to adverse noisy environments.

Audio and Speech Processing Sound

Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion

1 code implementation22 Jan 2020 Wen-Chin Huang, Hao Luo, Hsin-Te Hwang, Chen-Chou Lo, Yu-Huai Peng, Yu Tsao, Hsin-Min Wang

In this paper, we extend the CDVAE-VC framework by incorporating the concept of adversarial learning, in order to further increase the degree of disentanglement, thereby improving the quality and similarity of converted speech.

Disentanglement Voice Conversion

Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders

no code implementations6 Jan 2020 Cheng Yu, Ryandhimas E. Zezario, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao

The DSDT is built based on a prior knowledge of speech and noisy conditions (the speaker, environment, and signal factors are considered in this paper), where each component of the multi-branched encoder performs a particular mapping from noisy to clean speech along the branch in the DSDT.

Denoising Speech Enhancement

Cross-scale Attention Model for Acoustic Event Classification

no code implementations27 Dec 2019 Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai

However, a potential limitation of the network is that the discriminative features from the bottom layers (which can model the short-range dependency) are smoothed out in the final representation.

Classification General Classification

MITAS: A Compressed Time-Domain Audio Separation Network with Parameter Sharing

no code implementations9 Dec 2019 Chao-I Tuan, Yuan-Kuei Wu, Hung-Yi Lee, Yu Tsao

Our experimental results first confirmed the robustness of our MiTAS on two types of perturbations in mixed audio.

Speech Separation

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

no code implementations22 Nov 2019 Cheng Yu, Kuo-Hsuan Hung, Syu-Siang Wang, Szu-Wei Fu, Yu Tsao, Jeih-weih Hung

Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE).

Ensemble Learning Speech Enhancement

Improving the Intelligibility of Electric and Acoustic Stimulation Speech Using Fully Convolutional Networks Based Speech Enhancement

no code implementations26 Sep 2019 Natalie Yu-Hsien Wang, Hsiao-Lan Sharon Wang, Tao-Wei Wang, Szu-Wei Fu, Xugan Lu, Yu Tsao, Hsin-Min Wang

Recently, a time-domain speech enhancement algorithm based on the fully convolutional neural networks (FCN) with a short-time objective intelligibility (STOI)-based objective function (termed FCN(S) in short) has received increasing attention due to its simple structure and effectiveness of restoring clean speech signals from noisy counterparts.

Denoising Speech Enhancement +1 Sound Audio and Speech Processing

Increasing Compactness Of Deep Learning Based Speech Enhancement Models With Parameter Pruning And Quantization Techniques

no code implementations31 May 2019 Jyun-Yi Wu, Cheng Yu, Szu-Wei Fu, Chih-Ting Liu, Shao-Yi Chien, Yu Tsao

In addition, a parameter quantization (PQ) technique was applied to reduce the size of a neural network by representing weights with fewer cluster centroids.

Denoising Quantization +1

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement

5 code implementations13 May 2019 Szu-Wei Fu, Chien-Feng Liao, Yu Tsao, Shou-De Lin

Adversarial loss in a conditional generative adversarial network (GAN) is not designed to directly optimize evaluation metrics of a target task, and thus, may not always guide the generator in a GAN to generate data with improved metric scores.

Generative Adversarial Network Speech Enhancement

Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

1 code implementation6 May 2019 Szu-Wei Fu, Chien-Feng Liao, Yu Tsao

Utilizing a human-perception-related objective function to train a speech enhancement model has become a popular topic recently.

Speech Enhancement

Incorporating Symbolic Sequential Modeling for Speech Enhancement

no code implementations30 Apr 2019 Chien-Feng Liao, Yu Tsao, Xugang Lu, Hisashi Kawai

In this study, the symbolic sequences for acoustic signals are obtained as discrete representations with a Vector Quantized Variational Autoencoder algorithm.

Language Modelling Speech Enhancement

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

6 code implementations17 Apr 2019 Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech.

Voice Conversion

Boundary-Preserved Deep Denoising of the Stochastic Resonance Enhanced Multiphoton Images

no code implementations12 Apr 2019 Sheng-Yong Niu, Lun-Zhang Guo, Yue Li, Tzung-Dau Wang, Yu Tsao, Tzu-Ming Liu

As the rapid growth of high-speed and deep-tissue imaging in biomedical research, it is urgent to find a robust and effective denoising method to retain morphological features for further texture analysis and segmentation.

Denoising Texture Classification

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion

no code implementations27 Nov 2018 Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Conventional WaveNet vocoders are trained with natural acoustic features but conditioned on the converted features in the conversion stage for VC, and such a mismatch often causes significant quality and similarity degradation.

Voice Conversion

Robustness against the channel effect in pathological voice detection

no code implementations26 Nov 2018 Yi-Te Hsu, Zining Zhu, Chi-Te Wang, Shih-Hau Fang, Frank Rudzicz, Yu Tsao

In this study, we propose a detection system for pathological voice, which is robust against the channel effect.

Unsupervised Domain Adaptation

Speech Enhancement Based on Reducing the Detail Portion of Speech Spectrograms in Modulation Domain via Discrete Wavelet Transform

1 code implementation8 Nov 2018 Shih-kuang Lee, Syu-Siang Wang, Yu Tsao, Jeih-weih Hung

The presented DWT-based SE method with various scaling factors for the detail part is evaluated with a subset of Aurora-2 database, and the PESQ metric is used to indicate the quality of processed speech signals.

Speech Enhancement

Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech

2 code implementations30 Oct 2018 Li-Wei Chen, Hung-Yi Lee, Yu Tsao

This paper focuses on using voice conversion (VC) to improve the speech intelligibility of surgical patients who have had parts of their articulators removed.

Speech Recognition Voice Conversion

Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders

1 code implementation29 Aug 2018 Wen-Chin Huang, Hsin-Te Hwang, Yu-Huai Peng, Yu Tsao, Hsin-Min Wang

An effective approach to non-parallel voice conversion (VC) is to utilize deep neural networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure of speech in an unsupervised manner.

Voice Conversion

A study on speech enhancement using exponent-only floating point quantized neural network (EOFP-QNN)

no code implementations17 Aug 2018 Yi-Te Hsu, Yu-Chen Lin, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo

We evaluated the proposed EOFP quantization technique on two types of neural networks, namely, bidirectional long short-term memory (BLSTM) and fully convolutional neural network (FCN), on a speech enhancement task.

Quantization regression +1

Noise Adaptive Speech Enhancement using Domain Adversarial Training

1 code implementation19 Jul 2018 Chien-Feng Liao, Yu Tsao, Hung-Yi Lee, Hsin-Min Wang

The proposed noise adaptive SE system contains an encoder-decoder-based enhancement model and a domain discriminator model.

Sound Audio and Speech Processing

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

no code implementations12 Sep 2017 Szu-Wei Fu, Tao-Wei Wang, Yu Tsao, Xugang Lu, Hisashi Kawai

For example, in measuring speech intelligibility, most of the evaluation metric is based on a short-time objective intelligibility (STOI) measure, while the frame based minimum mean square error (MMSE) between estimated and clean speech is widely used in optimizing the model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

no code implementations1 Sep 2017 Jen-Cheng Hou, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao, Hsiu-Wen Chang, Hsin-Min Wang

Precisely speaking, the proposed AVDCNN model is structured as an audio-visual encoder-decoder network, in which audio and visual data are first processed using individual CNNs, and then fused into a joint network to generate enhanced speech (the primary task) and reconstructed images (the secondary task) at the output layer.

Multi-Task Learning Speech Enhancement

Complex spectrogram enhancement by convolutional neural network with multi-metrics learning

no code implementations27 Apr 2017 Szu-Wei Fu, Ting-yao Hu, Yu Tsao, Xugang Lu

This paper aims to address two issues existing in the current speech enhancement methods: 1) the difficulty of phase estimations; 2) a single objective function cannot consider multiple metrics simultaneously.

Speech Enhancement

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

no code implementations30 Mar 2017 Jen-Cheng Hou, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao, Hsiu-Wen Chang, Hsin-Min Wang

Precisely speaking, the proposed AVDCNN model is structured as an audio-visual encoder-decoder network, in which audio and visual data are first processed using individual CNNs, and then fused into a joint network to generate enhanced speech (the primary task) and reconstructed images (the secondary task) at the output layer.

Multi-Task Learning Speech Enhancement

Raw Waveform-based Speech Enhancement by Fully Convolutional Networks

no code implementations7 Mar 2017 Szu-Wei Fu, Yu Tsao, Xugang Lu, Hisashi Kawai

Because the fully connected layers, which are involved in deep neural networks (DNN) and convolutional neural networks (CNN), may not accurately characterize the local information of speech signals, particularly with high frequency components, we employed fully convolutional layers to model the waveform.

Denoising Speech Enhancement

Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network

no code implementations13 Oct 2016 Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

In this paper, we propose a dictionary update method for Nonnegative Matrix Factorization (NMF) with high dimensional data in a spectral conversion (SC) task.

Speech Enhancement Speech Synthesis +1

Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

4 code implementations13 Oct 2016 Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora.

Voice Conversion

Cannot find the paper you are looking for? You can Submit a new open access paper.