Search Results for author: Rong Gong

Found 7 papers, 5 papers with code

Spatial Processing Front-End For Distant ASR Exploiting Self-Attention Channel Combinator

no code implementations25 Mar 2022 Dushyant Sharma, Rong Gong, James Fosburgh, Stanislav Yu. Kruchinin, Patrick A. Naylor, Ljubomir Milanovic

We present a novel multi-channel front-end based on channel shortening with theWeighted Prediction Error (WPE) method followed by a fixed MVDR beamformer used in combination with a recently proposed self-attention-based channel combination (SACC) scheme, for tackling the distant ASR problem.

Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition

no code implementations10 Sep 2021 Rong Gong, Carl Quillen, Dushyant Sharma, Andrew Goderre, José Laínez, Ljubomir Milanović

When a sufficiently large far-field training data is presented, jointly optimizing a multichannel frontend and an end-to-end (E2E) Automatic Speech Recognition (ASR) backend shows promising results.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A Simple Fusion of Deep and Shallow Learning for Acoustic Scene Classification

2 code implementations19 Jun 2018 Eduardo Fonseca, Rong Gong, Xavier Serra

In this paper, we propose a system that consists of a simple fusion of two methods of the aforementioned types: a deep learning approach where log-scaled mel-spectrograms are input to a convolutional neural network, and a feature engineering approach, where a collection of hand-crafted features is input to a gradient boosting machine.

Acoustic Scene Classification Classification +3

Towards an efficient deep learning model for musical onset detection

2 code implementations18 Jun 2018 Rong Gong, Xavier Serra

We first review the state-of-the-art deep learning models for MOD, and identify their shortcomings and challenges: (i) the lack of hyper-parameter tuning details, (ii) the non-availability of code for training models on other datasets, and (iii) ignoring the network capability when comparing different architectures.

Transfer Learning

Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions

3 code implementations5 Jun 2018 Rong Gong, Xavier Serra

In the second step, the syllable and phoneme boundaries and labels are inferred hierarchically by using a duration-informed hidden Markov model (HMM).

Sound Information Retrieval Audio and Speech Processing

Audio to score matching by combining phonetic and duration information

1 code implementation12 Jul 2017 Rong Gong, Jordi Pons, Xavier Serra

We approach the singing phrase audio to score matching problem by using phonetic and duration information - with a focus on studying the jingju a cappella singing case.

Sound

Timbre Analysis of Music Audio Signals with Convolutional Neural Networks

3 code implementations20 Mar 2017 Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra

The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms.

Sound

Cannot find the paper you are looking for? You can Submit a new open access paper.