Search Results for author: Rong Gong

Found 7 papers, 5 papers with code

Spatial Processing Front-End For Distant ASR Exploiting Self-Attention Channel Combinator

no code implementations • 25 Mar 2022 • Dushyant Sharma, Rong Gong, James Fosburgh, Stanislav Yu. Kruchinin, Patrick A. Naylor, Ljubomir Milanovic

We present a novel multi-channel front-end based on channel shortening with theWeighted Prediction Error (WPE) method followed by a fixed MVDR beamformer used in combination with a recently proposed self-attention-based channel combination (SACC) scheme, for tackling the distant ASR problem.

Paper
Add Code

Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition

no code implementations • 10 Sep 2021 • Rong Gong, Carl Quillen, Dushyant Sharma, Andrew Goderre, José Laínez, Ljubomir Milanović

When a sufficiently large far-field training data is presented, jointly optimizing a multichannel frontend and an end-to-end (E2E) Automatic Speech Recognition (ASR) backend shows promising results.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Simple Fusion of Deep and Shallow Learning for Acoustic Scene Classification

2 code implementations • 19 Jun 2018 • Eduardo Fonseca, Rong Gong, Xavier Serra

In this paper, we propose a system that consists of a simple fusion of two methods of the aforementioned types: a deep learning approach where log-scaled mel-spectrograms are input to a convolutional neural network, and a feature engineering approach, where a collection of hand-crafted features is input to a gradient boosting machine.

Acoustic Scene Classification Classification +3

Paper
Code

Towards an efficient deep learning model for musical onset detection

2 code implementations • 18 Jun 2018 • Rong Gong, Xavier Serra

We first review the state-of-the-art deep learning models for MOD, and identify their shortcomings and challenges: (i) the lack of hyper-parameter tuning details, (ii) the non-availability of code for training models on other datasets, and (iii) ignoring the network capability when comparing different architectures.

Transfer Learning

Paper
Code

Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions

3 code implementations • 5 Jun 2018 • Rong Gong, Xavier Serra

In the second step, the syllable and phoneme boundaries and labels are inferred hierarchically by using a duration-informed hidden Markov model (HMM).

Sound Information Retrieval Audio and Speech Processing

Paper
Code

Audio to score matching by combining phonetic and duration information

1 code implementation • 12 Jul 2017 • Rong Gong, Jordi Pons, Xavier Serra

We approach the singing phrase audio to score matching problem by using phonetic and duration information - with a focus on studying the jingju a cappella singing case.

Sound

Paper
Code

Timbre Analysis of Music Audio Signals with Convolutional Neural Networks

3 code implementations • 20 Mar 2017 • Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra

The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms.

Sound

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.