Browse SoTA > Speech > Speech Recognition

Speech Recognition

266 papers with code · Speech

Speech recognition is the task of recognising speech within audio and converting it into text.

( Image credit: SpecAugment )

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Latest papers with code

CoVoST 2: A Massively Multilingual Speech-to-Text Translation Corpus

20 Jul 2020facebookresearch/covost

Speech translation has recently become an increasingly popular topic of research, partly due to the development of benchmark datasets.

MACHINE TRANSLATION SPEECH RECOGNITION

125
20 Jul 2020

Automatic Lyrics Transcription using Dilated Convolutional Neural Networks with Self-Attention

13 Jul 2020emirdemirel/AutomaticLyricsTranscription-with-Self-Attention

Speech recognition is a well developed research field so that the current state of the art systems are being used in many applications in the software industry, yet as by today, there still does not exist such robust system for the recognition of words and sentences from singing voice.

SPEECH RECOGNITION

0
13 Jul 2020

TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech

12 Jul 2020andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning

In our experiments, we show that through alteration along different dimensions, the model learns to encode distinct aspects of speech.

SELF-SUPERVISED LEARNING SPEAKER RECOGNITION SPEECH RECOGNITION TRANSFER LEARNING

149
12 Jul 2020

Fast Transformers with Clustered Attention

9 Jul 2020idiap/fast-transformers

This results in a model with linear complexity with respect to the sequence length for a fixed number of clusters.

SPEECH RECOGNITION

370
09 Jul 2020

Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory

29 Jun 2020AntonioCarta/mslmn

The effectiveness of recurrent neural networks can be largely influenced by their ability to store into their dynamical memory information extracted from input sequences at different frequencies and timescales.

SPEECH RECOGNITION

0
29 Jun 2020

GIPFA: Generating IPA Pronunciation from Audio

13 Jun 2020marxav/gipfa

Transcribing spoken audio samples into International Phonetic Alphabet (IPA) has long been reserved for experts.

SPEECH RECOGNITION

1
13 Jun 2020

audino: A Modern Annotation Tool for Audio and Speech

9 Jun 2020midas-research/audino

The tool allows audio data to be uploaded and assigned to a user through a key-based API.

ACTION DETECTION ACTIVITY DETECTION EMOTION RECOGNITION SPEAKER IDENTIFICATION SPEECH RECOGNITION

654
09 Jun 2020

Learning to Count Words in Fluent Speech enables Online Speech Recognition

8 Jun 2020georgesterpu/Taris

Sequence to Sequence models, in particular the Transformer, achieve state of the art results in Automatic Speech Recognition.

SPEECH RECOGNITION

4
08 Jun 2020

Improved acoustic word embeddings for zero-resource languages using multilingual transfer

2 Jun 2020kamperh/globalphone_awe

We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs.

SPEECH RECOGNITION WORD EMBEDDINGS

7
02 Jun 2020

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition

28 May 2020cywang97/StreamingTransformer

We show that both streaming RNN-T and transformer-AED models can obtain better accuracy than a highly-optimized hybrid model.

SPEECH RECOGNITION

71
28 May 2020