Speech Recognition

1078 papers with code • 314 benchmarks • 86 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Libraries

Use these libraries to find Speech Recognition models and implementations
16 papers
7,770
13 papers
44
11 papers
29,027
See all 16 libraries.

Latest papers with no code

Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

no code yet • 20 Mar 2024

In this paper, we present the development of an isometric NMT system using Reinforcement Learning (RL), with a focus on optimizing the alignment of phoneme counts in the source and target language sentence pairs.

AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition

no code yet • 18 Mar 2024

In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions.

Advanced Artificial Intelligence Algorithms in Cochlear Implants: Review of Healthcare Strategies, Challenges, and Perspectives

no code yet • 17 Mar 2024

Automatic speech recognition (ASR) plays a pivotal role in our daily lives, offering utility not only for interacting with machines but also for facilitating communication for individuals with either partial or profound hearing impairments.

Energy-Based Models with Applications to Speech and Language Processing

no code yet • 16 Mar 2024

Therefore, the purpose of this monograph is to present a systematic introduction to energy-based models, including both algorithmic progress and applications in speech and language processing.

Initial Decoding with Minimally Augmented Language Model for Improved Lattice Rescoring in Low Resource ASR

no code yet • 16 Mar 2024

Our approach is applicable for training speech recognition systems under low resource conditions where speech data and compute resources are insufficient, while there is a large text corpus that is available in the target language.

Neural Networks Hear You Loud And Clear: Hearing Loss Compensation Using Deep Neural Networks

no code yet • 15 Mar 2024

In this study, we propose a DNN-based approach for hearing-loss compensation, which is trained on the outputs of hearing-impaired and normal-hearing DNN-based auditory models in response to speech signals.

More than words: Advancements and challenges in speech recognition for singing

no code yet • 14 Mar 2024

This paper addresses the challenges and advancements in speech recognition for singing, a domain distinctly different from standard speech recognition.

Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition

no code yet • 13 Mar 2024

Conformer-based attention models have become the de facto backbone model for Automatic Speech Recognition tasks.

Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children

no code yet • 13 Mar 2024

This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures.

Beyond the Labels: Unveiling Text-Dependency in Paralinguistic Speech Recognition Datasets

no code yet • 12 Mar 2024

This paper critically evaluates the prevalent assumption that machine learning models trained on such datasets genuinely learn to identify paralinguistic traits, rather than merely capturing lexical features.