Speech Recognition
1078 papers with code • 314 benchmarks • 86 datasets
Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.
( Image credit: SpecAugment )
Libraries
Use these libraries to find Speech Recognition models and implementationsDatasets
Subtasks
Latest papers with no code
Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning
In this paper, we present the development of an isometric NMT system using Reinforcement Learning (RL), with a focus on optimizing the alignment of phoneme counts in the source and target language sentence pairs.
AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition
In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions.
Advanced Artificial Intelligence Algorithms in Cochlear Implants: Review of Healthcare Strategies, Challenges, and Perspectives
Automatic speech recognition (ASR) plays a pivotal role in our daily lives, offering utility not only for interacting with machines but also for facilitating communication for individuals with either partial or profound hearing impairments.
Energy-Based Models with Applications to Speech and Language Processing
Therefore, the purpose of this monograph is to present a systematic introduction to energy-based models, including both algorithmic progress and applications in speech and language processing.
Initial Decoding with Minimally Augmented Language Model for Improved Lattice Rescoring in Low Resource ASR
Our approach is applicable for training speech recognition systems under low resource conditions where speech data and compute resources are insufficient, while there is a large text corpus that is available in the target language.
Neural Networks Hear You Loud And Clear: Hearing Loss Compensation Using Deep Neural Networks
In this study, we propose a DNN-based approach for hearing-loss compensation, which is trained on the outputs of hearing-impaired and normal-hearing DNN-based auditory models in response to speech signals.
More than words: Advancements and challenges in speech recognition for singing
This paper addresses the challenges and advancements in speech recognition for singing, a domain distinctly different from standard speech recognition.
Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Conformer-based attention models have become the de facto backbone model for Automatic Speech Recognition tasks.
Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures.
Beyond the Labels: Unveiling Text-Dependency in Paralinguistic Speech Recognition Datasets
This paper critically evaluates the prevalent assumption that machine learning models trained on such datasets genuinely learn to identify paralinguistic traits, rather than merely capturing lexical features.