Speech Recognition

1094 papers with code • 234 benchmarks • 87 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Libraries

Use these libraries to find Speech Recognition models and implementations
16 papers
7,898
13 papers
44
11 papers
29,292
See all 16 libraries.

Latest papers with no code

Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance

no code yet • 23 Apr 2024

To this end, we propose a novel analysis scheme based on the orthogonal projection-based decomposition of SE errors.

Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks

no code yet • 22 Apr 2024

Understanding cognitive processes in the brain demands sophisticated models capable of replicating neural dynamics at large scales.

Learn2Talk: 3D Talking Face Learns from 2D Talking Face

no code yet • 19 Apr 2024

Speech-driven facial animation methods usually contain two main classes, 3D and 2D talking face, both of which attract considerable research attention in recent years.

Efficient infusion of self-supervised representations in Automatic Speech Recognition

no code yet • 19 Apr 2024

Self-supervised learned (SSL) models such as Wav2vec and HuBERT yield state-of-the-art results on speech-related tasks.

Artificial Neural Networks to Recognize Speakers Division from Continuous Bengali Speech

no code yet • 18 Apr 2024

In this paper, we presented a method that will provide a speakers geographical identity in a certain region using continuous Bengali speech.

Resilience of Large Language Models for Noisy Instructions

no code yet • 15 Apr 2024

As the rapidly advancing domain of natural language processing (NLP), large language models (LLMs) have emerged as powerful tools for interpreting human commands and generating text across various tasks.

Anatomy of Industrial Scale Multilingual ASR

no code yet • 15 Apr 2024

This paper describes AssemblyAI's industrial-scale automatic speech recognition (ASR) system, designed to meet the requirements of large-scale, multilingual ASR serving various application needs.

Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task

no code yet • 12 Apr 2024

Intention-based Human-Robot Interaction (HRI) systems allow robots to perceive and interpret user actions to proactively interact with humans and adapt to their behavior.

ASR advancements for indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa'ikhana

no code yet • 12 Apr 2024

Indigenous languages are a fundamental legacy in the development of human communication, embodying the unique identity and culture of local communities of America.

An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

no code yet • 11 Apr 2024

Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech.