Speech Recognition

1089 papers with code • 316 benchmarks • 87 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Libraries

Use these libraries to find Speech Recognition models and implementations
16 papers
7,864
13 papers
44
11 papers
29,201
See all 16 libraries.

Latest papers with no code

Resilience of Large Language Models for Noisy Instructions

no code yet • 15 Apr 2024

As the rapidly advancing domain of natural language processing (NLP), large language models (LLMs) have emerged as powerful tools for interpreting human commands and generating text across various tasks.

Anatomy of Industrial Scale Multilingual ASR

no code yet • 15 Apr 2024

This paper describes AssemblyAI's industrial-scale automatic speech recognition (ASR) system, designed to meet the requirements of large-scale, multilingual ASR serving various application needs.

Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task

no code yet • 12 Apr 2024

Intention-based Human-Robot Interaction (HRI) systems allow robots to perceive and interpret user actions to proactively interact with humans and adapt to their behavior.

ASR advancements for indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa'ikhana

no code yet • 12 Apr 2024

Indigenous languages are a fundamental legacy in the development of human communication, embodying the unique identity and culture of local communities of America.

An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

no code yet • 11 Apr 2024

Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech.

Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

no code yet • 10 Apr 2024

This paper presents Conformer-1, an end-to-end Automatic Speech Recognition (ASR) model trained on an extensive dataset of 570k hours of speech audio data, 91% of which was acquired from publicly available sources.

An inclusive review on deep learning techniques and their scope in handwriting recognition

no code yet • 10 Apr 2024

This paper presents a survey on the existing studies of deep learning in handwriting recognition field.

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

no code yet • 9 Apr 2024

Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS).

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

no code yet • 4 Apr 2024

This paper proposes Transducers with Pronunciation-aware Embeddings (PET).

Mai Ho'omāuna i ka 'Ai: Language Models Improve Automatic Speech Recognition in Hawaiian

no code yet • 3 Apr 2024

To do this, we train an external language model (LM) on ~1. 5M words of Hawaiian text.