Speech Recognition

1096 papers with code • 234 benchmarks • 87 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Libraries

Use these libraries to find Speech Recognition models and implementations
16 papers
7,912
13 papers
45
11 papers
29,318
See all 16 libraries.

How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena

hlt-mt/fbk-fairseq 20 Feb 2024

The attention mechanism, a cornerstone of state-of-the-art neural models, faces computational hurdles in processing long sequences due to its quadratic complexity.

28
20 Feb 2024

Careless Whisper: Speech-to-Text Hallucination Harms

koenecke/hallucination_harms 12 Feb 2024

We then study why hallucinations occur by observing the disparities in hallucination rates between speakers with aphasia (who have a lowered ability to express themselves using speech and voice) and a control group.

0
12 Feb 2024

DeepCover: Advancing RNN Test Coverage and Online Error Prediction using State Machine Extraction

pouriagr/deep-cover 10 Feb 2024

The proposed methodology along with its assessment metrics contribute to increasing explainability in RNN models by providing a clear representation of their internal decision making process through the extracted SM.

1
10 Feb 2024

Streaming Sequence Transduction through Dynamic Compression

steventan0110/star 2 Feb 2024

We introduce STAR (Stream Transduction with Anchor Representations), a novel Transformer-based model designed for efficient sequence-to-sequence transduction over streams.

1
02 Feb 2024

On Speaker Attribution with SURT

k2-fsa/icefall 28 Jan 2024

The Streaming Unmixing and Recognition Transducer (SURT) has recently become a popular framework for continuous, streaming, multi-talker speech recognition (ASR).

785
28 Jan 2024

Towards Event Extraction from Speech with Contextual Clues

jodie-kang/speechee 27 Jan 2024

While text-based event extraction has been an active research area and has seen successful application in many domains, extracting semantic events from speech directly is an under-explored problem.

1
27 Jan 2024

TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion

spkgyk/TDFNet 25 Jan 2024

TDANet serves as the architectural foundation for the auditory and visual networks within TDFNet, offering an efficient model with fewer parameters.

4
25 Jan 2024

Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free Metric

aixplain/NoRefER 20 Jan 2024

The findings suggest that NoRefER is not merely a tool for error detection but also a comprehensive framework for enhancing ASR systems' transparency, efficiency, and effectiveness.

12
20 Jan 2024

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

yuchen005/robustger 19 Jan 2024

To this end, we propose to extract a language-space noise embedding from the N-best list to represent the noise conditions of source speech, which can promote the denoising process in GER.

101
19 Jan 2024

Cascaded Cross-Modal Transformer for Audio-Textual Classification

ristea/ccmt 15 Jan 2024

Subsequently, we combine language-specific Bidirectional Encoder Representations from Transformers (BERT) with Wav2Vec2. 0 audio features via a novel cascaded cross-modal transformer (CCMT).

1
15 Jan 2024