Search Results for author: Egor Lakomkin

Found 14 papers, 1 papers with code

AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs

no code implementations • 12 Nov 2023 • Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Ke Li, Junteng Jia, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

In this work, we extend the instruction-tuned Llama-2 model with end-to-end general-purpose speech processing and reasoning abilities while maintaining the wide range of original LLM capabilities, without using any carefully curated paired data.

Question Answering

Paper
Add Code

End-to-End Speech Recognition Contextualization with Large Language Models

no code implementations • 19 Sep 2023 • Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen

Overall, we demonstrate that by only adding a handful number of trainable parameters via adapters, we can unlock contextualized speech recognition capability for the pretrained LLM while keeping the same text-only input functionality.

Decoder Language Modelling +2

Paper
Add Code

Prompting Large Language Models with Speech Recognition Abilities

no code implementations • 21 Jul 2023 • Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

Furthermore, we perform ablation studies to investigate whether the LLM can be completely frozen during training to maintain its original capabilities, scaling up the audio encoder, and increasing the audio encoder striding to generate fewer embeddings.

Abstractive Text Summarization Automatic Speech Recognition +3

Paper
Add Code

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

no code implementations • CVPR 2023 • Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen

Furthermore, when combined with large-scale pseudo-labeled audio-visual data SynthVSR yields a new state-of-the-art VSR WER of 16. 9% using publicly available data only, surpassing the recent state-of-the-art approaches trained with 29 times more non-public machine-transcribed video data (90, 000 hours).

Lip Reading speech-recognition +1

Paper
Add Code

Egocentric Audio-Visual Noise Suppression

no code implementations • 7 Nov 2022 • Roshan Sharma, Weipeng He, Ju Lin, Egor Lakomkin, Yang Liu, Kaustubh Kalgaonkar

In this paper, we first demonstrate that egocentric visual information is helpful for noise suppression.

Action Classification Event Detection +3

Paper
Add Code

KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos

1 code implementation • EMNLP 2018 • Egor Lakomkin, Sven Magg, Cornelius Weber, Stefan Wermter

In this paper, we describe KT-Speech-Crawler: an approach for automatic dataset construction for speech recognition by crawling YouTube videos.

speech-recognition Speech Recognition

148

Paper
Code

Incorporating End-to-End Speech Recognition Models for Sentiment Analysis

no code implementations • 28 Feb 2019 • Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter

We argue that using ground-truth transcriptions during training and evaluation phases leads to a significant discrepancy in performance compared to real-world conditions, as the spoken text has to be recognized on the fly and can contain speech recognition mistakes.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks

no code implementations • 6 Apr 2018 • Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter

Speech emotion recognition (SER) is an important aspect of effective human-robot collaboration and received a lot of attention from the research community.

Data Augmentation Speech Emotion Recognition

Paper
Add Code

EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning

no code implementations • 3 Apr 2018 • Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter

Acoustically expressed emotions can make communication with a robot more efficient.

Classification Emotion Classification +3

Paper
Add Code

GradAscent at EmoInt-2017: Character- and Word-Level Recurrent Neural Network Models for Tweet Emotion Intensity Detection

no code implementations • 30 Mar 2018 • Egor Lakomkin, Chandrakant Bothe, Stefan Wermter

Given the text of a tweet and its emotion category (anger, joy, fear, and sadness), the participants were asked to build a system that assigns emotion intensity values.

Paper
Add Code

Reusing Neural Speech Representations for Auditory Emotion Recognition

no code implementations • IJCNLP 2017 • Egor Lakomkin, Cornelius Weber, Sven Magg, Stefan Wermter

Acoustic emotion recognition aims to categorize the affective state of the speaker and is still a difficult task for machine learning models.

Emotion Recognition General Classification +1

Paper
Add Code

Automatically augmenting an emotion dataset improves classification using audio

no code implementations • EACL 2017 • Egor Lakomkin, Cornelius Weber, Stefan Wermter

In this work, we tackle a problem of speech emotion classification.

Classification Emotion Classification +2

Paper
Add Code

The OMG-Emotion Behavior Dataset

no code implementations • 14 Mar 2018 • Pablo Barros, Nikhil Churamani, Egor Lakomkin, Henrique Siqueira, Alexander Sutherland, Stefan Wermter

This paper is the basis paper for the accepted IJCNN challenge One-Minute Gradual-Emotion Recognition (OMG-Emotion) by which we hope to foster long-emotion classification using neural models for the benefit of the IJCNN community.

Human-Computer Interaction

Paper
Add Code

GradAscent at EmoInt-2017: Character and Word Level Recurrent Neural Network Models for Tweet Emotion Intensity Detection

no code implementations • WS 2017 • Egor Lakomkin, Ch Bothe, rakant, Stefan Wermter

Given the text of a tweet and its emotion category (anger, joy, fear, and sadness), the participants were asked to build a system that assigns emotion intensity values.

Language Modelling Machine Translation +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.