Search Results for author: Vladimir Iashin

Found 7 papers, 6 papers with code

Synchformer: Efficient Synchronization from Sparse Cues

2 code implementations29 Jan 2024 Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse.

Audio-Visual Synchronization

Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors

2 code implementations13 Oct 2022 Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

This contrasts with the case of synchronising videos of talking heads, where audio-visual correspondence is dense in both time and space.

Audio-Visual Synchronization

Taming Visually Guided Sound Generation

3 code implementations17 Oct 2021 Vladimir Iashin, Esa Rahtu

In this work, we propose a single model capable of generating visually relevant, high-fidelity sounds prompted with a set of frames from open-domain videos in less time than it takes to play it on a single GPU.

Audio Generation

A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer

2 code implementations17 May 2020 Vladimir Iashin, Esa Rahtu

We show the effectiveness of the proposed model with audio and visual modalities on the dense video captioning task, yet the module is capable of digesting any two modalities in a sequence-to-sequence task.

Dense Video Captioning Temporal Action Proposal Generation

Multi-modal Dense Video Captioning

4 code implementations17 Mar 2020 Vladimir Iashin, Esa Rahtu

We apply automatic speech recognition (ASR) system to obtain a temporally aligned textual description of the speech (similar to subtitles) and treat it as a separate input alongside video frames and the corresponding audio track.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.