Search Results for author: Vladimir Iashin

Found 7 papers, 6 papers with code

Synchformer: Efficient Synchronization from Sparse Cues

2 code implementations • 29 Jan 2024 • Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse.

Audio-Visual Synchronization

Paper
Code

Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors

2 code implementations • 13 Oct 2022 • Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

This contrasts with the case of synchronising videos of talking heads, where audio-visual correspondence is dense in both time and space.

Audio-Visual Synchronization

Paper
Code

Taming Visually Guided Sound Generation

3 code implementations • 17 Oct 2021 • Vladimir Iashin, Esa Rahtu

In this work, we propose a single model capable of generating visually relevant, high-fidelity sounds prompted with a set of frames from open-domain videos in less time than it takes to play it on a single GPU.

Audio Generation

434

Paper
Code

The CORSMAL benchmark for the prediction of the properties of containers

no code implementations • 27 Jul 2021 • Alessio Xompero, Santiago Donaher, Vladimir Iashin, Francesca Palermo, Gökhan Solak, Claudio Coppola, Reina Ishikawa, Yuichi Nagao, Ryo Hachiuma, Qi Liu, Fan Feng, Chuanlin Lan, Rosa H. M. Chan, Guilherme Christmann, Jyun-Ting Song, Gonuguntla Neeharika, Chinnakotla Krishna Teja Reddy, Dinesh Jain, Bakhtawar Ur Rehman, Andrea Cavallaro

In this paper, we present a range of methods and an open framework to benchmark acoustic and visual perception for the estimation of the capacity of a container, and the type, mass, and amount of its content.

Paper
Add Code

Top-1 CORSMAL Challenge 2020 Submission: Filling Mass Estimation Using Multi-modal Observations of Human-robot Handovers

1 code implementation • 2 Dec 2020 • Vladimir Iashin, Francesca Palermo, Gökhan Solak, Claudio Coppola

CORSMAL 2020 Challenge focuses on the perception part of this problem: the robot needs to estimate the filling mass of a container held by a human.

Ranked #1 on Filling Level Estimation on CORSMAL 2020 (Public Test)

Capacity Estimation Filling Level Estimation +2

Paper
Code

A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer

2 code implementations • 17 May 2020 • Vladimir Iashin, Esa Rahtu

We show the effectiveness of the proposed model with audio and visual modalities on the dense video captioning task, yet the module is capable of digesting any two modalities in a sequence-to-sequence task.

Ranked #1 on Temporal Action Proposal Generation on ActivityNet Captions

Dense Video Captioning Temporal Action Proposal Generation

434

Paper
Code

Multi-modal Dense Video Captioning

4 code implementations • 17 Mar 2020 • Vladimir Iashin, Esa Rahtu

We apply automatic speech recognition (ASR) system to obtain a temporally aligned textual description of the speech (similar to subtitles) and treat it as a separate input alongside video frames and the corresponding audio track.

Ranked #11 on Dense Video Captioning on ActivityNet Captions

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

434

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.