Search Results for author: Dan Oneata

Found 13 papers, 5 papers with code

Weakly-supervised deepfake localization in diffusion-generated images

1 code implementation • 8 Nov 2023 • Dragos Tantaru, Elisabeta Oneata, Dan Oneata

The remarkable generative capabilities of denoising diffusion models have raised new concerns regarding the authenticity of the images we see every day on the Internet.

DeepFake Detection Denoising +1

Paper
Code

Towards generalisable and calibrated synthetic speech detection with self-supervised representations

no code implementations • 11 Sep 2023 • Dan Oneata, Adriana Stan, Octavian Pascu, Elisabeta Oneata, Horia Cucu

Generalisation -- the ability of a model to perform well on unseen data -- is crucial for building reliable deep fake detectors.

Synthetic Speech Detection

Paper
Add Code

Visually grounded few-shot word learning in low-resource settings

no code implementations • 20 Jun 2023 • Leanne Nortje, Dan Oneata, Herman Kamper

We propose an approach that can work on natural word-image pairs but with less examples, i. e. fewer shots, and then illustrate how this approach can be applied for multimodal few-shot learning in a real low-resource language, Yor\`ub\'a.

Few-Shot Learning

Paper
Add Code

Multilingual Multimodal Learning with Machine Translated Text

1 code implementation • 24 Oct 2022 • Chen Qiu, Dan Oneata, Emanuele Bugliarello, Stella Frank, Desmond Elliott

We call this framework TD-MML: Translated Data for Multilingual Multimodal Learning, and it can be applied to any multimodal dataset and model.

Ranked #1 on Zero-Shot Cross-Lingual Text-to-Image Retrieval on WIT (IGLUE)

Zero-Shot Cross-Lingual Image-to-Text Retrieval Zero-Shot Cross-Lingual Text-to-Image Retrieval +3

Paper
Code

YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding

no code implementations • 10 Oct 2022 • Kayode Olaleye, Dan Oneata, Herman Kamper

We collect and release a new single-speaker dataset of audio captions for 6k Flickr images in Yor\`ub\'a -- a real low-resource language spoken in Nigeria.

Visual Grounding

Paper
Add Code

FlexLip: A Controllable Text-to-Lip System

no code implementations • 7 Jun 2022 • Dan Oneata, Beata Lorincz, Adriana Stan, Horia Cucu

This modularity enables the easy replacement of each of its components, while also ensuring the fast adaptation to new speaker identities by disentangling or projecting the input features.

Audio Generation Text-to-Video Generation +1

Paper
Add Code

Keyword localisation in untranscribed speech using visually grounded speech models

1 code implementation • 2 Feb 2022 • Kayode Olaleye, Dan Oneata, Herman Kamper

Masked-based localisation gives some of the best reported localisation scores from a VGS model, with an accuracy of 57% when the system knows that a keyword occurs in an utterance and need to predict its location.

Keyword Spotting TAG

Paper
Code

Speaker disentanglement in video-to-speech conversion

1 code implementation • 20 May 2021 • Dan Oneata, Adriana Stan, Horia Cucu

The task of video-to-speech aims to translate silent video of lip movement to its corresponding audio signal.

Disentanglement Speech Synthesis

Paper
Code

An evaluation of word-level confidence estimation for end-to-end automatic speech recognition

no code implementations • 14 Jan 2021 • Dan Oneata, Alexandru Caranica, Adriana Stan, Horia Cucu

In this paper we investigate confidence estimation for end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

The Quo Vadis submission at Traffic4cast 2019

1 code implementation • 27 Oct 2019 • Dan Oneata, Cosmin George Alexandru, Marius Stanescu, Octavian Pascu, Alexandru Magan, Adrian Postelnicu, Horia Cucu

We describe the submission of the Quo Vadis team to the Traffic4cast competition, which was organized as part of the NeurIPS 2019 series of challenges.

regression

Paper
Code

Kite: Automatic speech recognition for unmanned aerial vehicles

no code implementations • 2 Jul 2019 • Dan Oneata, Horia Cucu

This paper addresses the problem of building a speech recognition system attuned to the control of unmanned aerial vehicles (UAVs).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A robust and efficient video representation for action recognition

no code implementations • 21 Apr 2015 • Heng Wang, Dan Oneata, Jakob Verbeek, Cordelia Schmid

We also use the homography to cancel out camera motion from the optical flow.

Action Recognition Homography Estimation +4

Paper
Add Code

Efficient Action Localization with Approximately Normalized Fisher Vectors

no code implementations • CVPR 2014 • Dan Oneata, Jakob Verbeek, Cordelia Schmid

Transformation of the FV by power and L2 normalizations has shown to significantly improve its performance, and led to state-of-the-art results for a range of image and video classification and retrieval tasks.

Action Recognition General Classification +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.