1 code implementation • 8 Nov 2023 • Dragos Tantaru, Elisabeta Oneata, Dan Oneata
The remarkable generative capabilities of denoising diffusion models have raised new concerns regarding the authenticity of the images we see every day on the Internet.
no code implementations • 11 Sep 2023 • Dan Oneata, Adriana Stan, Octavian Pascu, Elisabeta Oneata, Horia Cucu
Generalisation -- the ability of a model to perform well on unseen data -- is crucial for building reliable deep fake detectors.
no code implementations • 20 Jun 2023 • Leanne Nortje, Dan Oneata, Herman Kamper
We propose an approach that can work on natural word-image pairs but with less examples, i. e. fewer shots, and then illustrate how this approach can be applied for multimodal few-shot learning in a real low-resource language, Yor\`ub\'a.
1 code implementation • 24 Oct 2022 • Chen Qiu, Dan Oneata, Emanuele Bugliarello, Stella Frank, Desmond Elliott
We call this framework TD-MML: Translated Data for Multilingual Multimodal Learning, and it can be applied to any multimodal dataset and model.
Zero-Shot Cross-Lingual Image-to-Text Retrieval Zero-Shot Cross-Lingual Text-to-Image Retrieval +3
no code implementations • 10 Oct 2022 • Kayode Olaleye, Dan Oneata, Herman Kamper
We collect and release a new single-speaker dataset of audio captions for 6k Flickr images in Yor\`ub\'a -- a real low-resource language spoken in Nigeria.
no code implementations • 7 Jun 2022 • Dan Oneata, Beata Lorincz, Adriana Stan, Horia Cucu
This modularity enables the easy replacement of each of its components, while also ensuring the fast adaptation to new speaker identities by disentangling or projecting the input features.
1 code implementation • 2 Feb 2022 • Kayode Olaleye, Dan Oneata, Herman Kamper
Masked-based localisation gives some of the best reported localisation scores from a VGS model, with an accuracy of 57% when the system knows that a keyword occurs in an utterance and need to predict its location.
1 code implementation • 20 May 2021 • Dan Oneata, Adriana Stan, Horia Cucu
The task of video-to-speech aims to translate silent video of lip movement to its corresponding audio signal.
no code implementations • 14 Jan 2021 • Dan Oneata, Alexandru Caranica, Adriana Stan, Horia Cucu
In this paper we investigate confidence estimation for end-to-end automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 27 Oct 2019 • Dan Oneata, Cosmin George Alexandru, Marius Stanescu, Octavian Pascu, Alexandru Magan, Adrian Postelnicu, Horia Cucu
We describe the submission of the Quo Vadis team to the Traffic4cast competition, which was organized as part of the NeurIPS 2019 series of challenges.
no code implementations • 2 Jul 2019 • Dan Oneata, Horia Cucu
This paper addresses the problem of building a speech recognition system attuned to the control of unmanned aerial vehicles (UAVs).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 21 Apr 2015 • Heng Wang, Dan Oneata, Jakob Verbeek, Cordelia Schmid
We also use the homography to cancel out camera motion from the optical flow.
no code implementations • CVPR 2014 • Dan Oneata, Jakob Verbeek, Cordelia Schmid
Transformation of the FV by power and L2 normalizations has shown to significantly improve its performance, and led to state-of-the-art results for a range of image and video classification and retrieval tasks.