no code implementations • 22 Apr 2024 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names.
no code implementations • 16 Jan 2024 • Mathis Petrovich, Or Litany, Umar Iqbal, Michael J. Black, Gül Varol, Xue Bin Peng, Davis Rempe
To generate composite animations from a multi-track timeline, we propose a new test-time denoising method.
no code implementations • 10 Oct 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.
1 code implementation • 28 Aug 2023 • Lucas Ventura, Antoine Yang, Cordelia Schmid, Gül Varol
Most CoIR approaches require manually annotated datasets, comprising image-text-image triplets, where the text describes a modification from the query image to the target image.
Ranked #1 on Composed Video Retrieval (CoVR) on WebVid-CoVR
no code implementations • ICCV 2023 • Mathis Petrovich, Michael J. Black, Gül Varol
We show that maintaining the motion generation loss, along with the contrastive training, is crucial to obtain good performance.
no code implementations • ICCV 2023 • Nikos Athanasiou, Mathis Petrovich, Michael J. Black, Gül Varol
Motivated by the observation that the correspondence between actions and body parts is encoded in powerful language models, we extract this knowledge by prompting GPT-3 with text such as "what are the body parts involved in the action <action name>?
1 code implementation • ICCV 2023 • Paola Cascante-Bonilla, Khaled Shehada, James Seale Smith, Sivan Doveh, Donghyun Kim, Rameswar Panda, Gül Varol, Aude Oliva, Vicente Ordonez, Rogerio Feris, Leonid Karlinsky
We contribute Synthetic Visual Concepts (SyViC) - a million-scale synthetic dataset and data generation codebase allowing to generate additional suitable data to improve VLC understanding and compositional reasoning of VL models.
Ranked #68 on Visual Reasoning on Winoground
1 code implementation • CVPR 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form.
1 code implementation • 16 Nov 2022 • K R Prajwal, Hannah Bull, Liliane Momeni, Samuel Albanie, Gül Varol, Andrew Zisserman
Through extensive evaluations, we verify our method for automatic annotation and our model architecture.
1 code implementation • 9 Sep 2022 • Nikos Athanasiou, Mathis Petrovich, Michael J. Black, Gül Varol
In particular, our goal is to enable the synthesis of a series of actions, which we refer to as temporal action composition.
no code implementations • 4 Aug 2022 • Liliane Momeni, Hannah Bull, K R Prajwal, Samuel Albanie, Gül Varol, Andrew Zisserman
Recently, sign language researchers have turned to sign language interpreted TV broadcasts, comprising (i) a video of continuous signing and (ii) subtitles corresponding to the audio content, as a readily available and large-scale source of training data.
1 code implementation • 17 May 2022 • Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
Our goal in this paper is the adaptation of image-text models for long video retrieval.
Ranked #4 on Zero-Shot Action Recognition on Charades
no code implementations • 9 May 2022 • Gül Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman
The focus of this work is $\textit{sign spotting}$ - given a video of an isolated sign, our task is to identify $\textit{whether}$ and $\textit{where}$ it has been signed in a continuous, co-articulated sign language video.
1 code implementation • 25 Apr 2022 • Mathis Petrovich, Michael J. Black, Gül Varol
In contrast to most previous work which focuses on generating a single, deterministic, motion from a textual description, we design a variational approach that can produce multiple diverse human motions.
Ranked #7 on Motion Synthesis on InterHuman
no code implementations • CVPR 2022 • Amanda Duarte, Samuel Albanie, Xavier Giró-i-Nieto, Gül Varol
Systems that can efficiently search collections of sign language videos have been highlighted as a useful application of sign language technology.
no code implementations • 5 Nov 2021 • Samuel Albanie, Gül Varol, Liliane Momeni, Hannah Bull, Triantafyllos Afouras, Himel Chowdhury, Neil Fox, Bencie Woll, Rob Cooper, Andrew McParland, Andrew Zisserman
In this work, we introduce the BBC-Oxford British Sign Language (BOBSL) dataset, a large-scale video collection of British Sign Language (BSL).
1 code implementation • 16 Aug 2021 • Yana Hasson, Gül Varol, Ivan Laptev, Cordelia Schmid
Our work aims to obtain 3D reconstruction of hands and manipulated objects from monocular videos.
Ranked #5 on hand-object pose on HO-3D
no code implementations • ICCV 2021 • Hannah Bull, Triantafyllos Afouras, Gül Varol, Samuel Albanie, Liliane Momeni, Andrew Zisserman
The goal of this work is to temporally align asynchronous subtitles in sign language videos.
1 code implementation • 28 Apr 2021 • Katrin Renz, Nicolaj C. Stache, Neil Fox, Gül Varol, Samuel Albanie
The objective of this work is to find temporal boundaries between signs in continuous sign language.
2 code implementations • ICCV 2021 • Mathis Petrovich, Michael J. Black, Gül Varol
By sampling from this latent space and querying a certain duration through a series of positional encodings, we synthesize variable-length motion sequences conditioned on a categorical action.
5 code implementations • ICCV 2021 • Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
Our objective in this work is video-text retrieval - in particular a joint embedding that enables efficient text-to-video retrieval.
Ranked #4 on Video Retrieval on QuerYD (using extra training data)
no code implementations • CVPR 2021 • Gül Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman
Our contributions are as follows: (1) we demonstrate the ability to leverage large quantities of continuous signing videos with weakly-aligned subtitles to localise signs in continuous sign language; (2) we employ the learned attention to automatically generate hundreds of thousands of annotations for a large sign vocabulary; (3) we collect a set of 37K manually verified sign instances across a vocabulary of 950 sign classes to support our study of sign language recognition; (4) by training on the newly annotated data from our method, we outperform the prior state of the art on the BSL-1K sign language recognition benchmark.
1 code implementation • 25 Nov 2020 • Katrin Renz, Nicolaj C. Stache, Samuel Albanie, Gül Varol
The objective of this work is to determine the location of temporal boundaries between signs in continuous sign language videos.
1 code implementation • 8 Oct 2020 • Liliane Momeni, Gül Varol, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman
The focus of this work is sign spotting - given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video.
1 code implementation • ECCV 2020 • Samuel Albanie, Gül Varol, Liliane Momeni, Triantafyllos Afouras, Joon Son Chung, Neil Fox, Andrew Zisserman
Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality.
Ranked #4 on Sign Language Recognition on WLASL-2000
1 code implementation • 9 Dec 2019 • Gül Varol, Ivan Laptev, Cordelia Schmid, Andrew Zisserman
Although synthetic training data has been shown to be beneficial for tasks such as human pose estimation, its use for RGB human action recognition is relatively unexplored.
3 code implementations • CVPR 2019 • Yana Hasson, Gül Varol, Dimitrios Tzionas, Igor Kalevatykh, Michael J. Black, Ivan Laptev, Cordelia Schmid
Previous work has made significant progress towards reconstruction of hand poses and object shapes in isolation.
Ranked #7 on hand-object pose on DexYCB
2 code implementations • ECCV 2018 • Gül Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, Cordelia Schmid
Human shape estimation is an important task for video editing, animation and fashion industry.
Ranked #3 on 3D Human Pose Estimation on Surreal (using extra training data)
2 code implementations • CVPR 2017 • Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev, Cordelia Schmid
In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data.
1 code implementation • 15 Apr 2016 • Gül Varol, Ivan Laptev, Cordelia Schmid
Typical human actions last several seconds and exhibit characteristic spatio-temporal structure.
Ranked #63 on Action Recognition on HMDB-51
no code implementations • 6 Apr 2016 • Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, Abhinav Gupta
Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects.