Search Results for author: Gül Varol

Found 31 papers, 19 papers with code

AutoAD III: The Prequel -- Back to the Pixels

no code implementations • 22 Apr 2024 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names.

Paper
Add Code

Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation

no code implementations • 16 Jan 2024 • Mathis Petrovich, Or Litany, Umar Iqbal, Michael J. Black, Gül Varol, Xue Bin Peng, Davis Rempe

To generate composite animations from a multi-track timeline, we propose a new test-time denoising method.

Denoising Motion Synthesis

Paper
Add Code

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

no code implementations • 10 Oct 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

Paper
Add Code

CoVR: Learning Composed Video Retrieval from Web Video Captions

1 code implementation • 28 Aug 2023 • Lucas Ventura, Antoine Yang, Cordelia Schmid, Gül Varol

Most CoIR approaches require manually annotated datasets, comprising image-text-image triplets, where the text describes a modification from the query image to the target image.

Ranked #1 on Composed Video Retrieval (CoVR) on WebVid-CoVR

Composed Video Retrieval (CoVR) Language Modelling +3

Paper
Code

TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis

no code implementations • ICCV 2023 • Mathis Petrovich, Michael J. Black, Gül Varol

We show that maintaining the motion generation loss, along with the contrastive training, is crucial to obtain good performance.

Moment Retrieval Motion Synthesis +3

Paper
Add Code

SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation

no code implementations • ICCV 2023 • Nikos Athanasiou, Mathis Petrovich, Michael J. Black, Gül Varol

Motivated by the observation that the correspondence between actions and body parts is encoded in powerful language models, we extract this knowledge by prompting GPT-3 with text such as "what are the body parts involved in the action <action name>?

Action Generation

Paper
Add Code

Going Beyond Nouns With Vision & Language Models Using Synthetic Data

1 code implementation • ICCV 2023 • Paola Cascante-Bonilla, Khaled Shehada, James Seale Smith, Sivan Doveh, Donghyun Kim, Rameswar Panda, Gül Varol, Aude Oliva, Vicente Ordonez, Rogerio Feris, Leonid Karlinsky

We contribute Synthetic Visual Concepts (SyViC) - a million-scale synthetic dataset and data generation codebase allowing to generate additional suitable data to improve VLC understanding and compositional reasoning of VL models.

Ranked #68 on Visual Reasoning on Winoground

Sentence Visual Reasoning

Paper
Code

AutoAD: Movie Description in Context

1 code implementation • CVPR 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form.

Image Captioning Text Generation

134

Paper
Code

Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

1 code implementation • 16 Nov 2022 • K R Prajwal, Hannah Bull, Liliane Momeni, Samuel Albanie, Gül Varol, Andrew Zisserman

Through extensive evaluations, we verify our method for automatic annotation and our model architecture.

Paper
Code

TEACH: Temporal Action Composition for 3D Humans

1 code implementation • 9 Sep 2022 • Nikos Athanasiou, Mathis Petrovich, Michael J. Black, Gül Varol

In particular, our goal is to enable the synthesis of a series of actions, which we refer to as temporal action composition.

Motion Synthesis Sentence

370

Paper
Code

Automatic dense annotation of large-vocabulary sign language videos

no code implementations • 4 Aug 2022 • Liliane Momeni, Hannah Bull, K R Prajwal, Samuel Albanie, Gül Varol, Andrew Zisserman

Recently, sign language researchers have turned to sign language interpreted TV broadcasts, comprising (i) a video of continuous signing and (ii) subtitles corresponding to the audio content, as a readily available and large-scale source of training data.

Paper
Add Code

A CLIP-Hitchhiker's Guide to Long Video Retrieval

1 code implementation • 17 May 2022 • Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman

Our goal in this paper is the adaptation of image-text models for long video retrieval.

Ranked #4 on Zero-Shot Action Recognition on Charades

Retrieval Video Retrieval +1

Paper
Code

Scaling up sign spotting through sign language dictionaries

no code implementations • 9 May 2022 • Gül Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

The focus of this work is $\textit{sign spotting}$ - given a video of an isolated sign, our task is to identify $\textit{whether}$ and $\textit{where}$ it has been signed in a continuous, co-articulated sign language video.

Multiple Instance Learning

Paper
Add Code

TEMOS: Generating diverse human motions from textual descriptions

1 code implementation • 25 Apr 2022 • Mathis Petrovich, Michael J. Black, Gül Varol

In contrast to most previous work which focuses on generating a single, deterministic, motion from a textual description, we design a variational approach that can produce multiple diverse human motions.

Ranked #7 on Motion Synthesis on InterHuman

Motion Synthesis

339

Paper
Code

Sign Language Video Retrieval with Free-Form Textual Queries

no code implementations • CVPR 2022 • Amanda Duarte, Samuel Albanie, Xavier Giró-i-Nieto, Gül Varol

Systems that can efficiently search collections of sign language videos have been highlighted as a useful application of sign language technology.

Retrieval Sentence +2

Paper
Add Code

BBC-Oxford British Sign Language Dataset

no code implementations • 5 Nov 2021 • Samuel Albanie, Gül Varol, Liliane Momeni, Hannah Bull, Triantafyllos Afouras, Himel Chowdhury, Neil Fox, Bencie Woll, Rob Cooper, Andrew McParland, Andrew Zisserman

In this work, we introduce the BBC-Oxford British Sign Language (BOBSL) dataset, a large-scale video collection of British Sign Language (BSL).

Sign Language Translation Translation

Paper
Add Code

Towards unconstrained joint hand-object reconstruction from RGB videos

1 code implementation • 16 Aug 2021 • Yana Hasson, Gül Varol, Ivan Laptev, Cordelia Schmid

Our work aims to obtain 3D reconstruction of hands and manipulated objects from monocular videos.

Ranked #5 on hand-object pose on HO-3D

3D Reconstruction hand-object pose +6

Paper
Code

Aligning Subtitles in Sign Language Videos

no code implementations • ICCV 2021 • Hannah Bull, Triantafyllos Afouras, Gül Varol, Samuel Albanie, Liliane Momeni, Andrew Zisserman

The goal of this work is to temporally align asynchronous subtitles in sign language videos.

Machine Translation Translation

Paper
Add Code

Sign Segmentation with Changepoint-Modulated Pseudo-Labelling

1 code implementation • 28 Apr 2021 • Katrin Renz, Nicolaj C. Stache, Neil Fox, Gül Varol, Samuel Albanie

The objective of this work is to find temporal boundaries between signs in continuous sign language.

Segmentation Source-Free Domain Adaptation

Paper
Code

Action-Conditioned 3D Human Motion Synthesis with Transformer VAE

2 code implementations • ICCV 2021 • Mathis Petrovich, Michael J. Black, Gül Varol

By sampling from this latent space and querying a certain duration through a series of positional encodings, we synthesize variable-length motion sequences conditioned on a categorical action.

Action Recognition Denoising +2

351

Paper
Code

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

5 code implementations • ICCV 2021 • Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman

Our objective in this work is video-text retrieval - in particular a joint embedding that enables efficient text-to-video retrieval.

Ranked #4 on Video Retrieval on QuerYD (using extra training data)

Retrieval Text Retrieval +4

2,993

Paper
Code

Read and Attend: Temporal Localisation in Sign Language Videos

no code implementations • CVPR 2021 • Gül Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

Our contributions are as follows: (1) we demonstrate the ability to leverage large quantities of continuous signing videos with weakly-aligned subtitles to localise signs in continuous sign language; (2) we employ the learned attention to automatically generate hundreds of thousands of annotations for a large sign vocabulary; (3) we collect a set of 37K manually verified sign instances across a vocabulary of 950 sign classes to support our study of sign language recognition; (4) by training on the newly annotated data from our method, we outperform the prior state of the art on the BSL-1K sign language recognition benchmark.

Sign Language Recognition

Paper
Add Code

Sign language segmentation with temporal convolutional networks

1 code implementation • 25 Nov 2020 • Katrin Renz, Nicolaj C. Stache, Samuel Albanie, Gül Varol

The objective of this work is to determine the location of temporal boundaries between signs in continuous sign language videos.

Paper
Code

Watch, read and lookup: learning to spot signs from multiple supervisors

1 code implementation • 8 Oct 2020 • Liliane Momeni, Gül Varol, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

The focus of this work is sign spotting - given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video.

Multiple Instance Learning

Paper
Code

BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

1 code implementation • ECCV 2020 • Samuel Albanie, Gül Varol, Liliane Momeni, Triantafyllos Afouras, Joon Son Chung, Neil Fox, Andrew Zisserman

Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality.

Ranked #4 on Sign Language Recognition on WLASL-2000

Action Classification Keyword Spotting +2

Paper
Code

Synthetic Humans for Action Recognition from Unseen Viewpoints

1 code implementation • 9 Dec 2019 • Gül Varol, Ivan Laptev, Cordelia Schmid, Andrew Zisserman

Although synthetic training data has been shown to be beneficial for tasks such as human pose estimation, its use for RGB human action recognition is relatively unexplored.

Action Classification Action Recognition +2

Paper
Code

Learning joint reconstruction of hands and manipulated objects

3 code implementations • CVPR 2019 • Yana Hasson, Gül Varol, Dimitrios Tzionas, Igor Kalevatykh, Michael J. Black, Ivan Laptev, Cordelia Schmid

Previous work has made significant progress towards reconstruction of hand poses and object shapes in isolation.

Ranked #7 on hand-object pose on DexYCB

Hand Joint Reconstruction hand-object pose +2

560

Paper
Code

BodyNet: Volumetric Inference of 3D Human Body Shapes

2 code implementations • ECCV 2018 • Gül Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, Cordelia Schmid

Human shape estimation is an important task for video editing, animation and fashion industry.

Ranked #3 on 3D Human Pose Estimation on Surreal (using extra training data)

3D Human Pose Estimation Segmentation +1

261

Paper
Code

Learning from Synthetic Humans

2 code implementations • CVPR 2017 • Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev, Cordelia Schmid

In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data.

2D Human Pose Estimation 3D Human Pose Estimation +2