Search Results for author: Tengda Han

Found 18 papers, 10 papers with code

AutoAD III: The Prequel -- Back to the Pixels

no code implementations • 22 Apr 2024 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names.

Paper
Add Code

Stale Diffusion: Hyper-realistic 5D Movie Generation Using Old-school Methods

no code implementations • 1 Apr 2024 • Joao F. Henriques, Dylan Campbell, Tengda Han

As the horses have long left the barn, our proposal may be seen as antiquated and irrelevant.

Paper
Add Code

A Strong Baseline for Temporal Video-Text Alignment

no code implementations • 21 Dec 2023 • Zeqian Li, Qirui Chen, Tengda Han, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of temporally aligning the video and texts from instructional videos, specifically, given a long-term video, and associated text sentences, our goal is to determine their corresponding timestamps in the video.

Descriptive Language Modelling +3

Paper
Add Code

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

no code implementations • 10 Oct 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

Paper
Add Code

Learning to Count without Annotations

1 code implementation • 17 Jul 2023 • Lukas Knobel, Tengda Han, Yuki M. Asano

While recent supervised methods for reference-based object counting continue to improve the performance on benchmark datasets, they have to rely on small datasets due to the cost associated with manually annotating dozens of objects in images.

Object Counting

Paper
Code

Open-world Text-specified Object Counting

1 code implementation • 2 Jun 2023 • Niki Amini-Naieni, Kiana Amini-Naieni, Tengda Han, Andrew Zisserman

Our objective is open-world object counting in images, where the target object class is specified by a text description.

Ranked #1 on Zero-Shot Counting on FSC147

Object Object Counting +1

Paper
Code

AutoAD: Movie Description in Context

1 code implementation • CVPR 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form.

Image Captioning Text Generation

134

Paper
Code

AutoAD II: The Sequel - Who, When, and What in Movie Audio Description

no code implementations • ICCV 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gul Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

Paper
Add Code

Prompt Generation Networks for Input-based Adaptation of Frozen Vision Transformers

1 code implementation • 12 Oct 2022 • Jochem Loedeman, Maarten C. Stol, Tengda Han, Yuki M. Asano

With the introduction of the transformer architecture in computer vision, increasing model scale has been demonstrated as a clear path to achieving performance and robustness gains.

Transfer Learning

Paper
Code

Turbo Training with Token Dropout

no code implementations • 10 Oct 2022 • Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is an efficient training method for video tasks.

Action Classification Classification +1

Paper
Add Code

Flamingo: a Visual Language Model for Few-Shot Learning

5 code implementations • DeepMind 2022 • Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, Karen Simonyan

Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research.

Ranked #1 on Action Recognition on RareAct

Few-Shot Learning Generative Visual Question Answering +9

3,466

Paper
Code

Temporal Alignment Networks for Long-term Video

1 code implementation • CVPR 2022 • Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment.

Action Recognition Action Segmentation +4

106

Paper
Code

Prompting Visual-Language Models for Efficient Video Understanding

1 code implementation • 8 Dec 2021 • Chen Ju, Tengda Han, Kunhao Zheng, Ya zhang, Weidi Xie

Image-based visual-language (I-VL) pre-training has shown great success for learning joint visual-textual representations from large-scale web data, revealing remarkable ability for zero-shot generalisation.

Ranked #5 on Zero-Shot Action Detection on ActivityNet-1.3

Action Recognition Language Modelling +4

179

Paper
Code

Self-supervised Co-training for Video Representation Learning

1 code implementation • NeurIPS 2020 • Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is visual-only self-supervised video representation learning.

Ranked #12 on Self-Supervised Action Recognition on HMDB51 (finetuned)

Contrastive Learning Optical Flow Estimation +4

282

Paper
Code

Memory-augmented Dense Predictive Coding for Video Representation Learning

1 code implementation • ECCV 2020 • Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is self-supervised learning from video, in particular for representations for action recognition.

Action Classification Action Recognition +5

164

Paper
Code

Video Representation Learning by Dense Predictive Coding

1 code implementation • 10 Sep 2019 • Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition.

Ranked #33 on Self-Supervised Action Recognition on UCF101

Representation Learning Self-Supervised Action Recognition +2

250

Paper
Code

Human Action Forecasting by Learning Task Grammars

no code implementations • 19 Sep 2017 • Tengda Han, Jue Wang, Anoop Cherian, Stephen Gould

For effective human-robot interaction, it is important that a robotic assistant can forecast the next action a human will consider in a given task.

Action Recognition Temporal Action Localization

Paper
Add Code

Human Pose Forecasting via Deep Markov Models

no code implementations • 24 Jul 2017 • Sam Toyer, Anoop Cherian, Tengda Han, Stephen Gould

Human pose forecasting is an important problem in computer vision with applications to human-robot interaction, visual surveillance, and autonomous driving.

Autonomous Driving Human Pose Forecasting

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.