Search Results for author: Tengda Han

Found 18 papers, 10 papers with code

AutoAD III: The Prequel -- Back to the Pixels

no code implementations22 Apr 2024 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names.

Stale Diffusion: Hyper-realistic 5D Movie Generation Using Old-school Methods

no code implementations1 Apr 2024 Joao F. Henriques, Dylan Campbell, Tengda Han

As the horses have long left the barn, our proposal may be seen as antiquated and irrelevant.

A Strong Baseline for Temporal Video-Text Alignment

no code implementations21 Dec 2023 Zeqian Li, Qirui Chen, Tengda Han, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of temporally aligning the video and texts from instructional videos, specifically, given a long-term video, and associated text sentences, our goal is to determine their corresponding timestamps in the video.

Descriptive Language Modelling +3

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

no code implementations10 Oct 2023 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

Learning to Count without Annotations

1 code implementation17 Jul 2023 Lukas Knobel, Tengda Han, Yuki M. Asano

While recent supervised methods for reference-based object counting continue to improve the performance on benchmark datasets, they have to rely on small datasets due to the cost associated with manually annotating dozens of objects in images.

Object Counting

Open-world Text-specified Object Counting

1 code implementation2 Jun 2023 Niki Amini-Naieni, Kiana Amini-Naieni, Tengda Han, Andrew Zisserman

Our objective is open-world object counting in images, where the target object class is specified by a text description.

Object Object Counting +1

AutoAD: Movie Description in Context

1 code implementation CVPR 2023 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form.

Image Captioning Text Generation

AutoAD II: The Sequel - Who, When, and What in Movie Audio Description

no code implementations ICCV 2023 Tengda Han, Max Bain, Arsha Nagrani, Gul Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

Prompt Generation Networks for Input-based Adaptation of Frozen Vision Transformers

1 code implementation12 Oct 2022 Jochem Loedeman, Maarten C. Stol, Tengda Han, Yuki M. Asano

With the introduction of the transformer architecture in computer vision, increasing model scale has been demonstrated as a clear path to achieving performance and robustness gains.

Transfer Learning

Turbo Training with Token Dropout

no code implementations10 Oct 2022 Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is an efficient training method for video tasks.

Action Classification Classification +1

Temporal Alignment Networks for Long-term Video

1 code implementation CVPR 2022 Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment.

Action Recognition Action Segmentation +4

Prompting Visual-Language Models for Efficient Video Understanding

1 code implementation8 Dec 2021 Chen Ju, Tengda Han, Kunhao Zheng, Ya zhang, Weidi Xie

Image-based visual-language (I-VL) pre-training has shown great success for learning joint visual-textual representations from large-scale web data, revealing remarkable ability for zero-shot generalisation.

Action Recognition Language Modelling +4

Memory-augmented Dense Predictive Coding for Video Representation Learning

1 code implementation ECCV 2020 Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is self-supervised learning from video, in particular for representations for action recognition.

Action Classification Action Recognition +5

Video Representation Learning by Dense Predictive Coding

1 code implementation10 Sep 2019 Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition.

Representation Learning Self-Supervised Action Recognition +2

Human Action Forecasting by Learning Task Grammars

no code implementations19 Sep 2017 Tengda Han, Jue Wang, Anoop Cherian, Stephen Gould

For effective human-robot interaction, it is important that a robotic assistant can forecast the next action a human will consider in a given task.

Action Recognition Temporal Action Localization

Human Pose Forecasting via Deep Markov Models

no code implementations24 Jul 2017 Sam Toyer, Anoop Cherian, Tengda Han, Stephen Gould

Human pose forecasting is an important problem in computer vision with applications to human-robot interaction, visual surveillance, and autonomous driving.

Autonomous Driving Human Pose Forecasting

Cannot find the paper you are looking for? You can Submit a new open access paper.