Search Results for author: Ankush Gupta

Found 21 papers, 12 papers with code

BootsTAP: Bootstrapped Training for Tracking-Any-Point

2 code implementations1 Feb 2024 Carl Doersch, Yi Yang, Dilara Gokay, Pauline Luc, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ross Goroshin, João Carreira, Andrew Zisserman

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes.

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

1 code implementation ICCV 2023 Chuhan Zhang, Ankush Gupta, Andrew Zisserman

We demonstrate the performance of the object-aware representations learnt by our model, by: (i) evaluating it for strong transfer, i. e. through zero-shot testing, on a number of downstream video-text retrieval and classification benchmarks; and (ii) by using the representations learned as input for long-term video understanding tasks (e. g. Episodic Memory in Ego4D).

Object Text Retrieval +3

SuS-X: Training-Free Name-Only Transfer of Vision-Language Models

2 code implementations ICCV 2023 Vishaal Udandarao, Ankush Gupta, Samuel Albanie

Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models.

Retrieval Zero-Shot Learning

TAP-Vid: A Benchmark for Tracking Any Point in a Video

3 code implementations7 Nov 2022 Carl Doersch, Ankush Gupta, Larisa Markeeva, Adrià Recasens, Lucas Smaira, Yusuf Aytar, João Carreira, Andrew Zisserman, Yi Yang

Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move.

Optical Flow Estimation Point Tracking

Is an Object-Centric Video Representation Beneficial for Transfer?

no code implementations20 Jul 2022 Chuhan Zhang, Ankush Gupta, Andrew Zisserman

The model learns a set of object-centric summary vectors for the video, and uses these vectors to fuse the visual and spatio-temporal trajectory 'modalities' of the video clip.

Action Classification Object +1

Temporal Query Networks for Fine-grained Video Understanding

no code implementations CVPR 2021 Chuhan Zhang, Ankush Gupta, Andrew Zisserman

It attends to relevant segments for each query with a temporal attention mechanism, and can be trained using only the labels for each query.

Action Classification Action Recognition +1

Adaptive Text Recognition through Visual Matching

no code implementations ECCV 2020 Chuhan Zhang, Ankush Gupta, Andrew Zisserman

In this work, our objective is to address the problems of generalization and flexibility for text recognition in documents.

Representation Learning

CrossTransformers: spatially-aware few-shot transfer

4 code implementations NeurIPS 2020 Carl Doersch, Ankush Gupta, Andrew Zisserman

In this work, we illustrate how the neural network representations which underpin modern vision systems are subject to supervision collapse, whereby they lose any information that is not necessary for performing the training task, including information that may be necessary for transfer to new tasks or domains.

Self-Supervised Learning

Compliance Change Tracking in Business Process Services

no code implementations20 Aug 2019 Srikanth G Tamilselvam, Ankush Gupta, Arvind Agarwal

Compliance officers responsible for maintaining adherence constantly struggle to keep up with the large amount of changes in regulatory requirements.

Classification feature selection +2

Self-supervised Learning of Interpretable Keypoints from Unlabelled Videos

no code implementations CVPR 2020 Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi

We propose KeypointGAN, a new method for recognizing the pose of objects from a single image that for learning uses only unlabelled videos and a weak empirical prior on the object poses.

Facial Landmark Detection Image-to-Image Translation +4

Inductive Visual Localisation: Factorised Training for Superior Generalisation

no code implementations21 Jul 2018 Ankush Gupta, Andrea Vedaldi, Andrew Zisserman

End-to-end trained Recurrent Neural Networks (RNNs) have been successfully applied to numerous problems that require processing sequences, such as image captioning, machine translation, and text recognition.

Image Captioning Machine Translation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.