Search Results for author: Ankush Gupta

Found 21 papers, 12 papers with code

BootsTAP: Bootstrapped Training for Tracking-Any-Point

2 code implementations • 1 Feb 2024 • Carl Doersch, Yi Yang, Dilara Gokay, Pauline Luc, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ross Goroshin, João Carreira, Andrew Zisserman

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes.

1,042

Paper
Code

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

1 code implementation • ICCV 2023 • Chuhan Zhang, Ankush Gupta, Andrew Zisserman

We demonstrate the performance of the object-aware representations learnt by our model, by: (i) evaluating it for strong transfer, i. e. through zero-shot testing, on a number of downstream video-text retrieval and classification benchmarks; and (ii) by using the representations learned as input for long-term video understanding tasks (e. g. Episodic Memory in Ego4D).

Object Text Retrieval +3

Paper
Code

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

1 code implementation • ICCV 2023 • Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Joao Carreira, Andrew Zisserman

We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.

Ranked #1 on Visual Tracking on Kinetics

Motion Estimation Visual Tracking

1,041

Paper
Code

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

2 code implementations • NeurIPS 2023 • Viorica Pătrăucean, Lucas Smaira, Ankush Gupta, Adrià Recasens Continente, Larisa Markeeva, Dylan Banarse, Skanda Koppula, Joseph Heyward, Mateusz Malinowski, Yi Yang, Carl Doersch, Tatiana Matejovicova, Yury Sulsky, Antoine Miech, Alex Frechette, Hanna Klimczak, Raphael Koster, Junlin Zhang, Stephanie Winkler, Yusuf Aytar, Simon Osindero, Dima Damen, Andrew Zisserman, João Carreira

We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e. g. Flamingo, SeViLA, or GPT-4).

counterfactual Descriptive +2

151

Paper
Code

SuS-X: Training-Free Name-Only Transfer of Vision-Language Models

2 code implementations • ICCV 2023 • Vishaal Udandarao, Ankush Gupta, Samuel Albanie

Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models.

Retrieval Zero-Shot Learning

Paper
Code

TAP-Vid: A Benchmark for Tracking Any Point in a Video

3 code implementations • 7 Nov 2022 • Carl Doersch, Ankush Gupta, Larisa Markeeva, Adrià Recasens, Lucas Smaira, Yusuf Aytar, João Carreira, Andrew Zisserman, Yi Yang

Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move.

Optical Flow Estimation Point Tracking

1,041

Paper
Code

Perception Test: A Diagnostic Benchmark for Multimodal Models

1 code implementation • Deep Mind 2022 • Viorica Pătrăucean, Lucas Smaira, Ankush Gupta, Adrià Recasens Continente, Larisa Markeeva, Dylan Banarse, Mateusz Malinowski, Yi Yang, Carl Doersch, Tatiana Matejovicova, Yury Sulsky, Antoine Miech, Skanda Koppula, Alex Frechette, Hanna Klimczak, Raphael Koster, Junlin Zhang, Stephanie Winkler, Yusuf Aytar, Simon Osindero, Dima Damen, Andrew Zisserman and João Carreira

We propose a novel multimodal benchmark – the Perception Test – that aims to extensively evaluate perception and reasoning skills of multimodal models.

Multiple-choice Question Answering +1

151

Paper
Code

Is an Object-Centric Video Representation Beneficial for Transfer?

no code implementations • 20 Jul 2022 • Chuhan Zhang, Ankush Gupta, Andrew Zisserman

The model learns a set of object-centric summary vectors for the video, and uses these vectors to fuse the visual and spatio-temporal trajectory 'modalities' of the video clip.

Action Classification Object +1

Paper
Add Code

Development of an Enterprise-Grade Contract Understanding System

no code implementations • NAACL 2021 • Arvind Agarwal, Laura Chiticariu, Poornima Chozhiyath Raman, Marina Danilevsky, Diman Ghazi, Ankush Gupta, Shanmukha Guttula, Yannis Katsis, Rajasekar Krishnamurthy, Yunyao Li, Shubham Mudgal, Vitobha Munigala, Nicholas Phan, Dhaval Sonawane, Sneha Srinivasan, Sudarshan R. Thitte, Mitesh Vasa, Ramiya Venkatachalam, Vinitha Yaski, Huaiyu Zhu

Contracts are arguably the most important type of business documents.

Paper
Add Code

Temporal Query Networks for Fine-grained Video Understanding

no code implementations • CVPR 2021 • Chuhan Zhang, Ankush Gupta, Andrew Zisserman

It attends to relevant segments for each query with a temporal attention mechanism, and can be trained using only the labels for each query.

Ranked #12 on Action Recognition on Diving-48

Action Classification Action Recognition +1

Paper
Add Code

Representation Matters: Improving Perception and Exploration for Robotics

no code implementations • 3 Nov 2020 • Markus Wulfmeier, Arunkumar Byravan, Tim Hertweck, Irina Higgins, Ankush Gupta, tejas kulkarni, Malcolm Reynolds, Denis Teplyashin, Roland Hafner, Thomas Lampe, Martin Riedmiller

Furthermore, the value of each representation is evaluated in terms of three properties: dimensionality, observability and disentanglement.

Disentanglement

Paper
Add Code

Adaptive Text Recognition through Visual Matching

no code implementations • ECCV 2020 • Chuhan Zhang, Ankush Gupta, Andrew Zisserman

In this work, our objective is to address the problems of generalization and flexibility for text recognition in documents.

Representation Learning

Paper
Add Code

CrossTransformers: spatially-aware few-shot transfer

4 code implementations • NeurIPS 2020 • Carl Doersch, Ankush Gupta, Andrew Zisserman

In this work, we illustrate how the neural network representations which underpin modern vision systems are subject to supervision collapse, whereby they lose any information that is not necessary for performing the training task, including information that may be necessary for transfer to new tasks or domains.

Self-Supervised Learning

740

Paper
Code

Compliance Change Tracking in Business Process Services

no code implementations • 20 Aug 2019 • Srikanth G Tamilselvam, Ankush Gupta, Arvind Agarwal

Compliance officers responsible for maintaining adherence constantly struggle to keep up with the large amount of changes in regulatory requirements.

Classification feature selection +2

Paper
Add Code

Self-supervised Learning of Interpretable Keypoints from Unlabelled Videos

no code implementations • CVPR 2020 • Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi

We propose KeypointGAN, a new method for recognizing the pose of objects from a single image that for learning uses only unlabelled videos and a weak empirical prior on the object poses.

Facial Landmark Detection Image-to-Image Translation +4

Paper
Add Code

Unsupervised Learning of Object Keypoints for Perception and Control

6 code implementations • NeurIPS 2019 • Tejas Kulkarni, Ankush Gupta, Catalin Ionescu, Sebastian Borgeaud, Malcolm Reynolds, Andrew Zisserman, Volodymyr Mnih

In this work we aim to learn object representations that are useful for control and reinforcement learning (RL).

3D Action Recognition Image Classification +6

12,788

Paper
Code

Learning to Read by Spelling: Towards Unsupervised Text Recognition

no code implementations • 23 Sep 2018 • Ankush Gupta, Andrea Vedaldi, Andrew Zisserman

This work presents a method for visual text recognition without using any paired supervisory data.

Unsupervised Text Recognition valid

Paper
Add Code

Inductive Visual Localisation: Factorised Training for Superior Generalisation

no code implementations • 21 Jul 2018 • Ankush Gupta, Andrea Vedaldi, Andrew Zisserman

End-to-end trained Recurrent Neural Networks (RNNs) have been successfully applied to numerous problems that require processing sequences, such as image captioning, machine translation, and text recognition.

Image Captioning Machine Translation +2