no code implementations • 25 Apr 2024 • Olivia Wiles, Chuhan Zhang, Isabela Albuquerque, Ivana Kajić, Su Wang, Emanuele Bugliarello, Yasumasa Onoe, Chris Knutsen, Cyrus Rashtchian, Jordi Pont-Tuset, Aida Nematzadeh
Human-rated prompt sets are generally small and the reliability of the ratings -- and thereby the prompt set used to compare models -- is not evaluated.
no code implementations • 9 Dec 2023 • Chuhan Zhang, Wei Pan, Cosimo Della Santina
Motor imagery, an important category in electroencephalogram (EEG) research, often intersects with scenarios demanding low energy consumption, such as portable medical devices and isolated environment operations.
1 code implementation • ICCV 2023 • Chuhan Zhang, Ankush Gupta, Andrew Zisserman
We demonstrate the performance of the object-aware representations learnt by our model, by: (i) evaluating it for strong transfer, i. e. through zero-shot testing, on a number of downstream video-text retrieval and classification benchmarks; and (ii) by using the representations learned as input for long-term video understanding tasks (e. g. Episodic Memory in Ego4D).
no code implementations • 3 May 2023 • Chuhan Zhang, Antoine Miech, Jiajun Shen, Jean-Baptiste Alayrac, Pauline Luc
Large-scale visual language models are widely used as pre-trained models and then adapted for various downstream tasks.
no code implementations • 20 Jul 2022 • Chuhan Zhang, Ankush Gupta, Andrew Zisserman
The model learns a set of object-centric summary vectors for the video, and uses these vectors to fuse the visual and spatio-temporal trajectory 'modalities' of the video clip.
no code implementations • CVPR 2021 • Chuhan Zhang, Ankush Gupta, Andrew Zisserman
It attends to relevant segments for each query with a temporal attention mechanism, and can be trained using only the labels for each query.
Ranked #12 on Action Recognition on Diving-48
no code implementations • ECCV 2020 • Chuhan Zhang, Ankush Gupta, Andrew Zisserman
In this work, our objective is to address the problems of generalization and flexibility for text recognition in documents.