2 code implementations • NeurIPS 2021 • Mandela Patrick, Dylan Campbell, Yuki M. Asano, Ishan Misra, Florian Metze, Christoph Feichtenhofer, Andrea Vedaldi, João F. Henriques
In video transformers, the time dimension is often treated in the same way as the two spatial dimensions.
Ranked #15 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)
1 code implementation • ICCV 2021 • Mandela Patrick, Yuki M. Asano, Bernie Huang, Ishan Misra, Florian Metze, Joao Henriques, Andrea Vedaldi
First, for space, we show that spatial augmentations such as cropping do work well for videos too, but that previous implementations, due to the high processing and memory cost, could not do this at a scale sufficient for it to work well.
1 code implementation • NAACL 2021 • Po-Yao Huang, Mandela Patrick, Junjie Hu, Graham Neubig, Florian Metze, Alexander Hauptmann
Specifically, we focus on multilingual text-to-video search and propose a Transformer-based model that learns contextualized multilingual multimodal embeddings.
no code implementations • ICLR 2021 • Mandela Patrick, Po-Yao Huang, Yuki Asano, Florian Metze, Alexander Hauptmann, João Henriques, Andrea Vedaldi
The dominant paradigm for learning video-text representations -- noise contrastive learning -- increases the similarity of the representations of pairs of samples that are known to be related, such as text and video from the same sample, and pushes away the representations of all other pairs.
no code implementations • 28 Sep 2020 • Mandela Patrick, Yuki Asano, Polina Kuznetsova, Ruth Fong, Joao F. Henriques, Geoffrey Zweig, Andrea Vedaldi
In this paper, we show that, for videos, the answer is more complex, and that better results can be obtained by accounting for the interplay between invariance, distinctiveness, multiple modalities and time.
1 code implementation • NeurIPS 2020 • Yuki M. Asano, Mandela Patrick, Christian Rupprecht, Andrea Vedaldi
A large part of the current success of deep learning lies in the effectiveness of data -- more precisely: labelled data.
1 code implementation • ICCV 2021 • Mandela Patrick, Yuki M. Asano, Polina Kuznetsova, Ruth Fong, João F. Henriques, Geoffrey Zweig, Andrea Vedaldi
In the image domain, excellent representations can be learned by inducing invariance to content-preserving transformations via noise contrastive learning.
2 code implementations • ICCV 2019 • Ruth Fong, Mandela Patrick, Andrea Vedaldi
In this paper, we discuss some of the shortcomings of existing approaches to perturbation analysis and address them by introducing the concept of extremal perturbations, which are theoretically grounded and interpretable.