Search Results for author: Haoqi Fan

Found 28 papers, 19 papers with code

Reversible Vision Transformers

4 code implementations CVPR 2022 Karttikeya Mangalam, Haoqi Fan, Yanghao Li, Chao-yuan Wu, Bo Xiong, Christoph Feichtenhofer, Jitendra Malik

Reversible Vision Transformers achieve a reduced memory footprint of up to 15. 5x at roughly identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for hardware resource limited training regimes.

Image Classification object-detection +2

Scaling Language-Image Pre-training via Masking

4 code implementations CVPR 2023 Yanghao Li, Haoqi Fan, Ronghang Hu, Christoph Feichtenhofer, Kaiming He

We present Fast Language-Image Pre-training (FLIP), a simple and more efficient method for training CLIP.

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference

no code implementations CVPR 2023 Haoran You, Yunyang Xiong, Xiaoliang Dai, Bichen Wu, Peizhao Zhang, Haoqi Fan, Peter Vajda, Yingyan Lin

Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens.

Masked Autoencoders As Spatiotemporal Learners

3 code implementations18 May 2022 Christoph Feichtenhofer, Haoqi Fan, Yanghao Li, Kaiming He

We randomly mask out spacetime patches in videos and learn an autoencoder to reconstruct them in pixels.

Inductive Bias Representation Learning

On the Importance of Asymmetry for Siamese Representation Learning

1 code implementation CVPR 2022 Xiao Wang, Haoqi Fan, Yuandong Tian, Daisuke Kihara, Xinlei Chen

Many recent self-supervised frameworks for visual representation learning are based on certain forms of Siamese networks.

Representation Learning

Unified Transformer Tracker for Object Tracking

1 code implementation CVPR 2022 Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, Zhicheng Yan

Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking.

Multiple Object Tracking Object

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

1 code implementation CVPR 2022 Chao-yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.

Ranked #3 on Action Anticipation on EPIC-KITCHENS-100 (using extra training data)

Action Anticipation Action Classification +2

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

6 code implementations CVPR 2022 Yanghao Li, Chao-yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.

 Ranked #1 on Action Classification on Kinetics-600 (GFLOPs metric)

Action Classification Action Recognition +6

PyTorchVideo: A Deep Learning Library for Video Understanding

1 code implementation18 Nov 2021 Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer

We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.

Self-Supervised Learning Video Understanding

Multiscale Vision Transformers

7 code implementations ICCV 2021 Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

Action Classification Action Recognition +2

Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories

no code implementations CVPR 2021 Xitong Yang, Haoqi Fan, Lorenzo Torresani, Larry Davis, Heng Wang

The standard way of training video models entails sampling at each iteration a single clip from a video and optimizing the clip prediction with respect to the video-level label.

Action Detection Action Recognition +1

Multiview Pseudo-Labeling for Semi-supervised Learning from Video

no code implementations ICCV 2021 Bo Xiong, Haoqi Fan, Kristen Grauman, Christoph Feichtenhofer

We present a multiview pseudo-labeling approach to video learning, a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video.

Representation Learning Video Recognition

Improved Baselines with Momentum Contrastive Learning

36 code implementations9 Mar 2020 Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He

Contrastive unsupervised learning has recently shown encouraging progress, e. g., in Momentum Contrast (MoCo) and SimCLR.

Contrastive Learning Data Augmentation +3

Stacked Latent Attention for Multimodal Reasoning

no code implementations CVPR 2018 Haoqi Fan, Jiatong Zhou

Attention has shown to be a pivotal development in deep learning and has been used for a multitude of multimodal learning tasks such as visual question answering and image captioning.

Image Captioning Multimodal Reasoning +2

Efficient K-Shot Learning with Regularized Deep Networks

no code implementations6 Oct 2017 Donghyun Yoo, Haoqi Fan, Vishnu Naresh Boddeti, Kris M. Kitani

To efficiently search for optimal groupings conditioned on the input data, we propose a reinforcement learning search strategy using recurrent networks to learn the optimal group assignments for each network layer.

Going Deeper into First-Person Activity Recognition

no code implementations CVPR 2016 Minghuang Ma, Haoqi Fan, Kris M. Kitani

Our appearance stream encodes prior knowledge of the egocentric paradigm by explicitly training the network to segment hands and localize objects.

Action Recognition Object +1

Cannot find the paper you are looking for? You can Submit a new open access paper.