Search Results for author: Yanghao Li

Found 34 papers, 26 papers with code

Idempotence and Perceptual Image Compression

1 code implementation17 Jan 2024 Tongda Xu, Ziran Zhu, Dailan He, Yanghao Li, Lina Guo, Yuanyuan Wang, Zhe Wang, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

However, we find that theoretically: 1) Conditional generative model-based perceptual codec satisfies idempotence; 2) Unconditional generative model with idempotence constraint is equivalent to conditional generative codec.

Image Compression

Bandwidth-efficient Inference for Neural Image Compression

no code implementations6 Sep 2023 Shanzhi Yin, Tongda Xu, Yongsheng Liang, Yuanyuan Wang, Yanghao Li, Yan Wang, Jingjing Liu

With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices.

Data Compression Image Compression +1

Conditional Perceptual Quality Preserving Image Compression

no code implementations16 Aug 2023 Tongda Xu, Qian Zhang, Yanghao Li, Dailan He, Zhe Wang, Yuanyuan Wang, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

We propose conditional perceptual quality, an extension of the perceptual quality defined in \citet{blau2018perception}, by conditioning it on user defined information.

Image Compression

R-MAE: Regions Meet Masked Autoencoders

1 code implementation8 Jun 2023 Duy-Kien Nguyen, Vaibhav Aggarwal, Yanghao Li, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen

In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning.

Contrastive Learning Interactive Segmentation +4

Reversible Vision Transformers

4 code implementations CVPR 2022 Karttikeya Mangalam, Haoqi Fan, Yanghao Li, Chao-yuan Wu, Bo Xiong, Christoph Feichtenhofer, Jitendra Malik

Reversible Vision Transformers achieve a reduced memory footprint of up to 15. 5x at roughly identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for hardware resource limited training regimes.

Image Classification object-detection +2

Scaling Language-Image Pre-training via Masking

4 code implementations CVPR 2023 Yanghao Li, Haoqi Fan, Ronghang Hu, Christoph Feichtenhofer, Kaiming He

We present Fast Language-Image Pre-training (FLIP), a simple and more efficient method for training CLIP.

Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization

1 code implementation CVPR 2023 Mengmeng Xu, Yanghao Li, Cheng-Yang Fu, Bernard Ghanem, Tao Xiang, Juan-Manuel Perez-Rua

Our experiments show the proposed adaptations improve egocentric query detection, leading to a better visual query localization system in both 2D and 3D configurations.

Object

Negative Frames Matter in Egocentric Visual Query 2D Localization

1 code implementation3 Aug 2022 Mengmeng Xu, Cheng-Yang Fu, Yanghao Li, Bernard Ghanem, Juan-Manuel Perez-Rua, Tao Xiang

The repeated gradient computation of the same object lead to an inefficient training; (2) The false positive rate is high on background frames.

Object

Masked Autoencoders As Spatiotemporal Learners

3 code implementations18 May 2022 Christoph Feichtenhofer, Haoqi Fan, Yanghao Li, Kaiming He

We randomly mask out spacetime patches in videos and learn an autoencoder to reconstruct them in pixels.

Inductive Bias Representation Learning

Exploring Plain Vision Transformer Backbones for Object Detection

6 code implementations30 Mar 2022 Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He

This design enables the original ViT architecture to be fine-tuned for object detection without needing to redesign a hierarchical backbone for pre-training.

Instance Segmentation Object +2

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

1 code implementation CVPR 2022 Chao-yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.

Ranked #3 on Action Anticipation on EPIC-KITCHENS-100 (using extra training data)

Action Anticipation Action Classification +2

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

6 code implementations CVPR 2022 Yanghao Li, Chao-yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.

 Ranked #1 on Action Classification on Kinetics-600 (GFLOPs metric)

Action Classification Action Recognition +6

Benchmarking Detection Transfer Learning with Vision Transformers

2 code implementations22 Nov 2021 Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollar, Kaiming He, Ross Girshick

The complexity of object detection methods can make this benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.

Benchmarking object-detection +3

PyTorchVideo: A Deep Learning Library for Video Understanding

1 code implementation18 Nov 2021 Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer

We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.

Self-Supervised Learning Video Understanding

Ego4D: Around the World in 3,000 Hours of Egocentric Video

5 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Multiscale Vision Transformers

7 code implementations ICCV 2021 Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

Action Classification Action Recognition +2

Learning Model-Blind Temporal Denoisers without Ground Truths

no code implementations7 Jul 2020 Yanghao Li, Bichuan Guo, Jiangtao Wen, Zhen Xia, Shan Liu, Yuxing Han

Denoisers trained with synthetic data often fail to cope with the diversity of unknown noises, giving way to methods that can adapt to existing noise without knowing its ground truth.

Denoising Management +2

Modality Compensation Network: Cross-Modal Adaptation for Action Recognition

no code implementations31 Jan 2020 Sijie Song, Jiaying Liu, Yanghao Li, Zongming Guo

In this work, we propose a Modality Compensation Network (MCN) to explore the relationships of different modalities, and boost the representations for human action recognition.

Action Recognition Optical Flow Estimation +2

EGO-TOPO: Environment Affordances from Egocentric Video

1 code implementation CVPR 2020 Tushar Nagarajan, Yanghao Li, Christoph Feichtenhofer, Kristen Grauman

We introduce a model for environment affordances that is learned directly from egocentric video.

Scale-Aware Trident Networks for Object Detection

4 code implementations ICCV 2019 Yanghao Li, Yuntao Chen, Naiyan Wang, Zhao-Xiang Zhang

In this work, we first present a controlled experiment to investigate the effect of receptive fields for scale variation in object detection.

Object object-detection +1

PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding

no code implementations22 Mar 2017 Chunhui Liu, Yueyu Hu, Yanghao Li, Sijie Song, Jiaying Liu

Despite the fact that many 3D human activity benchmarks being proposed, most existing action datasets focus on the action recognition tasks for the segmented videos.

Action Detection Action Recognition +2

Demystifying Neural Style Transfer

3 code implementations4 Jan 2017 Yanghao Li, Naiyan Wang, Jiaying Liu, Xiaodi Hou

Neural Style Transfer has recently demonstrated very exciting results which catches eyes in both academia and industry.

Domain Adaptation Style Transfer

Factorized Bilinear Models for Image Recognition

1 code implementation ICCV 2017 Yanghao Li, Naiyan Wang, Jiaying Liu, Xiaodi Hou

Although Deep Convolutional Neural Networks (CNNs) have liberated their power in various computer vision tasks, the most important components of CNN, convolutional layers and fully connected layers, are still limited to linear transformations.

Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks

no code implementations24 Mar 2016 Wentao Zhu, Cuiling Lan, Junliang Xing, Wen-Jun Zeng, Yanghao Li, Li Shen, Xiaohui Xie

Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions.

Action Recognition Skeleton Based Action Recognition +1

Revisiting Batch Normalization For Practical Domain Adaptation

1 code implementation15 Mar 2016 Yanghao Li, Naiyan Wang, Jianping Shi, Jiaying Liu, Xiaodi Hou

However, it is still a common annoyance during the training phase, that one has to prepare at least thousands of labeled images to fine-tune a network to a specific domain.

Domain Adaptation Image Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.